Research Article
A Two-Stream Deep Fusion Framework for High-Resolution Aerial Scene Classification
Table 3
Classification performance of the proposed method on the WHU-RS dataset using different feature extractors and fusion strategies.
| Different architectures | Feature size | Training ratios | 40% | 60% |
| Without fusion (CaffeNet(RGB)) | 4096 | 95.79 ± 1.37 | 96.87 ± 0.66 | Without fusion (CaffeNet(saliency)) | 4096 | 93.21 ± 1.55 | 95.86 ± 0.50 | Without fusion (VGG-Net-16(RGB)) | 4096 | 96.09 ± 0.56 | 96.64 ± 1.08 | Without fusion (VGG-Net-16(saliency)) | 4096 | 93.75 ± 0.86 | 95.55 ± 0.89 | Without fusion (GoogLeNet(RGB)) | 1024 | 93.77 ± 0.79 | 95.32 ± 1.92 | Without fusion (GoogLeNet(saliency)) | 1024 | 91.22 ± 0.78 | 94.10 ± 1.19 | Fusion strategy 1 (CaffeNet) | 8192 | 96.78 ± 1.02 | 98.00 ± 0.59 | Fusion strategy 2 (CaffeNet) | 4096 | 97.74 ± 0.98 | 98.92 ± 0.52 | Fusion strategy 1 (VGG-Net-16) | 8192 | 97.28 ± 0.62 | 97.81 ± 0.87 | Fusion strategy 2 (VGG-Net-16) | 4096 | 98.23 ± 0.56 | 98.79 ± 0.99 | Fusion strategy 1 (GoogLeNet) | 2048 | 94.78 ± 0.77 | 96.34 ± 1.09 | Fusion strategy 2 (GoogLeNet) | 1024 | 95.72 ± 0.87 | 97.29 ± 1.20 |
|
|