Research Article

A Two-Stream Deep Fusion Framework for High-Resolution Aerial Scene Classification

Table 3

Classification performance of the proposed method on the WHU-RS dataset using different feature extractors and fusion strategies.

Different architecturesFeature sizeTraining ratios
40%60%

Without fusion (CaffeNet(RGB))409695.79 ± 1.3796.87 ± 0.66
Without fusion (CaffeNet(saliency))409693.21 ± 1.5595.86 ± 0.50
Without fusion (VGG-Net-16(RGB))409696.09 ± 0.5696.64 ± 1.08
Without fusion (VGG-Net-16(saliency))409693.75 ± 0.8695.55 ± 0.89
Without fusion (GoogLeNet(RGB))102493.77 ± 0.7995.32 ± 1.92
Without fusion (GoogLeNet(saliency))102491.22 ± 0.7894.10 ± 1.19
Fusion strategy 1 (CaffeNet)819296.78 ± 1.0298.00 ± 0.59
Fusion strategy 2 (CaffeNet)409697.74 ± 0.9898.92 ± 0.52
Fusion strategy 1 (VGG-Net-16)819297.28 ± 0.6297.81 ± 0.87
Fusion strategy 2 (VGG-Net-16)409698.23 ± 0.5698.79 ± 0.99
Fusion strategy 1 (GoogLeNet)204894.78 ± 0.7796.34 ± 1.09
Fusion strategy 2 (GoogLeNet)102495.72 ± 0.8797.29 ± 1.20