Journal of Ophthalmology

Review Article

Accuracy of Deep Learning Algorithms for the Diagnosis of Retinopathy of Prematurity by Fundus Images: A Systematic Review and Meta-Analysis

Table 1

Characteristics of nine studies for the systematic review and meta-analysis.

General characteristics

Dataset characteristics

Definition and grade of ROP

Author

Year, data source

Camera

Reference standard

Dataset

Identification and grade

Brown et al. [24]

2018, i-ROP

RetCam

RSD, images and clinical diagnosis

5511 images

4535 N, 805 pre and 172 plus

Wang et al. [25]

2018, hospital and web

RetCam 3

ICROP, CRYO-ROP, and ETROP

3722 cases

2823 N and 899 ROP; 382 Min and 295 S

Hu et al. [26]

2019, hospital

RetCam 3

Consistent label

2668 images

1484 N and 1184 ROP; 382 Mil and 295 S

Tan et al. [27]

2019, ART-ROP

RetCam

Images and clinical diagnosis

6974 images

5336 N and 1638 plus

Wang et al. [28]

2019, hospital

Consistent label

11000 images

7559 N and 3441 ROP; 529 Mil and 1204 S

Zhang et al. [29]

2019, hospital

RetCam 2/3

The same criteria

19543 images

11298 N and 8245 ROP

Huang et al. [30]

2020, hospital

RetCam

ICROP + consistent label

18808 images

1222 N and 1129 ROP; 1189 Mil and 1174 S

Ramachandran et al. [31]

2021, KIDROP

RetCam 3

Consistent label

289 infants

200 N and 89 plus

Wang et al. [32]

2021, hospital

RetCam 2/3

Consistent label

52249 images

6363 any stage and 42177 N; 885 pre or plus and 17223 N

DL model characteristics

Author

Neural network

Algorithm evaluation

Classification

Brown et al. [24]

CNN: U-Net and Inception V1

The 5-fold cross-validation

N/pre and plus

Plus/N and pre

Wang et al. [25]

DNN: Id-Net and Gr-Net

N/ROP

Min/S

Hu et al. [26]

CNN: a pretrained ImageNet (VGG16, inception V2, and ResNet-50)

Select the best module and image size

N/ROP

Mil/S

Tan et al. [27]

CNN: Inception V3

N/plus

Wang et al. [28]

CNN: a pretrained ImageNet (Inception V2, Inception V3, and ResNet-50)

Select the best module

N/ROP

Mil/S

Zhang et al. [29]

DNN: AlexNet, VGG16, and GoogLeNet

Select the best module

N/ROP

Huang et al. [30]

DNN: VGG16, VGG19, MobileNet, InceptionV3, and DenseNet

Select the best module and then 5-fold cross-validation

N/ROP

Mil/S

Ramachandran et al. [31]

CNN: a pretrained ImageNet (Darknet-53 network)

Select the best module

N/plus

Wang et al. [32]

CNN: ResNet18, DenseNet121, and EfficientNetB2

Five independent classifiers validation

Preplus plus/non

Any stage/non

Accuracy values

Author

Negative vs. positive

ACC

AUC

TED

ACC

AUC

Brown et al. [24]

N vs. pre and plus

80%

20%

0.94

100 (from the same set with TD)

0.91

0.93

0.94

N and pre vs. plus

80%

20%

0.98

0.94

Wang et al. [25]

N vs. ROP

2226

298

0.9664

0.9933

0.9949

944 (from web)

0.8491

0.9690

Min vs. S

2004

104

0.8846

0.9231

0.9508

106 (from web)

0.933

0.736

Hu et al. [26]

N vs. ROP

2068

300

0.97

0.96

0.98

0.9922

406 (from the same set with TD)

0.900

0.989

Mil vs. S

466

100

0.84

0.82

0.86

0.9212

31 (from ROP in TED)

0.944

0.923

Tan et al. [27]

N vs. plus

5579

1395

0.973

0.966

0.98

0.993

90 (external set)

0.856

0.939

0.807

Wang et al. [28]

N vs. ROP

8507

1228

0.927

0.8999

1265 (from TD)

Mil vs. S

1175

269

0.785

0.9235

289 (from ROP in TED)

Zhang et al. [29]

N vs. ROP

17801

1742

0.988

0.935

0.995

0.998

1742 (from the same set with TD)

0.988

0.935

0.995

0.998

Huang et al. [30]

N vs. ROP

2351

368 cases

Average 0.911

Average 0.992

101 (from the same set with TD)

0.96

0.966

0.952

0.97

Mil vs. S

2363

339 cases

Average 0.987

Average 0.985

85 (from ROP in TED)

0.988

0.984

0.99

Ramachandran et al. [31]

N vs. plus

About 80%

About 20%

0.99

0.98

0.9947

1610 (from the same set with TD)

0.98

Wang et al. [32]

Non vs. any stage

36235

4813

0.972

0.984

0.9977

7492 (from the same set with TD)

0.982

0.985

0.9981

Non vs. preplus and plus

13524

1866

0.909

0.984

0.9882

2718 (from the same set with TD)

0.918

0.97

0.9827

ROP, retinopathy of prematurity. Reference Standard. Based on images: RSD, a reference standard diagnosis; ICROP, International Classification of ROP, and based on both images and clinical information: CRYO-ROP, Cryotherapy for Retinopathy of Prematurity; ETROP, early treatment ROP; N, normal, pre, preplus disease; plus, plus disease; Min, minor; Mil, mild; S, severe; i-ROP, Imaging and Informatics in Retinopathy of Prematurity; ART-ROP, Auckland Regional Telemedicine ROP image library; KIDROP, Karnataka Internet assisted diagnosis of ROP program; DL, deep learning; CNN, convolutional neural network; DNN, deep neural network; DCNN, deep convolutional neural network; TD, training dataset; VD, validation dataset; TED, test dataset. Total data set includes TD, VD, and TED; ACC, accuracy; SN, sensitivity; SP, specificity; AUC, area under the receiver operating curve; NR, not reported.