Research Article

Random Forests in Count Data Modelling: An Analysis of the Influence of Data Features and Overdispersion on Regression Performance

Table 3

Effect of overdispersion on the RF optimal size of the sample to draw.

Data typesVariance-to-mean relationshipsample sizeN = 50 (%)N = 250 (%)N = 1250 (%)

CategoricalLinear0.55564656304931637159273739898974343943
0.63218301824172419122632252710922272525
0.713111421162311126221621124232116
0.8131312251822759192213000161516
Quadratic0.55576257403637586862413837797086463643
0.632211321212530281923271825192610193226
0.71212101922139812172220140231517
0.8101312201720553152218104121714

25% of predictors are quantitativeLinear0.551000000010001001000100100010001000
0.63200100100000000000100010000
0.7000000010000100000000100
0.8010000100100000000000000
Quadratic0.55100010000000100100010010000100100100
0.63200010000100000100000100000
0.701000010010001000000000000
0.800000000000001000000

50% of predictors are quantitativeLinear0.55100100100100010001001000001001000000
0.63200000000001000001001000100
0.7000000100001000100000000
0.80000100000000000001000
Quadratic0.55010000001001000000100100000100
0.6321000100010000001000000100000
0.70000000010000000001000
0.80001000100000010010000010000

75% of predictors are quantitativeLinear0.55000010000100100000100100010000
0.632100100010000000100010000100000
0.70010000100000000000100100
0.80000001000001000000000
Quadratic0.55001000001000000100100010000100
0.63201000100010001000100000001001000
0.710000010000000100001000000
0.800000000100000000000

QuantitativeLinear0.55100010010000100100100100010001001000100100
0.6320100000000001000000000
0.70000100000000000010000
0.80000010000000010000000
Quadratic0.550010000001000100001001001000100100
0.6320100000010000000000000
0.7000100010000100000000000
0.81000001000000010010000010000