Research Article

Machine Learning to Assess Relatedness: The Advantage of Using Firm-Level Data

Table 2

Prediction performance of firm-based models. Our problem is characterized by a huge number of true negatives, and this precludes the use of metrics such as accuracy and ROC-AUC. In any case, the random forest outperforms all other models.

 RCARandom forestProduct spaceTaxonomy network

Best F10.0500.1660.1100.126
AUC-PR0.0060.0830.0460.056
ROC-AUC0.5190.9360.9190.928
Precision@10000.1560.3990.2070.265
mP@50.0470.2640.2370.235
MCC0.0520.1690.1130.130
TP2703155841148212706
FP33387100292126310116773
FN69037561566025859034
TN38952640388857353885971738869254