Machine Learning to Assess Relatedness: The Advantage of Using Firm-Level Data

<div>Performance of the random forest by varying the hyperparameters and the partitions that define the feature vector given as an input for each product. The results are optimized by tuning of <i>max depth</i> (circles) or <i>min sample leaf</i> (triangles). We also write the adopted partition and the color of the points represent the resulting number of blocks. On the horizontal axis, we report training time and on the vertical axis a performance indicator. Using more data (larger blocks for each product) provides better performance but takes longer time for training; smaller blocks lead to faster but less precise results. In any case, as evinced by the zoom given by the red lines on the plot on the left, random forest always outperforms the other models.</div>

Complexity

fig4

Figure 4

Figure 4: Machine Learning to Assess Relatedness: The Advantage of Using Firm-Level Data