Research Article

Automatic Extraction of Web Page Text Information Based on Network Topology Coincidence Degree

Table 3

Initial parameter settings of web site extraction.

ParameterNumerical valueParameterNumerical value

Number of download threads15Detection depth limit4
Number of extracted information20Number of web addresses not detected5000
Number of web addresses detected211Data set200 M