Research Article

A Large Visual, Qualitative, and Quantitative Dataset for Web Intelligence Applications

Table 1

Main characteristics of the datasets reviewed.

Owner and yearSizeTopicData typePurpose

De Boer et al. 2011Small: 60 screenshotsNews, hotels, conferences, and celebritiesImages databaseAesthetics and thematic classification with machine learning
Reinecke et al. 2014Small: 430 screenshotsGenericImages databaseAesthetics classification
López et al. 2017Small: 280 web pagesFood, animals, fashion, nature, home, and vehiclesURL and images extracted from HTMLThematic classification with machine learning
López et al. 2019Small: 365 web pagesFood, vehicles, animals, fashion, home design, and landscapeURL and images extracted from HTMLThematic classification with machine learning
CIRCL, 2019Small: 460 screenshotsPhishingImages databaseAnalysis of security events

ImageNet, 2009Large: 1840 screenshotsGenericImages databaseResource for image and vision research field
Nordhoff et al. 2018Large: 80901 screenshotsGenericURL, metrics and imagesAesthetics and Web design
CIRCL, 2019Large: 37500 screenshotsOnion Website (hidden Web, no indexed)Images databaseAnalysis of security events
University of Alicante, 2019Large: 8950 labeled screenshotsGood and bad designLabeled images datasetAesthetics Web categorization