Research Article

An Assessment of Lexical, Network, and Content-Based Features for Detecting Malicious URLs Using Machine Learning and Deep Learning Models

Table 2

Final features in the dataset after features extraction.

#AttributeDescription

1url_lenLength of the URL of the website
2geo_locThe geographical location where the website is being hosted
3tldTop-Level domain of the website
4who_isWho is domain information is complete or no
5httpsWebsite is HTTPS protocol using
6js_lenLength of JavaScript code present on the website
7js_obf_lenLength of the obfuscated JavaScript code present on the website
8count_linkCount appearance of JavaScript link() function in content
9count_evalCount appearance of JavaScript eval() function in content
10count_execCount appearance of JavaScript exec() function in content
11count_unescapeCount appearance of JavaScript unescape() function in content
12count_searchCount appearance of JavaScript search() function in content
13count_findCount appearance of JavaScript find() function in content
14count_escapeCount appearance of JavaScript escape() function in content
15count_all_functionsCount of all the above 7 suspicious functions in content
16Presence_iframeThe presence of the iFrame tag is checked in content
17count_/Count “/” symbols in URL
18count_dotCount “.” symbols in URL
19Count_&Count “&” symbols in URL
20Count_@Count “@” symbols in URL
21Count_-Count “−” symbols in URL
22count_=Count “=” symbols in URL
23Count_?Count “?” symbols in URL
24Count_;Count “;” symbols in URL
25count_digitCount total digits in URL
26count_letterCount total alphabetical letters in URL
27presence_ebayisapiCheck presence in URL
28presence_getImageCheck presence in URL
29presence_jpgCheck presence in URL
30presence_logCheck presence in URL
31count_path_dotsCount dots in URL path
32path_lengthLength of the URL path
33count_path_slashCount backslashes in URL path
34host_lengthLength of the hostname in URL
35host_Precense_of_digitCheck digits in the hostname
36count_symbolsCount all symbols in the URL
37presence_obfuscated_codeCheck the presence of obfuscated JavaScript code
38presence_Window.open()The presence of Window.open() function is checked in content
39lines_countThe number of lines of the content
40LabelLabel for indicating if the website is malicious or benign