Research Article

An Efficient Minimal Text Segmentation Method for URL Domain Names

Table 4

Experimental result.

Case 1

Raw data[stdl.qq.com/stdl/qbfilepush/tushu/qqbrowser/cloudctrl/production/15420175063859.txt/leaves.imtt.x2.sched.dcl oudstc.com]F1-scoreFL-score
EMTS[qq, push, file, book, browser, qq, cloud, production, txt, leaf, com]0.7140.701
Jieba[stdl, qq, com, stdl, qbfilepush, qqbrowser, cloud, ctrl, pro-duction, 15420175063859, txt, leaves, imtt, x2, sched, d- cloudstc, com]0.4240.482
Forward maximum[std, l, qq, com, std, l, q, b, file, push, tush, u, qq, browser, cloud, ct, r, l, production, 15, 42, 0, 175, 0, 6, 38, 59, txt, leave, s, i, mt, t, x, 2, sc, he, d, dc, loud, s, tc, com]0.3870.326
Reverse maximum[s, t, dl, qq, com, s, t, dl, q, b, file, push, tush, u, qq, browser, cloud, ct, r, l, production, 1, 54, 20, 17, 50, 6, 38, 59, txt, l, eaves, i, m, tt, x, 2, sc, h, ed, d, cl, o, u, ds, tc, com]0.3530.296
Porter Stemmer[stdl, qq, com, stdl, qbfilepush, tushu, qqbrowser, cloudc-trl, product, 15420175063859, txt, leav, imtt, x2, sched, dcloudstc, com]0.2580.332
Lancaster Stemmer[stdl, qq, com, stdl, qbfilepush, tushu, qqbrowser, cloud-ctrl, produc, 15420175063859, txt, leav, imtt, x2, sched, dcloudstc, com]0.2100.266
Lemmatizer[stdl, qq, com, stdl, qbfilepush, tushu, qqbrowser, cloudctrl, production, 15420175063859, txt, leaf, imtt, x2, sched, dcloudstc, com]0.3230.401

Case 2

Raw data[pcs-sdk-server.alibaba.com/l?umid = &csid = c01549d12 f3c447b18bb557a7766f0bb&acnt = &hosttype = 1&a mp; log = RegisterHostPath:[C:\ProgramFiles(x86)\AliWang Wang\ZhiFu]/na61-na62.wagbridge.alibaba.ali]F1-scoreFL-score
EMTS[server, alibaba, com, mid, type, host, log, path, host, register, program, file, aliwangwang, payment, bridge, wag, Alibaba]0.7290.726
Jieba[pcs, dk, server, alibaba, com, l, umid, amp, csid, c01549d12f3c447b18bb557a7766f0bb, amp, acnt, amp, host, type, 1, amp, log, register, host, path, C, program, file, x86, ali, wangwang, Zhifu, na61, na62, wagbridge, alibaba, ali]0.5650.439
Forward maximum[pc, s, sd, k, server, alibaba, com, l, u, mid, amp, cs, id, c, 0, 15, 49, d, 12, f, 3, c, 44, 7, b, 18, bb, 55, 7, a, 77, 66, f, 0, bb, amp, ac, nt, amp, host, type, 1, amp, log, egis, te, r, os, t, at, h, ro, gram, il, es, x, 86, li, ang, ang, hi, u, na, 61, na, 62, wag, bridge, alibaba, ali]0.2750.229
Reverse maximum[p, cs, sd, k, server, alibaba, com, l, u, mid, amp, cs, id, c, 0, 15, 49, d, 12, f, 3, c, 4, 47, b, 18, bb, 5, 57, a, 77, 66, f, 0, bb, amp, ac, nt, amp, host, type, 1, amp, log, e, gist, er, os, t, a, th, ro, gram, il, es, x, 86, li, ang, ang, hi, u, na, 61, na, 62, wag, bridge, alibaba, ali]0.2650.223
Porter Stemmer[pc, sdk, server, alibaba, com, l, umid, amp, csid, c01549d12f3c447b18bb557a7766f0bb, amp, acnt, amp, hosttyp, 1, amp, log, registerhostpath, C, program, file, x86, aliwangwang, zhifu, na61, na62, wagbridge, alibaba, ali]0.2920.370
Lancaster Stemmer[pcs, sdk, serv, alibab, com, l, umid, amp, csid, c01549d12f3c447b18bb557a7766f0bb, amp, acnt, amp, hosttyp, 1, amp, log, registerhostpa, c, program, fil, x86, aliwangwang, zhifu, na61, na62, wagbridge, alibab, al]0.1820.217
Lemmatizer[pc, sdk, server, alibaba, com, l, umid, amp, csid, c01549d12f3c447b18bb557a7766f0bb, amp, acnt, amp, hosttype, 1, amp, log, RegisterHostPath, C, Program, Files, x86, AliWangWang, ZhiFu, na61, na62, wagbridge, alibaba, ali]0.2450.318

Case 3

Raw data[info.pinyin.sogou.com/aserver push/html/tupian file/6013 86 20190628151003.html]F1-scoreFL-score
EMTS[pinyin, sogou, com, server, push, image, file, html]0.8690.823
Jieba[info, pinyin, sogou, com, aserver, push, html, tupian, file, 601386, 20190628151003, html]0.6530.605
Forward maximum[info, pinyin, sogou, com, as, er, v, er, push, html, tupi, an, file, 60, 13, 86, 20, 190, 62, 81, 51, 0, 0, 3, html]0.5350.499
Reverse maximum[info, pinyin, sogou, com, a, server, push, html, tupi, an, file, 60, 13, 86, 20, 190, 6, 28, 15, 100, 3, html]0.6430.626
Porter Stemmer[info, pinyin, sogou, com, aserv, push, html, tupian, file, 601386, 20190628151003, html]0.6530.605
Lancaster Stemmer[info, pinyin, sogou, com, aserv, push, html, tup, fil, 601386, 20190628151003, html]0.5700.500
Lemmatizer[info, pinyin, sogou, com, aserver, push, html, tupian, file, 601386, 20190628151003, html]0.6530.605