Research on Data Analysis and Visualization of Recruitment Positions Based on Text Mining

<table class="algorithm-group"><tr><td><table class="algorithm" id="alg1"><tr><td> </td><td>LDA modeling using the Gensim algorithm</td></tr><tr><td> </td><td>Input: job description text set (<span class="nowrap"><svg height="7.76577pt" id="M1" style="vertical-align:-0.1802897pt" version="1.1" viewbox="-0.0498162 -7.58548 24.4747 7.76577" width="24.4747pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M298 36L289 62C276 55 253 45 228 45C202 45 169 60 169 141V397H276C289 405 292 426 282 437H169V574L155 576L90 509V437H45L17 408L21 397H90V107C90 28 125 -12 188 -12C198 -12 213 -8 230 1L298 36Z"></path></g><g transform="matrix(.013,0,0,-0.013,3.887,0)"><path d="M380 106C343 72 306 56 265 56C195 56 116 112 115 248C235 252 361 262 377 265C396 269 400 277 400 297C400 374 333 449 250 449H249C198 449 144 421 103 376S37 269 37 201C37 88 109 -12 232 -12C263 -12 332 6 395 84L380 106ZM225 412C281 412 315 364 314 312C314 297 308 292 290 292C232 290 176 289 120 289C135 370 180 412 225 412Z"></path></g><g transform="matrix(.013,0,0,-0.013,9.412,0)"><path d="M474 0V26C414 34 401 43 364 100L267 248C300 297 324 332 345 358C381 400 394 405 455 411V437H272V411C316 406 323 401 305 370C287 337 267 306 247 276L188 369C169 397 173 405 215 411V437H16V411C71 404 83 396 114 348L201 212C171 167 144 127 116 92C77 42 66 34 4 26V0H190V26C139 34 136 43 156 77C175 113 198 150 220 183L294 66C311 39 302 31 260 26V0H474Z"></path></g><g transform="matrix(.013,0,0,-0.013,15.548,0)"><path d="M298 36L289 62C276 55 253 45 228 45C202 45 169 60 169 141V397H276C289 405 292 426 282 437H169V574L155 576L90 509V437H45L17 408L21 397H90V107C90 28 125 -12 188 -12C198 -12 213 -8 230 1L298 36Z"></path></g><g transform="matrix(.013,0,0,-0.013,19.513,0)"><path d="M319 325C317 349 306 409 297 431C277 440 250 449 209 449C117 449 57 389 57 319C57 243 122 209 182 182C232 159 261 135 261 91C261 48 227 21 190 21C130 21 85 79 68 145L41 140C41 104 51 36 58 22C75 7 121 -12 172 -12C252 -12 337 35 337 126C337 195 286 231 210 262C166 281 126 304 126 348C126 388 152 417 191 417C240 417 274 378 294 318L319 325Z"></path></g></svg>)</span></td></tr><tr><td> </td><td>Output: topic inference</td></tr><tr><td>(1)</td><td> function Gensim(texts)</td></tr><tr><td>(2)</td><td>  create part of speech table flags, stop word table stop words</td></tr><tr><td>(3)</td><td>  use the Jieba library to segment and filter</td></tr><tr><td>(4)</td><td>  words_ls ← []</td></tr><tr><td>(5)</td><td>  for text in texts:</td></tr><tr><td>(6)</td><td>   words ← remove_top words([w.word for <svg height="6.1673pt" id="M2" style="vertical-align:-0.2063904pt" version="1.1" viewbox="-0.0498162 -5.96091 9.39034 6.1673" width="9.39034pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M689 332C689 394 670 448 646 448C620 448 597 421 597 396C597 386 600 381 608 372C619 359 620 334 620 315C620 150 538 45 454 45C414 45 386 67 386 122C386 138 388 158 394 180L457 426L452 432L377 416L315 156C302 100 259 45 216 45C176 45 148 67 148 122C148 133 152 158 156 180C162 212 173 259 194 332C201 357 206 384 206 405C206 430 198 448 174 448C125 448 66 406 23 342L43 319C84 368 110 383 121 383C126 383 128 382 128 377C128 370 127 359 122 343C99 268 84 204 77 156C74 137 70 111 70 104C70 25 125 -12 180 -12C228 -12 276 12 319 50C338 8 378 -12 418 -12C549 -12 689 166 689 332Z"></path></g></svg> in jp.cut(text)])</td></tr><tr><td>(7)</td><td>   words_ls.append(words)</td></tr><tr><td>(8)</td><td>  end for</td></tr><tr><td>(9)</td><td>  dictionary ← corpora.Dictionary (word_ls)</td></tr><tr><td>(10)</td><td>  corpus ← [dictionary.doc2bow (words) for words in words_ls]</td></tr><tr><td>(11)</td><td>  LDA ← models.ldamodel.LdaModel(corpus= corpus, id2word= dictionary, num_opics= 1)</td></tr><tr><td>(12)</td><td>  show the top 30 words in each topic</td></tr><tr><td>(13)</td><td>  for topic in lda.print_topics (num_words= 30):</td></tr><tr><td>(14)</td><td>   print topic</td></tr><tr><td>(15)</td><td>  end for</td></tr><tr><td>(16)</td><td> end function</td></tr></table></td></tr></table>

Advances in Multimedia

alg1

Algorithm 1

Algorithm 1: Research on Data Analysis and Visualization of Recruitment Positions Based on Text Mining