java - Labeled Latent Dirichlet Allocation input values -

July 15, 2010

i doing tag prediction , keyword extraction on stackexchange posts. have ~36,000 posts consisting of title, body , tags. processes them filtering out noisy elements. after perform labeled latent dirichlet allocation (llda) obtained here.

when looking @ output, majority of first half of topic-keyword assignment pretty good, example:

topic 0: hardware  hardware 0.01417490938078998  apple  0.007714736647543383  macbook    0.004179344296774437  mac    0.003794235182959134  topic 1: mac  mac    0.09533364420104305  os 0.02075003721054881  mini   0.00682593613383348  macs   0.00435445224274711  topic 2: powerpc  powerpc    0.010548590021130589  ppc    0.007893573342376935  mac    0.0039821054483700795  ibook  0.003731934198917873  os 0.003471650527888505

however, more come close end of output file, topic-keyword assignments weird:

topic 976: shopping-recommendation difference  7.5409094336777e-5 intel   7.5409094336777e-5 ppc 7.5409094336777e-5 turn    7.5409094336777e-5  topic 977: pci-card difference  7.5409094336777e-5 intel   7.5409094336777e-5 ppc 7.5409094336777e-5 turn    7.5409094336777e-5  topic 978: tmux difference  7.5409094336777e-5 intel   7.5409094336777e-5 ppc 7.5409094336777e-5 turn    7.5409094336777e-5  topic 979: difference  7.5409094336777e-5 intel   7.5409094336777e-5 ppc 7.5409094336777e-5 turn    7.5409094336777e-5

can please explain why such wrong assignments in end? , also, why values extremely low?

as said before have ~36,000 posts, these values perform llda:

option.est = true; option.alpha = 50/920 // 920 number of topics option.beta = 0.1; option.niters = 3000; option.twords = 15; option.nburnin = 350; option.samplinglag = 256;

i found little no documentation previous values, trial , error found these fit best of have managed get. however, maybe better understanding can explain me and/or suggest values best?

Search This Blog

Silver

java - Labeled Latent Dirichlet Allocation input values -

Comments

Post a Comment

Popular posts from this blog

user interface - How to replace the Python logo in a Tkinter-based Python GUI app? -

android - Get AccessToken using signpost OAuth without opening a browser (Two legged Oauth) -

org.mockito.exceptions.misusing.InvalidUseOfMatchersException: mockito -