Lucene's WordnetSynonymParser -
i trying use lucene's wordnetsynonymparser class create synonym filter, i'm not sure of prolog files i'm meant passing parse() function.
the documentation says:
see http://wordnet.princeton.edu/man/prologdb.5wn.html description of format.
so i've downloaded prolog files, i'm not sure ones should passing in, , how go it.
could please point me in right direction?
thanks help
edit:
thanks femtorgon pointing me in direction of wn_s.pl. have got following code:
analyzer tempanalyzer = new simpleanalyzer(version.lucene_40); wordnetsynonymparser synparser = new wordnetsynonymparser(true, true, tempanalyzer); filereader doctoread = new filereader("wn_s.pl"); synparser.parse(doctoread); synonymmap synmap = synparser.build(); analyzer analyzer = new analyzer() { @override protected tokenstreamcomponents createcomponents(string fieldname, reader reader) { englishanalyzer enganalyzer = new englishanalyzer(version.lucene_40); chararrayset engstopset = enganalyzer.getdefaultstopset(); tokenizer source = new standardtokenizer(version.lucene_40, reader); tokenstream filter = new synonymfilter(source, synmap, true); filter = new standardfilter(version.lucene_40, filter); filter = new lowercasefilter(version.lucene_40, filter); filter = new stopfilter(version.lucene_40, filter, engstopset); /*tokenstream filter = new standardfilter(version.lucene_40, source); filter = new lowercasefilter(version.lucene_40, filter); filter = new stopfilter(version.lucene_40, filter, engstopset);*/ return new tokenstreamcomponents(source, filter); } };
which plan on passing indexwriterconfig, following compile error:
indexfilesdb.java:133: cannot find symbol symbol : method parse(java.io.filereader) location: class org.apache.lucene.analysis.synonym.wordnetsynonymparser synparser.parse(doctoread);
i still don't understand wordnetsynonymparser, error class or simple error file not being passes in correctly?
thanks help.
wn_s.pl
contains synset pointers (that is, defines groups of synonyms), need synonym filter, knowledge. i'd start that.
Comments
Post a Comment