lucene - Elasticsearch: shingles with stop words elimination -

January 15, 2013

i trying implement elasticsearch mapping optimize phrase search in large body of text. per suggestions in this article, using shingle filter build multiple unigrams per phrase.

two questions:

in article mentioned, stopwords filtered , shingles take care of missing spaces inserting "_" tokens. these tokens should eliminated unigram indexed engine. point of elimination able respond phrase queries contain sorts of "useless" words. standard solution (as mentioned in article), no longer possible, given lucene deprecating feature (enable_position_increments) needed kind of behaviour. how solve kind of issue?
given elimination of punctuation, routinely see unigrams resulting shingling process cover both phrases. point of view of search, result contains words 2 separate phrases not correct. how avoid (or mitigate) kind of issues?

Search This Blog

Silver

lucene - Elasticsearch: shingles with stop words elimination -

Comments

Post a Comment

Popular posts from this blog

user interface - How to replace the Python logo in a Tkinter-based Python GUI app? -

netbeans - Remove indent guide lines -

android - Get AccessToken using signpost OAuth without opening a browser (Two legged Oauth) -