lucene - Elasticsearch: shingles with stop words elimination -
i trying implement elasticsearch mapping optimize phrase search in large body of text. per suggestions in this article, using shingle filter build multiple unigrams per phrase.
two questions:
in article mentioned, stopwords filtered , shingles take care of missing spaces inserting "_" tokens. these tokens should eliminated unigram indexed engine. point of elimination able respond phrase queries contain sorts of "useless" words. standard solution (as mentioned in article), no longer possible, given lucene deprecating feature (enable_position_increments) needed kind of behaviour. how solve kind of issue?
given elimination of punctuation, routinely see unigrams resulting shingling process cover both phrases. point of view of search, result contains words 2 separate phrases not correct. how avoid (or mitigate) kind of issues?
Comments
Post a Comment