Elasticsearch: mapping text field for search optimization -
i have implement text search application indexes news articles , allows user search keywords, phrases or dates inside these texts.
after consideration regarding options(solr vs. elasticsearch mainly), ended doing testing elasticsearch.
now part stuck on regards mapping , search query construction options best suited special cases have encountered. current mapping has 1 field contains text , needs analyzed in order searchable.
the specific part of mapping field:
"txt": { "type" : "string", "term_vector" : "with_positions_offsets", "analyzer" : "shingle_analyzer" }
where shingle_analyzer
is:
"analysis" : { "filter" : { "filter_snow": { "type":"snowball", "language":"romanian" }, "shingle":{ "type":"shingle", "max_shingle_size":4, "min_shingle_size":2, "output_unigrams":"true", "filler_token":"" }, "filter_stop":{ "type":"stop", "stopwords":["_romanian_"] } }, "analyzer" : { "shingle_analyzer" : { "type" : "custom", "tokenizer" : "standard", "filter" : ["lowercase","asciifolding", "filter_stop","filter_snow","shingle"] } }}
my question regards following situations:
- i have search "ing" , there several "ing." returned.
- i have search "e!" , analyzer kills punctuation , no results.
- i have search uppercased common terms used company names (like "apple" multiple words) , lowercase filter creates useless results.
the idea have build different fields different filters cover these possible issues.
three questions:
- is splitting field in 3 fields different analyzers correct way?
- how use different fields when searching?
- could explain how scoring work include these fields?
Comments
Post a Comment