java - Solr: Cannot correctly tokenize query terms -
i have following analyzer setup both query , index particular field type. should split terms such "java/cpp" => "java" "cpp" , i.e. 2 tokens due patterntokenizer have defined in schema.xml. applied correctly @ index time not @ query time. testing analyzer gui tester in solr, seems work correctly. here's start of analyzer chain:
<analyzer type="query"> <!-- collapse hyphens around alpha words --> <charfilter class="solr.patternreplacecharfilterfactory" pattern="([a-za-z])(-+)([a-za-z])" replacement="$1$3"/> <charfilter class="solr.patternreplacecharfilterfactory" pattern="([0-9])(,)([0-9])" replacement="$1$3"/> <!-- else split on hyphens (numbers, etc), plus other splitable chars --> <tokenizer class="solr.patterntokenizerfactory" pattern="([\\*/\),\(\-]|\s)+" /> ........... </analyzer> if send in query like:
where q (query) parameter set to: java/cpp (no quotes), , query field (qf) pointing field of fieldtype (title), debugquery option shows following in response:
<lst name="debug"> <str name="rawquerystring">java/cpp</str> <str name="querystring">java/cpp</str> <str name="parsedquery">(+disjunctionmaxquery((title:"java cpp")~1.0))/no_coord</str> <str name="parsedquery_tostring">+(title:"java cpp")~1.0</str> so appears send on java/cpp phrase query, despite omission of quotes in query sending in. appears applying other analyzer transforms schema.xml, not seem appropriately tokenizing words in query. should splitting 2 terms due patterntokenizerfactory defined above. doing wrong?
turns out had autogeneratephrasequeries attribute of fieldtype node set true. generates phrases terms split tokenizer. not wanted. setting false fixed issue.
Comments
Post a Comment