Get list of matching lexemes from PostgreSQL full text search? -
the full text search ranking documentation suggests that
you can write own ranking functions and/or combine results additional factors fit specific needs.
but haven't been able find examples of how custom ranking functions can built.
specifically, haven't been able figure out how extract list of lexemes in tsvector match given tsquery… this:
> select ts_matching_lexemes('cat in hat'::tsvector, 'cat'::tsquery); ts_matching_lexems ------------------ 'cat':1
so, how can figure out lexemes in tsvector match given tsquery?
it looks ts_headline
function internally, it's deep in c source , outputs string. can, however, use prepare input string parsing result (this relatively slow compared c functions):
code:
create or replace function ts_matching_lexemes(tsv tsvector, tsq tsquery) returns tsvector $$ proc ( select ts_headline(tsv::text, tsq, 'startsel = <;>, stopsel = <;>') tsh ) , parts ( select unnest(regexp_split_to_array(tsh, '<;>')) p proc ) , parts_enum ( select p, lead(p, 1) on (), row_number() on () parts ) select (string_agg(p || substring(split_part(lead, ' ', 1) 2), ' '))::tsvector parts_enum row_number % 2 = 0 $$ language sql;
e.g.:
select ts_matching_lexemes(to_tsvector('cat in hat'), to_tsquery('cat')) union select ts_matching_lexemes(to_tsvector('cats , bikes in hat'), to_tsquery('cat & bike')) ts_matching_lexemes tsvector ------------------- 'cat':1 'bike':3 'cat':1
notes:
- passing text representation of
tsvector
ts_headline
reduce redundant work - it's approximately 10x slower
ts_headline(text, to_tsquery(...))
, can sped removing ctes - of course, faster solution add functionality directly in c source. should fast
tsvector @@ tsquery
Comments
Post a Comment