2
votes

I have been successfully able to concat arrays of single words into strings for to_tsquery but phraseto_tsquery in postgres 9.6 only allows one keyword phrase. Does anyone know of a solution to query a tsvector (whether in Sql or full-text-search features) in such a way where I can (OR/AND) a dynamic amount of phrases into a query. The select blocks are all arrays of text.

First tries:

SELECT to_tsvector('english','Try not to become a man of successful companies, but rather try to become a man of value')
   @@ (to_tsquery('english','man & become')
       && phraseto_tsquery('english','man of value')
       && phraseto_tsquery('english','company')
       || phraseto_tsquery('english', 'company | man of value')
   );

Example of the real world problem searching for animals:

-- with statements here of opp_tsv and tp
SELECT
  tp.id,
  tp.keywords, --['giraffes','lions', 'monkeys']
  tp.phrase_keywords, --['pygmy marmocet','African Lion']
  tp.neg_keywords, --['aliens', 'spaceships', 'space']
  tp.neg_phrase_keywords --['Andromedan Alien', 'Nibiru Reptilian']
FROM tp, opp_tsv,
  -- string logic for ts_query
      concat(array_to_string(tp.keywords, ' | ')) AS kws_concat,
      concat(array_to_string(tp.neg_keywords, ' | ')) AS     neg_kws_concat,
      to_tsquery('english', kws_concat) query,
      to_tsquery('english', concat(neg_kws_concat)) neg_query
  -- Case logic for phrase queries

  -- .... -> phrase_query,
      phraseto_tsquery('phrase to search | Need this phrase too')
  -- .... -> phrase_neg_query,

WHERE
  (
    opp_tsv.doc @@ query --pos
    OR
    opp_tsv.doc @@ phrase_query --pos
  )
  AND NOT (
    opp_tsv.doc @@ neg_query --neg
    OR
    opp_tsv.doc @@ phrase_neg_query --neg
  )
ORDER BY rank_cd DESC;

Thoughts: generate dynamically according to array length

opp_tsv.doc @@ (phrase_query || phrase_query2)

or achieve this somehow

opp_tsv.doc @@ phraseto_tsquery('big messy phrase | more messy wordphrases')

EDIT: SELECT phraseto_tsquery('phrase to search | Need this phrase too') result = 'phrase' <-> 'to' <-> 'search' <-> 'need' <-> 'this' <-> 'phrase' <-> 'too' What I am looking for is the result of 'phrase<->to<->search' | 'need<->this<->phrase<->too'

1
You could either a) CREATE AGGRETAGE functions around the tsquery operators && and ||, or b) just simply use the string representation of outputs of these phraseto_tsquery calls, using correct parentheses, e.g. concat('(', array_to_string(tp.phrase_keywords, ') | ('), ')') ... - pozs
... but I'm not sure you understood every aspect of the phraseto_tsquery function: Like plainto_tsquery, the phraseto_tsquery function will not recognize tsquery operators, weight labels, or prefix-match labels in its input, so phraseto_tsquery('phrase to search | Need this phrase too') is equivalent to phraseto_tsquery('phrase to search need this phrase too') - pozs
Sidenote: you could have a single predicate in your WHERE clause: opp_tsv.doc @@ ((query || phrase_query) && !!(neg_query || phrase_neg_query)) -- And, if you extract the construction of this tsquery into a CTE, you'll more likely use an index for that search. - pozs
I've tried this => phraseto_tsquery('phrase to search | Need this phrase too') but this treats the entire input as one phrase. phraseto_query can only take 1 phrase. Thanks for the side note! That can significantly clean up my query. But I still need a programmatic way to add multiple phraseto_query in my query depending on the amount of phrases. @pozs - sizzle

1 Answers

2
votes

You can define your own aggregate over tsquery's or (||) operator:

CREATE AGGREGATE tsquery_or_agg(tsquery) (
  SFUNC = tsquery_or,
  STYPE = tsquery
);

Note: the aggregate above relies on the fact that tsquery's || operator is backed by the tsquery_or(tsquery, tsquery) function. You can check that with:

SELECT *
FROM   pg_operator
WHERE  oprname  = '||'
AND    oprleft  = regtype 'tsquery'
AND    oprright = regtype 'tsquery';

If you don't want to rely on this (undocumented) function's name (even if it's unlikely to be changed), you can create your own function to serve as the base function (SFUNC) for your aggregate:

CREATE FUNCTION my_tsquery_or(tsquery, tsquery)
  RETURNS tsquery
  LANGUAGE sql
  IMMUTABLE
  STRICT
  AS 'SELECT $1 || $2';

After that, your query will be something like:

WITH tp(id, keywords, phrase_keywords, neg_keywords, neg_phrase_keywords ) AS (
  VALUES (42, ARRAY['giraffes', 'lions', 'monkeys']::text[],
              ARRAY['pygmy marmocet', 'African Lion']::text[],
              ARRAY['aliens', 'spaceships', 'space']::text[],
              ARRAY['Andromedan Alien', 'Nibiru Reptilian']::text[])
),
tq(id, query) AS (
  SELECT   tp.id,
           (((SELECT tsquery_or_agg(plainto_tsquery(kw)) FROM unnest(keywords) kw) ||
             (SELECT tsquery_or_agg(phraseto_tsquery(pk)) FROM unnest(phrase_keywords) pk)) &&
             !!((SELECT tsquery_or_agg(plainto_tsquery(nk)) FROM unnest(neg_keywords) nk) ||
                (SELECT tsquery_or_agg(phraseto_tsquery(np)) FROM unnest(neg_phrase_keywords) np)))
  FROM     tp
),
opp_tsv(doc) AS (
  VALUES (to_tsvector('Earth''s African Lions')),
         (to_tsvector('Andromedan Alien''s space monkeys'))
)
SELECT   tp.id,
         tp.keywords,
         tp.phrase_keywords,
         tp.neg_keywords,
         tp.neg_phrase_keywords,
         opp_tsv.doc
FROM     opp_tsv, tp
JOIN     tq USING (id)
WHERE    opp_tsv.doc @@ tq.query
ORDER BY ts_rank_cd(opp_tsv.doc, tq.query) DESC;

Also, if fields in tp can contain phrases like 'big messy phrase | more messy wordphrases', then you didn't split your input properly in the first place. You can split such phrases/keywords with the regexp_split_to_table() function. With that, the tq CTE should look something like:

tq(id, query) AS (
  SELECT   tp.id,
           (((SELECT tsquery_or_agg(plainto_tsquery(kw)) FROM unnest(keywords) kwb, regexp_split_to_table(kwb, '\|') kw) ||
             (SELECT tsquery_or_agg(phraseto_tsquery(pk)) FROM unnest(phrase_keywords) pkb, regexp_split_to_table(pkb, '\|') pk)) &&
             !!((SELECT tsquery_or_agg(plainto_tsquery(nk)) FROM unnest(neg_keywords) nkb, regexp_split_to_table(nkb, '\|') nk) ||
                (SELECT tsquery_or_agg(phraseto_tsquery(np)) FROM unnest(neg_phrase_keywords) npb, regexp_split_to_table(npb, '\|') np)))
  FROM     tp
),