2
votes

Task: Implement full text search in multiple tables in PostgreSQL.

For example, a table of projects - projects: There are two fields in which need to search: title and description. Create an index on this table:

ALTER TABLE public.projects ADD COLUMN search_fts tsvector;
UPDATE public.projects SET search_fts =
to_tsvector(
    coalesce(title, '') ||' '||
    coalesce(description, '') || ' '
);
CREATE INDEX in_projects_idx ON public.projects
  USING gin (search_fts);

Now search is simple:

SELECT
    DISTINCT p.id,
    p.title,
    ts_rank(
        p.search_fts,
        to_tsquery('word1 | word2')
    ) as rank
FROM
    projects p
WHERE
    p.search_fts @@ to_tsquery('word1 | word2')
ORDER BY rank DESC;

Sophisticate. Now we need to look at the table and project categories - project_categories. Similar to create tsvector and for the table on the field name. Search query now is:

SELECT
    DISTINCT p.id,
    p.title,
    category.name as categoryName,
    ts_rank(
        (
            coalesce(p.search_fts, '') ||' '||
            coalesce(category.search_fts, '')
        ),
        to_tsquery('word1 | word2 | categoryName')
    ) as rank
FROM
    projects p
LEFT JOIN project_categories category
    ON p.category_id = category.category_id
WHERE
    (
        coalesce(p.search_fts, '') ||' '||
        coalesce(category.search_fts, '')
    ) @@ to_tsquery('word1 | word2 | categoryName')
ORDER BY rank DESC;

More sophisticate. Now the search has to work on dozens of tables related to projects as a one-to-many and many-to-many.

Number join's growing. Combine to a growing number of fields.

Actually the question the right way I'm going? How would you solve this problem?

I also think about the version with view. For example, built on such a request:

(
  SELECT 
      p.id as project_id,
      p.search_fts
  FROM projects p
) UNION ALL (
  SELECT 
      p.id as project_id,
      category.search_fts
  FROM projects p
  JOIN project_categories category
      ON p.category_id = category.category_id
)

Finally, we obtain the overall index. By which to search. But in the view have to combine a dozen requests for various communication options. Interested in your views on this option.

Sorry for my english.

1

1 Answers

3
votes

You have a couple of options:

  • Use triggers on all the input tables to maintain a summary table that combines all the records of interest in text rows. This works OK for simple-ish cases like a customer that has 1 or more phone and one or more address entries, which you can put in the summary search table as |customername|customerid|phone1|phone2|phone3|...|address1|address2|... using simple concatenation and string_agg over joined tables. It scales poorly to deeply nested relationships, though, and can impose some locking and concurrency problems.

  • Use an external search system like Apache Solr that's designed to solve complex search problems.