7
votes

I am building a site in which I want to implement text search for the title and description of some objects. Since I will have little amount of objects (~500 documents) I am not considering Haystack and the such.

I only need 2 features:

  • Be able to prioritize matches on the title over the description (with some kind of weight).
  • Allow partial match of the sentence. For example, if I search for 'ice cream', get also the results for 'ice' and 'cream'.

I have looked into django-watson and django-full-text-search but I am not sure if they allow partial matching. Any ideas?

6
What is the underlying Database?Tisho

6 Answers

3
votes

How many hits by second have your site? Each document, how many data stores?

If we are talking about 500 docs and few hits by minute perhaps django api is enough:

q = None
for word in search_string.split():
   q_aux = Q( title__icontains = word ) | Q( description__icontains = word )
   q = ( q_aux & q ) if bool( q ) else q_aux

result = Document.objects.filter( q ) 

You ever considered this option?

Be careful:

  • This approach don't priorize title over description
  • Only "all words" matches appear in results.
3
votes

As the creator of django-watson, I can confirm that, with some database backends, it allows partial matches. Specifically, on MySQL and PostgreSQL, it allows prefix matching, which is a partial match from the beginning of a word.

Check out this database comparison page on the wiki:

https://github.com/etianen/django-watson/wiki/Database-support

3
votes

Using the new full-text search in django.contrib.postgres as a starting point, one can expand upon SearchQuery to create a version that handles searches for a partial part of the final word:

from psycopg2.extensions import adapt
from django.contrib.postgres.search import SearchQuery


class PrefixedPhraseQuery(SearchQuery):
    """
    Alter the tsquery executed by SearchQuery
    """

    def as_sql(self, compiler, connection):
        # Or <-> available in Postgres 9.6
        value = adapt('%s:*' % ' & '.join(self.value.split()))

        if self.config:
            config_sql, config_params = compiler.compile(self.config)
            template = 'to_tsquery({}::regconfig, {})'\
                .format(config_sql, value)
            params = config_params

        else:
            template = 'to_tsquery({})'\
                .format(value)
            params = []

        if self.invert:
            template = '!!({})'.format(template)

        return template, params

Refer to the Postgres docs for the ts_query syntax.

You can then use it in a query like so:

vector = SearchVector(  
    'first_name',
    'last_name',
    'email',
    config='simple')
query = PrefixedPhraseQuery(query, config='simple')
queryset = queryset\
    .annotate(vector=vector)\
    .filter(vector=query)

You could also write a startswith lookup, refer to the implementation of SearchVectorExact.

2
votes

Check out this article. It has information about what you are trying to do.

Take a look at Haystack as well. Whoosh seems to be a good option.

0
votes

I have used Apache Solr in my projects and it is very good and has a good deal of docs. And do check sunburnt, pysolr and solrpy

0
votes

Full text search it is now supported by Django: Django Full Text Search.

IMPORTANT: It seems this is only enabled for postgres db backend.

# Example based on Django Docs.
Entry.objects.annotate(
   search=SearchVector('title', 'description'),
).filter(search='some_text')

You could also use the search lookup

Entry.objects.filter(title__search='Cheese')