2
votes

I need to configure thinking sphinx with Spanish stemming and I can't get it to work.

I learned [1] that I needed to compile the sphinx source code with the libstemmer_c library and install it. Additionally, I had to change the configuration of thinking sphinx by adding the libstemmer_es stemmer to morphology.

In detail, this is what I did

  1. Remove existing sphinx installation with apt-get

     apt-get remove sphinxsearch
    
  2. Download and unpack source code of sphinx and the libstemmer_c library and copy content of latter to libstemmer_c directory

     wget http://sphinxsearch.com/files/sphinx-2.2.11-release.tar.gz
     tar xvf sphinx-2.2.11-release.tar.gz
     wget http://snowball.tartarus.org/dist/libstemmer_c.tgz
     tar xvf libstemmer_c.tgz
     cp -rf libstemmer_c/* sphinx-2.2.11-release/libstemmer_c/
    
  3. Configure, compile and install sphinx with the libstemmer_c library

    cd sphinx-2.2.11-release
    ./configure --with-mysql-includes=/usr/include/mysql --with-mysql-libs=/usr/lib/x86_64-linux-gnu --with-libstemmer
    make          
    make install
    
  4. Add libstemmer_es stemmer to morphology in thinking_sphinx.yml

    development:
      mysql41: 3563
      address: <%= ENV['SPHINX_HOST'] || '' %>
      enable_star: true
      charset_type: utf-8
      min_infix_len: 2
      morphology: libstemmer_es
      ...
    
  5. Reconfigure sphinx and regenerate indices

    bundle exec rake ts:configure
    bundle exec rake ts:generate
    
  6. Restart docker containers and rails server

I'm working on a website with various products that are indexed with sphinx. With stemming enabled searching for "cameras" should yield all products with "cameras" or "camera". Currently, searching "cameras" only returns products with "cameras" in the string, but no products with "camera" only.

I'm using Rails 3.2, thinking-sphinx 3.2 and sphinx 2.2.11 on Ubuntu 14.04.4 LTS. Maybe worth to mention that I'm using docker containers. The searchd runs in a separate container apart from the rails application.

UPDATE 1: I can't do rake ts:regenerate since I'm running searchd in a separate docker-container, i.e. my sphinx container. Instead I stop the sphinx container, enter a worker container, run rake ts:clear_rt and rake ts:configure, then restart the sphinx container which also restarts -searchd, enter the sphinx container and then finall run rake ts:generate

UPDATE 2: Content of log/development.searchd.log is

[Thu Mar 16 12:24:59.147 2017] [  127] listening on all interfaces, port=3563
[Thu Mar 16 12:24:59.161 2017] [  127] binlog: replaying log .../development/binlog.001
[Thu Mar 16 12:24:59.161 2017] [  127] binlog: replay stats: 0 rows in 0 commits; 0 updates, 0 reconfigure; 0 indexes
[Thu Mar 16 12:24:59.162 2017] [  127] binlog: finished replaying /opt/sharetribe/tmp/binlog/development/binlog.001; 0.0 MB in 0.000 sec
[Thu Mar 16 12:24:59.162 2017] [  127] binlog: finished replaying total 1 in 0.001 sec
[Thu Mar 16 12:24:59.163 2017] [  127] DEBUG: SaveMeta: Done.
[Thu Mar 16 12:24:59.163 2017] [  127] accepting connections
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: ReadLock 0xe42ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: Unlock 0xe42ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: ReadLock 0xe42ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: Unlock 0xe42ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: ReadLock 0xe42ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: Unlock 0xe42ef8
... /* many more ReadLock and Unlock */
[Thu Mar 16 12:28:50.467 2017] [  128] listening on all interfaces, port=3563
[Thu Mar 16 12:28:50.478 2017] [  128] DEBUG: SaveMeta: Done.
[Thu Mar 16 12:28:50.478 2017] [  128] accepting connections
[Thu Mar 16 12:28:55.503 2017] [  128] DEBUG: ReadLock 0x1522ef8
[Thu Mar 16 12:28:55.503 2017] [  128] DEBUG: Unlock 0x1522ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: ReadLock 0xe42ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: Unlock 0xe42ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: ReadLock 0xe42ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: Unlock 0xe42ef8
... /* many more ReadLock and Unlock */
[Thu Mar 16 12:29:09.806 2017] [  128] caught SIGHUP (seamless=1, in queue=1)
[Thu Mar 16 12:29:09.806 2017] [  128] DEBUG: CheckRotate invoked
[Thu Mar 16 12:29:09.806 2017] [  128] DEBUG: /opt/sharetribe/db/sphinx/development/custom_field_value_core.new.sph is not readable. Skipping
[Thu Mar 16 12:29:09.806 2017] [  128] DEBUG: /opt/sharetribe/db/sphinx/development/listing_core.new.sph is not readable. Skipping
[Thu Mar 16 12:29:09.806 2017] [  128] WARNING: nothing to rotate after SIGHUP ( in queue=0 )
[Thu Mar 16 12:29:10.541 2017] [  128] DEBUG: ReadLock 0x1522ef8
[Thu Mar 16 12:29:10.541 2017] [  128] DEBUG: Unlock 0x1522ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: ReadLock 0xe42ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: Unlock 0xe42ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: ReadLock 0xe42ef8
[Thu Mar 16 12:25:04.175 2017] [  127] DEBUG: Unlock 0xe42ef8
... /* many more ReadLock and Unlock */

UPDATE 3: I'm defining a real time index on listings of products with attributes such as title, description, author name etc.

ThinkingSphinx::Index.define :listing, :with => :real_time do
  indexes title
  indexes description
  indexes custom_field_values_sphinx
  indexes origin_loc.google_address
  indexes author.given_name
  indexes author.username
  indexes location.province
...

This the underlying model

class Listing < ActiveRecord::Base 

  after_save ThinkingSphinx::RealTime.callback_for(:listing)
...

The Listing.search method is called in a public method of the model

  Listing.search(
      escaped_query,
      :select => "*, #{SPHINX_WEIGHT_FUNCTION} as w",
      :sql => {:include => params[:include]},
      :star => true,
      :with => with,
      :with_all => with_all,
      :order => params[:sort],
      :per_page => per_page,
      :page => page
  )

[1] http://freelancing-gods.com/thinking-sphinx/advanced_config.html#word-stemming--morphology

1

1 Answers

0
votes

From what I can see, you've got everything configured correctly.

You may want to run rake ts:regenerate to ensure Sphinx has the new configuration loaded correctly (ts:generate is for updating data, but doesn't update configuration).

If that doesn't change anything (and the morphology setting is appearing in the generated configuration file), then the problem may not be with TS, but with Sphinx itself? I wonder if the Sphinx logs have any clues.