0
votes

I have the following manual Sphinx query (via the mySQL client), that is producing proper results, and I would like to call it through Thinking Sphinx from Rails. For the life of me, I am struggling with how to make a 'distinct' query work in Thinking Sphinx.

mysql> select merchant_name, count (distinct part_number) from product_core group by merchant_name;

+-----------------------+-----------------------------------------+
| merchant_name         | count (distinct part_number)            |
+-----------------------+-----------------------------------------+
|            1962041491 |                                       1 |
|            3208850848 |                                       1 |
|            1043652526 |                                   48754 |
|             770188128 |                                       1 |
|             374573991 |                                   34113 |
+-----------------------+-----------------------------------------+

Please note: This mySQL query is agaist Sphinx, NOT mySQL. I use the mySQL client to connect to Sphinx, as: mysql -h 127.0.0.1 -P 9306. This works well for debugging/development. My actual db, is Postgres.

Given this, and to add more context, I am attempting to combine a group_by in thinking Sphinx, with a count('Distinct' ...).

So, this query works:

Product.search group_by: :merchant_name

... and, this query works:

Product.count ('DISTINCT part_number')

... but, this combined query throws an error:

Product.search group_by: :merchant_name, count ('DISTINCT part_number')

SyntaxError: (irb):90: syntax error, unexpected ( arg, expecting keyword_do or '{' or '('
...merchant_name, count ('DISTINCT part_num...

Both merchant_name and part_number are defined as attributes.

Environment:

Sphinx 2.2.10-id64-release (2c212e0)
thinking-sphinx 3.1.4
rails 4.2.4
postgres (PostgreSQL) 9.3.4

I have also tried using Facets, but to no avail:

Product.search group_by: :merchant_name, facets: :part_number
Product.facets :part_number, group_by: :merchant_name

For additional information, and to see if this could be accomplished through a Thinking Sphinx call, here is a basic example. I have one product table (and associated index), that lists both merchants, and their products (I agree, it could be normalized, but its coming in from a data feed, and Sphinx can handle it as is):

+-----------------+-------------------+
| merchant        | product           |
+-----------------+-------------------+
|   Best Buy      |   Android phone   |
|   Best Buy      |   Android phone   |
|   Best Buy      |   Android phone   |
|   Best Buy      |   iPhone          |
|   Amazon        |   Android phone   |
|   Amazon        |   iPhone          |
|   Amazon        |   iPhone          |
|   Amazon        |   iPhone          |
|   Amazon        |   Onkyo Receiver  |
+-----------------+-------------------+

With Thinking Sphinx, I want to: a) group the rows by merchant, and b) create a “distinct” product count for each group.

The above example, should give the following result:

+-----------------+------------------------+
| merchant        | count(DISTINCT product |
+-----------------+------------------------+
|   Best Buy      |   2                    |
|   Amazon        |   3                    |
+-----------------+------------------------+
1
What's the issue with the query? Is it that merchant_name is returning integers? Or beyond that, are the aggregate values not accurate? What version of Sphinx are you using?pat
Pat, my raw query works perfectly against Sphinx via the MySQL client (which proves my indexes and attributes work), I just am not able to write the 'district' portion of the query into a ThinkingSphinx call.Martin Sommer

1 Answers

1
votes

You're not going to be able to run this query through a model's search call, because that's set up to always return instances of a model, whereas what you're wanting is raw results. The following code should do the trick:

ThinkingSphinx::Connection.take do |connection|
  result = connection.execute <<-SQL
  SELECT merchant_name, COUNT(distinct part_number)
  FROM product_core
  GROUP BY merchant_name
  SQL
  result.to_a
end

Or, I think this will work to go through a normal search call:

Product.search(
  select:     "merchant_name, COUNT(distinct part_number) AS count",
  group_by:   :merchant_name,
  middleware: ThinkingSphinx::Middlewares::RAW_ONLY
)