5
votes

I'm working with a Rails 3 application to allow people to apply for grants and such. We're using Elasticsearch/Tire as a search engine.

Documents, e.g., grant proposals, are composed of many answers of varying types, like contact information or essays. In AR, (relational dbs in general) you can't specify a polymorphic "has_many" relation directly, so instead:

class Document < ActiveRecord::Base
  has_many :answerings
end

class Answering < ActiveRecord::Base
  belongs_to :document
  belongs_to :question
  belongs_to :payload, :polymorphic => true
end

"Payloads" are models for individual answer types: contacts, narratives, multiple choice, and so on. (These models are namespaced under "Answerable.")

class Answerable::Narrative < ActiveRecord::Base
  has_one :answering, :as => :payload
  validates_presence_of :narrative_content
end

class Answerable::Contact < ActiveRecord::Base
  has_one :answering, :as => :payload
  validates_presence_of :fname, :lname, :city, :state, :zip...
end

Conceptually, the idea is an answer is composed of an answering (functions like a join table, stores metadata common to all answers) and an answerable (which stores the actual content of the answer.) This works great for writing data. Search and retrieval, not so much.

I want to use Tire/ES to expose a more sane representation of my data for searching and reading. In a normal Tire setup, I'd wind up with (a) an index for answerings and (b) separate indices for narratives, contacts, multiple choices, and so on. Instead, I'd like to just store Documents and Answers, possibly as parent/child. The Answers index would merge data from Answerings (id, question_id, updated_at...) and Answerables (fname, lname, email...). This way, I can search Answers from a single index, filter by type, question_id, document_id, etc. The updates would be triggered from Answering, but each answering will then pull in information from its answerable. I'm using RABL to template my search engine inputs, so that's easy enough.

Answering.find(123).to_indexed_json  # let's say it's a narrative
=> { id: 123, question_id: 10, :document_id: 24, updated_at: ..., updated_by: [email protected], narrative_content: "Back in the day, when I was a teenager, before I had...", answerable_type: "narrative" }

So, I have a couple of questions.

  1. The goal is to provide a single-query solution for all answers, regardless of underlying (answerable) type. I've never set something like this up before. Does this seem like a sane approach to the problem? Can you foresee wrinkles I can't? Alternatives/suggestions/etc. are welcome.
  2. The tricky part, as I see it, is mapping. My plan is to put explicit mappings in the Answering model for the fields that need indexing options, and just let the default mappings take care of the rest:

    mapping do
      indexes :question_id, :index => :not_analyzed
      indexes :document_id, :index => :not_analyzed
      indexes :narrative_content, :analyzer => :snowball
      indexes :junk_collection_total, :index => :not_analyzed
      indexes :some_other_crazy_field, :index
      [...]
    

    If I don't specify a mapping for some field, (say, "fname") will Tire/ES fall back on dynamic mapping? (Should I explicitly map every field that will be used?)

Thanks in advance. Please let me know if I can be more specific.

1

1 Answers

0
votes

Indexing is the right way to go about this. Along with indexing field names, you can index the results of methods.

mapping do
  indexes  :payload_details, :as => 'payload_details', :analyzer => 'snowball',:boost => 0
end

def payload_details
  "#{payload.fname} #{payload.lname}" #etc.
end

The indexed value becomes a duck type, so if you index all of the values that you reference in your view, the data will be available. If you access an attribute that is not indexed on the model of the indexed item, it will grab the instance from ActiveRecord, if you access an attribute of a related model, I am pretty sure you get a reference error, but the dynamic finder may take over.