36
votes

We have a new project there for index a large amount of data and for provide real time. I have also complexe search with facets, full text, geospatial...

The first prototype is to index in MongoDB and next, into Elasticsearch, because I had read that Elasticsearch does not apply a checksum on stored files and the index can't be fully trusted. But since last versions (in the version 1.5), there is now a checksum and I'm guessing if we can use Elasticsearch as primary data store ? And what is the benefit to use MongoDB in addition to Elasticsearch ?

I can't find up to date answer about thoses features in Elasticsearch

Thanks a lot

4
That heavily depends on your use cases and overall application design. This is a question too broad to be answered here in a sensible way.Markus W Mahlberg
Ok, but there is a contraindication to use only Elasticsearch ? There is a fonctiannality provided by MongoDB but not by Elasticsearch ?user1853777
Storing abitrary data, for example?Markus W Mahlberg
ElasticSearch is aimed at indexing various data sources. MongoDB is a NoSQL database. While you can use the latter for indexing content, you'll have a hard time storing your data entities in the former.Markus W Mahlberg
SO seems inconsistent in these types of questions. This questions calls for an opinion or a view and hence cannot be verified but just debated. Since people have different backgrounds and bias, the answers will vary. When I ask this type of question I get a minus rank and a warning email. If SO is going to be the forum for FB, Mongo, then it needs to be fairer and more consistent.Trevor Oakley

4 Answers

48
votes

Talking about arguments to use Mongo instead of/together with ES:

  1. User/role management.

    • Built-in in MongoDB. May not fit all your needs, may be clumsy somewhere, but it exists and it was implemented pretty long time ago.
    • The only thing for security in ES is shield. But it ships only for Gold/Platinum subscription for production use.
  2. Schema

    • ES is schemaless, but its built on top of Lucene and written in Java. The core idea of this tool - index and search documents, and working this way requires index consistency. At back end, all documents should be fitted in flat lucene index, which requires some understanding about how ES should deal with your nested documents and values, and how you should organize your indexes to maintain balance between speed and data completeness/consistency. Working with ES requires you to keep some things about schema in mind constantly. I.e: as you can index almost anything to ES without putting corresponding mapping in advance, ES can "guess" mapping on the fly but sometimes do it wrong and sometimes implicit mapping is evil, because once it put, it can't be changed w/o reindexing whole index. So, its better to not treat ES as schemaless store, because you can step on a rake some time (and this will be pain :) ), but rather treat it as schema-intensive, at least when you work with documents, that can be sliced to concrete fields.
    • Mongo, on the other hand, can "chew and leave no crumbs" out of almost anything you put in it. And most your queries will work fine, `til you remember how Mongo will deal with your data from JavaScript perspective. And as JS is weakly typed, you can work with really schemaless workflow (for sure, if you need such)
  3. Handling non-table-like data.

    • ES is limited to handle data without putting it to search index. And this solution is good enough, when you need to store and retrieve some extra data (comparing to data you want to search against).
    • MongoDB supports gridFS. This gives you ability to handle large chunks of data behind the same interface. I.e., you can store binary data in Mongo and retrieve it within the same interface, from your code perspective.
2
votes

Well, choose the right tool for the right job. If you require searching capabilities such as full text search, faceting etc, then nothing can beat a full fledged search engine. ElasticSearch(ES) or Solr is just a matter of choice.

You can actually feed(index) documents into ES for searching and then fetch the complete details for a particular entry from MongoDB or any other database.

I can make your task easier, do take a look at my open source work that's using MongoDB, ES, Redis and RabbitMQ, all integrated at one place, here on github

Please note that the application is built in .Net C#.

1
votes

After having used Elasticsearch on production, I can add up to this thread few notes :

  • We securized our Elasticsearch clustering via a reverse proxy which check client certificate authenticity at request time before letting the query in : it proves that there is multiple way to add authentication anyway. (If you need more accuracy in security, like by using roles, there is few plugins that can be added to manage permissions)
  • Elasticsearch mapping and settings (tuning) are really important concepts to fully understand before going on production with it, and that's no that easy to get how everything works quickly.
  • Clustering and horizontal scaling is very flexible and easy to set up
  • The suite tools (Kibana, beats, etc ..) are a very convinient way to gather logs, expose key data, etc ...
  • Search features are extremely advanced, you can really do amazing things when you master a bit how full text search works (fuzzyness, boosting, scoring, stemming, tokenizer, analyzers, and so on ...).
  • API's are a bit scattered and there is not unique ways to achieve something. And some API are really WTF to use, like the bulk insert API: you need to pass binary data, with JSON format (ofc don't forget end of line characters) and repeating some fields multiple times. This is very verbose and I guess it's legacy code like we all have in our projects ;).
  • Last thing : if you develop a Java project, do not use Hibernate Search to duplicate data from a datasource to your ES cluster, we had so much issues with Hibernate Search, if we had to do that again, we'd do that manually.

Now about the real question :

To my mind, using only Elasticsearch is sufficient and may reduce complexity of having a multiple NoSQL storage systems.

I think it's worthy when you are doing a duo Relational and Transactional database + NoSQL search engine, but having two system which roughly serves the same purposes is a bit overkilled

0
votes

I have recently developed a feature in my company,

we wanted to perform some searches and rank the result according to its relevance on multiple factors and conditions.

So in my application, we were already using MongoDB as Db,

So on ElasticSearch index, I exported some of the fields from MongoDB that I want to perform search and filters on. So according to required conditions I prepared my mongo query and elasticsearch query also and performed the search. Then I filtered and sorted the result according to my need. The whole flow will was designed in such a way that, even if there is an error from ES, mongo will fetch the records. If I get the result from ES then, mongo result will depend on ES result. This is how I used mongo and ES in combination.

Also, don't forget to properly handle all updates, deletes and new record insertions.

And Just to Know, results for me were Really Good.