1
votes

I have scenario where I need to store unstructured data but the rest of my data is structured and relational. An example of the type of unstructured data is as explained below:

User Type 1:

How do you blah blah : 5 fields

User Type 2 :

How do you blah blah : 3 fields

User Type 3 :

How do you blah blah : 7 fields

All 3 types are asked the same question "How do you blah blah" but each user type answers it using different number of fields. And there can be a lot of different user types.

For the relational data, I'm using MySQL but i'm a little confused on how to store this unstructured data:

  1. Serialize as JSON and store in MySQL
  2. Use NoSQL

My requirements are high reads, average updates, average inserts & no deletes. No JOINS needed. I need guaranteed writes & high availability. If I were to choose a NoSQL, it'd be a AP type according to the CAP theorem. I won't be hitting millions of records anytime soon.

I also plan to provide a text search for this data in the future, but it doesn't need to be a real time search, so I can always index the data using Lucene periodically. But of course, document based NoSQL implementations do provide this out of the box. But I have read in a few places where people have advised against storing JSON data in MySQL. But adding a NoSQL layer could be overkill.

What do I do & if you do advise me to go for NoSQL DB, which one should I choose?

EDIT: To clarify, I don't need to query the specific fields from the data that I'm storing. If I need the data, then i'll need the whole data together, not specific fields. I do need full text search, which I can accomplish on MySQL too using Lucene.

2

2 Answers

2
votes

You could probably make it work with MySQL by having a row ID and a single text column, but then you wouldn't be able to query the fields. You could also consider table inheritance, but if you do have lots of types, this is gonna be a mess. The bottom line is that you have a good reason for considering an alternate solution instead of bending relationnal databases.

So from what you're saying, I think it would indeed be a good use case for polyglot persistence. With that said, MySQL + NoSQL will increase the overall complexity of your application, so you need to make sure to abstract both data access layers.

For the database choice, document oriented solutions seem like a good fit when looking at your data (dynamic, isolated aggregates). I'd look into MongoDB or CouchDB, even if the second option seems more appropriate (AP, Master/master, Lucene integration...).

EDIT : See comments.

2
votes

I recently worked on a platform that made heavy use of SQL Server, MySQL and Mongo. The data we stored was spread across these three database systems.

It made me long for only one database technology.

I would advise from experience to just make a text field and store JSON in there. You can't query the field directly, but you can make static fields next to the text field that will be queryable.

It is definitely non-trivial to introduce another system into the mix.

Some reasons for this:

  1. There is a high learning curve for document modeling. You don't normalize, you denormalize data - and doing so is a bit of an art.
  2. Having configured CouchDB and MongoDB clusters, I can tell you it's not a no brainer - especially when you move to production.
  3. It's certainly non-trivial to query across database technologies.

I would only introduce a separate NoSQL solution as a last resort.