1
votes

Brief overview of the setup:

5 x SolrCloud (Solr 4.6.1) node instances (separate machines).
The setup is intended to store last 48 hours webapp logs (which are pretty intense... ~ 3MB/sec)

"logs" collection has 5 shards (one per node instance).
One logline represents one document of "logs" collection


If I keep storing log documents to this "logs" collection, cores on shards start getting really big and CPU graphs show that instances spend more and more time waiting for disk I/O.

So, my idea is to create new collection with each 15 minutes and name it "logs-201402051400" with shards spread across 5 instances. Document writers will start writing to the new collection as soon as it is created. At some time I will get the list of collection like that:

...
logs-201402051400
logs-201402051415
logs-201402051430
logs-201402051445
logs-201402051500
...

Since there will be max 192 collections (~1000 cores) in the SolrCloud at some certain period of time. It seems that search performance should degrade drastically.

So, I would like to merge collections that are not being currently written to into one large collection (but still sharded across 5 instances). I have found information how to merge cores, but how can I merge collections?

1

1 Answers

1
votes

This might NOT be a complete answer to your query - but something tells me that you need to redo the design of your collection.

This is a classic debate between using a Single Collection with Multiple Shards versus Multiple Collections.

I think you ought to setup a Single Collection - and then use Solr Cloud's dynamic sharding capability (implicit router) to add new shards (for newer 15 minute intervals) / delete old shards (for older 15 minute intervals).

Managing a single collection means that you will have a single end point and will save you from complexity of querying multiple collections.

Take a look at one of the answers on this link that talks about using the implicit router for dynamic sharding in SolrCloud.

How to add shards dynamically to collection in solr?