1
votes

We use Solr 4.8 for our project.

One colleague created 2 cores in the same instance to index 80GB documents XML, from the same source. He said that one core can contain a maximum of 50GB of indexed data, so we split the 80GB to 2 cores. These cores have the same config files and schema. For indexation, he puts odd docs in the 1st core, and even docs in the 2nd core. For search, he uses one of SolrJ API to query on all documents from each core.

As we have only one server, distribution and replication aren't applied for the project.

My question: is this architecture a correct use case for Solr multiple cores? Anyone have some suggests?

3

3 Answers

3
votes

instead of storing two indexes and manually managing storing of documents on different cores, you should create solrcloud, which automatically distributes the data among the shards. It also allows you to distribute your data on multiple machines.

It will also make your performance better, querying would be much easier and you could add multiple collections(with different schema's) too.

2
votes

you should be using Solr Cloud, with a collection that has 2 shards. Take a look at https://cwiki.apache.org/confluence/display/solr/SolrCloud

1
votes

Generally cores are created to differentiate the application data in different collection entity format. It generally becomes useful to migrate core data from lower version to higher version. You can have many cores in solr. Suppose you have data harvested from two different source like one from X source and other from Y source, we generally would store them in 2 separate cores.

In your case it would be good idea to have 2 cores over same set of collection of data as the memory limit is huge. Generally a single core can accommodate huge amount memory. According to me its just the matter of your resource capability(hardware configuration like RAM and HDD)