SolrCloud - Multiple Collections or Shards

Question

I currently use an older version of Solr - 4.7.2. It runs in standalone mode - only one solr node with multiple cores. Each core is protected by ldap groups.

I am looking to be able to search against a single core and also now add searching across multiple cores. Since distributed searching is considered legacy, I believe SolrCloud must be the way to go. I have installed the latest version of solr locally.

I have been reading up on this and I am still not sure how to do this.

There are roughly 100 cores right now. All have the same schema.

Do I convert each core to a collection where each collection is still protected by ldap groups? And then can you search across multiple collections?

Or do you set up one collection with multiple cores? Is each cores then a shard and I can still ldap protect each shard? Users can then search within a shard\core or across all within the collection?

Then what happens if you search across multiple collections or shards (depending on which above scenario is the way to go) and the user does not have access to a collection or shard? Do you need to know ahead a time where the user can search so there are no errors or will it bypass ones you do not have access to?

Thank you for any insight you can provide.

Andrea Andrea · Accepted Answer · 2018-03-10T21:03:08

Well, lot of points here. Let's go how I can help you

I currently use an older version of Solr - 4.7.2. It runs in standalone mode - only one solr node with multiple cores. Each core is protected by ldap groups.

Ok

I am looking to be able to search against a single core and also now add searching across multiple cores. Since distributed searching is considered legacy, I believe SolrCloud must be the way to go. I have installed the latest version of solr locally. I have been reading up on this and I am still not sure how to do this. There are roughly 100 cores right now. All have the same schema. Do I convert each core to a collection where each collection is still protected by ldap groups? And then can you search across multiple collections?

This is one possible scenario. I'm not sure if the LDAP auth still works as you've currently implemented, because keep in mind that the interaction with SolrCloud is different, it involves a third component (Zookeeper) which is absent in the standalone scenario or (manually) distributed search.
Starting from (maybe I'm wrong here with the version) Solr 5, the /admin endpoint offers also an authorisation / authentication API (the underlying AA mechanism, like an LDAP, can be plugged-in )

Just one doubt: 100 cores with the same schema means 100 collections with the same schema and that could mean a relevant amount of resources for managing what can you can consider 100 distributed Lucene indexes. Assuming that at the moment you are on a single server (and that means you don't have a lot of data) why don't you merge everything in a single collection (adding an additional "source" field for discriminating between documents)?

Or do you set up one collection with multiple cores?

Read above, it's basically up to you. You can do both.

Is each cores then a shard and I can still ldap protect each shard? Users can then search within a shard\core or across all within the collection?

It's not actually correct to think at core = shard but considering the step you're doing, yes, I think that this can help you to understand at the very beginning how things are working. However, I would have a look at the reference guide. And yes, the client can search wherever you want, targeting one or more collections.

Then what happens if you search across multiple collections or shards (depending on which above scenario is the way to go) and the user does not have access to a collection or shard? Do you need to know ahead a time where the user can search so there are no errors or will it bypass ones you do not have access to?

I think that the auth protection you're actually using is completely external to Solr, so I guess your assumption is right: you should know in advance where a given user can go, otherwise some request would return a 403 error (or something like that).

SolrCloud - Multiple Collections or Shards

1 Answers