This is mostly a Design Pattern Question for Elastic Search.
If I wanted to index The Internet with Elastic Search, what would be the most efficient way to organize such a task?
@kimchy talks about different patterns and Rafal Kuc discusses scaling massive clusters, but I didnt get a sense of how to organize an index of the internet after watching these.
I think logically you could organize such an effort by creating a new index for each domain. So you could shard heavily on indexes like Stackoverflow.com but maybe have as little as 1 shard for indexes like momandpopsite.com
Does that look efficient to you ES Community? I'm not sure because we can very quickly get into millions of indexes not to mention their individual shards. And now I'm wondering if there is a lot of overhead associated with this type of design and it becomes bloated. (That is, does this pattern's structure create too much overhead?).
I know this question has to be theoretical because resources are not specified. But if you could use your imagination and try to stick purely to a design strategy -- how would you index the world wide web? Lets say there are 275 million domains. What is the most efficient design pattern for indexing the internet using Elastic Search?