1
votes

For an item called "some page", Sitecore automatically makes the URL "/some-page" but the page can also be reached by "/some page". The URLs are different, but point to the same Sitecore item.

Content authors can use both URLs in links on other pages, and for the current question, let's assume we can't change their behavior.

If both links are used, Lucene will add both to the search index, i.e. the same page is indexed twice. Both have the same "_id" value, so they are recognized as being the same item.

How can we make sure that Lucene does NOT add duplicate entries? How can we configure it to never store duplicate entries for the same "_id" value?

1

1 Answers

1
votes

Sitecore applications don't look at urls while indexing items (doesn't matter if it's Lucene or Solr).

Sitecore checks ID of the items, Language,Version and Database and on that base it uniquely determines Lucene document.

If you open your index e.g. with Luke , you can see _uniqueid field in all of your documents looking like sitecore://web/{d376c64b-866d-4725-8606-d0462b6ef28a}?lang=en&ver=1.

ID of the item (which is stored in _group of the Lucene document) is not used for unique identification of Lucene documents.

And in terms of linking to your pages, assuming that you're talking about Sitecore internal links, authors only select target item and it's Sitecore who generates "user-friendly" version of the link. And for that reason, you should not see different urls to the same page.