4
votes

I have a Solr index that uses quite a few dynamic fields. I've recently changed my code to reduce the amount of data we index with Solr, significantly reducing the number of dynamic fields that are in use.

I've reindexed my data, and the doc count (as displayed in the admin schema browser) for the old fields has dropped to zero. But I'm confused as to why the fields still exist. I've done an optimize, and restarted the server, but I can't find any information on whether there's a way to get these fields to disappear.

Am I now stuck with these fields unless I create an index from scratch? We're talking about a significant reduction in fields (about 200 -> 30), and I'm worried about the performance impact of keeping them floating around.

I'm using Solr 1.4.

Edit: The dynamic field definitions still exist in the schema.xml, because I'm still using them in a few cases. It's just that the number of fields based on them has been significantly reduced.

Edit:

None of these fields are stored, only indexed. So I can't see them just by inspecting the documents returned, but I can facet on them.

Here are my results for querying on a field that I'm still using...

Query:

/?q=*:*&facet=on&facet.field=books_isbn_10_s_exact

Result:

<lst name="books_isbn_10_s_exact">
    <int name="1010102457">2</int>
    <int name="1110011010">2</int>
    <int name="1110011013">2</int>
    ...

Here are my results for one of the empty ones...

Query:

/?q=*:*&facet=on&facet.field=mobiles_infrared_s_exact

Result:

<lst name="mobiles_infrared_s_exact"/>

Both fields are using this field definition in my schema.xml:

<dynamicField name="*_s_exact"  type="string"  indexed="true"  stored="false" termVectors="true" omitNorms="true" multiValued="false" />

The only place I'm seeing the old fields (eg mobiles_infrared_s_exact and about 100 others) is in Solr's schema browser in /admin/. Where I can see all the dynamic fields I've ever used, even though the doc count for most of them is 0.

I'm just trying to find out if there's a way to remove them from the schema browser, and also whether there's a performance implication for them sticking around given that I have an index of 10m documents.

2
just in case, did you remove the dynamic field declarations in your schema?Mauricio Scheffer
I haven't, I still need some instances of the field, just not the majority of them. I'll update the question to clarify.Andrew Ingram
Then I don't understand... can you tell us where do you see these unwanted fields?Mauricio Scheffer
In the admin schema browser, hence why I can see the doc count is zero. I can see a list of every field in the index, including those generated from dynamic field definitions. And generated fields which are no longer used are still appearing in the index, when I'd have expected them to be removed after an optimize.Andrew Ingram
Try using TermsComponent (wiki.apache.org/solr/TermsComponent) to get the documents with these unwanted fields (and their values).Mauricio Scheffer

2 Answers

0
votes

What happens when you do something like this:

/?q=mobiles_infrared_s_exact:xyzzy

Do you get zero documents returned or do you get an error?

0
votes

I have detected this for multiple solr cores after several rounds of schema migration. You can automate it by pulling directly from lucene data like:

/solr/your_core/admin/luke?numTerms=0&wt=json

[
// ...
fields: {
 _version_: {
  type: "long",
  schema: "I-S-----OF------",
  index: "-TS-------------",
  docs: 761997
 },
 abstract_display: {
  type: "string",
  schema: "--S-M----------l",
  dynamicBase: "*_display"
 },
 abstract_t: {
  type: "text",
  schema: "ITS-M-----------",
  dynamicBase: "*_t"
 }
 //...
}]

Then filter fields by the presence of nonzero docs count. As for removing them in schema browser, I have only been able to do that when migrating to new solr installations or rebuilding the core from scratch. There may be other means, but it really isn't something Solr is setup to manipulate. It probably considers the trace an internal artifact.

Effectively this is more of a solr schema browser question than a solr question.