0
votes

I am using solr 4.10.3. Documents are indexed using apache nutch 2.3. There is a field in schema.xml that is tstamp that contains informas when documents was indexed. This field is not indexed and stored only in solr. I want to count no of documents indexed by nutch in solr. It is clear that I have to use tstamp field. Now how I can do it?

Please explain in details.

1

1 Answers

0
votes

The default nutch-default.xml config file does not have the index-more plugin activated. You can enable it by adding it to the plugin chain.

Look for the plugin.includes property and change it from

<value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)</value>

to

<value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor|more)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)</value>

index-more will index the fetch date. Now to know the total number of documents being indexed you need to do a Solr query.

All documents: *:*

Documents indexed in the last 24 hours: date:[NOW-1DAY TO NOW]