2
votes

All documents in a Solr index have an "added" field containing the ISO 8601 date in which the document was added to Solr.

<result name="response" numFound="34587104" start="0">
    <doc>
        <date name="added">2013-03-04T01:00:26Z</date>
        <str name="text">Hello, world!</str>
        <str name="id">93416604d274d28a44e14a9535bb9e6e1db3d851</str>
        <str name="_version_">1428536769315340290</str>
    </doc>
<result/>

Assuming that no documents are removed, how might I get a count of how many documents exist in the index per day? For instance, in order to know how many documents were in the index on 2013-03-05 I could query q=added:[* TO 2013-03-05T00:00:00Z]. However, I need to know how many documents were in the index for each day from one month ago until today.

One solution might be to query how many documents were in the index on the date one month ago, then facet on how many documents were added each day and then add them to a cumulative count. PseudoCode:

initial_count = q=added:[* TO NOW/MONTH-1MONTH]
running_total = initial_count;
daily_added_array = facet.range=added
                    & f.added.facet.range.start=NOW/MONTH-1MONTH
                    & f.added.facet.range.end=NOW/DAY-1DAY
                    & f.added.facet.range.gap=+1DAY

foreach (daily_added_array as day) {
    running_total += day;
    printf(running_total);
}

However this method seems extremely fragile and prone to error.

Is there a way to get the cumulative amount of documents in the index per day?

2
I have a field tstam in schema that is not indexed but stored only. How can I test daily indexed documents in solr . Its format is like "tstamp": "2015-03-05T04:57:54.032Z"Hafiz Muhammad Shafiq

2 Answers

2
votes

I don't think there is a better way than faceting to pull out the daily counts, and using date math is preferable to any attempts to calculate the specific date strings, so I think you already have it right on those aspects.

About the only improvement I can see is to query *:* and grab the hit count off of that, then you can use the daily counts to generate your running totals backwards by subtraction, rather than addition forwards. This should perform a little better than your method since *:* requires no filtering work or score calculation at all for solr. It also gives you one less date math expression to write :)

Speaking of which I suspect NOW/MONTH-1MONTH isn't what you meant. That says: "NOW rounded to start of the current month minus one month". Which will be anywhere from 28 to 61 days ago depending on what day now is (consider if NOW is March 1 vs Dec 31, you get Feb1 or Nov 1 respectively). That won't correspond to your stated requirement:

I need to know how many documents were in the index for each day from one month ago until today

I think you probably want NOW/DAY-1MONTH. Also, it seems you are excluding today's documents with your upper bound of your facets... is that desired? (if so my method still works, but you have to extend the upper bounds of your facets to NOW/DAY+1DAY and just ignore the document total when generating your running total list (still backwards).

2
votes

Depending on your Solr version, you may use a combination of Grouping aka Field Collapsing together with group.func parameter. http://wiki.apache.org/solr/FieldCollapsing

set rows=0&group.field=added&group.func=rint(div(ms(added),mul(24,mul(60,mul(60,1000)))))

The latter function converts to milliseconds and rounds to the day. Number of groups returned is what you want. You can filter it by last month etc. as you like