1
votes

I'm needing to perform deep pagination using R and the solr package. SOLR 7.2.1 server, R 3.4.3

I can't figure out how to get the nextCursorMark from the resultant dataframe. I usually do this in Python but this is stumping me.

res <- solr_all(base = myBase, rows = 100, verbose=TRUE,
                sort = "unique_id asc",
                fq="*:*",
                cursorMark="*"
               )

I cannot get the nextCursorMark from the result. Any help would be appreciated.

I have noticed that if I add the nextCursorMark to pageDoc it will return the value if parsetype is set to json, but not dataframe. So I guess another part is - where is that value if you return a dataframe?

1
I've rebooted solr and it's now called solrium - I will soon take solr off CRAN. There's no way to get the cursorMark value right now. I've opened an issue github.com/ropensci/solrium/issues/114 and should have something up soon - sckott
I did find a way to make it work - if you get the json response back you get the nextCursorMark back as part of the payload. I'll switch to solrium. Thanks! - WmSadler

1 Answers

0
votes

So I finally got a way to make this work. This is not optimal, the final solution is in the github issue referenced in the comment. But this works:

dat <-"http://yadda.com"
cM = "*"
done = FALSE
rowCount = 0
a <- data.frame()

while (!done)
{
  Data <- solr_search(base = dat, rows = 100, verbose=FALSE,
                      sort = "unique_id asc",
                      fq="*:*",
                      parsetype="json",
                      cursorMark=cM,
                      pageDoc = "nextCursorMark"
                      )
  if (cM == Data$nextCursorMark) {
    done = TRUE
  } else {
    cM = Data$nextCursorMark
  }
  a <- append(x = a, Data$response$docs)  
  rowCount = rowCount + length(Data$response$docs)
  print(rowCount)
}