I am using Nutch to crawl a list of URLS specified in the seed file with depth 100 and topN 10,000 to ensure a full crawl. Also, I am trying to ignore urls with repeated strings in their path using regex-urlfilter http://rubular.com/r/oSkwqGHrri
However, I am curious to know which urls have been ignored during crawling. Is there anyway i can log the list of urls "ignored" by Nutch while it crawls?