I have 50 Millions document in my marklogic database. I'd like to analyze the content in order to know which are the main categories of document.
Each of my document are in a specific folder (ie : "/books/") and with a specific collection ("/type/books").
I'd like to generate a CSV with two columns : name_of_the_collection;count_distinct_value
Example :
Collection;count
books;437438
cars;46565
cats;457373
And the same with the directory :
directory;count
/animals/cats/;437438
/animals/dogs;46565
/animals/cow;457373
I tried to list all distinct categories/collection and count the number of documents but I was not able to combine the two.
Could you please help me ?
Thanks, Romain.