How to get duplicates in Alfresco?

Question

I have a task: "to get a duplicates (documents with same property value) from Alfresco database with count duplicates amounts". In MySql there will be something like that:

mysql> SELECT COUNT(*) AS repetitions, last_name, first_name
-> FROM cat_mailing
-> GROUP BY last_name, first_name
-> HAVING repetitions > 1

But I have read that "The CMIS query language in Alfresco does not support GROUP BY or HAVING." . Is there any query (in any supported language) to perform described task? Thank you!

UPD: for now I am counting in JVM this way (redefining hashCode/equals for Form20Row)

Map<Form20Row, Form20Row> rowsMap =  results.stream().parallel().map(doc -> {
            Form20Row row = new Form20Row();
            String propMark = propertyHelper.getNodeProp(doc, NDBaseDocumentModel.PROP_MARK);
            row.setGroupName(systemMessages.getString("form20.nss.name"));
            row.setDocMark(propMark);
            row.setDupesNumber(1);
            return row;
        }).collect(Collectors.toConcurrentMap(form20Row -> form20Row, form20Row -> form20Row,
                (existing, replacement) ->  {
                    int count = existing.getDupesNumber();
                    existing.setDupesNumber(++count);
                    return existing;
                }));

Heiko Robert Heiko Robert · Accepted Answer · 2020-08-24T11:05:25

Alfresco uses SOLR for search on nodes but SOLR is very limited on joins, aggregate functions, counting ... What you may do is querying the SOLR index using facets like facet.field=field1&facet.mincount=1.

Personally I would prefer to query the alfresco db directly to find nodes having the same property values for specific properties. This will not depend on the solr index and gives you the full flexibility of SQL.

How to get duplicates in Alfresco?

2 Answers