1
votes

I'm working on a c# application that extracts metadata from all the documents in a Lotus Notes database (.nsf file, no Domino server) thusly:

NotesDocumentCollection documents = _notesDatabase.AllDocuments;
if (documents.Count>0)
{
      NotesDocument document = documents.GetFirstDocument();
      while (document!=null)
      { 
           //processing 
      } 
}

This works well, except we also need to record all the views that a document appears in. We iterate over all the views like this:

foreach (var viewName in _notesDatabase.Views)
{
        NotesView view = _notesDatabase.GetView(viewName);
        if (view != null)
        {
            if (view.AllEntries.Count > 0)
            {
                folderCount = view.AllEntries.Count;
                NotesDocument document = view.GetFirstDocument();
                while (document!=null)
                {
                    //record the document/view cross reference
                    document = view.GetNextDocument(document);
                }
            }
            Marshal.ReleaseComObject(view);
            view = null;
        }
}

Here are my problems and questions:

  1. We fairly regularly encounter documents in a view that were not found in NotesDatabase.AllDocuments collection. How is that possible? Is there a better way to get all the documents in a notes database?

  2. Is there a way to find out all the views a document is in without looping through all the views and documents? This part of the process can be very slow, especially on large nsf files (35 GB!). I'd love to find a way to get just a list of view name and Document.UniversalID.

  3. If there is not a more efficient way to find all the document + view information, is it possible to do this in parallel, with a separate thread/worker/whatever processing each view?

Thanks!

2

2 Answers

0
votes

Answering questions in the same order:

  1. I'm not sure how this is possible either unless perhaps there's a special type of document that doesn't get returned by that AllDocuments property. Maybe replication conflicts are excluded?

  2. Unfortunately there's no better way. Views are really just a saved query into the database that return a list of matching documents. There's no list of views directly associated with a document.

  3. You may be able to do this in parallel by processing each view on its own thread, but the bottleneck may be the Domino server that needs to refresh the views and thus it might not gain much.

One other note, the "AllEntries" in a view is different than all the documents in the view. Entries can include things like the category row, which is just a grouping and isn't backed by an actual document. In other words, the count of AllEntries might be more than the count of all documents.

0
votes

Well, first of all, it's possible that documents are being created while your job runs. It takes time to cycle through AllDocuments, and then it takes time to cycle through all the views. Unless you are working on a copy or replica of the database that is is isolated from all other possible users, then you can easily run into a case where a document was created after you loaded AllDocuments but before you accessed one of the views.

Also, is it may be possible that some of the objects returned by the view.getXXXDocument() methods are deleted documents. You should probably be checking document.isValid() to avoid trying to process them.

I'm going to suggest using the NotesNoteCollection as a check on AllDocuments. If AllDocuments were returning the full set of documents, or if NotesNoteCollection does (after selecting documents and building the collection), then there is a way to do this that is going to be faster than iterating each view.

(1) Read all the selection formulas from the views, removing the word 'SELECT' and saving them in a list of pairs of {view name, formula}.

(2) Iterate through the documents (from the NotesNoteCollection or AllDocuments) and for each doc you can use foreach to iterate through the list of view/formula pairs. Use the NotesSession.Evaluate method on each formula, passing the current document in for the context. A return of True from any evaluated formula tells you the document is in the view corresponding to the formula.

It's still brute force, but it's got to be faster than iterating all views.