First of all, to do the merge you're talking about, you probably want to use set (or map's) merge member function, which will let you merge some existing map into this one. The advantage of doing this (and the reason you might not want to, depending your usage pattern) is that the items being merged in are actually moved from one set to the other, so you don't have to allocate new nodes (which can save a fair amount of time). The disadvantage is that the nodes then disappear from the source set, so if you need each local histogram to remain intact after being merged into the global histogram, you don't want to do this.
You can typically do better than O(log N) when searching a sorted vector. Assuming reasonably predictable distribution you can use an interpolating search to do a search in (typically) around O(log log N), often called "pseudo-constant" complexity.
Given that you only do insertions relatively infrequently, you might also consider a hybrid structure. This starts with a small chunk of data that you don't keep sorted. When you reach an upper bound on its size, you sort it and insert it into a sorted vector. Then you go back to adding items to your unsorted area. When it reaches the limit, again sort it and merge it with the existing sorted data.
Assuming you limit the unsorted chunk to no larger than log(N), search complexity is still O(log N)--one log(n) binary search or log log N interpolating search on the sorted chunk, and one log(n) linear search on the unsorted chunk. Once you've verified that an item doesn't exist yet, adding it has constant complexity (just tack it onto the end of the unsorted chunk). The big advantage is that this can still easily use a contiguous structure such as a vector, so it's much more cache friendly than a typical tree structure.
Since your global histogram is (apparently) only ever populated with data coming from the local histograms, it might be worth considering just keeping it in a vector, and when you need to merge in the data from one of the local chunks, just use std::merge to take the existing global histogram and the local histogram, and merge them together into a new global histogram. This has O(N + M) complexity (N = size of global histogram, M = size of local histogram). Depending on the typical size of a local histogram, this could pretty easily work out as a win.
ktimes1equalskNfor sufficiently large values of1:) Sorry, typo, I edited it out. - penelopestd::setis what you're looking for. Using it to keep sorted collections isn't a good idea. You usestd::setwhen you need to rely on the property that insertions don't invalidate iterators and references. To keep sorted collections,std::vectorand<algorithm>generally are the best choice. And what you ask for is a no brainer on twovectors - papagagastd::maps, which represent, in a way, very sparse histograms. I calculate several of these histograms (about 10, not thousands). The calculation is the complicated bit, where I need to add and update many, many elements out of order, so I need find/insert O(logN) and thestd::mapis the correct data structure for this. After each calculation, I need to update the "globalHistogram" by adding the information of the current histogram to it which is what my question refers to. I asked the question withstd::setfor simplicity instead. - penelope