I'm trying to collect stream throwing away rarely used items like in this example:
import java.util.*;
import java.util.function.Function;
import static java.util.stream.Collectors.*;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.containsInAnyOrder;
import org.junit.Test;
@Test
public void shouldFilterCommonlyUsedWords() {
// given
List<String> allWords = Arrays.asList(
"call", "feel", "call", "very", "call", "very", "feel", "very", "any");
// when
Set<String> commonlyUsed = allWords.stream()
.collect(groupingBy(Function.identity(), counting()))
.entrySet().stream().filter(e -> e.getValue() > 2)
.map(Map.Entry::getKey).collect(toSet());
// then
assertThat(commonlyUsed, containsInAnyOrder("call", "very"));
}
I have a feeling that it is possible to do it much simpler - am I right?
new ArrayList<>(Arrays.asList(…))
by a simpleArrays.asList(…)
. There is only one way to avoid a map which is calculating the frequency again for each item but that’sO(n²)
CPU complexity, so I guess you better live with the intermediate map… – HolgerSELECT word FROM allWords GROUP BY word HAVING count(*) > 2
. ThegroupingBy
Collector
does the job ofGROUP BY
, but there is noHAVING
clause equivalent. It would be good for Java to add that functionality, e.g. something likegroupingBy(Function<? super T,? extends K> classifier, Collector<? super T,A,D> downstream, Predicate<? super D> having)
. – rgettmanHAVING
clause in SQL? It's just hiding the fact that the group by and aggregate operations happen before the aggregate values are filtered. Once could writeSELECT word FROM (SELECT word, count(*) c FROM allWords GROUP BY word) WHERE c > 2;
. HavingHAVING
allows more concise code. One can certainly use your solution; it will work. I just pointed out that it would be good for Java to have theHAVING
option that would simplify the code. – rgettman