I'm trying to count how many values in a data set match a filter condition, but I'm running into issues when the filter matches no entries.
There are a lot of columns in my data structure, but there's only three of use for this example: key - data key for the set (not unique), value - float value as recorded, nominal_value - float representing the nominal value.
Our use case right now is to find the number of values that are 10% or more below the nominal value.
I'm doing something like this:
filtered_data = FILTER data BY value <= (0.9 * nominal_value);
filtered_count = FOREACH (GROUP filtered_data BY key) GENERATE COUNT(filtered_data.value);
DUMP filtered_count;
In most cases, there are no values that fall outside of the nominal range, so filtered_data is empty (or null. Not sure how to tell which.). This results in filtered_count also being empty/null, which is not desirable.
How can I construct a statement that will return a value of 0 when filtered_data is empty/null? I've tried a couple of options that I've found online:
-- Extra parens in COUNT required to avoid syntax error
filtered_count = FOREACH (GROUP filtered_data BY key) GENERATE COUNT((filtered_data.value is null ? {} : filtered_data.value));
which results in:
Two inputs of BinCond must have compatible schemas. left hand side: #1259:bag{} right hand side: #1261:bag{#1260:tuple(cf#1038:float)}
And:
filtered_count = FOREACH (GROUP filtered_data BY key) GENERATE (filtered_data.value is null ? 0 : COUNT(filtered_data.value));
which results in an empty/null result.