Is it possible to apply a stateful transformation to only the values in a keyed PCollection?
For sake of example, let's say this PCollection is keyed on zip codes. The values are dictionaries that contain a user_id key. In this stateful DoFn, I want to keep track of all of the user_ids I have seen per zip code. However, given the sheer amount of zip codes, it becomes intractable to store all zip code, user_id pairs in state. However, if I only apply this stateful DoFn per key, then I don't need to explicitly store the zip code in state.
From the Python documentation, it doesn't look like this is possible. Would the best way be to abuse a custom CombineFn?
Thanks!
DoFn
implementation? – Nick_Kh