I'm designing a pipeline with the following functionality:
- Read Events from different Pub/Sub topics providing me with objects from which I can extract a
StrId
(String) - Load a mapping table from Bigtable with
KV(StrId, IntId>
whereIntId
is a unique integer - Look up the
StrId
in that mapping:- If
StrId
is found, return correspondingIntId
- If
StrId
is not found, generate a newIntId
sequentially, add it to the mapping and also write it to Bigtable
- If
- Pass the object and
IntId
downstream
I'm wondering whether the state approach would fit my needs here, and whether Bigtable is the right storage technology to use? The mapping between StrId
and IntId
would have to be persisted across all workers in order to keep IntIds
unique.
Also, any links to code examples would be greatly appreciated. I'm aware of this Stackoverflow Question and this blog post.
(For the downstream calculations, I need integer Ids, so there's no way around that)