I am having understanding how we are supposed to test our pipeline using Google DataFlow(based on Apache Beam) Python SDK.
https://beam.apache.org/documentation/pipelines/test-your-pipeline/ https://cloud.google.com/dataflow/pipelines/creating-a-pipeline-beam
The above link is ONLY for Java. I am pretty confused as to why Google will point to Java Apache testing.
I want to be able to view the results of a CoGroupByKey join on two p collections. I am coming from a Python background, and I have little to no experience using Beam/Dataflow.
Could really use any help. I know this is open ended to an extent.. basically I need to be able to view results within my pipeline and it's preventing me from seeing the results of my CoGroupByKey Join.
Code Below
#dwsku, product are PCollections coming from BigQuery. Nested Values as
#well in Product, but not dwsku
d1 = {'dwsku': dwsku, 'product': product}
results = d1 | beam.CoGroupByKey()
print results
What is printed:
PCollection[CoGroupByKey/Map(_merge_tagged_vals_under_key).None]