I have several functions (or transforms): func1, func2, func3, ...
and a dict that holds the functions
FUNCS = {
'1': func1,
'2': func2,
...
}
What I'm thinking about is to pass a parameter funcs
, that accepts a string of integers, and use for
to loop through funcs
, and execute the functions.
For example:
say I pass funcs="1321"
, then the functions are executed as:
with beam.Pipeline as p:
lines = (
p
| "read file" >> beam.io.ReadFromText('gs://some/inputData.txt')
)
for f in funcs: # 1321
lines = lines | FUNCS[f](#some other params)
and the functions are executed in order: func1, func3, func2, func1.
Is there any difference compared with:
with ...
lines = ...
lines = lines | func1 | func3 | func2 | func1
It's possible, I think; but is this even a good idea? Will there be any disadvantage about the parallel things of beam?
The true question is:
Is the pipeline get built first, THEN get executed?
Will the
for
loop and hard coded steps above end up with same pipeline? What effects doesfor
loop have on efficiency and final result?
I'm using flex template of Google Dataflow btw.