Dataflow will eventually optimize the machine type for you. In the meantime here are some scenarios I can think of where you might want to change the machine type.
If your ParDO operation needs a lot of memory you might want to change the machine type to one of the high memory machines that Google Compute Engine provides.
Optimizing for cost and speed. If your CPU utilization is less than 100% you could probably reduce the cost of your job by picking a machine with fewer CPUs. Alternatively, if you increase the number of machines and reduce the number of CPUs per machine (so total CPUs stays approximately constant) you can make your job run faster but cost approximately the same.
Can you please elaborate more on what type of system failures you are seeing? A large class of failures (e.g. VM interruptions) are probabilistic so you would expect to see a larger absolute number of failures as the number of machines increases. However, failures like VM interruptions should be fairly rare so I'd be surprised if you noticed an increase unless you were using order of magnitude more VMs.
On the other hand, its possible you are seeing more failures because of resource contention due to the increased parallelism of using more machines. If that's the case we'd really like to know about it to see if this is something we can address.