The question here primarily depends on how parallel the computer is (number of cores) and how parallel the algorithm is. Most likely your CPU cores are vastly faster than the network connection anyway and even one of them could easily overwhelm the connection. Thus on a typical system option (1) will give significantly better performance and lower drop rates.
This is because there is a significant overhead to using a UDP port on several threads or processes due to the internal locking the OS has to do to ensure the packets' contents are not multiplexed and corrupted, this causes a significant performance loss and significantly increased chance of packet loss where the kernel gives up waiting for other threads and just throws your pending packets away.
In the extreme case where your cores are very slow and your connection extremely fast (say a 500 core super computer with a 10 - 100Gbit fibre connection) option two could become more feasible, the locking would be less likely as the connection would be fast enough to keep many cores busy without them tripping over each other and locking often, this will -not- increase reliability (and may slightly decrease it) but might increase throughput depending on your architecture.
Overall in nearly every case I would suggest option 1, but if you really do have an extreme throughput situation you should look into other methods, however if you are writing software for this kind of system you would probably benefit from some more general training in massively parallel systems.
I hope this helps, if you have any queries please leave a comment.