What are the most common uses for distributed computing?

Question

I wrote a very simple distributed computing platform (based on the Map/Reduce paradigm), and I'm in the process of writing some demos and showcases. I have a very small team and have to prioritize which demos I'll write first.

To prioritize I need to sort the demos accordingly to about 70% being a relevant, common, significant use case of distributed computing, 30% being easy to write.

So far I have it ordered like this:

Discovering pi digits with Monte Carlo
Numerical integration with Monte Carlo
Large matrix multiplication (dense matrices)
Linear regressions
Large matrix inversion
Multiple regressions
Sorting
Clustering (K-Means)
Clustering (Hierarchical)

Number 1 is on the list because it took 10 minutes to write, although it's completely useless (I'm not sure but I figure there's not a lot of people trying to find more digits to pi).

Due to the nature of my platform, it will shine more in things that are of course embarrassingly parallel, and not I/O-bounded or reduce-dominated.

How would you change my list? What would you add to it? Is sorting useful at all in the enterprise world or is it only for benchmarking distributed computing platforms?

High Performance Mark High Performance Mark · Accepted Answer · 2012-08-21T07:06:43

Your list suggests that you are not distinguishing between parallel computing and distributed computing. This is not necessarily wrong but someone looking for a demonstration of the excellence of a distributed computing platform might be left tepidly enthused upon seeing parallel computations, such as your items 2 - 5, being performed.

Sorting is certainly useful everywhere there is data: large enterprises, small enterprises, in your desk drawers, across the Googlesphere. So too is searching, which is a surprising omission from your list. The other omission which strikes me immediately is any sort of data fusion, merging large datasets to get information from their intersections beyond what can be extracted from the datasets individually.

What are the most common uses for distributed computing?

2 Answers