There is a large and repetitive Hough transform in a piece of code I'm vaguely attached too. The maintainer of that part of the code has been experimenting with sparse arrays (actually a C++ std::map
keyed on the cell index if I understood his presentation right) for the accumulator with some success.
I presume the speed up is related to cache locality issues, and it certainly depends on the data being sparse.
Update: The software referenced above is intended to serve many particle physics experiments, but was originally used on a test-bed project (i.e. small scale). As we've gotten serious about doing larger projects and started doing Monte Carlo for them, the Hough transform has gotten to be a bit of a bottle neck again even with the sparse matrix.
As yet we do not have a solution, but one of colleagues found Gandalf which includes "fast hough transform", which appears to evaluate the transform in a quad-tree like way (in 2D, presumably you use a oct-tree in 3D) to reduce the order of work. We're probably going to experiment with this.
Further update: A colleague eventual implemented a progressive, probabilistic Hough transform in our code which currently seems to be the fastest version we've got. Works best if you don't require that every point gets assigned to a line.