Most of the clustering algorithms requires a distance matrix. Creating a distance matrix is easy if the data has lower dimension. But what about a time series with around 8000 points to consider?
for i in range(total_series):
for j in range(total_series):
dis[i][j] = distance(series[i],series[j])
It's clear that the minimum time required to create this matrix will be of order O(n^2). Now, if we compare all the 8000 points of two time series, the time complexity is going to be very high. I am just talking about aligned distance(euclidean) and not some edit distance here.
Since we have around 50,000 time series to cluster, O(n^2) is going to very high for just those for loops. I need to calculate the distance function in minimal time though some indexing or pre-processing technique. Note that the distance function is going to point by point comparison.
Can somebody suggest some techniques so that we can find distance between two time series in less than O(length of time series) through some pre-processing? Or suggest some methods to cluster without creating the distance matrix with time complexity O(n^2)?