I need your expertise:
I am about to implement a graph class in c++ and thinking about the right representation. The graphs are simple and undirected. Number of vertices get for now just up to 1000 but maybe higher in the future. Number of edges up to 200k and maybe higher. Each vertex got a color (int) and an id (int). Edges transport no more information than connecting to vertices.
I store the graph and just need to access if x and y are connected or not - this I need very often. After initialising i never remove or add new vertices or edges (N = Number of vertices and M=number of edges given from the start)!
The one representation which is already available to me:
An adjacency list rolled out into just one long list. Along with this representation goes an array with starting indices for each vertex. Storage O(2M) and check if edge between x and y in an average of O(n/m)
A representation I thought of:
the idea is to, instead of rolling out the adjacency list into one array, do it with the adjacency matrix. So storage O(N^2)? Yes but I want to store an edge in one bit except of one byte.(actually 2 bits symmetricallywise) Example: Say N=8, then create an vector<uint8_t> of length 8 (64 bit). Init each entry on 0. If there is an edge between vertex 3 and vertex 5, then add pow(2,5) to the entry of my vector belonging to vertex 3 and symmetrically. So there is a 1 in the entry of vertex 3 at position of vertex 5 exactly when there is an edge between 3 and 5. After inserting my graph into this data structure I think one should be able to access neighborhood in constant time by just a binary operation: Are 3 and 5 connected? Yes if v[3] ^ pow(2,5) == 0. When there are more vertices than 8, then every vertex needs to get more than one entry in the vector and I need to perform one modulo and one division operation for accessing the correct spot.
What do you think of the second solution - is it maybe already known and in use? Am I wrong by thinking about an access of O(1)? Is it to much effort for no real performance improvement?
The reason for loading both representations in one big list is due to cache improvements I was told.
I am happy to get some feedback on this idea. I might be way off - pls be kind in that case :D