0
votes

I am hierarchically clustering gene expression data. My result data is shaped like dendrogram. I want to keep the whole tree in some data structure in python and do some calculations in each node (I think recursively). For every node I know the genes in there and some extra information (GO, p-values etc..) Do you have any suggestions on how to store this kind of data in python in a way that I can traverse the whole tree? My first thought was a list of dictionaries:

clusters=[{'id': 1, 'cluster': [gen1, gen2,...], 'size': ... , 'ChildIDs': ... , 'ParentID': ... , 'distance': ..., 'score': ...}, {'id': 2, ...}, ... ]

But since the clusters are nested, then storing the genes for every cluster is not efficient, I think.

If anyone has a better idea how to keep this kind of info, I would appreciate it:)

1
You could maybe take a look to this implementation of hierarchical clustering. It stores the information in a Dendogram class. If you look at the method show of the class Dendogram you might get an idea of what it is doing. I don't know how many genes you're talking about, probably a lot, and how efficient this implementation will be... Hope it helps anyway.lrnzcig

1 Answers

0
votes

Short of writing your own datastructure, collections.defaultdict might be appropriate for a tree. See here.