1
votes

I am trying to save a large dendrogram made from a large table (10000+ rows, 18 columns), and I came with this code

from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
import numpy as np
import pandas as pd

data = pd.read_csv("Input.txt", header = 0, index_col = None,\
               sep = "\t", memory_map = True)
data = data.fillna(0)
Matrix = data.iloc[:,-18:]

Linkage_Matrix = linkage (Matrix, "ward")
fig=plt.figure(figsize=(20, 200))
#fig, ax = plt.subplots(1, 1, tight_layout=False)
ax = fig.add_axes([0.1,0.1,0.75,0.75])
#fig.title('Hierarchical Clustering Dendrogram')
ax.set_title("Hierarchical Clustering Dendrogram")
ax.set_xlabel("distance")
ax.set_xlabel("name")
dendrogram(
    Linkage_Matrix,
    orientation ="left",
    leaf_rotation=0., 
    leaf_font_size=12.,  
    labels = list(data.loc[:,"name"])
)    
ax.set_yticklabels(list(data.loc[:,"name"]), minor=False)
ax.yaxis.set_label_position('right')
ax.yaxis.tick_right()

plt.savefig("plt1.png", dpi = 320, format= "png", bbox_inches=None)

But unfortunately, it doesn't save the axis, while I left some space as showed in these:
Matplotlib savefig does not save axes
Why is my xlabel cut off in my matplotlib plot?
Matplotlib savefig image trim Plotting hierarchical clustering dendrograms for large data sets Dendrogram generated by scipy-cluster customisation I have a correct display in the console, which I can save, but the dpi are not good, and ideally I also would like to switch to svg to be able to set the level of readability afterwards.

Any insights would be greatly appreciated

1
What exactly do you mean by "it doesn't save the axis"?ImportanceOfBeingErnest
Exactly what it means; axis and elements that are depending on it like ticks, labels and so on are absent from the saved figure.Ando Jurai
But the dendrogram itself is correctly shown in the plot without axis spines, ticks and labels or is the plot completely empty? Would you be able to provide a minimal reproducible example of the issue that can be copied and run, to reproduce the problem? Just looking at the code I do not see what's wrong.ImportanceOfBeingErnest

1 Answers

3
votes

Removing this line

ax = fig.add_axes([0.1,0.1,0.75,0.75])

and setting bbox_inches='tight' in plt.savefig() makes it work for me.

Also, since you are loading the data with pandas, note how you can declare the 'name' column as index and use these index values as labels.

from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
import numpy as np
import pandas as pd


data = pd.read_csv('input.txt', header=0, index_col=['name'], sep="\t")
data = data.fillna(0)

link_matrix = linkage(data, 'ward')
fig, ax = plt.subplots(1, 1, figsize=(20,200))
ax.set_title('Hierarchical Clustering Dendrogram')
ax.set_xlabel('distance')
ax.set_ylabel('name')
dendrogram(
    link_matrix,
    orientation='left',
    leaf_rotation=0., 
    leaf_font_size=12.,  
    labels=data.index.values
)    
ax.yaxis.set_label_position('right')
ax.yaxis.tick_right()
plt.savefig('plt1.png', dpi=320, format='png', bbox_inches='tight')