I am using "plt.subplots(2, 2, sharex=True, sharey=True)" to draw a 2*2 subplots. Each subplot has two Y axis and contains normal distribution curve over a histogram. Noting I particularly set "sharex=True, sharey=True" here in order to make all subplots share the same X axis and Y axis.
After running my code, everything is fine except the second, three, and fourth subplots where the normal distribution curve doesn't fit the histogram very well (please see the figure here)
I did googling but failed to get this issue solved. However, if I set "sharex=True, sharey=False" in my code, then the figure looks correct, but all subplots use their own Y axix which isn't what I want. Please see the figure here
Hope this issue can be fixed by experts in StackOverflow. Many thanks in advance!
Below is my code:
import matplotlib.pyplot as plt
from scipy.stats import norm
def align_yaxis(ax1, v1, ax2, v2):
#adjust ax2 ylimit so that v2 in ax2 is aligned to v1 in ax1
_, y1 = ax1.transData.transform((0, v1))
_, y2 = ax2.transData.transform((0, v2))
inv = ax2.transData.inverted()
_, dy = inv.transform((0, 0)) - inv.transform((0, y1-y2))
miny, maxy = ax2.get_ylim()
ax2.set_ylim(miny+dy, maxy+dy)
def drawSingle(myax, mydf , title, offset):
num_bins = 200
xs = mydf["gap"]
x = np.linspace(-1,1,1000)
mu =np.mean(x)
sigma =np.std(xs)
n, bins, patche = myax.hist(xs, num_bins, alpha=0.8, facecolor='blue', density=False)
myax.set_ylabel('frequency',color="black",fontsize=12, weight = "bold")
myax.set_xlabel('X', fontsize=12, weight = "bold",horizontalalignment='center')
ax_twin = myax.twinx()
y_normcurve = norm.pdf(bins, mu, sigma)
ax_twin.plot(bins, y_normcurve, 'r--')
align_yaxis(myax,0,ax_twin,0)
peakpoint = norm.pdf(mu,loc=mu,scale=sigma)
plt.vlines(mu, 0, peakpoint, 'y', '--', label='example')
ax_twin.set_ylabel("probablility dense",color="black",fontsize=12, weight = "bold")
def drawSubplots(mydf1,mydf2,mydf3,mydf4, pos1,pos2,pos3,pos4, title, filename):
plt.rcParams['figure.figsize'] = (18,15 )
my_x_ticks = np.arange(-0.8, 0.8,0.1)
rows, cols = 2, 2
fig, ax = plt.subplots(2, 2, sharex=True, sharey=True)
drawSingle(ax[0][0], mydf1, "Subplot1", pos1)
drawSingle(ax[0][1], mydf2, "Subplot2", pos2)
drawSingle(ax[1][0], mydf3, "Subplot3", pos3)
drawSingle(ax[1][1], mydf4, "Subplot4", pos4)
plt.text(-1, -1, title, horizontalalignment='center', fontsize=18)
plt.show()
drawSubplots(df1, df2,df3,df4,3.2,3.1,2.7,2.85,"test9", "test9")
sharey=True
, the histograms of dataframes with less rows will be smaller. If you want those to have a similar height, you need to normalize the heights withhist(..., density=True)
(this scales their area to 1). – JohanCdensity=True
to properly fit the normal curve. Or, otherwise, multiplyy_normcurve
withlen(xs)
and by the binwidth (y_normcurve*len(xs)*(bins[1]-bins[0)
) to have their area equal. Better leave out thetwinx()
and plot everything on the sameax
. Also,mu=np.mean(xs)
is more appropriate thanmu=np.mean(x)
. Finally, the curve would look better withy_normcurve = norm.pdf(x, mu, sigma)
and also draw it asmyax.plot(x, ....)
– JohanCdensity=True
, the y-axis will be the height of the "probabiliy distribution function". Note that "frequency" is only a useful measure if you have a well-defined bin width. If you usebins=200
, the bin width will be(xs.max() - xs.min()) / 200
which is different in the 4 plots. Multiplying(y_normcurve*len(xs)*(bins[1]-bins[0))
is only needed if you'd usedensity=False
. – JohanC