4
votes

I'm trying to create a simple scatter plot with metrics data I collect from my experiments. Each day, I test multiple experimental samples and the number of samples varies. I'm trying to create a scatter plot with the days as the x values, and all the experimental values collected on that day as the y values.

I've tried several approaches so far.

I'll spare the full code, but here is an example of what the data looks like:

XVals = ['10-Dec-18', '11-Dec-18']
YVals = [[0.88, 0.78, 0.92, 0.98, 0.91],[0.88, 0.78, 0.92, 0.98]]

Since pyplot wants x and y to be of the same dimension, I tried the following suggestion

for xe, ye in zip(XVals, YVals):
   plt.scatter([xe] * len(ye), ye)

This gives me a value error since my xvals are strings.

ValueError: could not convert string to float: '10-Dec-18'

I have also tried generating the plot in the following fashion, but again I get an error message because x and y are of different dimensions:

fig, ax = plt.subplots()
ax.scatter(XVals, YVals)
plt.show()

This gives me the obvious error:

ValueError: x and y must be the same size

I haven't been able to find any examples of a similar plot (multiple Y values with categorical X values). Any help would be appreciated!

1
I would like to mention that this is version dependent. With matplotlib 2.2 or higher the first code runs fine and produces the desired plot. Also the answer below will only work for matplotlib 2.2 or higher. Which version are you using? Is it an option to update or are you specifically asking for a solution for your version?ImportanceOfBeingErnest
Wow! Nice catch. I'm using the anaconda distribution on my work PC. Interestingly enough, matplotlib 2.0.2 was installed and up to date according to anaconda. I've updated the package manually to the latest stable release and it's working just fine. Thanks again!JeremyD

1 Answers

1
votes

One option is to create flattened lists for the data. The first list, X, will contain the day of each data point. Each day is repeated n times, where n is the number of data points for that day. The second list Y is simply a flattened version of YVals.

import matplotlib.pyplot as plt

XVals = ['10-Dec-18', '11-Dec-18']
YVals = [[0.88, 0.78, 0.92, 0.98, 0.91],[0.88, 0.78, 0.92, 0.98]]

X = [XVals[i] for i, data in enumerate(YVals) for j in range(len(data))]
Y = [val for data in YVals for val in data]

plt.scatter(X, Y)
plt.show()

enter image description here