1
votes

I have simple x,y data from a csv file of which I want to plot a linear fit. I followed the example in the first answer to this question: Linear regression with matplotlib / numpy

My code looks like this:

#!/usr/bin/env python
import matplotlib.axes as ax
import matplotlib.pyplot as plt
import numpy as np
import csv
import seaborn
from scipy import stats

 x = []
 y = []
 z = []

with open('Data.csv','r') as csvfile:
plots = csv.reader(csvfile, delimiter=',')
for row in plots:
    x.append(float(row[0]))
    y.append(float(row[2]))



xarray = np.array(x)  #Convert data from csv into arrays
yarray = np.array(y)

m,b = np.polyfit(xarray,yarray,1) 
plt.plot(xarray, yarray,'b+', m*xarray+b,'--k')
plt.plot(x,y,'ko')



 f = [28.45294177, 61.06207611, 85.51892687,115.21653136,143.7495239] #this is the array 
  resulting from m*x+b

 plt.plot(m*xarray+b)
 plt.plot(x,f, 'r+')
 plt.xlabel('Masse [kg]')
 plt.ylabel('Auslenkung[mm]')
 ax = plt.gca()
 ax.set_xlim([0,0.3])
 plt.title('')
 plt.grid(True, linestyle = '--') #enable Grid, dashed linestyle

 plt.show()

The output is:

This graph

However, the resulting Graph (Blue line) is not at all how it is to be expected, the slope is way to small. When I get the values of the array that results from the m*x+b function and plot it, the values correspond to the expected linear regression and to the actual Data (red pluses)

Honestly, I am at wits end here. I can't seem to figure out where my mistake is and neither do I understand where the blue line results from.

Any help would be greatly appreciated

2
Could you please fix your indentation and perhaps also provide the Data.csv file (perhaps copy paste it here, looks like it is only 5 points)norok2

2 Answers

2
votes

plt.plot(m*xarray+b) should be plt.plot(xarray, m*xarray+b). Otherwise matplotlib will use range(0, (m*xarray+b).size) for the X asis, as described in the docs, on the third line here:

>>> plot(x, y)        # plot x and y using default line style and color
>>> plot(x, y, 'bo')  # plot x and y using blue circle markers
>>> plot(y)           # plot y using x as index array 0..N-1 <HERE>
>>> plot(y, 'r+')     # ditto, but with red plusses
0
votes

I extracted data from your plot for analysis. Here is a graphical Python polynomial fitter that uses numpy.polyfit() for fitting and numpy.polyval() for evaluation. You can set the polynomial order at the top of the code. This will also draw a scatterplot of regression error. Replace the hard-coded data in the example with your xarray and yarray data from the csv file and you should be done. plot

import numpy, matplotlib
import matplotlib.pyplot as plt

xData = numpy.array([5.233e-02, 1.088e-01, 1.507e-01, 2.023e-01, 2.494e-01])
yData = numpy.array([3.060e+01, 5.881e+01, 8.541e+01, 1.161e+02, 1.444e+02])


polynomialOrder = 1 # example linear equation


# curve fit the test data
fittedParameters = numpy.polyfit(xData, yData, polynomialOrder)
print('Fitted Parameters:', fittedParameters)

# predict a single value
print('Single value prediction:', numpy.polyval(fittedParameters, 0.175))

# Use polyval to find model predictions
modelPredictions = numpy.polyval(fittedParameters, xData)
regressionError = modelPredictions - yData

SE = numpy.square(regressionError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(regressionError) / numpy.var(yData))
print('RMSE:', RMSE)
print('R-squared:', Rsquared)

print()


##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    # first the raw data as a scatter plot
    axes.plot(xData, yData,  'D')

    # create data for the fitted equation plot
    xModel = numpy.linspace(min(xData), max(xData))
    yModel = numpy.polyval(fittedParameters, xModel)

    # now the model as a line plot
    axes.plot(xModel, yModel)

    axes.set_title('numpy polyfit() and polyval() example') # add a title
    axes.set_xlabel('X Data') # X axis data label
    axes.set_ylabel('Y Data') # Y axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot


def RegressionErrorPlot(graphWidth, graphHeight):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    axes.plot(yData, regressionError, 'D')

    axes.set_title('Regression error') # add a title
    axes.set_xlabel('Y Data') # X axis data label
    axes.set_ylabel('Regression Error') # Y axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot



graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
RegressionErrorPlot(graphWidth, graphHeight)