0
votes

I am analyzing a dataset and want to quantify the linear regression of total dataset. The dataset has different dependent variables, but with a single independent variable.

I have tried to use the simple linear regression using stat_smooth() from ggplot2 package. This gives the individual regression line for a dataset, but what I want to do is to combine these regression lines into a single regression line that represents both these and maybe an average of more regression lines

The image is here

    geom_point(aes(x= DateAndTime, y= T_423), na.rm=TRUE, color="purple", shape=19, size=3)+
    geom_point(aes(x= DateAndTime, y= T_422), na.rm=TRUE, color="red", shape=8, size=1)+
    ggtitle("Module Temperature STP423 - Total distribution") +
           xlab("Date") + ylab("Module Temperature (C)")


The data looks something like this:

        Dates            X1            X2
1    2014-01-04      8.0645816      7.2969667
2    2014-01-06      7.7804850      7.1507470
3    2014-01-07      8.8772607      8.6917391
4    2014-01-08      8.8943146      8.3475009
5    2014-01-10      11.6734008     10.6493480
6    2014-01-11      9.0915727      8.5793932
7    2014-01-12      9.5216658      9.4891858
8    2014-01-13     -6.2493962     -6.9360515

1
Combine them into a long dataset, with an extra column indicating the source that you can set the color by. If you need additional help, please provide a small sample of input data, preferably shared in a copy/pasteable way. (dput(x) produces a copy/pasteable representation of x, or share code to simulate/create example data)Gregor Thomas
This FAQ is basically what you want to do. It's close enough I'm tempted to close this question as a duplicate... if you want to keep this as a separate question it certainly needs sample data.Gregor Thomas
@Gregor This is a different question. Maybe I couldn't explain it well. What I want to achieve is to have a general linear regression (both line and equation) in the graph for multiple parameters. Date is independent variable here and the temperature of two different locations is denoted by X1 and X2. Both will have their own dependence on the date. What I want to achieve is the single regression line denoting both X1 and X2.Basant
I understand perfectly. The same solution will work.Gregor Thomas

1 Answers

0
votes

ggplot2 works best with long-format data. geom_smooth, for example, requires all it's y-values to be in a single column. So we need to convert your data to long format. I re-use the code from this FAQ in particular (because it's about ggplot), but another FAQ, How to reshape data from wide to long? covers several other methods.

test_data = read.table(text = '        Dates            X1            X2
1    2014-01-04      8.0645816      7.2969667
2    2014-01-06      7.7804850      7.1507470
3    2014-01-07      8.8772607      8.6917391
4    2014-01-08      8.8943146      8.3475009
5    2014-01-10      11.6734008     10.6493480
6    2014-01-11      9.0915727      8.5793932
7    2014-01-12      9.5216658      9.4891858
8    2014-01-13     -6.2493962     -6.9360515', header = T)

test_data$Dates = as.Date(test_data$Dates)

# code copy/pasted from linked FAQ, only changed id = "date" to id = "Dates"
library("reshape2")
library("ggplot2")

test_data_long <- melt(test_data, id = "Dates")  # convert to long format


# now we can plot:

ggplot(test_data_long, aes(x = Dates, y = value)) +
  geom_point(aes(color = variable, size = variable, shape = variable)) +
  geom_smooth() +
  labs(title = "Module Temperature STP423 - Total distribution",
       x = "Date",
       y = "Module Temperature (C)") +
  scale_size_manual(values = c(1, 3)) +
  scale_color_manual(values = c("red", "purple")) +
  scale_shape_manual(values = c(8, 19))

enter image description here