1
votes

I have been trying to plot a graph between two columns from a data frame which I had created. The data values stored in the first column is daily time data named "Time"(format- YYYY-MM-DD) and the second column contains precipitation magnitude, which is a numeric value named "data1".

This data is taken from an excel file "St Lucia3" which has a total 11598 data points and stores daily precipitation data from 1981 to 2018 in two columns:

1) YearMonthDay (format- "YYYYMMDD", example "19810501")

2) Rainfall (mm)

The code for importing data into R:

StLucia <- read_excel("C:/Users/hp/Desktop/St Lucia3.xlsx")

The code for time data "Time" :

Time <- as.Date(as.character(StLucia$YearMonthDay), format= "%Y%m%d")

The code for precipitation data "data1" :

data1 <- na.ma(StLucia$`Rainfall (mm)`, k = 4, weighting = "exponential")

The code for data frame "Pecip1" :

Precip1 <- data.frame(Time, data1, check.rows=TRUE)

The code for ggplot is:

ggplot(data = Precip1, mapping= aes(x= Time, y= data1)) + geom_line()

Using ggplot for plotting the graph between "Time" and "data1" results as:Link to the Rplot between "data1" and "Time"

Can someone please explain to me why there is an "unusual kink" like behavior at the right end of the graph, even though there are no such values in the column "data1".

The plot of "data1" data against its index is as shown:Link for Rplot for "data1" against its index

The code for this plot is:

plot(data1, type = "l")

Any help would be highly appreciated. Thanks!

2
There is no data between 2015 and where the plot picks up again at ~2017. You are plotting a line chart so it is connecting these points. Perhaps you have some missing data?Chris
Can you include your call to ggplot? It looks like you're using some kind of line plot where you want to use a bar plot.divibisan
It looks like there is a gap in the data. It has data up to sometime in 2014, then has nothing until late 2017 or early 2018. Try searching for dates in 2015 or 2016. I'll bet there are none.G5W
@KaranChaudhary Data can still go 1,2,3,4 in the file, but not go in consecutive order by date. So the data will not skip lines in the index because you don't have those dates in the file itself. Therefore 12/1/2015 can be right before 5/31/2017, you would only see this gap if you graphed by dates, because in the index there is no gap.Chabo
@KaranChaudhary: usually it's better to use geom_col for plotting precipitation dataTung

2 Answers

3
votes

By using pad we can make up for those lost values an assign an NA value as to avoid plotting in the region of missing data.

library(padr)
library(zoo)

YearMonthDay<-c(19810501,19810502,19810504,19810505)
Data<-c(1,2,3,4)

StLucia<-data.frame(YearMonthDay,Data)

 StLucia$YearMonthDay <- as.Date(as.character(StLucia$YearMonthDay), format= 
 "%Y%m%d")

> StLucia
  YearMonthDay Data
1   1981-05-01    1
2   1981-05-02    2
3   1981-05-04    3
4   1981-05-05    4

Note: you can see we are missing a date, but still there is no gap between position 2 and 3, thus plotting versus indexing you would not see a gap.

So lets add the missing date:

 StLucia<-pad(StLucia,interval="day")

> StLucia
   YearMonthDay Data
 1   1981-05-01    1
 2   1981-05-02    2
 3   1981-05-03   NA
 4   1981-05-04    3
 5   1981-05-05    4

 plot(StLucia, type = "l")

enter image description here

If you want to fill in those NA values, use na.locf() from package(zoo)

1
votes

Here is a reproducible example - change the names to match your data.

# create sample data
set.seed(47)
dd = data.frame(t = Sys.Date() + c(0:5, 30:32), y = runif(9))

# demonstrate problem
ggplot(dd, aes(t, y)) +
    geom_point() +
    geom_line()

enter image description here

The easiest solution, as Tung points out, is to use a more appropriate geom, like geom_col:

ggplot(dd, aes(t, y)) +
    geom_col()

enter image description here

If you really want to use lines, you should fill in the missing dates with NA for rainfall. H

# calculate all days
all_days = data.frame(t = seq.Date(from = min(dd$t), to = max(dd$t), by = "day"))
# join to original data
library(dplyr)
dd_complete = left_join(all_days, dd, by = "t")

# ggplot won't connect lines across missing values
ggplot(dd_complete, aes(t, y)) +
    geom_point() +
    geom_line()

enter image description here

Alternately, you could replace the missing values with 0s to have the line just go along the axis, but I think it's nicer to not plot the line, which implies no data/missing data, rather than plot 0s which implies no rainfall.