1
votes

I was interested in looking at GDP of a few states over a span of 4 years. After I imported the .csv file, I renamed the column names and then removed irrelevant rows. The result is that the data skips the 10th row when numbered. So it goes from 1 to 9, then starts at 11.

When I tried this with a similar dataframe I imported from a .xls file, the data does not skip the 10th row when numbered.

gdp<-read.csv("GDP_per.csv",skip = 4)
gdp<-gdp%>%
  rename(
    "2014" = X2013.2014,
    "2015" = X2014.2015,
    "2016" = X2015.2016,
    "2017" = X2016.2017,
    "2018" = X2017.2018
  )
gdp<-gdp[c(-(10),-(53:64)),]


gdp2<-read_excel("GDP_dol.xls", skip = 5)
gdp2<-gdp2[,c(2,20:24)]
gdp2<-gdp2[c(-(10),-(53:64)),]

9 Delaware 10.7 5.5 -0.7 2.5 3.9

11 Florida 4.9 6.5 5.0 4.4 5.8

vs.

9 Delaware 67178.9 70896.2 70379.8 72167.2 74973.3

10 Florida 839706.0 894044.0 938370.3 979464.6 1036323.2

2
Please make this question reproducible. This includes sample code (including listing non-base R packages), sample unambiguous data (e.g., dput(head(x)) or data.frame(x=...,y=...)), and expected output. Refs: stackoverflow.com/questions/5963269, stackoverflow.com/help/mcve, and stackoverflow.com/tags/r/info.r2evans

2 Answers

0
votes

The read.csv function returns a data.frame while read_excel returns a tibble. They are not the same and do not necessarily behave the same way. A data frame retains the original row names until you change them, e.g.

(x <- data.frame(V1=1:10, V2=11:20))
(x2 <- x[-5, ])                # Row name 5 is missing
rownames(x2) <- NULL
x2                             # Row names 1 - 9

A tibble automatically renumber the rows:

library(tidyr)
xt <- tibble(x)
(xt[-5, ])
0
votes

I would suggest you to use the read_csv() function from the readr package which imports it as a tibble thus resulting in the same behaviour.