How can a add a row to a data frame in R?

160

votes

In R, how do you add a new row to a data frame once the data frame has already been initialized?

So far I have this:

df <- data.frame("hi", "bye")
names(df) <- c("hello", "goodbye")

#I am trying to add "hola" and "ciao" as a new row
de <- data.frame("hola", "ciao")

merge(df, de) # Adds to the same row as new columns

# Unfortunately, I couldn't find an rbind() solution that wouldn't give me an error

Any help would be appreciated

r dataframe

assign names to de too. names(de) <- c("hello","goodbye") and rbind – Khashaa

Or in one line rbind(df, setNames(de, names(df))) – Rich Scriven

This really is an area which base R fails miserably at, and has for a long time: stackoverflow.com/questions/13599197/… – thelatemail

@thelatemail disagree. data frames are a special structure in r. a list of lists with common dimnames and attributes and methods. I think it is very expected that one cannot rbind(data.frame(a = 1), data.frame(b = 2)).. why would you want to? I would hope that would throw an error regardless. It's like merge'ing with a random by variable. And this is 2015, doesn't everyone set options(stringsAsFactors = FALSE)? – rawr

@rawr - sure, different names shouldn't be bound, but R can't handle binding no names to no names, binding names to no names with the same dimensions, or binding new data to incorporate new factor levels. I think that's a weakness. Particularly when it can handle binding repeated names and all NA names. And setting stringsAsFactors=FALSE can be a quick fix, but changing the defaults that other people are going to have set differently can really ruin a day. – thelatemail

155

votes

Like @Khashaa and @Richard Scriven point out in comments, you have to set consistent column names for all the data frames you want to append.

Hence, you need to explicitly declare the columns names for the second data frame, de, then use rbind(). You only set column names for the first data frame, df:

df<-data.frame("hi","bye")
names(df)<-c("hello","goodbye")

de<-data.frame("hola","ciao")
names(de)<-c("hello","goodbye")

newdf <- rbind(df, de)

148

votes

Let's make it simple:

df[nrow(df) + 1,] = c("v1","v2")

49

votes

Or, as inspired by @MatheusAraujo:

df[nrow(df) + 1,] = list("v1","v2")

This would allow for mixed data types.

45

votes

There's now add_row() from the tibble or tidyverse packages.

library(tidyverse)
df %>% add_row(hello = "hola", goodbye = "ciao")

Unspecified columns get an NA.

18

votes

I like list instead of c because it handles mixed data types better. Adding an additional column to the original poster's question:

#Create an empty data frame
df <- data.frame(hello=character(), goodbye=character(), volume=double())
de <- list(hello="hi", goodbye="bye", volume=3.0)
df = rbind(df,de, stringsAsFactors=FALSE)
de <- list(hello="hola", goodbye="ciao", volume=13.1)
df = rbind(df,de, stringsAsFactors=FALSE)

Note that some additional control is required if the string/factor conversion is important.

Or using the original variables with the solution from MatheusAraujo/Ytsen de Boer:

df[nrow(df) + 1,] = list(hello="hallo",goodbye="auf wiedersehen", volume=20.2)

Note that this solution doesn't work well with the strings unless there is existing data in the dataframe.

13

votes

Not terribly elegant, but:

data.frame(rbind(as.matrix(df), as.matrix(de)))

From documentation of the rbind function:

For rbind column names are taken from the first argument with appropriate names: colnames for a matrix...

3

votes

If you want to make an empty data frame and add contents in a loop, the following may help:

# Number of students in class
student.count <- 36

# Gather data about the students
student.age <- sample(14:17, size = student.count, replace = TRUE)
student.gender <- sample(c('male', 'female'), size = student.count, replace = TRUE)
student.marks <- sample(46:97, size = student.count, replace = TRUE)

# Create empty data frame
student.data <- data.frame()

# Populate the data frame using a for loop
for (i in 1 : student.count) {
    # Get the row data
    age <- student.age[i]
    gender <- student.gender[i]
    marks <- student.marks[i]

    # Populate the row
    new.row <- data.frame(age = age, gender = gender, marks = marks)

    # Add the row
    student.data <- rbind(student.data, new.row)
}

# Print the data frame
student.data

Hope it helps :)

1

votes

I need to add stringsAsFactors=FALSE when creating the dataframe.

> df <- data.frame("hello"= character(0), "goodbye"=character(0))
> df
[1] hello   goodbye
<0 rows> (or 0-length row.names)
> df[nrow(df) + 1,] = list("hi","bye")
Warning messages:
1: In `[<-.factor`(`*tmp*`, iseq, value = "hi") :
  invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, iseq, value = "bye") :
  invalid factor level, NA generated
> df
  hello goodbye
1  <NA>    <NA>
>

.

> df <- data.frame("hello"= character(0), "goodbye"=character(0), stringsAsFactors=FALSE)
> df
[1] hello   goodbye
<0 rows> (or 0-length row.names)
> df[nrow(df) + 1,] = list("hi","bye")
> df[nrow(df) + 1,] = list("hola","ciao")
> df[nrow(df) + 1,] = list(hello="hallo",goodbye="auf wiedersehen")
> df
  hello         goodbye
1    hi             bye
2  hola            ciao
3 hallo auf wiedersehen
>

1

votes

Make certain to specify stringsAsFactors=FALSE when creating the dataframe:

> rm(list=ls())
> trigonometry <- data.frame(character(0), numeric(0), stringsAsFactors=FALSE)
> colnames(trigonometry) <- c("theta", "sin.theta")
> trigonometry
[1] theta     sin.theta
<0 rows> (or 0-length row.names)
> trigonometry[nrow(trigonometry) + 1, ] <- c("0", sin(0))
> trigonometry[nrow(trigonometry) + 1, ] <- c("pi/2", sin(pi/2))
> trigonometry
  theta sin.theta
1     0         0
2  pi/2         1
> typeof(trigonometry)
[1] "list"
> class(trigonometry)
[1] "data.frame"

Failing to use stringsAsFactors=FALSE when creating the dataframe will result in the following error when attempting to add the new row:

> trigonometry[nrow(trigonometry) + 1, ] <- c("0", sin(0))
Warning message:
In `[<-.factor`(`*tmp*`, iseq, value = "0") :
  invalid factor level, NA generated

0

votes

There is a simpler way to append a record from one dataframe to another IF you know that the two dataframes share the same columns and types. To append one row from xx to yy just do the following where i is the i'th row in xx.

yy[nrow(yy)+1,] <- xx[i,]

Simple as that. No messy binds. If you need to append all of xx to yy, then either call a loop or take advantage of R's sequence abilities and do this:

zz[(nrow(zz)+1):(nrow(zz)+nrow(yy)),] <- yy[1:nrow(yy),]

0

votes

To formalize what someone else used setNames for:

add_row <- function(original_data, new_vals_list){ 
  # appends row to dataset while assuming new vals are ordered and classed appropriately. 
  # new_vals must be a list not a single vector. 
  rbind(
    original_data,
    setNames(data.frame(new_vals_list), colnames(original_data))
    )
  }

It preserves class when legal and passes errors elsewhere.

m <- mtcars[ ,1:3]
m$cyl <- as.factor(m$cyl)
str(m)

#'data.frame':  32 obs. of  3 variables:
# $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
# $ disp: num  160 160 108 258 360 ...

Factor preserved when adding 4, even though it was passed as a numeric.

str(add_row(m, list(20,4,160)))
#'data.frame':  33 obs. of  3 variables:
# $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ... 
# $ disp: num  160 160 108 258 360 ...

Attempting to pass a non- 4,6,8 would return an error that factor level is invalid.

str(add_row(m, list(20,3,160)))
# 'data.frame': 33 obs. of  3 variables:
# $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
# $ disp: num  160 160 108 258 360 ...
Warning message:
In `[<-.factor`(`*tmp*`, ri, value = 3) :
  invalid factor level, NA generated

How can a add a row to a data frame in R?

11 Answers