13
votes

How can I simplify or perform the following operations using dplyr:

  1. Run a function on all data.frame names, like mutate_each(funs()) for values, e.g.

    names(iris) <- make.names(names(iris))
    
  2. Delete columns that do NOT exist (i.e. delete nothing), e.g.

    iris %>% select(-matches("Width")) # ok
    iris %>% select(-matches("X"))     # returns empty data.frame, why?
    
  3. Add a new column by name (string), e.g.

    iris %>% mutate_("newcol" = 0) # ok
    
    x <- "newcol"
    iris %>% mutate_(x = 0) # adds a column with name "x" instead of "newcol"
    
  4. Rename a data.frame colname that does not exist

    names(iris)[names(iris)=="X"] <- "Y"
    
    iris %>% rename(sl=Sepal.Length) # ok
    iris %>% rename(Y=X)             # error, instead of no change
    
4
For number 3 why not? iris %>% mutate_( 'x' = 0)IRTFM
@BondedDust, that adds a column named "x" while they want it named "newcol" or whatever name is stored n x.talat
x <- "Sepal.Length"; iris %>% rename_(.dots = setNames(x,"sl")) works but that can not be used for (4) because a missing colnames throws an errorckluss
It looks like iris %>% select(-matches("X")) now returns the full iris data.frame. The everything argument in the answer below isn't necessary anymore.Tedward

4 Answers

12
votes
  1. I would use setNames for this:

iris %>% setNames(make.names(names(.)))
  1. Include everything() as an argument for select:

iris %>% select(-matches("Width"), everything())
iris %>% select(-matches("X"), everything())
  1. To my understanding there's no other shortcut than explicitly naming the string like you already do:

iris %>% mutate_("newcol" = 0)
2
votes

I came up with the following solution for #4:

iris %>% 
  rename_at(vars(everything()), 
            function(nm)
              recode(nm, 
                     Sepal.Length="sl",
                     Sepal.Width = "sw",
                     X = "Y")) %>%
  head()

The last line just for convenient output of course.

1
votes

1 through 3 are answered above. I came here because I had the same problem as number 4. Here is my solution:

df <- iris

Set a name key with the columns to be renamed and the new values:

name_key <- c(
  sl = "Sepal.Length",
  sw = "Sepal.Width",
  Y = "X"
)

Set values not in data frame to NA. This works for my purpose better. You could probably just remove it from name_key.

for (var in names(name_key)) {
  if (!(name_key[[var]] %in% names(df))) {
    name_key[var] <- NA
  }
}

Get a vector of column names in the data frame.

cols <- names(name_key[!is.na(name_key)])

Rename columns

for (nm in names(name_key)) {
  names(df)[names(df) == name_key[[nm]]] <- nm
}

Select columns

df2 <- df %>%
  select(cols)

I'm almost positive this can be done more elegantly, but this is what I have so far. Hope this helps, if you haven't solved it already!

1
votes

Answer for the question n.2:

You can use the function any_of if you want to give explicitly the full names of the columns.

iris %>% 
    select(-any_of(c("X", "Sepal.Width","Petal.Width")))

This will not remove the non-existing column X and will remove the other two listed.

Otherwise, you are good with the solution with matches or a combination of any_of and matches.

  iris %>% 
    select(-any_of("X")) %>% 
    select(-matches("Width"))

This will remove explicitly X and the matches. Multiple matches are also possible.

iris %>% 
    select(-any_of("X")) %>%
    select(-matches(c("Width", "Spec"))) # use c for multiple matches