1
votes

I'm a beginner learning how to subset specific rows and columns from a dataset in R. I am using the state.x77 dataset in R Studio as practice. When I try to select specified columns, I get the following error message:

library(dplyr)
library(tibble)

select(state.x77, Income, HS Grad)
Error: unexpected symbol in "select(state.x77, Income, HS Grad"

I do not understand what symbol in that line of code is not correct.

Also, if I am trying to filter a certain state in addition to selecting certain columns (variables), how do I use the filter function when the list of states is the row names? When I try:

rownames_to_column(state.x77, var = "State")

it creates a column called State for the state names but it does not seem to be permanent when I go to view state.x77 (and thus I can't use the filter function).

I am sorry, I am very much a beginner. Any help would be appreciated.

Thank you.

2

2 Answers

1
votes

There are two issues. First, state.x77 is a matrix, so you need to convert it to a data frame because select function from the dplyr package only takes data frame as the first argument. Second, if there are spaces in the column names, it is necessary to use `` or "" to enclose the column name.

# Load package
library(dplyr)

# Show the class of state.x77
class(state.x77)
# [1] "matrix"

# Convert state.x77 to a data frame
state.x77_df <- as.data.frame(state.x77)

# Show the class of state.x77_df
class(state.x77_df)
[1] "data.frame"

# Select Income and `HS Grad` columns
# All the following will work
select(state.x77_df, Income, `HS Grad`)
select(state.x77_df, "Income", "HS Grad")
select(state.x77_df, c("Income", "HS Grad"))

For your second question, you have to save the output back to the object as follows.

library(tibble)

state.x77_df <- rownames_to_column(state.x77_df,  var = "State")
head(state.x77_df) 
       State Population Income Illiteracy Life Exp Murder HS Grad Frost   Area
1    Alabama       3615   3624        2.1    69.05   15.1    41.3    20  50708
2     Alaska        365   6315        1.5    69.31   11.3    66.7   152 566432
3    Arizona       2212   4530        1.8    70.55    7.8    58.1    15 113417
4   Arkansas       2110   3378        1.9    70.66   10.1    39.9    65  51945
5 California      21198   5114        1.1    71.71   10.3    62.6    20 156361
6   Colorado       2541   4884        0.7    72.06    6.8    63.9   166 103766
0
votes
# Convert state.x77 into a dataframe and renaming rowname into State column
df <- tibble::rownames_to_column(data.frame(state.x77), var = "State")

## You can select any columns by their column names or by index
# by column names
 col_names <- c("Income", "HS.Grad")
 df[,col_names]

# by column index
 col_index <- c(3,7)
 df[, col_index]

# Filtering(subsetting) data by state
subset(df, df$State == "Arizona")

 State   Population Income  Illiteracy  Life.Exp Murder HS.Grad  Frost  Area
Arizona       2212   4530        1.8    70.55     7.8    58.1     15   113417