R: How to get the levels from a tibble column

Question

The overall goal:

I have a tibble with a column of country names, a column that contains either the characters "Military" or "Economic", and a column of numbers (that were imported as characters). I need to aggregate the military spending for each country and the economic spending for each country.

My strategy:

I imported the data with

data <- read_excel("/path/name.xlsx")

which now makes data a tibble, but all the values are automatically characters. To make the numeric columns summable, I write

data[,8] <- sapply(data[,8], as.numeric)

Seems to work, not sure if it's the best want to accomplish the goal. Anyway, now I want to get the factors of the country column, so that I can loop over every country name, and then loop over every row of the tibble, aggregating the military spending, and so on.

Question:

However,

levels(data[,3])

returns an error. So is there a way to get the levels without looping over the rows to collect them into a vector, and then going back over the rows trying to collect the military spending sum?

You probably should supply the column types (col_types) as an argument to read_excel readxl.tidyverse.org/articles/cell-and-column-types.html. You can also access columns with data$your_column_name. data[,N] is pretty hard to read. — Jack Brookes
Also you don't need to sapply. Or "loop over" as per your final comment. Most functions in R are vectorized, so no need for loops. — Jack Brookes

Jack Brookes Jack Brookes · Accepted Answer · 2018-07-20T21:00:47

read_excel reads strings as strings and doesn't convert to a factor.

Read the data:

data <- read_excel("/path/name.xlsx",
                       col_types = c("guess", "guess", "guess", "numeric", "guess", "guess", "guess", "text"))
    # I dont know your other column types

Option one, convert to a factor:

data$country <- as.factor(data$country)
levels(data$country)

Option two, use unique, which may give you the same thing depending on what you are doing

unique(data$country)

However, looking at your overall goal, you want to get the sum per country. Using dplyr.

# read fake data
mydata <- read.table(text = "
country spending_type spending
uk military 100
uk military 800
uk military 300
uk military 700
uk economic 500
uk economic 100
uk economic 80
uk economic 200
usa military 3100
usa military 2800
usa military 4300
usa military 3700
usa economic 2500
usa economic 1100
usa economic 280
usa economic 3200", header = TRUE, stringsAsFactors = FALSE)

The magic of R, no loops needed:

library(dplyr)

mydata %>% 
  group_by(country, spending_type) %>% 
  summarise(total_spending = sum(spending))

Output:

  country spending_type total_spending
  <chr>   <chr>                  <int>
1 uk      economic                 880
2 uk      military                1900
3 usa     economic                7080
4 usa     military               13900

R: How to get the levels from a tibble column

1 Answers