5
votes

When I read a CSV file containing a trailing delimiter using readr::read_csv, I get a warning that a missing column name was filled in. Here is the contents of a short example CSV file to reproduce this warning (store the following snippet in a file called example.csv):

A,B,C,
2,1,1,
14,22,5,
9,-4,8,
17,9,-3,

Note the trailing comma at the end of each line. Now if I load this file with

read_csv("example.csv")

I get the following warning:

Missing column names filled in: 'X4'

Even if I want to explicitly load only the 3 columns with

read_csv("example.csv", col_types=cols_only(A=col_integer(),
                                            B=col_integer(),
                                            C=col_integer()))

I still get the warning message.

Is this the expected behavior or is there some way to tell read_csv that it is supposed to ignore all columns except the ones I specify? Or is there another way to tidy up this (apparently malformed) CSV so that trailing delimiters are deleted/ignored?

2
Can you add a small example that shows the problem? Does the warning affect the output in some way or is it just a message?aosmith
It is just a warning message, but it seems strange that even with cols_only all columns seem to be imported. I edited my question to include a small example CSV file to show the problem.cbrnr

2 Answers

3
votes

I don't think you can. From what I can see in the documentation, cols_only() is for R objects that you have already loaded in.

However, the fread() function from the data.table library allows you to select specific column names as a file is read in:

DT <- fread("filename.csv", select = c("colA","colB"))

2
votes

Here's another example with error message.

> read_csv("1,2,3\n4,5,6", col_names = c("x", "y"))
Warning: 2 parsing failures.
row # A tibble: 2 x 5 col     row   col  expected    actual         file expected   <int> <chr>     <chr>     <chr>        <chr> actual 1     1  <NA> 2 columns 3 columns literal data file 2     2  <NA> 2 columns 3 columns literal data

# A tibble: 2 x 2
      x     y
  <int> <int>
1     1     2
2     4     5

Here is the fix/hack. Also see this SOF link. Suppress reader parse problems in r

> suppressWarnings(read_csv("1,2,3\n4,5,6", col_names = c("x", "y")))
# A tibble: 2 x 2
      x     y
  <int> <int>
1     1     2
2     4     5