I am trying to solve a problem of importing xls data into R with readxl package. The specific xls file has 18 columns and 472 rows, first 7 rows have descriptive text that needs to be skipped. I only want to select col 1,3,6:9 out of the 18 columns for EDA. They have mixed types including date, numeric and text.
The readxl seems not able to import non-continous columns directly. My plan is to use skip =7 to read the entire sheet first and use select next step. However, the problem is readxl guess the date type to numeric by default. Is there a way in readxl to specify col_types by column name?
A reproducible code with example xlsx for a work around demostration.
library(readxl)
xlsx_example <- readxl_example("datasets.xlsx")
# read the entire table
read_excel(xlsx_example)
# select specific column to name - following code does not work
read_excel(xlsx_example, col_types=col (Sepal.Length = "numeric"))
col_types = "text"
. This will set all columns to text by default. From there you can select the relevant columns keep and convert each column to an appropriate type after import. – markdly