I'm attempting to read this fixed width file into R using read.fwf:
http://www.cpc.ncep.noaa.gov/data/indices/wksst8110.for
When I perform this function I'm getting some weird errors that I cannot sort out unless I read it a very specific way:
> fwf <- read.fwf("getdata_wksst8110.for", 1:9, skip = 4)
> head(fwf)
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 NA 3 JAN 1990 NA 23.4-0 0.4 25.1-0.3 26.6
2 NA 10 JAN 1990 NA 23.4-0 0.8 25.2-0.3 26.6
3 NA 17 JAN 1990 NA 24.2-0 0.3 25.3-0.3 26.5
4 NA 24 JAN 1990 NA 24.4-0 0.5 25.5-0.4 26.5
5 NA 31 JAN 1990 NA 25.1-0 0.2 25.8-0.2 26.7
6 NA 7 FEB 1990 NA 25.8 0 0.2 26.1-0.1 26.8
However, you clearly see that by comparing the output to the original file it's not right. There should indeed be 9 columns, but it's cutting up my date columns and the other columns.
If I use a sep = " " argument it just throws an error:
> fwf <- read.fwf("getdata_wksst8110.for", 1:9, skip = 4, sep = " ")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 6 did not have 25 elements
Could someone, please, help me figure out why this isn't reading in the way I would expect?
This is a helpful link I found related to using this function but it's more of a performance related question. The author never defined his widths = col arguments.
Thank you for your consideration of this puny question.
So I re-ran the operation using the vector of widths as recommended by @MrFlick and the data is looking a lot better. However, what I am seeing is that the "sep" argument is clearly reeking havoc. If I use sep = " " it's throwing a strange error. But if I don't use sep then it jerks up my column results.
*
Non-jerked results using widths = c(10, 4, 4, 4, 4, 4, 4, 4, 4)
> head(fwf)
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 03JAN1990 NA 23 4-0. 4 25 .1-0 0.3 2
2 10JAN1990 NA 23 4-0. 8 25 .2-0 0.3 2
3 17JAN1990 NA 24 2-0. 3 25 .3-0 0.3 2
4 24JAN1990 NA 24 4-0. 5 25 .5-0 0.4 2
5 31JAN1990 NA 25 1-0. 2 25 .8-0 0.2 2
6 07FEB1990 NA 25 8 0. 2 26 .1-0 0.1 2
Jerked results using:
fwf <- read.fwf("getdata_wksst8110.for", widths = c(10, 4, 4, 4, 4, 4, 4, 4, 4), skip = 4, sep = " ") Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 6 did not have 25 elements
Am I missing something with sep?
#
A modification of the awesome @MrFlick's script appears to have fit the bill (more or less)! That first row remained troublesome and made it impossible for my to summarize/sum on hd[4]. Removing the first row hd[-1,] didn't seem to help at all oddly enough. Oh well.
hd<-read.fwf("http://www.cpc.ncep.noaa.gov/data/indices/wksst8110.for",
widths=c(10,rep(c(9,4),4)), skip=3)
trim <- function(x) gsub("^\\s+|\\s+$","",x)
main <- paste0(trim(hd[1,seq(2, ncol(hd), by=2)]), trim(hd[1,seq(3, ncol(hd), by=2)]))
sub <- trim(as.vector(hd[2,]))
names(hd) <- make.names(c(sub[1],paste(rep(main, each=2), sub[-1])))
1:9
is doing? That parameter should be specifying the width of each column (in terms of number of characters). It doesn't seem as though you've correctly specified the column widths at all. Also, you may want to look at theread_fwf
function from the readr package because the baseread.fwf
function is pretty inefficient (should that be a concern). – MrFlickc(8, 4, ...)
. You specify a width for each of the 9 columns. – MrFlickwidths = 4
, means you have just one column with width 4. If you have 9 columns of width 4, you would dowidths=c(4,4,4,4,4,4,4,4,4)
or, more succinctly,widths=rep(4,9)
. That's the thing with fixed-width files, you need to specify all the widths of all the columns; that's the only way to know how to parse the file. – MrFlick