2
votes

Here's my code.

require(RCurl)

u="https://docs.google.com/spreadsheet/pub?key=0AmKR51OFQvWqdFd5XzgwQXlrZ2t3a2dYUXFSMl85TFE&single=true&gid=0&output=csv"

#1st attmept
tc <- getURL(u, ssl.verifypeer=FALSE, .encoding="UTF-8")
tcc <- textConnection(tc, encoding="UTF-8")
readLines(tcc, encoding="UTF-8")

#2nd attmept
tc <- getURL(u, ssl.verifypeer=FALSE, .encoding="UTF-8")
tcc <- textConnection(tc, encoding="UTF-8")
read.csv(tcc, header=T, sep=",", fileEncoding="UTF-8")
read.csv(tcc, header=F, sep=",", fileEncoding="UTF-8")

I have a problem regarding "read.csv" from google doc 1st attempt is succesful but 2nd attempt is unsuccessful

read.csv(tcc, header=T, sep=",", fileEncoding="UTF-8")
Error in make.names(col.names, unique = TRUE) : invalid multibyte string at '<ed><83><80>?꾩뒪<ed>꺃<ed>봽'

read.csv(tcc, header=F, sep=",", fileEncoding="UTF-8")
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) : 

invalid multibyte string at '<82>?<90>'

I don't know why. I am using Windows Vista

and

R version 3.0.0 (2013-04-03) Platform: x86_64-w64-mingw32/x64 (64-bit)

locale: [1] LC_COLLATE=Korean_Korea.949 LC_CTYPE=Korean_Korea.949 LC_MONETARY=Korean_Korea.949 LC_NUMERIC=C
[5] LC_TIME=Korean_Korea.949

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] RCurl_1.95-4.1 bitops_1.0-6

I guess it's about Encoding, but why readLines works but read.csv doesn't?

And is there a way to get around this problem?

I would be much better if I could use read.csv

1

1 Answers

0
votes

I'd try:

library(data.table)
?fread

fread is pretty smart determining data types for you, it might solve the problem.