Reading PISA data into R - read.table error

Question

I am trying to read data from the PISA 2012 study (http://pisa2012.acer.edu.au/downloads.php) into R using the read.table function. This is the code I tried:

pisa  <- read.table("pisa2012.txt", sep = "")

unfortunately I keep getting the following error message:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  
: line 2 did not have 184 elements

I have tried to set

header = T

but then get the following error message

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  
 :line 1 did not have 184 elements

Lastly, this is what the .txt file looks like ...

http://postimg.org/image/4u9lqtxqd/

Thanks for your help!

Your sep value here describes no separation. Is there a regular separating value? Appears as though it could be tabs, denoted by "\t" in the sep arguement. — Badger
If I use "\t" in the sep argument, R loads the data but only detects one single variable in the data.frame =/ — sascha91
Looking at the image, it's hard to tell what the first line is. Try using readLines to read each line of the data into R to check what the first line actually is (opening large text file in text editor can be misleading with so many line wraps and blank space). Also, if you could post the first few lines of data, people would have more to help you with. — Heisenberg
Thanks Heisenberg. I tried "readLines("pisa2012.txt", n = 3)" This is what I get: postimg.org/image/hhfsxxei1 — sascha91

Leslie Leslie · Accepted Answer · 2015-10-07T09:40:16

You can see from the first line that you'll need some sort of control file to delimit the individual variables. So, from working with PISA in other environments, I know the first three columns corrrespond to the ISO 3 letter country code (e.g., ALB). What follows are numbers and letters that need to be made sense of in a meaninful way by separating them. You could use the codebook for this (https://pisa2012.acer.edu.au/downloads/M_stu_codebook.pdf), but that is a real bear for every single variable. Why not download in SPSS or sAS and import? Not a 'slick' solution, but without a control file, you'd have a lot of manual work to do.

Reading PISA data into R - read.table error

2 Answers