2
votes

I am trying to read data from the PISA 2012 study (http://pisa2012.acer.edu.au/downloads.php) into R using the read.table function. This is the code I tried:

pisa  <- read.table("pisa2012.txt", sep = "")    

unfortunately I keep getting the following error message:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  
: line 2 did not have 184 elements    

I have tried to set

header = T

but then get the following error message

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  
 :line 1 did not have 184 elements

Lastly, this is what the .txt file looks like ...

http://postimg.org/image/4u9lqtxqd/

Thanks for your help!

2
Your sep value here describes no separation. Is there a regular separating value? Appears as though it could be tabs, denoted by "\t" in the sep arguement. - Badger
If I use "\t" in the sep argument, R loads the data but only detects one single variable in the data.frame =/ - sascha91
Looking at the image, it's hard to tell what the first line is. Try using readLines to read each line of the data into R to check what the first line actually is (opening large text file in text editor can be misleading with so many line wraps and blank space). Also, if you could post the first few lines of data, people would have more to help you with. - Heisenberg
Thanks Heisenberg. I tried "readLines("pisa2012.txt", n = 3)" This is what I get: postimg.org/image/hhfsxxei1 - sascha91

2 Answers

0
votes

You can see from the first line that you'll need some sort of control file to delimit the individual variables. So, from working with PISA in other environments, I know the first three columns corrrespond to the ISO 3 letter country code (e.g., ALB). What follows are numbers and letters that need to be made sense of in a meaninful way by separating them. You could use the codebook for this (https://pisa2012.acer.edu.au/downloads/M_stu_codebook.pdf), but that is a real bear for every single variable. Why not download in SPSS or sAS and import? Not a 'slick' solution, but without a control file, you'd have a lot of manual work to do.

0
votes

I just read the files using readr package. So what will you need: readr package, the TXT file, SAScii package and the associated sas file.

So, let say you want to read the student files. Then you will need the following files: INT_STU12_DEC03.txt and INT_STU12_DEC03.sas.

##################### READING STUDENT DATA  ###################
## Loading the dictionary
dic_student = parse.SAScii(sas_ri = 'INT_STU12_SAS.sas')

## Creating the positions to read_fwf
student <- read_fwf(file = 'INT_STU12_DEC03.txt', col_positions = fwf_widths(dic_student$width), progress = T)
colnames(student) <- dic_student$varname

OBS 1: As i'm using Linux, I needed to delete the first lines from the sas file and change the encoding to UTF-8.

OBS 2: The lines deleted, were:

libname  M_DEC03 "C:\XXX"; 
filename STU "C:\XXX\INT_STU12_DEC03.txt"; 
options nofmterr;

OBS 3: The dataset takes about 1Gb, so you will need enougth RAM.