0
votes

I am trying to import a SAS data set to R (I cannot share the data set). SAS sees columns as number or character. However, some of the number columns have coded character values. I've used the sas7bdat package to bring in the data set but those character values in number columns return NaN. I would like the actual character value. I have tried exporting the data set to csv and tab delimited files. However, I end up with observations that take 2 lines (a problem with SAS that I haven't been able to figure out). Since there are over 9000 observations I cannot go back and look for those observations that take 2 lines manually. Any ideas how I can fix this?

3
I understand that you cannot share your production data, but you could share some fake sample data that reproduces your problem.. without that, it will be hard to get working answers...Wimpel
While I don't know how well they will help (since you can't share the data), searching SO for [r] [sas] import produces several promising links, including stackoverflow.com/questions/33421854/…. Lacking that, even though you can't share that data (with perhaps 72M rows and 36k columns), you could mimic the data you have with fake data, perhaps 7 rows and 4 columns ... and see what we can do. It's just numbers and letters, just mimic it closely, please.r2evans

3 Answers

0
votes

SAS does NOT store character values in numeric columns. But there are some ways that numeric values will be printed using characters.

First is if you are using BEST format (which is the defualt for numeric variables). If the value cannot be represented exactly in the number of characters then it will use scientific notation.

Second is special missing values. SAS has 28 missing values. Regular missing is represented by a period. The others by single letter or underscore.

Third would be a custom format that displays the numbers using letters.

The first should not cause any trouble when importing into R. The last two can be handled by Haven. See the semantics Vignette in the documentation.

As to your multiple line CSV file there are two possible issues. The first is just that you did not tell SAS to use long enough lines for your data. Just make sure to use a longer LRECL setting on the file you are writing to.

filename csv 'myfile.csv' lrecl=1000000 ;
proc export data=mydata file=csv dbms=csv ; run;

The second possible issue is that some of your character variables include end of line characters in them. It is best to just remove or replace those characters. You could always add them back if they are really wanted. For example these steps will export the same file as above. It will first replace the carriage returns and line feeds in the character variables with pipe characters instead.

data for_export ;
   set mydata;
   array _c _character_;
   do over _c;
     _c = translate(_c,'||','0A0D'x);
   end;
run;
proc export data=for_export file=csv dbms=csv ; run;   
0
votes

partial answer for dealing with data across multiple rows

library( data.table )
#first, read the whole lines into a single colunm, for example with
DT <- data.table::fread( myfile, sep = "")

#sample data for this example: a data.table with ten rows containing the numbers 1 to 10
DT <- data.table( 1:10 )

#column-bind wo subsets of the data, using a logical vector to select the evenery first 
#and every second row. then paste the colums together and collapse using a 
#comma-separator (of whatever separator you like)
ans <- as.data.table(
  cbind ( DT[ rep( c(TRUE, FALSE), length = .N), 1], 
          DT[ rep( c(FALSE, TRUE), length = .N), 1] )[, do.call( paste, c(.SD, sep = ","))] )

#      V1
# 1:  1,2
# 2:  3,4
# 3:  5,6
# 4:  7,8
# 5: 9,10
0
votes

I prefer read_sas function from 'haven' package for reading sas data

library(haven)
data <- read_sas("data.sas7bdat")