I'm experiencing some very strange behaviour. I have a CSV data set with lots of columns (3199) and 148 rows/observations. Only some of the variables are of interest, but I've parsed out and manipulated what I need only to find a many of the variables have values that are all missing. Looking in the CSV directly (where I can) and looking at the data into R confirms the data are NOT missing.
My DATA step looks like this:
DATA WORK.MYFILE;
INFILE '\\[SERVER NAME]\[FOLDERS]\RawData.csv'
lrecl=32760
DSD
TRUNCOVER
FIRSTOBS=3 ;
INPUT
F1 : $ CHAR6.
ACAAAE31RR : ?? BEST32.
ACAAAE32RR : ?? BEST32.
ACAAAE33RR : ?? BEST32.
ACAAAE3BRR : ?? BEST32.
/* lots and lots of lines like this */
SHAW5564TT3R : ?? BEST32.
SHAW6599TT3R : ?? BEST32.
SHAX0099TTAR : ?? BEST32. ;
RUN;
Everything seemed fine until I pulled out the data I wanted and realized this forced some missing variables. My guess is that after some column (I don't know which) there's only missing values all the way to the end of the observation row, and this maybe turns later columns to missing?? (Clearly I'm new to SAS).
BACKGROUND
In case it's relevant.
1) The data has many missing observations... most of them in the first several dozen rows. (The data are time series starting in 1980Q1 and the variables I want are missing before 2000Q1.)
2) I'm on SAS EnterpriseGuide. I got the above code by using the IMPORT wizard, copying the code into a program, and then removing the lines of formatting code above INFILE
and changing the path to the actual CSV rather than the temporary one SAS creates with the wizard. When I did this, I did a find and replace of all the $CHAR1.
and other misspecified informats
and changed everything to BEST32.
3) My hunch about forced-to-missing-ness after some column is because all the variables are listed alphabetically. The variables I want look like:ACAINDIRR
, AEDINDIRR
, BVAINDIRR
, BVIINDIRR
, MWIINDIRR
, OHAINDIRR
, OOHINDIRR
. The first three variables each have 104 non-missing observations as expected. The fourth (BVI...
) has 43 non-missing observations and the rest have zero non-missing observations.
4) I've tried a bunch of trouble-shoot things to fixed this. The only thing that got close was converting the CSV to at *.txt file. The fourth variable, BVIINDIRR
, then had 104 non-missing observations, oddly. But, the alphabetically later ones all still have 0 non-missing obs.
Any help is greatly appreciated!!
EDIT
Removing the ??
from the INPUT
commands doesn't result in any errors or warnings. I do not get any warnings about characters being converted to numeric. The only notes in the Log are:
NOTE: The infile '\\[SERVER NAME]\[FOLDERS]\RawData.csv'is:
Filename= \\[SERVER NAME]\[FOLDERS]\RawData.csv,
RECFM=V,LRECL=32760,File Size (bytes)=23765791,
Last Modified=03Aug2016:08:50:21,
Create Time=03Aug2016:10:52:33
NOTE: 148 records were read from the infile '\\[SERVER NAME]\[FOLDERS]\RawData.csv'.
The minimum record length was 32294.
The maximum record length was 32760.
One or more lines were truncated.
NOTE: The data set WORK.PARETO has 148 observations and 31998 variables.
NOTE: MVA_DSIO.OPEN_CLOSE| _DISARM| STOP| _DISARM| 2016-08-05T08:47:01,746-04:00| _DISARM| WorkspaceServer| _DISARM| SAS|
_DISARM| | _DISARM| 148| _DISARM| 28418048| _DISARM| 10| _DISARM| 11| _DISARM| 66861615| _DISARM| 569154881| _DISARM|
0.296875| _DISARM| 0.672000| _DISARM| 1786020421.075000| _DISARM| 1786020421.747000| _DISARM| 0.281250| _DISARM| | _ENDDISARM
NOTE: PROCEDURE| _DISARM| STOP| _DISARM| 2016-08-05T08:47:01,746-04:00| _DISARM| WorkspaceServer| _DISARM| SAS| _DISARM| |
_DISARM| 682061824| _DISARM| 26718208| _DISARM| 10| _DISARM| 11| _DISARM| 72237092| _DISARM| 569155157| _DISARM| 4.750000|
_DISARM| 7.937000| _DISARM| 1786020413.810000| _DISARM| 1786020421.747000| _DISARM| 2.453125| _DISARM| | _ENDDISARM
NOTE: DATA statement used (Total process time):
real time 7.93 seconds
08:45 Friday, August 5, 2016
cpu time 4.75 seconds
lrecl
is whatever your OS allows. 32767 is the maximum if you want to use_infile_
(as that is a character variable subject to character variable limits). – Joe