0
votes

I have a large json file (250 Mb) that has no line breaks in it when opening the file in notepad or SAS. But if I open it in Wordpad, I get the correct line breaks. I suppose this could mean the json file uses unix line breaks, which notapad can't read, but wordpad can read, from what I have read.

I need to import the file to SAS. One way of doing this migth be to open the file in wordpad, save it as a text file, which will hopefully retain the correct line breaks, so that I can read the file in SAS. I have tried reading the file, but without line breaks, I only get the first observation, and I can't get the program to find the next observation.

I have tried getting wordpad to save the file, but wordpad crashes each time, probably because of the file size. Also tried doing this through powershell, but can't figure out how to save the file once it is opened, and I see no reason why it should work seeing as wordpad crashes when i try it through point and click.

Is there another way to fix this json-file? Is there a way to view the unix code for line breaks and replace it with windows line breaks, or something to that effect?

EDIT: I have tried adding the TERMSTR=LF option both in filename and infile, without any luck:

filename test "C:\path";
data datatest;
  infile test lrecl = 32000 truncover scanover TERMSTR=LF;
  input @'"Id":' ID $9.;
run;

However, If I manually edit a small portion of the file to have line breaks, it works. The TERMSTR option doesn't seem to do much for me

EDIT 2: Solved using RECFM=F

data datatest;
  infile test lrecl = 42000 truncover scanover  RECFM=F ;
  input @'"Id":' ID $9.;
run; 

EDIT 3: Turn out it didnt solve the problem after all. RECFM=F means all records have a fixed length, which they don't, so my data gets mixed up and a lot of info is skipped. Tried RECFM=V(ariable), but this is not working either.

2
Look at the documentation for the FILENAME statement specifically TERMSTR=LF.data _null_
You can post your solution as an answer to the question and accept it if you're happy with it.user667489
Turns out I didnt solve the issue after all, se edit 3Ullsokk
JSON files are not required to have any line breaks. Why not try reading it using RECFM=N?Tom
I tried using RECFM=N, but got the following warning: The '@"STRING"' INPUT/PUT statement option is inconsistent with binary mode I/O. The execution of the DATA STEP is being terminated.Ullsokk

2 Answers

0
votes

I guess you're using windows, so try:

TYPE input_filename | MORE /P > output_filename

this should replace unix style text file with windows/dos one.

0
votes

250 Mbytes is not too long to treat as a single record.

data want ;
  infile json lrecl=250000000; *250 Mb ;
  input @'"Id":' ID :$9. @@;
run;