0
votes

So the idea is that I have this .json file that I need to read. It is so big that I can't even open it using notepad or Visual studio code.

I tried this:

BufferedReader in = new BufferedReader(new FileReader("path to the file"));
String line = in.readLine();

and I get this error:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.base/java.util.Arrays.copyOf(Arrays.java:3536) at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:228) at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:735) at java.base/java.lang.StringBuilder.append(StringBuilder.java:227) at java.base/java.io.BufferedReader.readLine(BufferedReader.java:372) at java.base/java.io.BufferedReader.readLine(BufferedReader.java:392) at com.ReadJSON.TagValues.listFilesForFolder(TagValues.java:133) at com.ReadJSON.TagValues.listFilesForFolder(TagValues.java:129) at com.ReadJSON.TagValues.listFilesForFolder(TagValues.java:129) at com.ReadJSON.TagValues.listFilesForFolder(TagValues.java:129) at com.ReadJSON.Main.main(Main.java:18)

I searched on internet and some solutions were to change memory settings, but it doesn't work, it returns the same error. Another problem is that the entire file is ONELINE. The entire content of the file is written in a single line. I think I have to break the reading part of the line at a certain time so it doesn't get over the maximum allocated memory, store the value and start to read again from where I left. Doing this over and over until the end of the line.

Any suggestions of how should I read this file? Should I try a different way to read it or is there a trick to break the readLine()?

Thanks!

3
The solution of increase the memory, might be acceptable in small files, not in big files. For big files, you will need to use a Stream, and read the file chunk by chunk.sgtcortez

3 Answers

3
votes

For such a huge JSON file one should not read an entire JSON DOM, document object model, into memory. But use a stream parser.

BufferedReader with a readLine would be wrong anyway if there is only one huge line. Also JSON files are in general in UT-8 encoding. FileReader is an old utility class that uses the default character encoding: not portable code, wrong.

There is a Jackson Streaming API. For a project using maven:

<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-core</artifactId>
    <version>2.11.3</version>
</dependency>

The code would be something like:

JsonFactory factory = new JsonFactory();
try (JsonParser parser = jactory.createParser(...)) {
    while (parser.nextToken() != JsonToken.END_OBJECT) {
        String field = parser.getCurrentName();
        switch (field) {
        case "...":
            ...
            ... parser.getText();
            ... parser.getIntValue();
            break;
    }
}

For extracting a part of the data, or storing data in a database.

1
votes

Even though you can increase the JVM memory limit, it is needless and allocating a huge memory like 1GB to process a file overkill and resource intensive.

InputStream inFileReader = channelSFtp.get(path); // file reading from ssh.
byte[] localbuffer = new byte[2048];

int i = 0;
while (-1 != (i = inFileReader.read(buffer))) {
    //Deal with the current read 2KB file chunk here
}

inFileReader.close();

This way you can read it piece by piece.

1
votes

You can check DSM streaming library. You can process JSON document while parsing it. You define your mapping in yaml for for data you want to process. It process the JSON document based on the mapping file. DSM uses Jackson stream api.

You can check example in this question

JAVA - Best approach to parse huge (extra large) JSON file