0
votes

I have a large text file having some content written in Hindi and german language. I want to convert every single special character to UTF-8 encoding. (line by line)

I was using this code but it is giving me an error as :

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:566) at java.lang.StringBuilder.append(StringBuilder.java:181) at ConvertUTF.main(ConvertUTF.java:47)

This is the code:


import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.UnsupportedEncodingException;
import java.util.Scanner;
import java.io.FileWriter;
import java.io.InputStream;

public class ConvertUTF {

    public static void main(String[] args){

        try {

            InputStream is = null;
            is = new FileInputStream("file.txt");
            BufferedReader in = new BufferedReader(
               new InputStreamReader(is, "UTF8"));

            int str;


            char[] cbuf = new char[is.available()];
            int i=1;
            StringBuilder sb1 = new
                    StringBuilder("");
            while ((str = (in.read(cbuf,0,8))) != 0 && i<7) {
                sb1.append(cbuf);


            }

            System.out.print(sb1);

            in.close();



            }
            catch (UnsupportedEncodingException e)
            {
                System.out.println(e.getMessage());
            }
            catch (IOException e)
            {
                System.out.println(e.getMessage());
            }
            catch (Exception e)
            {
                System.out.println(e.getMessage());
            }
        }


}

2
The Exception is being thrown when you append to your StringBuilder. Per your description, the File is large. Consider writing the converted text into a separate File rather than trying to build the entire text in memory.vsfDawg
I did that already it ended up creating a large file and the loop didn't stopSoleBird
I would fully expect a large output file because the input file is large and you are changing the encoding on various characters. You reported an OutOfMemoryException. The loop not terminating is a different problem. I would assume that your stop condition is off - I believe end-of-stream is indicated by -1.vsfDawg

2 Answers

0
votes

BufferedReader br = new BufferedReader(new InputStreamReader( new FileInputStream("file.txt"), "UTF-8"));

try with UTF-8instead of UTF8

0
votes

there is a mistake in the normalization name : Change UTF8 to UTF-8