0
votes

I am getting a memory issue when I run the following query in my Java program with Progress DataDirect MongoDB JDBC driver with a large table in a database (40 million records):

String query = "SELECT * FROM (SELECT tablename.*, ROW_NUMBER() OVER() AS rowid FROM tablename)";

The query above works perfectly fine if I have a small table in the database.

If I just run "SELECT ROW_NUMBER() OVER() AS rowid FROM tablename", then the name of the column will become null. The AS seems not effect in the statement:

Column Name: null class java.lang.Integer

Exception in thread "main" java.lang.NullPointerException

If I just run "SELECT * FROM tablename", it is very fast to get the ResultSet object back and I can easily read the 40 million records in less than few minutes.

So what am I missing? I don't know how to use the ROW_NUMBER function correctly. Any idea? Thanks.

2
What is the objective after reading 40 million records ? - developer
The requirement is for my library to pass the ResultSet object with the rowid added to another application so they can do whatever they want with the records. - Cao Felix
And the name should be "rowid" because the library also support other different databases. In stead of using ResultSet.getInt(index) to get the number, we would like to call it by name ResultSet.getString("rowid"). I guess I could force to either put it in the beginning or the end, but I thought that was not a good idea. - Cao Felix
Did you read my answer, why can't you read them in batches ? - developer
Your column name being null problem is not a problem. That column doesn't have a name, only a label (the as clause), so you should get the column label instead. And if you can't retrieve it by label, then it is a bug in the driver. - Mark Rotteveel

2 Answers

2
votes

In your query, the row_number() has no column name, it only has a label (the as clause).

Luckily, the JDBC specification (section 15.2.3) requires you to retrieve columns by label and not by name. The column label in JDBC is either the value of the as clause - if specified - or otherwise the original column name.

You can obtain the column label by using ResultSetMetaData.getColumnLabel().

0
votes

You will get OutOfMemoryError when try to load all of those (40 million) records at the same time into the JVM (as Heap size is limited which is according to -Xmx setting), so the best practice is NOT to load all of them at a time, rather fetch them as batches and process the each batch of records separately.