0
votes

I try to query data from Google BigQuery via JDBC driver, but some values seems to be incorrect.

Steps:

  1. Download Simba JDBC driver for googlebigquery freely from https://storage.googleapis.com/simba-bq-release/jdbc/SimbaJDBCDriverforGoogleBigQuery42_1.2.2.1004.zip

  2. Loading the driver to your Java App

  3. Connect the google query via jdbc The sample codes are below:

   public static void main(String[] args) {

        Connection conn = null;
        String url ="jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;" +
                  "ProjectId=fedjdbc;OAuthType=0;DefaultDataset=jdbc;" +
                  "[email protected];" +
                    "OAuthPvtKeyPath=/Users/laptop/workspace/bigquery/accounts/fedjdbc-5cb29b8a02fb.json";

        try {
            Class.forName("com.simba.googlebigquery.jdbc42.Driver");
            conn = DriverManager.getConnection(url);

            query(conn);
            conn.close();


        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        } catch (SQLException e) {
            e.printStackTrace();
        }
    }

    public static void query(Connection conn) {
        String sql = "select * from numbers1";
        sql = "select * from numbers1";
        sql = "select timestamp(\"0001-01-01 00:00:00 UTC\") from jdbc.test_date";
        try {
            PreparedStatement stmt = conn.prepareStatement(sql, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
            System.out.println("query begin");
            java.util.Date date = new java.util.Date();

            System.out.println(System.currentTimeMillis());

            ResultSet rs = stmt.executeQuery();
            System.out.println(System.currentTimeMillis());
            System.out.println("query end");

            while (rs.next()) {
                System.out.println(rs.getString(1));
                //System.out.println(rs.getTimestamp(1));
            }
            rs.close();
        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }

You can find that the result is not "0001-01-01 00:00:00.000000" but "0001-01-03 08:00:00.000000".

Do you know why?

The timezone convert the data locally?

I test it in some hosts with different timezones, same issues.

1
I test it with Datadirect driver too which has the same issue. So, I think, maybe it works as design, but what's the root cause? If querying the value with equal filter for timezone column, it is incorrect.Junhui Liu
Another test in Google Bigquery UI, it works fine. i.e, the statement result is correct ("0001-01-01 00:00:00 UTC")Junhui Liu
Don't test with dates at 0001-01-01, you'll run into all kinds of oddities surrounding historic calendar changes, etc. And possibly getString(..) does (getTimestamp(..).toString()), which given the requirements and implementations of java.sql.Timestamp can have additional problems.Mark Rotteveel
Thanks Mark. When I use Timestamp ts = rs.getTimestamp(1); to check the details of the value, it seems incorrect too. i.e, the value is 0001-01-03 08:00:00.0 . Maybe, it's a historic problem as you said. Thanks.Junhui Liu
@MarkRotteveel could you post your comment as an answer so that the community may benefit from it? Also you could attach this link as example for historical changes in timezones: dba.stackexchange.com/questions/127965/…Joaquim

1 Answers

0
votes

There are some things to consider in your code implementation. As @MarkRotteveel mentioned, testing with 0001-01-01 will lead to confusion as there have been multiple historic changes.

BigQuery and the majority of most used software systems work with tz database, which aims to record historical time zones and you will find occasional differences compared with UTC. As a test, you can try to query timestamp ‘0001-01-01 America/Los Angeles’ in BigQuery and will get the odd result ‘0001-01-01 07:52:58 UTC”. There is a further explanation in this thread.

Regarding your code implementation, you should also consider taking a look at this thread in order to properly extract the timestamp as it seems that the way you use getString method on the ResultSet could lead to errors.