1
votes

After having followed the beginner Java tutorials for Apache Flink on their documentation sites I wanted to try some transformations on my own data. However, I'm having trouble gathering input from my Microsoft SQL database running on a server in the network.

The examples in the section about possible sources for DataSets contain a section that looked like what I need, where a DataSet is built using env.createInput(...) with a JDBCInputFormat. So I added the Maven dependency for Flink JDBC

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-jdbc_2.11</artifactId>
    <version>0.10.2</version>
</dependency>

and remodeled the given code to fit to my own database like this:

// create and configure input format
JDBCInputFormat inputFormat = JDBCInputFormat.buildJDBCInputFormat()
    .setDrivername("org.apache.derby.jdbc.EmbeddedDriver")
    .setDBUrl(sqlserver)
    .setUsername(username)
    .setPassword(password)
    .setQuery(query)
    .finish();

// create and configure type information for DataSet
TupleTypeInfo typeInformation = new TupleTypeInfo(Tuple2.class, STRING_TYPE_INFO, INT_TYPE_INFO);

// Read data from a relational database using the JDBC input format
DataSet<Tuple2<String, Integer>> dbData = environment.createInput(inputFormat, typeInformation);

Server address, user name and password are the same that work in another Java program of mine where I use JDBC only. The query is a simple SELECT on two columns, one containing String values, the other Integers.

When running the program I get a ClassNotFoundException referring to the selected driver: JDBC-Class not found. - org.apache.derby.jdbc.EmbeddedDriver at org.apache.flink.api.java.io.jdbc.JDBCInputFormat.open

Now, I seem to be missing some imports here, but I can't figure out which (and where to get them), as I was expecting Flink JDBC to support this minimal example. The same driver name is also given in the JDBCInputFormat Javadoc. I tried adding JDBC 4.2 manually which did not work.

What do I need to add or change so that the driver will be found? Additionally, is there some official material about Flink JDBC and its usage, apart from the Javadoc? I am even having difficulties finding tutorials about Flink and SQL sources in general.

1
Only allowed two links in one question, here is the JDBCInputFormat Javadoc : ci.apache.org/projects/flink/flink-docs-release-1.1/api/java/… - danny

1 Answers

2
votes
  1. If you want to read data from a Microsoft SQL Server database, you should use the JDBC driver for SQL Server, not the one for Apache Derby. The JDBC drivers are often included in the DBMS distribution / installation. Maybe Microsoft also offers the corresponding JAR file as a download on a website.

  2. The driver must be added to your classpath. There are two options: 1) bundle it in your application JAR, i.e., add include it in the fat jar or 2) add it to Apache Flink's ./lib folder (note, it must be added to all Flink installations of the cluster.