2
votes

I don´t have much experience with Java, especially with multi modules project, so I'm not able to create a dataflow template from a multi-module project.

To generate a template from a Dataflow template, you have to use something like this:

mvn compile exec:java \
     -Dexec.mainClass=com.example.myclass \
     -Dexec.args="--runner=DataflowRunner \
                  --project=YOUR_PROJECT_ID \
                  --stagingLocation=gs://YOUR_BUCKET_NAME/staging \
                  --templateLocation=gs://YOUR_BUCKET_NAME/templates/YOUR_TEMPLATE_NAME"

This works well for me in a simple Java project, but currently I need to use the following in a project with the following simplified structure:

C:.
|   pom.xml
|
+---configuration
|   |   dependency-reduced-pom.xml
|   |   pom.xml
|   |
|   +---src
|   |   \---main
|   |       \---java
|   |           \---com
|   |               \---xxx
|   |                   \---gcp
|   |                       \---dataflow
|   |                           \---yyy
|   |                               +---package
|   |                               |   |   java files
|   |
+---pipeline
|   |   dependency-reduced-pom.xml
|   |   pom.xml
|   |
|   +---src
|   |   \---main
|   |       \---java
|   |           \---com
|   |               \---xxx
|   |                   \---gcp
|   |                       \---dataflow
|   |                           \---yyy
|   |                               \---package
|   |                                       MAINJAVACLASS.java
|   |
\---transform
|   |   dependency-reduced-pom.xml
|   |   pom.xml
|   |
|   +---src
|   |   \---main
|   |       +---java
|   |       |   +---com
|   |       |   |   \---xxx
|   |       |   |       \---gcp
|   |       |   |           \---dataflow
|   |       |   |               \---yyy
|   |       |   |                   +---package
|   |       |   |                   |       java files

I have executed mvn package without any error with the following output:

[INFO] Reactor Build Order:
[INFO]
[INFO] pipeline-framework                                                 [pom]
[INFO] configuration                                                      [jar]
[INFO] transform                                                          [jar]
[INFO] pipeline                                                           [jar]

<...>

[INFO] Reactor Summary for pipeline-framework 0.1:
[INFO]
[INFO] pipeline-framework ................................. SUCCESS [ 19.076 s]
[INFO] configuration ...................................... SUCCESS [ 25.070 s]
[INFO] transform .......................................... SUCCESS [ 21.625 s]
[INFO] pipeline ........................................... SUCCESS [ 19.365 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS

But when I try to execute:

mvn compile exec:java -Dexec.mainClass=com.xxx.gcp.dataflow.yyy.pipeline.MAINJAVACLASS -Dexec.args=...

I have the following error:

If I execute it from the root directoy:

[INFO] Reactor Summary for pipeline-framework 0.1:
[INFO]
[INFO] pipeline-framework ................................. FAILURE [  5.287 s]
[INFO] configuration ...................................... SKIPPED
[INFO] transform .......................................... SKIPPED
[INFO] pipeline ........................................... SKIPPED

<...>

Caused by: java.lang.ClassNotFoundException: com.xxx.gcp.dataflow.yyy.pipeline.MAINJAVACLASS

I also tried with:

mvn compile exec:java -pl pipeline <...>

If I execute it inside the pipeline directoy:

Could not resolve dependencies for project com.xxx.gcp.dataflow:pipeline:jar:0.1: The following artifacts could not be resolved: com.xxx.gcp.dataflow:transform:jar:0.1, com.xxx.gcp.dataflow:configuration:jar:0.1: Failure to find com.xxx.gcp.dataflow:transform:jar:0.1 in https://repo.maven.apache.org/maven2

Which command should I execute to build the template?


The main pom.xml file

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.xxx.gcp.dataflow</groupId>
  <artifactId>pipeline-framework</artifactId>
  <version>0.1</version>
  <packaging>pom</packaging>

  <modules>
    <module>configuration</module>
    <module>transform</module>
    <module>pipeline</module>
  </modules>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>

    <beam.version>2.16.0</beam.version>

    <maven-compiler-plugin.version>3.7.0</maven-compiler-plugin.version>
    <maven-exec-plugin.version>1.6.0</maven-exec-plugin.version>
    <maven-jar-plugin.version>3.1.2</maven-jar-plugin.version>
    <slf4j.version>1.7.25</slf4j.version>

    <autovalue.annotations.version>1.6</autovalue.annotations.version>
    <autovalue.version>1.6.2</autovalue.version>
  </properties>

  <repositories>
    <repository>
      <id>apache.snapshots</id>
      <name>Apache Development Snapshot Repository</name>
      <url>https://repository.apache.org/content/repositories/snapshots/</url>
      <releases>
        <enabled>false</enabled>
      </releases>
      <snapshots>
        <enabled>true</enabled>
      </snapshots>
    </repository>
  </repositories>

  <dependencyManagement>
    <dependencies>
      <dependency>
        <groupId>${project.groupId}</groupId>
        <artifactId>configuration</artifactId>
        <version>${project.version}</version>
      </dependency>
      <dependency>
        <groupId>${project.groupId}</groupId>
        <artifactId>transform</artifactId>
        <version>${project.version}</version>
      </dependency>
      <dependency>
        <groupId>${project.groupId}</groupId>
        <artifactId>pipeline</artifactId>
        <version>${project.version}</version>
      </dependency>
    </dependencies>
  </dependencyManagement>

  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>${maven-compiler-plugin.version}</version>
        <configuration>
          <source>1.8</source>
          <target>1.8</target>
        </configuration>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-surefire-plugin</artifactId>
        <version>2.22.1</version>
        <configuration>
          <useSystemClassLoader>false</useSystemClassLoader>
        </configuration>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-jar-plugin</artifactId>
        <version>${maven-jar-plugin.version}</version>
        <configuration>
          <archive>
            <manifest>
              <mainClass>com.xxx.gcp.dataflow.yyy.pipeline.TerraformPipeline</mainClass>
            </manifest>
          </archive>
        </configuration>
      </plugin>

      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>3.0.0</version>
        <executions>
          <execution>
            <id>bundle-and-repackage</id>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>

              <artifactSet>
                <includes>
                  <include>*:*</include>
                </includes>
              </artifactSet>

              <transformers>
                <transformer
                        implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
              </transformers>

            </configuration>
          </execution>
        </executions>
      </plugin>

    </plugins>

    <pluginManagement>
      <plugins>
        <plugin>
          <groupId>org.codehaus.mojo</groupId>
          <artifactId>exec-maven-plugin</artifactId>
          <version>${maven-exec-plugin.version}</version>
          <configuration>
            <cleanupDaemonThreads>false</cleanupDaemonThreads>
          </configuration>
        </plugin>


      </plugins>


    </pluginManagement>
  </build>

  <dependencies>
    <...>
  </dependencies>
</project>
3
can you post the full command you are using to launch your pipelinePaddy Popeye
Caused by: java.lang.ClassNotFoundException: com.xxx.gcp.dataflow.yyy.pipeline.MAINJAVACLASS - can't find this in the CLASSPATH.duffymo

3 Answers

2
votes

I believe this is a common problem with multi-module maven projects and isn't specific to Dataflow. Maybe this other thread could help: Maven exec:java goal on a multi-module project

That one mentions the issue you're having with MAINJAVACLASS not being found. The other half I'm less sure on, bit I think the reason the jars are missing there is because the package lifecycle phase hasn't been run on the module whose .jar you need. From what I can tell, the exec plugin doesn't run on any specific phase of the build lifecycle, so based on your information I would guess that it's being run only after the compile phase, which doesn't produce any jars (that happens in package).

Info on the build lifecycle: http://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html Info on

1
votes

Finally, to solve this, I had to execute: mvn clean install before compile it. With this, all the dependencies was installed in my computer and then with a command like:

mvn compile exec:java \
     -Dexec.mainClass=com.example.myclass \
     -Dexec.args="--runner=DataflowRunner \
                  --project=YOUR_PROJECT_ID \
                  --stagingLocation=gs://YOUR_BUCKET_NAME/staging \
                  --templateLocation=gs://YOUR_BUCKET_NAME/templates/YOUR_TEMPLATE_NAME"

The template is created and uploaded to GCS

If you want to create the template using Cloud Build, you can follow this steps

0
votes

Hi IoT user I am trying to design a similar solution using DataflowTemplatedJobStartOperator and classic dataflow template. Would you be able to provide a sample example of parameters for creating template and json type for parameter argument in operator . Thanks Pratul