2
votes

I'm looking to set up a proper deployment pipeline for our Dataflow jobs, allowing Continuous Delivery and QA testing of specific versions of our jobs.

In order to do this I'm looking to "build" the jobs into artifacts that can be referenced and executed in different places. I've been looking into Dataflow Templates in order to do this, but it seems like a template has to be built for a specific GCP project, meaning that I can't share the artifacts between my staging and production projects.

Is there a better way to accomplish what I'm trying to do? What do people generally do in order to enforce a predictable deployment pipeline?

1

1 Answers

3
votes

When you create a job based on a template you can override the project. Here is an example in Go.

package main

import (
    "context"
    "fmt"
    "log"

    "golang.org/x/oauth2/google"
    "google.golang.org/api/dataflow/v1b3"
)

func main() {
    ctx := context.Background()
    projectID := "PROJECT"
    bucket := "gs://BUCKET/"
    input := "gs://dataflow-samples/shakespeare/kinglear.txt"
    output := bucket + "shakespeare"
    temp := bucket + "temp"
    template := "gs://dataflow-templates/wordcount/template_file"

    client, err := google.DefaultClient(ctx, "https://www.googleapis.com/auth/cloud-platform")

    if err != nil {
        log.Fatal(err)
    }

    dataflowService, err := dataflow.New(client)
    if err != nil {
        log.Fatal(err)
    }
    templateService := dataflow.NewProjectsTemplatesService(dataflowService)


    mapP := map[string]string{"inputFile": input, "output": output}

    env := dataflow.RuntimeEnvironment{TempLocation: temp}
    r := dataflow.CreateJobFromTemplateRequest{GcsPath: template, Parameters: mapP, Environment: &env}

    resp, err := templateService.Create(projectID, &r).Do()
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("Job URL: https://console.cloud.google.com/dataflow/job/%s?project=%s\n", resp.Id, resp.ProjectId)
}