2
votes

The easiest way to describe what I'm doing is essentially to follow this tutorial: Import a CSV file into a Cloud Bigtable table, but in the section where they start the Dataflow job, they use Java:

mvn package exec:exec \
    -DCsvImport \
    -Dbigtable.projectID=YOUR_PROJECT_ID \
    -Dbigtable.instanceID=YOUR_INSTANCE_ID \
    -Dbigtable.table="YOUR_TABLE_ID" \
    -DinputFile="YOUR_FILE" \
    -Dheaders="YOUR_HEADERS"

Is there a way to do this particular step in python? The closest I could find was the apache_beam.examples.wordcount example here, but ultimately I'd like to see some code where I can add some customization into the Dataflow job using Python.

3

3 Answers

3
votes

There is a connector for writing to Cloud Bigtable, which you can use as a starting point for importing CSV files.

0
votes

Google Dataflow does not have a Python connector for BigTable.

Here is a link to the Apache Beam connectors for both Java and Python:

Built-in I/O Transforms

-3
votes

I'd suggest doing something like this.

DataFrame.to_gbq(destination_table, project_id, chunksize=10000, verbose=True, reauth=False, if_exists='fail', private_key=None)

You will find all parameters, and explanations of each, in the link below.

https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.to_gbq.html#pandas.DataFrame.to_gbq