I have a set of text(.txt) files in cloud storage(that are uploaded into the Cloud storage in every 5 minutes). What I want to do is I want to upload them into big query. But bIgquery cant accept text files. So I have to convert it to Bq acceptable format. What is the best possible way?
2 Answers
1
votes
As per this document, BigQuery only supports loading data with the following file format: CSV, JSON, Avro and Google Cloud Datastore backups.
Hence, if you upload a Text file to BigQuery, BigQuery reads your text file as a CSV file and then really would run into an error.
You would have to manually convert your text file into a CSV file, before uploading it to BigQuery.
Alternatively, you may also use Cloud Dataprep as it supports text files as inputs. You may do transformations with your text file here in Dataprep then export the results to BigQuery.
Here is an Overview of Dataprep and a Quickstart Documentation to learn how to use it.
0
votes
Here is the code snippet:
def getBlobAsString(bucketName, blobName):
storageClient = storage.Client()
bucket = storageClient.get_bucket(bucketName)
blobFile = bucket.get_blob(blobName)
blobStr = blobFile.download_as_string()
return(blobStr)
def getBlobAsFile(bucketName, blobName, txtStr):
storageClient = storage.Client()
csvFileName = blobName.replace('txt', 'csv')
bucket = storageClient.get_bucket(bucketName)
blob = bucket.blob(csvFileName)
blob.upload_from_string(txtStr)
return(csvFileName)
txtBucket = "bucket-name"
txtBlob = "blob-name"
# Read text file content as string
txtBlobAsStr = getBlobAsString(txtBucket, txtBlob)
txtStr = str(txtBlobAsStr, 'utf-8')
# Write text file content to CSV file
csvBlob = getBlobAsFile(txtBucket, txtBlob, txtStr)