2
votes

I have a CSV file on a Cloud Storage instance (bd_storage) and need to make an import to an already created table (matriculas) in a Cloud SQL database (test). The thing is that the UI import option by default use fields separated by comma (',') and my CSV file is semicolon separated (';').

I know I could use a text editor to change all the commas to semicolons, but the CSV file it's too big for my PC to do it (that's the reason I'm using Google Cloud Platform). How can I use the Cloud Shell to do that?

Here are some examples of what I've tried with no success:

NOTE: I login using gcloud sql connect mschiaff --user=root, and then use test to specify the database, where 'mschiaff' it's the Cloud SQL instance.

  1. LOAD DATA LOCAL INFILE 'gs://bd_storage/Matrícula_Ed_Superior_2016_UPLOAD.csv' INTO TABLE `matriculas` CHARACTER SET 'utf8' FIELDS TERMINATED BY ';';
    

    Error:

    2 (HY000): File 'gs:/bd_storage/Matrícula_Ed_Superior_2016_UPLOAD.csv' not found (Errcode: 2 "No such file or directory")

  2. LOAD DATA INFILE 'gs://bd_storage/Matrícula_Ed_Superior_2016_UPLOAD.csv' INTO TABLE `matriculas` CHARACTER SET 'utf8' FIELDS TERMINATED BY ';';
    

    Error:

    ERROR 1045 (28000): Access denied for user 'root'@'%' (using password: YES)

This is for my thesis.

2
Please read Under what circumstances may I add “urgent” or other similar phrases to my question, in order to obtain faster answers? - the summary is that this is not an ideal way to address volunteers, and is probably counterproductive to obtaining answers. Please refrain from adding this to your questions.halfer

2 Answers

3
votes

I have a couple of suggestions and you could use either to achieve this.

LOAD DATA LOCAL INFILE can't be used to point to a bucket because it expects the file to be stored locally. Therefore, you need a copy of the CSV stored on the filesystem of the machine you are using to connect to the CloudSQL instance. For the example, let's assume your going to do this from the Cloud Shell.

1) Copy the CSV from your storage bucket to your home directory in the Cloud Shell:

cd ~

gsutil cp gs://YOU_BUCKET_NAME/file.csv .

1) Connect to the CloudSQL instance from the Cloud Shell:

gcloud sql connect CLOUDSQL_INSTANCE_NAME --user root

2) Connect to the correct database:

use DATABASE_NAME;

3) Import the csv into a table within the database and specify a semicolon delimiter (the file location in this command will look for the file in the Cloud Shell filesystem, as this is where you connected from):

LOAD DATA LOCAL INFILE '/home/USERNAME/file.csv' INTO TABLE table_name FIELDS TERMINATED BY ';';

You should have now successfully imported the CSV into the database table.

Alternatively, In relation to you not being able to open the file to change the delimiter to commas instead of semicolons, you could try mounting the bucket the CSV resides in to a Compute Engine instance, then running a sed command from the instance to replace all of the semicolons with commas in the file.

You can mount the bucket to an instance by using FUSE. The instructions for this are here.

Once mounted to the instance, you could run the following command on the CSV to replace the semicolons with commas.

$ sed -ie "s/;/,/g /path/to/mount/point/file.csv

Once you have replaced the semicolons with commas, you would then be able to import the CSV via the Console.

0
votes

if it makes sense for you, Amazon RDS support LOAD DATA LOCAL INFILE with adding following parameter of connection (should be included into connection string)

ENABLE_LOCAL_INFILE=1