2
votes

I need to push 1.18 MB or approximately 10,000 rows of data from a csv file residing on my server to a Google Sheet for Tableau to read.

The data is coming in from Google DFP into a csv document. I have been using the gspread library to update google sheets with the csv data before, however with 10,000 records and the ~30 seconds to post per record this approach will not be valid.

Is there a faster way to copy the contents of a csv/txt file to a google spreadsheet than by using the gspread library? Preferably with Python.

Update: I'm trying this approach of bulk updating the cells.

raw_dfp = pd.read_csv('live_dfp_UTF.csv', error_bad_lines=False)
sample5 = raw_dfp.iloc[:3, :]
rows, col = sample5.shape

doc.resize(1, 7)
doc.clear()
doc.resize(rows + 1, col)

column_names = ['', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K']

cell_range = 'A1:' + column_names[col] + str(rows)

cells = doc.range(cell_range)

# To use this next line??
# flattened_data = np.flatten(sample5)

for x in range(rows):
    cells[x].value = sample5[x].decode('utf-8')

doc.update_cells(cells)
3
In case the suggested answer doesn't work: 30 seconds seems a long time for a single record. However, it's the API call that adds a lot of overhead and not so much what data you pass. Depending on the number of columns, pushing it in chunks of ~200 rows at a time used to take almost negligible additional time over pushing a single row for me. There is an upper limit in how much you can push in a single update, however.roganjosh
I wrote an answer here for doing just that which might help.roganjosh
@roganjosh could you take a look at the update in put in the body of the question? I'm having trouble making sense of what to do after the flatten line.Artem Yevtushenko
Which part specifically is causing problems? cells is a 1D structure, while the spreadsheet is 2D. So, you first need to flatten the spreadsheet data to 1D and then run the for x in range(rows). It looks like cells[x].value = sample5[x].decode('utf-8') should be cells[x].value = flattened_data[x].decode('utf-8').roganjosh
Also, it would be flattened_data = sample5.flatten(), I missed that. You need to get a 2D numpy array out of your DataFrame first. My initial answer on the other question assumed Python lists. It might be that you need to call as_matrix on the DataFrame, then flatten it. Sorry, not hugely familiar with numpy/pandas.roganjosh

3 Answers

3
votes

Why don't you load all the data from the csv file into a pandas dataframe and then push it to google spreadsheets as a whole?

This library may be helpful:https://github.com/maybelinot/df2gspread

1
votes

you could try pygsheets it uses api v4 which is faster than v3 . it also supports pushing data from pandas dataframe.

0
votes

I agree with Shivam, the best way to transfer google spreadsheet to python pandas and vice versa would be by using df2gspread: http://df2gspread.readthedocs.io/en/latest/overview.html#usage1

It takes 10 minutes to set up and only 3 lines of code to do the trick:

from df2gspread import gspread2df as g2d
df = g2d.download(gfile="your_spreadsheet_ID", col_names=True, row_names=True)

I just set this up so if you have any questions, feel free to ask.