3
votes

Going through an itertuples, I need to export two data, located in two rows of an excel column ('G7', 'G8') to two columns of google sheets. How can I do this?

import gspread
from gspread_dataframe import get_as_dataframe, set_with_dataframe
from oauth2client.service_account import ServiceAccountCredentials
import pyperclip
import pyautogui as p
import rpa as r
import pandas as pd
import tabula
import openpyxl

r.init()
r.url('https://www.meudetran.ms.gov.br/veiculo.php#')
p.sleep(2)
janela = p.getActiveWindow()
janela.maximize()
p.sleep(2)

scope = ['https://spreadsheets.google.com/feeds']
credentials = ServiceAccountCredentials.from_json_keyfile_name('credentials.json', scope)
gc = gspread.authorize(credentials)
wks = gc.open_by_key('1AGYhinoPiE9xUABnrNEfVGjLf5s_bAJGjpz9hatfIQU')
worksheet = wks.get_worksheet(0)
dados = get_as_dataframe(worksheet)
df = pd.DataFrame.from_records(dados, columns=["Placa", "Renavam"])
set_with_dataframe(worksheet, df)
df2 = get_as_dataframe(worksheet)

for row in df2.itertuples():
    df = tabula.read_pdf(text, pages=1)[1]
    df.to_excel('dados.xlsx')
    wb = openpyxl.load_workbook('dados.xlsx')
    sheet = wb.active
    venc = sheet['G8'].value
    valor = sheet['G7'].value
    worksheet.update(row[3], venc)

This last line does not update column 3 of the google sheet

1
What is df = tabula.read_pdf(text, pages=1)[1] in your script?Tanaike
I am converting a web pdf to table in xlsx. To get the data and feed it in the googlesheet, in the active tupleLino Costa
Thank you for replying. You want to retrieve the values from the cells "G7" and "G8" from 1st tab in the XLSX data converted from a PDF data. Is my understanding correct? If my understanding is correct, where do you want to put the retrieved values of "G7" and "G8" to Google Spreadsheet? It seems that in your script, only the value of "G8" is used. So I cannot understand about the detail of your goal. I apologize for this.Tanaike
That's it. I need to import this data in 'G8' into cell 'C2' of google sheets. However, I need this to be done automatically according to the line being read in the tuple. In this case I could use worksheet.update ('C2', venc), but in all tuples, I would update the same cell in google sheets. In the next tuple, need to feed the Google 'C3' cell, then the 'C4' ....Lino Costa
Thank you for replying. I have to apologize for my poor English skill. Unfortunately, from your replying, I cannot still understand about your goal. For example, you want to append the values to the column "C" in the Spreadsheet every run of script? For example, at 1st run, you want to put the retrieved values of "G7" and "G8" to the cells "C2" and "C3" of Spreadsheet. And, at 2nd run, you want to put the retrieved values of "G7" and "G8" to the cells "C4" and "C5" of Spreadsheet. You want to do this cycle. Is my understanding correct?Tanaike

1 Answers

2
votes

I believe your goal and your current situation as follows.

  • You want to retrieve the values from the cells "G7" and "G8" from 1st tab in the XLSX data converted from a PDF data.
    • You have already achieved this.
  • You want to append the values to the columns "C" and "D" in the Spreadsheet every run of script.
    • For example, at 1st run, you want to put the retrieved values of "G7" and "G8" to the cells "C2" and "D2" of Spreadsheet. And, at 2nd run, you want to put the retrieved values of "G7" and "G8" to the cells "C3" and "D3" of Spreadsheet. You want to do this cycle.
  • You have already been able to get and put values for Google Spreadsheet using Sheets API.

Modification points:

  • In your script, the values retrieved from Spreadsheet are converted to the dataframe. I thought that in your situation, this might not be required.
  • In this modification, I would like to propose the following flow.
    1. Retrieve the values from "G7" and "G8" from the XLSX data converted from PDF data.
    2. Retrieve the values from the columns "C" and "D" and retrieve the last row of the columns "C" and "D".
    3. Append the retrieved values to the columns "C" and "D" in Google Spreadsheet.

When above points are reflected to your script, it becomes as follows.

Modified script:

In this modified script, I modified below gc = gspread.authorize(credentials) in your script.

gc = gspread.authorize(credentials)
wks = gc.open_by_key('###') # Please set your Spreadsheet ID.
worksheet = wks.get_worksheet(0)

# 1. Retrieve the values from "G7" and "G8" from the XLSX data converted from PDF data.
df = tabula.read_pdf(text, pages=1)[1]
df.to_excel('dados.xlsx')
wb = openpyxl.load_workbook('dados.xlsx')
sheet = wb.active
venc = sheet['G8'].value
valor = sheet['G7'].value

# 2. Retrieve the values from the column "C" and retrieve the last row of the columns "C" and "D".
lastRow = max([len(worksheet.col_values(3)), len(worksheet.col_values(4))])

# 3. Append the retrieved values to the columns "C" and "D" in Google Spreadsheet.
worksheet.update('C' + str(lastRow + 1), [[valor, venc]])
  • In this modified script, it supposes that df = tabula.read_pdf(text, pages=1)[1] works fine. Please be careful this.
  • By above modification, the retrieved values valor, venc are appended to the columns "C" and "D" every run.

References: