0
votes

Our team has a Google Form we use for intake. Once an intake is submitted I have a GAS script that will process it (create some files in Google Drive, send some e-mails, etc...).

Instead of triggering on the on form submit event of the Spreadsheet linked to the Form, I am using an hourly trigger. There is a specific reason for this that is out-of-scope for this question but I can explain if needed.

The Spreadsheet linked to the Form will eventually have thousands of rows.

My script will go through each unprocessed row. I added a column on the Spreadsheet to track which rows have been processed. When a row is processed some string is added to column A. For unprocessed rows, column A will be empty.

I am trying to find the fastest and most efficient way to get all the unprocessed rows -- rows where column A is blank.

Initial thoughts:

  • Use Sheet.getDataRange().getValues() to get all of the rows, iterate through each, and process the unprocessed ones. Then use the index of the row to mark the row processed on the Spreadsheet. This way would work but after the Spreadsheet has thousands of rows this way will get really slow.
  • Apply a filter to the Spreadsheet and filter on rows where column A is blank. I can apply the filter but then there doesn't seem to be an easy way to get all the visible rows and their row number so I can mark them processed.

So then I figured out I could use the query function on a second sheet to show all rows where column A is empty and then I could use that sheet in my script for what needs to be processed.

=query('Form Responses'!A:V, "SELECT * WHERE A IS NULL")

But this does not include the row number from the Form Responses sheet so I wouldn't know which row to go back to on Form Responses to mark processed.

I was able to add the row number using arrayformula like this:

=query({arrayformula(row('Form Responses'!A:V)), 'Form Responses'!A:V}, "SELECT * WHERE Col2 IS NULL")

But this adds a bunch of rows with numbers, even if the source table doesn't have data in that row.

And if I don't include the row number it still wont work cause if I call Sheet.getDataRange() on the sheet that has the query function, it returns more rows than there is data.

I made an example sheet - https://docs.google.com/spreadsheets/d/1LYpqfj3clZiaGCPgp_eRfrxhQKWambRyQn3uNCFAIpU/edit?usp=sharing.

So is there a fast/efficient way to get all rows with data in a sheet where a specific column is empty? Either using query or some other way?

1
generally, I'd do getValues() and a loop, and it's fast enough. This might vary with the content of your sheet and your organization's quotas though. If you're going to use =query anyway, why not simply add a column of =arrayformula(row(A2:A)) to your main sheet so you have row numbers to reference?sinaraheneba
@Altigraph You're brilliant! I added the arrayformula to the Form Responses sheet and then in the second sheet I basically did =query('Form Responses!A:Z, "SELECT * WHERE B IS NULL AND C IS NOT NULL")`. This did the trick. Thank you! If you put your idea as an answer I'll mark it as accepted.IMTheNachoMan

1 Answers

1
votes

For a purely Apps Script answer to finding the position of a value within your sheet, either TextFinder (I'm not sure if this would be appropriate for empty fields) or getValues() and a loop ought to be fast enough. You might consider sorting the sheet prior to using getValues(), or sorting the array after, depending on the value you're seeking, or e.g. starting your loop from the most recent entries if you expect to be looking for newer items.

Alternatively, however, if you are using Sheets' formulas, you can easily add =row() to a column (or e.g. =arrayformula(row(A2:A))) such that the results of =query return a row number. If it's not feasible to add this column to the original sheet for whatever reason, it may also work to use match with the query result as the search key and the original sheet as the range, depending on how your spreadsheet is structured.