1
votes

I can't believe this hasn't been brought up, but I've looked at about 30 questions and none mention this specific scenario.

The Backstory

There is a data set on a web page that grows over time. I am attempting to maintain a persistent copy of this data in a sheet where the user can add information in additional columns. I have a script that the user runs in the browser console to scrape the entirety of the data set. (I do the entire set every time because it makes the process uniform for new and repeat users, and because I have no way to incorporate the existing data to act as a stop point/filter) This scraped data is then pasted into an input area of my spreadsheet to be processed by an App Script.

The import process is intended to filter out the existing data from the input, and append the new unique data to the end of the existing data. This allows the additional info added by the user to remain undisturbed.

My Approach

I'm trying to minimize the amount of script work on this sheet, and am relying on formulas to do as much heavy lifting as possible. To that end I have settled on the FILTER() formula for removing the duplicate data from the input. I hate it, but I haven't found another solution that works as well.

There is an intermediary area of the sheet that holds the output of this formula. So user pastes data in to an input range > the FILTER() formula builds the range of new data on the fly > script is meant to copy this data to the end of the existing data.

The Problem

In the UI the Ctrl + Down shortcut works as expected, and will move the cursor to the end of the data in the FILTER() output. When recording a macro it creates a script with the getNextDataCell(SpreadsheetApp.Direction.DOWN).activate(); call as I would expect. The issue is that executing this script results in the cursor moving all the way to the end of where the FILTER() formula extends its influence.

The phantom data is only detectable by the scripts. I can copy paste blank rows from this range and they act as blank in the destination for all user input, but scripts will detect the phantom data in the new location as well. As the title suggests, this applies to the UNIQUE() formula as well, and I would assume QUERY and any other formulas that generate a range of data from a single cell.

I could just accept that these cells exist, let them get copied all over, and then sort my various ranges to get all the data together, but this poses another problem. The copying of all these blank rows results in a rapid increase in rows in my sheet. And with all the copying happening the blank cells will be compounded in some ranges, leading into tens of thousands of new rows being added to a sheet within a few imports.

The Real Question

Is there a way to make getNextDataCell() ignore blank rows from formulas like FILTER and QUERY?

Edit Note: I've removed the secondary questions to focus solely on making App Script ignore blank cells because apparently people think this is too broad as written.

1
Focus on one question at a timeCooper
Welcome to StackOverFlow please take this opportunity to take the tour and learn how to How to Ask, format code, minimal reproducible example and Tag InfoCooper
@Cooper I'm definitely not new to SO. The questions asked are all tightly coupled and would not make much sense to separate into 3 questions. I'd be hit with X Y problem complaints if I did that.Logarr

1 Answers

2
votes

The solution to this problem has been provided by a Gold Product Expert on the Google Support community. The disconnect between the App Script behavior and the UI is not expected and they're going to notify the product team.

The problem is somewhat of an X/Y problem. You can't change the behavior of the getNextDataCell function, but you can make the FILTER formula not spit out blank cells by adding another criteria to it.

For a single column named range it would be:

=FILTER(InputRange, LEN(InputRange), <whatever other criteria you have>)

For a multi-column named range it would be:

=FILTER(InputRange, LEN(INDEX(InputRange,,<column index to check for content>)), <other filter criteria>)

You cannot do this trick with the UNIQUE formula, but you can simply send that data through a FILTER formula with the LEN criteria to get rid of blanks.

It's stupidly obvious to me now, but a result of 0 from LEN is the same as FALSE, and causes the row to be skipped over.