0
votes

I'm setting up a fairly complex Google sheet and trying to automate some routine interpolation with a script. I now have a script that works, but I want to optimise it.

Let me briefly describe the set up with some (simple) example data:

          A             B         C           D              E
1    Lookup date       Result           Recorded date     Value
2    17/8/2018           -               31/12/2018         210
3    31/12/2018         210              31/3/2019          273
4    14/2/2019          241.5            12/6/2019          411
5    31/3/2019          273
6    12/6/2019          411
7    1/7/2019           411

In this example, I have a small number of recorded values (columns D and E) and I want to compute the value for any date (column A). Column B is the output of the script. The problem is that my script is very slow, taking quite a while on my laptop (sometimes I must refresh the page), and never fully executing on my iPad.

I think part of this may be the volume of requests: I run this script for about 200 cells in my sheet.

I will briefly explain the script (full javascript code below).

It creates a custom function getvalue(x, y, lookupdate) which, for a given x-range (col. D) y-range (col. E) and "lookup date" (eg A4) will return the correct result (eg B4). This result is either:

  • blank if the lookup date occurs before the first recorded date
  • the exact value if the lookup date equals a recorded date
  • an interpolated value if the lookup date is in between two recorded dates
  • the final recorded value if the lookup date is beyond the range of the recorded dates

Now I have optimised this somewhat. In my implementation, I actually run it as an array for 100 cells in column A (only some of which actually need to run the script). I have another simple system that basically auto-populates the date in column A as a binary flag to say the script needs to run. So using ISBLANK() as a switch, my array formula for cells B3:B103 is:

=ArrayFormula(IF(ISBLANK(A3:A103),"",getvalue(D:D,E:E,A3:A103)))

Even though the array covers 100 cells, only about 50 of them are "activated" with a date in the A column, so only about 50 of them actually need to run the getvalue function. However, as a final complication, I am actually doing this to calculate four different values for each "lookup date", running four different arrays in four columns, so that's what I say the script runs approx. 200 times.

Here is my actual script:

function getvalue(x, y, lookupdate) {

/// LOOKUP AN ARRAY  
  if (lookupdate.map) {
    return lookupdate.map(function(v) {
      return getvalue(x, y, v);
    });
  }


/// GET RID OF EMPTY CELLS IN COLUMN
  var xf = x.filter(function(el) {
    return el != ""; 
  });

  var yf = y.filter(function(el) {
    return el != "";
  });

/// GET RID OF HEADER ROW
  xf.shift()
  yf.shift()

/// SAVE THE FIRST AND LAST VALUES
  var firstx = xf[0][0]
  var firsty = yf[0][0]
  var lastx = xf[xf.length - 1][0]
  var lasty = yf[yf.length - 1][0]

/// FIGURE OUT WHAT TO RETURN
  if (lookupdate < firstx) {
      return "";
  } else if (lookupdate.valueOf() == firstx.valueOf()) {
      return firsty;
  } else if (lookupdate > lastx) {
      return lasty;
  } else {
      var check = 0, index;
      for(var i = 0, iLen = xf.length; i < iLen; i++) {
          if(xf[i][0] == lookupdate) {
              return yf[i][0];
              } else {      
              if(xf[i][0] < lookupdate && ((xf[i][0] - check) < (lookupdate - check))) {
              check = xf[i][0];
              index = i;
               }
           }
       }
       var xValue, yValue, xDiff, yDiff, xInt;
       yValue = yf[index][0];
       xDiff = xf[index+1][0] - check;
       yDiff = yf[index+1][0] - yValue;
       xInt = lookupdate - check; 

       return (xInt * (yDiff / xDiff)) + yValue;
    }
}

The error message on the iPad is simply the cells never move past "Loading...", and on the laptop it takes much longer than expected.

The most confusing thing is that I think it has gotten worse since I set it up as an array. I previously had it where all 400 cells would run the ISBLANK() check, and then for the approx 200 triggered cells, they would individually run the script. This at least would load on the iPad. I read on here and on general Google support that scripts will run a lot faster if you batch operations, but it seems to have gotten slower since moving from 200 single cells to 4 arrays.

Does this need to be optimised further, or is there some other reason it might be stalling on my iPad? Is it even possible to optimise it down and do this in a single call, instead of in 4 arrays?

1
Show execution transcriptTheMaster
Also, Don't use ARRAYFORMULA/ISBLANK. If you must, try https://stackoverflow.com/a/46884012/TheMaster
Thanks @TheMaster I will try that suggestion you linked to. For the execution transcript, it simply says "No execution transcripts found. Please run your script and try again." Perhaps because this is not a self-contained script, but more like a custom Sheets formula?Tobes
Show view> Executions. If that doesn't log time, custom log it using console.time('test1') and console.timeEnd('test1'). Custom log different parts of the script to isolate the part of the script taking the longest time. Also, Just try plain =getvalue(D:D,E:E,A3:A103)TheMaster

1 Answers

0
votes
  1. Accounting for the case else if (lookupdate.valueOf() == firstx.valueOf()) return firsty; is superfluous because it is covered already by if(xf[i][0] == lookupdate)
  2. (xf[i][0] - check) < (lookupdate - check) can be simplified to xf[i][0] < lookupdate
  3. You are using pure javascript code, but keep in mind that App Script has many additional functions which are handy when working with Spreadsheet. https://developers.google.com/apps-script/reference/spreadsheet/ So, e.g. for running your function only for the populated range functions like getDataRange() or getRange() in combination with getNextDataCell() and getLastRow() will be very useful for you.
  4. The main important point - the functionality of your script. Are you assuming that there is an approximately linear relationship between Recorded date and value, and thus interpolate the value for not recorded dates? In this case the statistically most precise way (and the programmatically simplest one) would be to calculate your slope between the first and last x and y respectively. That is: Result=first_y+((y_last-y_first)/(x_last-x_first)*(Lookup_Date-first_x))

If this approach is suitable for you, your code would simplify and would look in App Script something like:

function myFunction() {
  var ss=SpreadsheetApp.getActiveSpreadsheet().getActiveSheet();
  var Result_Range=ss.getRange("A2:B")
  var limit=Result_Range.getNextDataCell(SpreadsheetApp.Direction.DOWN).getLastRow()
  var Result_values=Result_Range.getValues();
  var valueRange=ss.getRange("D1:E");
  var values=valueRange.getValues();
  var last_Index=valueRange.getNextDataCell(SpreadsheetApp.Direction.DOWN).getLastRow()
  var last_y=values[last_Index-1][1];
  var last_x=values[last_Index-1][0].valueOf();
  var first_y=values[1][1];
  var first_x=values[1][0].valueOf();
  var slope=(last_y-first_y)/(last_x-first_x);

  for(var i=1;i<limit;i++)
  {  
    Result_Range.getCell(i,2).setValue(first_y+(slope*(Result_values[i-1][0].valueOf()-first_x)))
    Logger.log(i)
    Logger.log(Result_values[i][0].valueOf()-first_x)
  }
}