0
votes

I am new to Google App scripting, and I have no prior knowledge of scripting of any type other than basic HTML. However, Google App script didn't pose much of a challenge thanks to this forum.

I am a data analyst, and has been researching on low-cost/open source ways to emulate some of basic Big Data advantages for website publishers. My quest brought me to Google scripts. I have been able to write a few since I got to know about it a week ago.

The objective challenge is:

  1. I have a spreadsheet that dynamically pulls about 1000 rows using IMPORTHTML function. The range automatically refreshes as the source refreshes everyday, so previous day's data is lost. That calls for backing up the data in an archive sheet, so that I can analyze the historical data on time-ranges of choice.

  2. I want to automatically copy the rows and paste them on the top of the archive sheet, just below the range header, so that I don't have to sort the archive by dates, which may be required for data analysis. I also need to check for duplicate rows and remove them--just in case.

First I wrote a script that appended the copied rows below the last row of the archived range. However, sorting by date became necessary, as I had to filter the data by specific date ranges, say 14 days or 7 days, for advanced analysis. So I added a snippet for sorting and another for removing duplicates. It works well, however, sorting takes a long time. And considering thousands of new rows being added everyday, it will continue to take longer. I needed a smarter solution.

So I started writing a script that will (1) detect the number of rows in the source range (2) Insert as many rows below the header of the archive sheet and (3) paste copied range into the newly inserted rows.

I finished writing it, and it works very fast; apparently no sorting is required. However, I was wondering, if there is a way to make it even quicker and smarter and future-proof. Please find the code below. Any suggestion will be highly appreciated.

function myFunction() {


  //1. Get data from source sheet of a spreadsheet whose id is known, we will also need the data range's last row number

  var firstStep = SpreadsheetApp.openById("ID of Source Spreadsheet");
  var ss = firstStep.getSheetByName("Sheet1");
  ss.activate();
  var myRange = ss.getRange(4, 2, ss.getLastRow() - 3, ss.getLastColumn());
  var myData = myRange.getValues();


  //'3' subtracted from last row data collector above as first three rows contain static data or blank row in my source sheet. Applied same technique at line 17 below as well. This totally depends on how you position the source range in the source sheet. For exaple, for a range starting at 1,1 on any sheet, no such subtraction woud be required.

  var lastRow = myRange.getLastRow() - 3;


  //2. Open archive spreadsheet, select the destination sheet, insert exact number of rows of source range and then paste copied range. 

  var secondStep = SpreadsheetApp.openById("ID of archive spreadsheet");
  var newSS = secondStep.getSheetByName("dump1");
  newSS.activate();


  //2.a  Insert Rows as in #lastrow in the new sheet, just below the header at Row 1

  newSS.insertRowsBefore(2, lastRow)


  //2.b Paste values

  newSS.getRange(2, 1, myData.length, myData[0].length).setValues(myData);


  //2.c Paste last row number of the copied range in another cell of the same sheet, optional step, just to be sure that last row determination process is right. You may remove this step if you like.  

  newSS.getRange(1, 15).setValue(lastRow);

  /*   
   //3.a Optional: Script to remove duplicate rows in archive sheet. Will increase the script-run duration considerably. 
   
   var data = newSS.getDataRange().getValues();
   var newData = new Array();
   for(i in data){
     var row = data[i];
     var duplicate = false;
     for(j in newData){
       if(row.join() == newData[j].join()){
         duplicate = true;
       }
     }
    if(!duplicate){
       newData.push(row);
     } 
   }
   newSS.clearContents();
   newSS.getRange(1, 1, newData.length, newData[0].length).setValues(newData);
   
   */

}
1
unless you plan to have only a few thousand rows, you should seriously consider google bigquery instead. there are samples for using it from apps script.Zig Mandel
Hi Zig Mandel, thanks for your comment. Can you please share the links/resources for using bigquery?AliveToLearn
what have you googled so far?Zig Mandel
Thanks Zig for writing back. I found a third party plugin to run Bigquery analysis and show results on Google Spreadsheet. It's OWOX. However, it will involve learning Bigquery. I have one question. I have been using Google Visyalization Query Language for advanced analysis for a while, it's amazing. Are the Bigquery commands similar? Where do I get the basics of Bigquery functions? Thanks again!AliveToLearn
you really need to start reading the bigquery official docs before making all these questions.Zig Mandel

1 Answers

0
votes

Anything you can accomplish within Google apps Script itself will be much faster than making calls that need to fetch data from Google's servers or an external server such as requests to Spreadsheets, Docs, sites and so on. Your scripts will run faster if you can find ways to minimize the calls the scripts make to those services.

To speed up a script, read all data into an array with one command, perform any operations on the data in the array and write the data out with one command.

Here's an example:

var cell = sheet.getRange('a1');
var colors = new Array(100);
for (var y = 0; y < 100; y++) {
xcoord = xmin;
colors[y] = new Array(100);
for (var x = 0; x < 100; x++) {
colors[y][x] = getColor_(xcoord, ycoord);
xcoord += xincrement;
}
ycoord -= yincrement;
}
sheet.getRange(1, 1, 100, 100).setBackgroundColors(colors);

You must use the Google's Best Practice, The highlight from Google's List are:

  • Reduce the number of API calls
  • When making API calls, batch the requests
  • Use the Apps Script built in cache service
  • Do not use UIApp; use HTMLService

Here's a document list best practices that will help you improve the performance of your scripts: https://developers.google.com/apps-script/best_practices#minimize-calls-to-other-services