I am new to Google App scripting, and I have no prior knowledge of scripting of any type other than basic HTML. However, Google App script didn't pose much of a challenge thanks to this forum.
I am a data analyst, and has been researching on low-cost/open source ways to emulate some of basic Big Data advantages for website publishers. My quest brought me to Google scripts. I have been able to write a few since I got to know about it a week ago.
The objective challenge is:
I have a spreadsheet that dynamically pulls about 1000 rows using IMPORTHTML function. The range automatically refreshes as the source refreshes everyday, so previous day's data is lost. That calls for backing up the data in an archive sheet, so that I can analyze the historical data on time-ranges of choice.
I want to automatically copy the rows and paste them on the top of the archive sheet, just below the range header, so that I don't have to sort the archive by dates, which may be required for data analysis. I also need to check for duplicate rows and remove them--just in case.
First I wrote a script that appended the copied rows below the last row of the archived range. However, sorting by date became necessary, as I had to filter the data by specific date ranges, say 14 days or 7 days, for advanced analysis. So I added a snippet for sorting and another for removing duplicates. It works well, however, sorting takes a long time. And considering thousands of new rows being added everyday, it will continue to take longer. I needed a smarter solution.
So I started writing a script that will (1) detect the number of rows in the source range (2) Insert as many rows below the header of the archive sheet and (3) paste copied range into the newly inserted rows.
I finished writing it, and it works very fast; apparently no sorting is required. However, I was wondering, if there is a way to make it even quicker and smarter and future-proof. Please find the code below. Any suggestion will be highly appreciated.
function myFunction() {
//1. Get data from source sheet of a spreadsheet whose id is known, we will also need the data range's last row number
var firstStep = SpreadsheetApp.openById("ID of Source Spreadsheet");
var ss = firstStep.getSheetByName("Sheet1");
ss.activate();
var myRange = ss.getRange(4, 2, ss.getLastRow() - 3, ss.getLastColumn());
var myData = myRange.getValues();
//'3' subtracted from last row data collector above as first three rows contain static data or blank row in my source sheet. Applied same technique at line 17 below as well. This totally depends on how you position the source range in the source sheet. For exaple, for a range starting at 1,1 on any sheet, no such subtraction woud be required.
var lastRow = myRange.getLastRow() - 3;
//2. Open archive spreadsheet, select the destination sheet, insert exact number of rows of source range and then paste copied range.
var secondStep = SpreadsheetApp.openById("ID of archive spreadsheet");
var newSS = secondStep.getSheetByName("dump1");
newSS.activate();
//2.a Insert Rows as in #lastrow in the new sheet, just below the header at Row 1
newSS.insertRowsBefore(2, lastRow)
//2.b Paste values
newSS.getRange(2, 1, myData.length, myData[0].length).setValues(myData);
//2.c Paste last row number of the copied range in another cell of the same sheet, optional step, just to be sure that last row determination process is right. You may remove this step if you like.
newSS.getRange(1, 15).setValue(lastRow);
/*
//3.a Optional: Script to remove duplicate rows in archive sheet. Will increase the script-run duration considerably.
var data = newSS.getDataRange().getValues();
var newData = new Array();
for(i in data){
var row = data[i];
var duplicate = false;
for(j in newData){
if(row.join() == newData[j].join()){
duplicate = true;
}
}
if(!duplicate){
newData.push(row);
}
}
newSS.clearContents();
newSS.getRange(1, 1, newData.length, newData[0].length).setValues(newData);
*/
}