I have searched, copied and modified code, and tried to break down what others have done and I still can't get this right.
I have email receipts for an ecommerce webiste, where I am trying to harvest particular details from each email and save to a spreadsheet with a script.
Here is the entire script as I have now.
function menu(e) {
var ui = SpreadsheetApp.getUi();
ui.createMenu('programs')
.addItem('parse mail', 'grabreceipt')
.addToUi();
}
function grabreceipt() {
var ss = SpreadsheetApp.getActiveSheet();
var ss = SpreadsheetApp.getActiveSpreadsheet();
var s = ss.getSheetByName("Sheet1");
var threads = GmailApp.search("(subject:order receipt) and (after:2016/12/01)");
var a=[];
for (var i = 0; i<threads.length; i++)
{
var messages = threads[i].getMessages();
for (var j=0; j<messages.length; j++)
{
var messages = GmailApp.getMessagesForThread(threads[i]);
for (var j = 0; j < messages.length; j++) {
a[j]=parseMail(messages[j].getPlainBody());
}
}
var nextRow=s.getDataRange().getLastRow()+1;
var numRows=a.length;
var numCols=a[0].length;
s.getRange(nextRow,1,numRows,numCols).setValues(a);
}
function parseMail(body) {
var a=[];
var keystr="Order #,Subtotal:,Shipping:,Total:";
var keys=keystr.split(",");
var i,p,r;
for (i in keys) {
//p=keys[i]+(/-?\d+(,\d+)*(\.\d+(e\d+)?)?/);
p=keys[i]+"[\r\n]*([^\r^\n]*)[\r\n]";
//p=keys[i]+"[\$]?[\d]+[\.]?[\d]+$";
r=new RegExp(p,"m");
try {a[i]=body.match(p)[1];}
catch (err) {a[i]="no match";}
}
return a;
}
}
So the email data to pluck from comes as text only like this:
Order #89076
(body content, omitted)
Subtotal: $528.31
Shipping: $42.66 via Priority Mail®
Payment Method: Check Payment- Money order
Total: $570.97
Note: mywebsite order 456. Customer asked about this and that... etc.
The original code regex was designed to grab content, following the keystr values which were easily found on their own line. So this made sense:
p=keys[i]+"[\r\n]*([^\r^\n]*)[\r\n]";
This works okay, but results where the lines include more data that follows as in line Shipping: $42.66 via Priority Mail®.
My data is more blended, where I only wish to take numbers, or numbers and decimals. So I have this instead which validates on regex101.com
p=keys[i]+"[\$]?[\d]+[\.]?\d+$";
The expression only, [\$]?[\d]+[.]?\d+$ works great but I still get "no match" for each row.
Additionally, within this search there are 22 threads returned, and it populates 39 rows in the spreadsheet. I can not figure out why 39?