1
votes

I have searched, copied and modified code, and tried to break down what others have done and I still can't get this right.

I have email receipts for an ecommerce webiste, where I am trying to harvest particular details from each email and save to a spreadsheet with a script.

Here is the entire script as I have now.

function menu(e) {
  var ui = SpreadsheetApp.getUi();
  ui.createMenu('programs')
      .addItem('parse mail', 'grabreceipt')
      .addToUi();
}

function grabreceipt() {

  var ss = SpreadsheetApp.getActiveSheet();
  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var s = ss.getSheetByName("Sheet1");
  var threads = GmailApp.search("(subject:order receipt) and (after:2016/12/01)");
  var a=[];
  for (var i = 0; i<threads.length; i++)
  {
    var messages = threads[i].getMessages();

    for (var j=0; j<messages.length; j++)
    {
    var messages = GmailApp.getMessagesForThread(threads[i]);
    for (var j = 0; j < messages.length; j++) {
      a[j]=parseMail(messages[j].getPlainBody());
    }
  }
  var nextRow=s.getDataRange().getLastRow()+1;
  var numRows=a.length;
  var numCols=a[0].length;
  s.getRange(nextRow,1,numRows,numCols).setValues(a);
}

function parseMail(body) {
  var a=[];
  var keystr="Order #,Subtotal:,Shipping:,Total:";
  var keys=keystr.split(",");
  var i,p,r;
  for (i in keys)  {
    //p=keys[i]+(/-?\d+(,\d+)*(\.\d+(e\d+)?)?/);
    p=keys[i]+"[\r\n]*([^\r^\n]*)[\r\n]";
    //p=keys[i]+"[\$]?[\d]+[\.]?[\d]+$";
    r=new RegExp(p,"m");
    try {a[i]=body.match(p)[1];}
    catch (err) {a[i]="no match";}
  }
  return a;
}
}

So the email data to pluck from comes as text only like this:

Order #89076
(body content, omitted)
Subtotal: $528.31
Shipping: $42.66 via Priority Mail®
Payment Method: Check Payment- Money order
Total: $570.97

Note: mywebsite order 456. Customer asked about this and that... etc.

The original code regex was designed to grab content, following the keystr values which were easily found on their own line. So this made sense:

p=keys[i]+"[\r\n]*([^\r^\n]*)[\r\n]";

This works okay, but results where the lines include more data that follows as in line Shipping: $42.66 via Priority Mail®.

My data is more blended, where I only wish to take numbers, or numbers and decimals. So I have this instead which validates on regex101.com

p=keys[i]+"[\$]?[\d]+[\.]?\d+$";

The expression only, [\$]?[\d]+[.]?\d+$ works great but I still get "no match" for each row.

Additionally, within this search there are 22 threads returned, and it populates 39 rows in the spreadsheet. I can not figure out why 39?

1
The question is too verbose, I would shorten it to what is required. In this case, it would be the example strings(atleast 3-4 examples) you have in the email. The regex you use to match it and the result you get. Finally, mention what you actually want to match and ask for input to modify the regexp. Also, a Regex tag to the question will help you get the appropriate audience.Jack Brown
Also, I would use services like regex101.com to help perfect the regex, if you have not already done so.Jack Brown
Thanks Jack. I took your advice and modified the question, plus I ran some of my own trials per your suggestion on regex101.com. Mentioned in the question, I have a successful match for what I want with the new expression, and I end up with worse results than in the original code.Jasun Thomas
This becomes part of the issue, when I work at it making changes to the script, getting different (yet incorrect) results, then changing it back and getting different results than I had on the first round of errors. How does one weed out specific issues or variables, when the results in themselves become variables even when reverting back to a known point?Jasun Thomas

1 Answers

0
votes

The reason for your regex not working like it should is because you are not escaping the "\" in the string you use to create the regex

So a regex like this

"\s?\$?(\d+\.?\d+)"

needs to be escaped like so:

"\\s?\\$?(\\d+\\.?\\d+)"

The below code is just modified from your parseEmail() to work as a snippet here. If you copy this to your app script code delete document.getElementById() lines.

Your can try your example in the snippet below it will only give you the numbers.

function parseMail(body) {
 if(body == "" || body == undefined){
  var body = document.getElementById("input").value
  }
  var a=[];
  var keystr="Order #,Subtotal:,Shipping:,Total:";
  var keys=keystr.split(",");
  var i,p,r;
  for (i in keys)  {
    p=keys[i]+"\\s?\\$?(\\d+\\.?\\d+)";
    r=new RegExp(p,"m");
    try {a[i]=body.match(p)[1];}
    catch (err) {a[i]="no match";}
  }
  document.getElementById("output").innerHTML = a.join(";")
  return a;
}
<textarea id ="input"></textarea>
<div id= "output"></div>
<input type = "button" value = "Parse" onclick = "parseMail()">

Hope that helps