3
votes

I'm parsing multiple double quoted literal text from a Visual Basic 6 Source file. Some lines may have comments at the end of each statement. Each comment is preceded by a single quote. In addition, the literal text may have single quotes which I need to retain. The line below is an example of a statement with a comment at the end.

Example Line: MsgBox "Must enter at least 2 'characters' before doing a Healthcare Data Dictionary Search.", vbInformation, "Search HDD" 'This is a "comment".

The following Regular Expression will return:

Must enter at least 2 'characters' before doing a Healthcare Data Dictionary Search.
Search HDD
comment

The following Regular Expression will capture/parse multiple double quoted string literals, however it does not ignore double quoted strings that come after a single quote (within a comment).

Regular Expression: "([^""]*)(?:\.[^""\\])*"
C#-Style: @"""([^""""]*)(?:\.[^""""\\])*"""

I would like to be able to strip off the comment however if I look for a single quote, that single quote could potentially be in the double quoted string I want to keep, thus stripping off half the double quoted string.

Please let me know if this is not clear and I'll try to clarify.

Any suggestions?

1
If you are doing this for an entire Source file, you are unlikely to find a regular expression that will work for all cases - you'd probably be better off writing a parser. It may (or may not) be slightly slower, but you're likely going to run it just a few times and speed won't matter that much. Regular Expressions only work well when you have predictable and consistent patterns.Wonko the Sane
I am writing a parser. This is just one piece if it. I've been trying to narrow down the string using a non regex solution to the point where I can use the solution above. I haven't had much luck though. I was hoping I might find something here that would help me.GhostHunterJim
To clarify, I'm looking for a regex to handle line by line, not the entire file.GhostHunterJim
So the example line is one that you're trying to solve, correct?tsacodes
@tsacodes Yes, that's correct.GhostHunterJim

1 Answers

0
votes

I saw that you tagged this with c#. Why not use c# and LINQ to your advantage! Would something like the below work for you?

   var text = "MsgBox \"Must enter at least 2 'characters' before doing a Healthcare Data Dictionary Search.\", vbInformation, \"Search HDD\" 'This is a \"comment\".";

   //Use LINQ to count singlequotes
   var singleQuoteOccurences = text.Count(sq => sq == '\'');

   //If you have an odd number, that means a comment is at the end
   //so just strip off everything after that last quote
   if(singleQuoteOccurences % 2 == 1)
        text = text.Substring(0, text.LastIndexOf('\''));

Yields:

MsgBox "Must enter at least 2 'characters' before doing a Healthcare Data Dictionary Search.", vbInformation, "Search HDD"

This is something that you could easily encapsulate into a "StringVBTrailingComment(string line)" or something like that.