I'm writing a Ruby script that uses regex to find all comments of a specific format in Objective-C source code files.
The format is
/* <Headline_in_caps> <#>:
<Comment body>
**/
I want to capture the headline in caps, the number and the body of the comment.
With the regex below I can find one comment in this format within a larger body of text.
My problem is that if there are more than one comments in the file then I end up with all the text, including code, between the first /* and last **/. I don't want it to capture all text inclusively, but only what is within each /* and **/.
The body of the comment can include all characters, except for **/ and */ which both signify the end of a comment. Am I correct assuming that regex will find multiple-whole-regex-matches only processing text once?
\/\*\s*([A-Z]+). (\d)\:([\w\d\D\W]+)\*{2}\//x
Broken apart the regex does this:
\/\* —finds the start of a comment
\s* —finds whitespace
([A-Z]+) —captures caps word
.<space> —find the space in between caps word and digit
(\d) —capture the digit
\: —find the colon
([\w\W\d\D]+) —captures the body of a message which can include all valid characters, except **/ or */
\*{2}\/ —finds the end of a comment
Here is a sample, everything from the first /* to the second **/ is captured.:
/*
HEADLINE 1:
Comment body.
**/
- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions
{
// This text and method declaration are captured
// The regex captures from HEADLINE to the end of the comment "meddled in." inclusively.
/*
HEADLINE 2:
Should be captured separately and without Objective-C code meddled in.
**/
}
Here is the sample on Rubular: http://rubular.com/r/4EoXXotzX0
I'm using gsub to process the regex on a string of the whole file, running Ruby 1.9.3. Another issue I have is that gsub gives me what Rubular ignores, is this a regression or is Rubular using a different method that gives what I want?
In this question Regex matching multiple occurrences per file and per line about multiple occurrences the answer is to use g for the global option, that is not valid in Ruby regex.