0
votes

This might be very simple. I just want to match all strings within strings, including new line breaks. Example:

textfile:

MESSAGE BEGIN

mary had a little lamb.

little lamb

MESSAGE END

output expectation:

mary had a little lamb.

little lamb

Here is what i currently have. it works okay, except everything is in 1 line.

Code (I currently have):

$pattern= Regex::"MESSAGE BEGIN(.*?)MESSAGE END"

[regex]::Match($text,$pattern).Groups[1].Value

result:

mary had a little lamb.little lamb

I would like it to respect line breaks, so that they are not all crammed together.

4
Are you sure that the line breaks are not there? I suggest that maybe they are there, but you just can't see them in the tool you are using. - Tim Biegeleisen
@wp78de But it appears that dot is already matching across newlines. - Tim Biegeleisen
Content comes from a text file, where there is a return, I guess. It matches exactly what I want it to match, but it doesn't respect the newline\break. I am sorry if I am using the wrong term. - j. doe
If the line breaks are in your file, then the should be retained. I guess the problem is the way you read the file. - wp78de
([\s\S]*?) not quite. but it worked better than others. same output as my original (.*?) - j. doe

4 Answers

1
votes

Use look arounds:

(?<=MESSAGE BEGIN)[\s\S]+(?=MESSAGE END)

Will match any text between (but not including) MESSAGE BEGIN and MESSAGE END.

For discussion of supported regular expresions in Powershell visit: https://blogs.technet.microsoft.com/heyscriptingguy/2016/10/21/powershell-regex-crash-course-part-4-of-5/

1
votes

The first part here is to use a pattern like [\s\S]* instead of the . to match newlines too. You want to match lazy+?/*? to avoid to match too much (e.g. from the first MESSAGE BEGIN to the last MESSAGE END if there are multiple message blocks.)

Pattern:

MESSAGE BEGIN([\s\S]*?)MESSAGE END

or if you just want the inner part use look-arounds (still lazy *?):

(?<=MESSAGE BEGIN)[\s\S]*?(?=MESSAGE END)

End-to-end code sample:

$text = [IO.File]::ReadAllText(".\a.txt")

$matches = [regex]::matches($text, "MESSAGE BEGIN([\s\S]*?)MESSAGE END");
ForEach($match in $matches) {
  #Write-Output $match.Value.Trim(); #if you use look-arounds
  Write-Output $match.Groups[1].Value.Trim();
}
0
votes
MESSAGE BEGIN(\s|\S)*MESSAGE END

(.*?) matches all characters, except for line terminators.

\s matches any whitespace character (equal to [\r\n\t\f\v ])

\S matches any non-whitespace character (equal to [^\r\n\t\f\v ])

Include a bar | in the capture group to match either \s or \S

Then a star * after the capture group to match zero to unlimited characters

Link to example

0
votes

I've created an example in javascript.

const texto = `
MESSAGE BEGIN

mary had a little lamb.

little lamb

MESSAGE END
`

const regex = /MESSAGE\sBEGIN[\s\S]*MESSAGE\sEND/gi

console.log(texto.match(regex))

The output is:
[ 'MESSAGE BEGIN\n\nmary had a little lamb.\n\nlittle lamb\n\nMESSAGE END' ]

The breaklines were kept.