4
votes

I am working with a tokenized string, the token always begins with ~~Example~~ and ends with ~~end~~. I am trying to work out a regex expression that will grab both tokens. I currently have /~~([^])\w+~~/ but this is only grabbing the end token which is ~~end~~. The following example will clarify my question.

Current Regex expression /~~([^])\w+~~/

Example text:

~~/Document Heading 1~~ [Paragraph 1 /Document Heading 1]Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae, ultricies eget, tempor sit amet, ante. Donec eu libero sit amet quam egestas semper. Aenean ultricies mi vitae est. Mauris placerat eleifend leo. Quisque sit amet est et sapien ullamcorper pharetra. Vestibulum erat wisi, condimentum sed, commodo vitae, ornare sit amet, wisi. Aenean fermentum, elit eget tincidunt condimentum, eros ipsum rutrum orci, sagittis tempus lacus enim ac dui. Donec non enim in turpis pulvinar facilisis. Ut felis. Praesent dapibus, neque id cursus faucibus, tortor neque egestas augue, eu vulputate magna eros eu erat. Aliquam erat volutpat. Nam dui mi, tincidunt quis, accumsan porttitor, facilisis luctus, metus ~~end~~

Current result:
I am currently only grabbing the last token ~~end~~ with the current expression.

Desired result:
I would like both ~~/Document Heading 1~~ and ~~end~~ , however, it is important to point out that the beginning of this token "~~/Document Heading 1~~" can contain anything between ~~ however the ending of the token "~~end~~" will always be the same.

5
If the end token is always the same, why do you still want to grab it?trincot
the idea is I want to segment the incoming context that may have different headers in columns which I can the use to insert it into a word docuser6446203

5 Answers

3
votes

You may use 2 regex to match anything between 2 mutlicharacter delimiters.

A lazy matching solution:

/~~([^]*?)~~/g

See the regex demo. This can be written as /~~([\s\S]*?)~~/g, too, and captures any 0+ characters between leading ~~ and trailing ~~ as few as possible.

Another way is by using negated character classes (to unroll the lazy matching pattern):

/~~([^~]*(?:~(?!~)[^~]*)*)~~/g

See another regex demo. This alternative is good to use if the strings you have a very long. The [^~]* matches 0+ chars other than ~, and (?:~(?!~)[^~]*)* matches 0+ sequences of a ~ that is not followed with another ~ and then 0+ chars other than ~.

var re = /~~([^]*?)~~/g; 
var str = '~~/Document Heading 1~~\n[Paragraph 1 /Document Heading 1]Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae, ultricies eget, tempor sit amet, ante. Donec eu libero sit amet quam egestas semper. Aenean ultricies mi vitae est. Mauris placerat eleifend leo. Quisque sit amet est et sapien ullamcorper pharetra. Vestibulum erat wisi, condimentum sed, commodo vitae, ornare sit amet, wisi. Aenean fermentum, elit eget tincidunt condimentum, eros ipsum rutrum orci, sagittis tempus lacus enim ac dui. Donec non enim in turpis pulvinar facilisis. Ut felis. Praesent dapibus, neque id cursus faucibus, tortor neque egestas augue, eu vulputate magna eros eu erat. Aliquam erat volutpat. Nam dui mi, tincidunt quis, accumsan porttitor, facilisis luctus, metus\n~~end~~ \n';
var res = [];
while ((m = re.exec(str)) !== null) {
    res.push(m[1]);
}
document.body.innerHTML = "<pre>" + JSON.stringify(res, 0, 4) + "</pre>";
3
votes

this might work if you use it globally

(~~.*?~~)

~~ matches the characters ~~ literally

.*? matches any character (except newline)

Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]

~~ matches the characters ~~ literally

g modifier: global. All matches (don't return on first match)

if you haven't already checked it out https://regex101.com/ is a great resource for testing out these expressions

3
votes

/~~(.|[\r\n])*?~~/

should work for you (assuming you set greedy flag of course)

2
votes

In your regexp you are missing the tilde char inside the squar brachets:

/~~([^~]+)~~/mg

You could test your expressions in:

https://regex101.com/

1
votes

Try This

(~~.*?~~)(?:.|\n|\r)*?(~~end~~)

OutPut

Match 1
1.  ~~/Document Heading 1~~
2.  ~~end~~
Match 2
1.  ~~/Document Heading 1~~
2.  ~~end~~

Groups will have your start and end values