Segmenting a tokenized string to include parts which do not contain tokens

Question

Background Information:

Currently working on a word add in which will require me to used different styles depending on a designated start and end token which will be ~~randomTextandChar~~...........~~end~~. I am currently splitting on ~~end~~, however this will ignore a paragraph which may not have a token and combine it with the paragraph that contains a token.

Current Problem:

When I am splitting paragraphs according to styles I am using contentToInsert.split("~~end~~"); however when a paragraph does not contain the designated token it is combined with the next paragraph which does have a token, making both paragraphs acquire the same styling.

Desired results

I would like to split according to a paragraph token however, I would like to also separate segments that do not have tokens. This way I would now that the paragraphs with out token will not need any type of styling. Referencing the text below, I would like to have an array of three elements, one for each paragraph.

Example Text

~~/Document Heading 1~~ [Paragraph 1 /Document Heading 1]Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae, ultricies eget, tempor sit amet, ante. Donec eu libero sit amet quam egestas semper. Aenean ultricies mi vitae est. Mauris placerat eleifend leo. Quisque sit amet est et sapien ullamcorper pharetra. Vestibulum erat wisi, condimentum sed, commodo vitae, ornare sit amet, wisi. Aenean fermentum, elit eget tincidunt condimentum, eros ipsum rutrum orci, sagittis tempus lacus enim ac dui. Donec non enim in turpis pulvinar facilisis. Ut felis. Praesent dapibus, neque id cursus faucibus, tortor neque egestas augue, eu vulputate magna eros eu erat. Aliquam erat volutpat. Nam dui mi, tincidunt quis, accumsan porttitor, facilisis luctus, metus~~end~~

[Paragraph 2 Normal]Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae, ultricies eget, tempor sit amet, ante. Donec eu libero sit amet quam egestas semper. Aenean ultricies mi vitae est. Mauris placerat eleifend leo.

~~/Document Heading 2~~ [Paragraph 3 /Document Heading 2]Morbi in sem quis dui placerat ornare. Pellentesque odio nisi, euismod in, pharetra a, ultricies in, diam. Sed arcu. Cras consequat. Praesent dapibus, neque id cursus faucibus, tortor neque egestas augue, eu vulputate magna eros eu erat. Aliquam erat volutpat. Nam dui mi, tincidunt quis, accumsan porttitor, facilisis luctus, metus. Phasellus ultrices nulla quis nibh. Quisque a lectus. Donec consectetuer ligula vulputate sem tristique cursus. Nam nulla quam, gravida non, commodo a, sodales sit amet, nisi. Pellentesque fermentum dolor. Aliquam quam lectus, facilisis auctor, ultrices ut, elementum vulputate, nunc. ~~end~~

Current Code

  var contentToInsert = selectedContent.toString();

        if (selectedContent.toString().search("~~") <= 0) {

        contentToInsertWithStyles = contentToInsert.split("~~end~~");
        var elementToInspect;

        for (var x = 0; x < contentToInsertWithStyles.length; x++) {

            elementToInspect = contentToInsertWithStyles[x].toString().search("~~");
            //-1 is given if the string does not contain designated char
           //[Not working as desired]
            if (elementToInspect === -1) {
                segmentedStyles.push({
                    ContentStyle: "Normal",
                    ContentText: contentToInsertWithStyles[x]
                });
            }

            else {
                var styleType = contentToInsertWithStyles[x].match(/~~([^]+)~~/);
                segmentedStyles.push({
                    ContentStyle: styleType[1],
                    ContentText: contentToInsertWithStyles[x].replace(styleType[0], '').trim()
                });
            }
        }
    }

Appendix

This Code is not working, it only splits according to ~~end~~, the result of this code will combine the second paragraph 2 with paragraph 3 making just array elements which I do not want. I am looking to have three array elements.

Mårten Wikström Mårten Wikström · Accepted Answer · 2016-06-14T21:15:17

This function solves your problem:

function getSegmentedStyles(text) {
    var pattern = /^~~((?:(?!~~).)+)~~((?:(?!~~end~~).)+)~~end~~/gm;
    var pos = 0;
    var match;
    var result = [];

    function trim(str) {
        return str.replace(/(^\s+)|(\s+$)/, "");
    }

    function add(style, content) {
        var trimmed = trim(content);
        if (trimmed) {
            result.push({
                ContentStyle: style,
                ContentText: trimmed
            })
        }
    }

    while (match = pattern.exec(text)) {
        if (match.index > pos) {
            add("Normal", text.substr(pos, match.index));
        }

        add(match[1], match[2]);

        pos = match.index + match[0].length;
    }

    if (pos < text.length) {
        add("Normal", text.substr(pos));
    }

    return result;
}

The pattern matches styled blocks. Segments of text between matches are added as "Normal" segments. In addition; leading and trailing white space is removed from blocks, and empty blocks are ignored.

See this JSFiddle for a working example:

https://jsfiddle.net/o17Lq11x/

Or have a look at this regex-101 snippet that shows how styled blocks are captured:

https://regex101.com/r/xM9bD0/1

Segmenting a tokenized string to include parts which do not contain tokens

3 Answers