0
votes

skip counting words in parentheses when splitting string into rows so the rows have same number of words that are not inside parentheses but still include the parentheses in the split. For my cells setup please see attached image:

skip counting words in parentheses

the following UDF function works in splitting a long string into rows of predefined number of words in cell "D3".

Public Function SplitOnNth(ByVal inputStr$, ByVal StartPos&, ByVal NumWords&) As String

    Dim arr() As String, i As Long, newArr() As String
    arr = Split(inputStr)
    ReDim newArr(NumWords - 1)

    'Arrays are zero-based, but your string isn't. Subtract 1
    For i = StartPos - 1 To StartPos + NumWords - 2
        If i > UBound(arr) Then Exit For    'Exit if you loop past the last word in string

       'ANYTHING IN PARENTHESES SHOULD BE SKIPPED IN WORDS COUNT BUT INCLUDED IN THE WORDS SPLIT

        newArr(i - StartPos + 1) = arr(i)
    Next

    SplitOnNth = Join(newArr, " ")

End Function

This is what I have tried to solve the issue. It is terribly hard coded and only works for one row. YOUR HELP IS MUCH APPRECIATED AND THANKS IN ADVANCE.


              'ANYTHING IN PARENTHESES SHOULD BE SKIPPED IN WORDS COUNT BUT INCLUDED IN THE WORDS SPLIT

        If arr(i) Like "*(*" & "*)*" Then
            newArr(i - StartPos + 1) = arr(i) + " " + _
                                    arr(i + 1) + " " + _
                                    arr(i + 2) + " " + _
                                    arr(i + 3) + " " + _
                                    arr(i + 4) + " "
            Exit For
        Else
        newArr(i - StartPos + 1) = arr(i)
        End If

        'result in the first two rows is ok but not for the rest of the rows
        'One two three four
        '(1) five six seven eight

1
FYI the parentheses could be anywhere in the string.morad medo
What rule to use to make your first line One two three four vs One two three four (1)? From what you write, they would both count as four words.Ron Rosenfeld
Thanks for answering. I used "=splitOnNth($B$3,$E4*$D$3-$D$3+1,$D$3)" and yes both cases will be regarded as four words.morad medo

1 Answers

1
votes

I would use Regular Expressions to do the splitting, and I would return the result as an array, from which you can either use the INDEX function to return the relevant row, or enter the formula as an array-formula over a range of cells.

You will note, in the code, two different Regex's depending on whether you want to keep the parenthesized substrings with the subsequent or preceding string.

Also note that the regex will handle multiple sequential parenthesized substrings:

Edit: (bugs corrected)

Option Explicit
Function splitOnNth(inputString As String, numWords As Long) As String()
    Dim RE As Object, MC As Object
    Dim I As Long, J As Long
    Dim strArray() As String

Set RE = CreateObject("vbscript.regexp")
With RE
    .Global = True
    .Pattern = "(?:(?:\([^)]+\)\s+)+)?\S+" 'in parentheses combined with following word
    '.Pattern = "\S+(?:(?:\s+\([^)]+\))+)?" 'in_parentheses combined with preceding word

    If .test(inputString) = True Then
        Set MC = .Execute(inputString)
        ReDim strArray(1 To Int(MC.Count / numWords + 1))

        J = 1
        For I = 1 To UBound(strArray)
            For J = J To numWords * I
                If J > MC.Count Then Exit For
                strArray(I) = strArray(I) & " " & MC(J - 1)
            Next J
            strArray(I) = Mid(strArray(I), 2)
        Next I
    End If
End With

splitOnNth = strArray

End Function

enter image description here

And here is an explanation of the Regex:

Split words except keep parenthesized words with following word

(?:(?:\([^)]+\)\s+)+)?\S+

Created with RegexBuddy