I have data and I need to split each block so as to store each block in separate row. The entire text looks like:
م
مطروح
الحمام
school
الصف
:
الصف الأول
1
458316219
30709101600371
ابراهيم وليد ابراهيم ابوالحمد
منافذ فورى
2
458361688
30702263300318
احمد ابوالريش فرج عبدالله
منافذ فورى
3
458312720
30703143300418
اسلام فتحى محمد ناجى
منافذ فورى
4
458790904
30606101802299
اسلام نصار حسين نصار حسين عبد الونيس
منافذ فورى
5
458312908
30612013300259
ايمن راضى صالح سلومه
منافذ فورى
6
458884564
30802203300186
بسمه محمد ابراهيم ظدم
منافذ فورى
7
477625786
30708263300235
بشار نصر الله مصوف السايب
منافذ فورى
I used https://regex101.com/ and I could define the start of each block like that
\d{1,3}\n
This highlights the start of each block
How can I split and separate each block >> and each block has to be in one row?
Here's the HTML for the whole page: https://pastebin.com/nu0dLvch
Here's a link of the full data: https://pastebin.com/dWcu97Wt
I would highlight the needed parts(these are the groups to match). Starting with...
ending with...
There are 22 blocks of data (groups) in total.
Looking at the regex provided by @Wiktor Stribiżew in comments: https://regex101.com/r/dmCNuH/1
match 11 is the first real needed data (match group) though truncates the final line.
After the amazing pattern I got it from Wiktor, I tried to get all the matches
Sub Test()
Dim a(), s As String, i As Long, j As Long
Dim bot As New ChromeDriver
With bot
.AddArgument "--headless"
.Get "file:///C:\Sample.html"
s = .FindElementByCss("table[id='all']").Text
End With
a = GetMatches(s, "^\s*\d{1,3}(?:(?:\r\n|[\r\n])(?!\s*\d{1,3}\n).*)+")
For i = LBound(a) To UBound(a)
Debug.Print a(i)
Next i
End Sub
Function GetMatches(ByVal inputString As String, ByVal sPattern As String) As Variant
Dim arrMatches(), matches As Object, iMatch As Object, s As String, i As Long
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = sPattern
If .Test(inputString) Then
Set matches = .Execute(inputString)
ReDim arrMatches(0 To matches.Count - 1)
For Each iMatch In matches
arrMatches(i) = iMatch.SubMatches.Item(0)
i = i + 1
Next iMatch
Else
ReDim arrMatches(0)
arrMatches(0) = vbNullString
End If
End With
GetMatches = arrMatches
End Function
But this doesn't work for me and throws an error.
^\s*\d{1,3}(?:(?:\r\n|[\r\n])(?!\s*\d{1,3}\n).*)*
withregExp.Multiline = True
. – Wiktor Stribiżew^\s*\d{1,3}(?:\n(?!\s*\d{1,3}\n|\d{4}/\d{2}/\d{2}\n).*)+
. If you have line breaks inside a cell in Excel, those are probably CRs, so you need\r
instead of\n
. – Wiktor Stribiżew