0
votes

There is a document like this one. I process 20 documents like this every day and they all look the same (structure, I mean, is very consistent).

enter image description here

The goal of this macro is to extract ONLY THE FIRST match of the RegEx pattern from the .ActiveDocument.Content. In the whole doc there is many more matches, but I need only the first one. The document being processed will be manually opened before the macro would run.

I'm just a VBA beginner so if there is a possibility to write it without using arrays, collections or some dictionaries I'd much appreciate. There is just one item to extract, so it's best to load it inside repNmbr string variable and from there just ws.Range("G30").Value = repNmbr. The simpler the better.

I used these resources Excel Regex Tutorial (Regular Expressions) which is very helpful but I still don't know how to load the FIRST MATCH alone into my repNmbr string variable. I'd like to do this without using any loop, because I just want to load a single string into this repNmbr variable.

Currently I have code like this:

Sub ExtractRepertor03()
    'Application.ScreenUpdating = False
    Dim WordApp As Word.Application
    Dim WordDoc As Word.Document
    Dim ExcelApp As Excel.Application
    Dim rng As Word.Range
    Dim ws As Worksheet
    Dim regEx As Object
    Dim matches As MatchCollection
    Dim match As String
    Dim repNmbr As String

    'Assigning object variables
    Set WordApp = GetObject(, "Word.Application")      'ActiveX can't create object is when
    Set ExcelApp = GetObject(, "Excel.Application")    'there is no Word document open;
    Set regEx = CreateObject("VBScript.RegExp")
    Set WordDoc = WordApp.ActiveDocument
    Set rng = WordApp.ActiveDocument.Content

    'Create the regular expression object
    regEx.Global = False    'because I need only the first match instead of all occurences;
    regEx.IgnoreCase = True
    regEx.Pattern = "([0-9]{1,5})([ ]{0,4})([/])([0-9]{4})"
    'regEx.Pattern = "([0-9]{1,5})([\s]{0,4})(/[0-9]{4})"

    repNmbr = regEx.Execute(rng.text)      'here is something wrong but I don't know what;
                            'I'm trying to assign the first RegEx match to repNmbr variable;
    Debug.Print repNmbr
    repNmbr = Replace(repNmbr, " ", "") 
'    Set matches = regEx.Execute(rng.text)
'    Debug.Print regEx.Test(rng)
'    'Debug.Print regEx.Value
'        For Each match In matches    'I just want this macro run without the loop
'           Debug.Print match.Value   'Result: 9042 /2019
'           repNmbr = match.Value
'        Next match

    ExcelApp.Application.Visible = True
    ws.Range("G30").Value = repNmbr
End Sub

And an error like this:

enter image description here

Can someone explain to me why Set matches = regEx.Execute(rng.text) works fine but repNmbr = regEx.Execute(rng.text) returns the error: "Wrong number of arguments or invalid property assignment"??

After regEx.Global = False is set, the RegEx finds only a single value, so why VBA refuses to assign this string into the repNmbr string variable??

1
Try using of Set repNmbr = regEx.Execute(rng.text) but after declaring it in this way: Dim repNmbr As MatchCollection. The first occurrence would be repNmbr(1)...FaneDuru
For your luck, I am Polish, and I warn you: this contains very sensitive personal data and you could be prosecuted for publishing it like that. Try figuring out just a sample for presenting what is your problem.Michał Turczyn
First: do not try to use RegEx to extract information from Word. Use Word's built-in Find functionality. To get the basic syntax, first experiment with Find as an end-user (Ctrl+H, then select the Find tab, click "More" to see all possible options). Take a close look at "wildcards", which is Word's built-in "reg ex". If you have trouble working out the Find term, ask in an end-user venue such as Super User. Then record a macro to get the basic syntax needed for the VBA code. To get the first match, set the Forward property to True (it's the default).Cindy Meister
It must be dimed as As MatchCollection because this variable type is returned by the function in discussion. Even if there is only one occurrence. Or you can use it in the second way ai recommended: repNmbr = regEx.Execute(rng.text) (0). Which means the first element of the returned collection... And if you address to someone here, it is recommended to use @FaneDuru (in my case). You can see our comments because you are the one asking the question...FaneDuru
I wanted to refer at notifications receiving...FaneDuru

1 Answers

1
votes

As I said in your other question, you don't need the RegEx library for this. Stick to Word's wildcards! Try:

Sub Demo()
Application.ScreenUpdating = False
Dim WordApp As Word.Application
Set WordApp = GetObject(, "Word.Application")
With WordApp.ActiveDocument.Range
  With .Find
    .Text = "<[0-9 ]{1,7}/[0-9]{4}>"
    .MatchWildcards = True
    .Wrap = wdFindStop
    .Forward = True
    .Execute
  End With
  If .Find.Found = True Then ActiveSheet.Range("G30").Value = Replace(.Text, " ", "")
End With
Application.ScreenUpdating = True
End Sub

Note: I haven't bothered with any of:

Dim ExcelApp As Excel.Application
Dim rng As Word.Range
Dim ws As Worksheet
Dim regEx As Object
Dim matches As MatchCollection
Dim match As String
Dim repNmbr As String

as it's all superfluous - even your own code never assigns anything to ws.