0
votes

Overarching Question:how do I access entire text of an RTF file

Ok, so I have a bit of a problem here, I'm hoping what I want isn't totally crazy, but here it goes.

I work with cars, and at the end of every day we compile an RTF of the vehicles we found with damages and send them off to someone. We must also keep an Excel file with these VIN numbers and the corresponding damages. I've done some work on the VIN Log part, using VBA to format certain values in certain ways. The RTF file is akin to the below formatting (these are not real VIN, but match the regex for them)

1FTEX8EEG12356789 //Other random Information I do not need
    004121 2
    012051 3
    005091
1FTFW7D78KF123567 //Other Random Information I do not need
    042071
    010341 4
    010341 9
//ETC

Here's my question: I've figured out how to open the RTF file, but how do I gain access to the entirety of the document text all at once, not just paragraph by paragraph, and does the RegExp object have available a way to capture the offset the string was found at?

The reason I am trying to use RegEx is because there's this header, which takes up roughly 10 "Paragraphs" worth of space every page (these documents can be 1 page or sometimes 10 or more). If anyone could point me into a quicker way to accomplish this, I would appreciate it.

What I was thinking I would end up having to do, once I figure out how to RegEx search the whole document was this

  1. Gather all RegExp matches for ([A-Z0-9]{17})
  2. Use the matches from Step 1 to find out the location in the document via InStr
  3. Use the value from Step 2 to loop through each set of matches from step 1, and indexes from step 2 in order to form something akin to the below code.

Code:

For i=1 To RegMatches.Count 
  start_pos = InStr(WordDocumentText,RegMatches.Item(i))
  For j=start_pos To InStr(WordDocumentText,RegMatches.Item(i+1))
    //Code to gather damages on VIN 'i'
  Next
Next

But these seems... redundant and just a sort of messy way to do so.

All I would really need to know are how to get access to the entirety of the text in the RTF file I am opening with VBA, and I can kind of go from there, but if anyone has an better idea on how to go from here with this, I'd appreciate it.

1
You rather do it in wrong way- you should rather keep the information in Excel file and then export it to RTF. Have you tried with importing your document to excel file and processing next?Kazimierz Jawor
The program we use does allow us to export it to CSV, however the problem I run into there is how the damages are stored - Assume there's more than 7 or so damages on a vehicle (it's rare but it does happen), then the row that has the VIN information I use will just add columns in some weird pattern. I would then need to use figure out the last column in every row and then have to calculate backwards where each damage code would end, as the last column would be akin to the 2 in the first damage on the first "VIN" I provided or the 1 in the first damage, second "VIN".Jhecht
in my opinion it would be much better to search for information in CSV file instead of RTF file. For CSV you could use some simple VBA functions like Open, Read, Like, InStr, Split and others. For RTF I would suggest Excel-Word-Interop and then using Word.Find object to get what you need. However, it's a big much harder work. So, how does your CSV file looks like?Kazimierz Jawor
The first couple of columns are named or headered, so I could find the first listed damage on a vehicle, but the problem is that each damaged after that original one is just appended 3 or so columns over from the first damage. So if the first damage is located in columns W,X,Y,Z, the second would be in columns AA,AB,AC,AD, the third AH,AI,AJ,AK, the fourth would start 3 from there, and then a curveball to this all would be if a damage doesn't have a grid number (the last number) then only 3 of the columns would have data so I would have to read backwards and it just seems convoluted to me.Jhecht

1 Answers

1
votes

I like to use MSWord behind the scenes to read an RTF file into Excel. Here is how to get access to the entire text of an RTF document.

Sub readRTF()
    Dim wrdApp As Word.Application
    Dim wrdDoc As Word.Document
    Dim FileName As String
    Dim strFolder As String
    Dim strInput As String

    strFolder = Application.ActiveWorkbook.Path & "\"
    FileName = "VINreport.rtf"

    'open a Word instance
    Set wrdApp = CreateObject("Word.Application")
    wrdApp.Visible = False

    Set wrdDoc = wrdApp.Documents.Open(strFolder & FileName)

    'Read RTF file text into variable
    strInput = wrdDoc.Range.Text

    'Print All Text into Immediate Window
    Debug.Print strInput

    'Clean Up
    wrdDoc.Close 0
    Set wrdDoc = Nothing

    wrdApp.Quit
    Set wrdApp = Nothing
End Sub

My example RTF file was located in the same folder as the excel file and was a straight cut & paste from your example code above.

Results:

enter image description here


Now you can run whatever Regex you need against the text in strInput. If you need help with the Regex part, check out this link for some useful tips using Regex with Excel.