11
votes

I have about 50 or so Excel workbooks that I need to pull data from. I need to take data from specific cells, specific worksheets and compile into one dataset (preferably into another excel workbook).

I am looking for some VBA so that I can compile the results into the workbook I am using to run the code.

So, one of the xls or xlsx files I need to pull the data from, worksheet("DataSource"), I need to evaluate cell(D4), and if its not null, then pull data from cell(F4), and put into a new row into the compiled data set. Looping through all the Excel files in that folder as mentioned above.

And if possible, I would like the first data field in the first column the name of the file the data is being pulled from in the resulting dataset.

Can someone help me with this? I am looking for VBA because I am more familiar with that, but also interested in VBScript (as I am trying to get into that and learn the differences).

5
Sounds like you're most of the way there with the planning. Do you have specific questions on syntax or methods?jonsca
yeah...just basically the syntax used to accomplish this overall. How would I loop through all excel files (xls or xlsx) files within a specified folder?Justin
also how would I specify new row for data being pulled from 'the next' excel file?Justin
What version of Excel does it need to work in? For example the Application.FileSearch approach won't work in 2007/10Tim Williams

5 Answers

11
votes

First start with this google query and click the first link that comes up, which takes you to an article showing how to iterate through a group of Excel files in a folder.

Sub RunCodeOnAllXLSFiles()
Dim lCount As Long
Dim wbResults As Workbook
Dim wbCodeBook As Workbook


Application.ScreenUpdating = False
Application.DisplayAlerts = False
Application.EnableEvents = False

On Error Resume Next
    Set wbCodeBook = ThisWorkbook
        With Application.FileSearch
            .NewSearch
            'Change path to suit
            .LookIn = "C:\MyDocuments\TestResults"
            .FileType = msoFileTypeExcelWorkbooks
            'Optional filter with wildcard
            '.Filename = "Book*.xls"
                If .Execute > 0 Then 'Workbooks in folder
                    For lCount = 1 To .FoundFiles.Count 'Loop through all
                        'Open Workbook x and Set a Workbook variable to it
                        Set wbResults = Workbooks.Open(Filename:=.FoundFiles(lCount), UpdateLinks:=0)

                        'DO YOUR CODE HERE

                        wbResults.Close SaveChanges:=False
                    Next lCount
                End If
        End With
On Error GoTo 0
Application.ScreenUpdating = True
Application.DisplayAlerts = True
Application.EnableEvents = True
End Sub

To get the name of the workbook, you'll want to adapt the code at "DO YOUR CODE HERE" to include wbResults.Name. If it's the filename you want, use wbResults.FullName, which returns the name of the workbook including its path on disk as a string.

A search for a VBScript variation on the same thing yields a number of results that are useful, including this script:

strPath = "C:\PATH_TO_YOUR_FOLDER"

Set objExcel = CreateObject("Excel.Application")
objExcel.Visible = True
objExcel.DisplayAlerts = False

Set objFso = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFso.GetFolder (strPath)

For Each objFile In objFolder.Files

If objFso.GetExtensionName (objFile.Path) = "xls" Then
   Set objWorkbook = objExcel.Workbooks.Open(objFile.Path)
   ' Include your code to work with the Excel object here
   objWorkbook.Close True 'Save changes
End If

Next

objExcel.Quit
5
votes

I would do it in VBScript or even, VB.NET or Powershell if you feel so inclined.

Using VB.NET, you can access Excel spreadsheets as if they were databases, via the OLEDB provider. The code to select a range of values might look like this :

 Try
        Dim MyConnection As System.Data.OleDb.OleDbConnection
        Dim DtSet As System.Data.DataSet
        Dim MyCommand As System.Data.OleDb.OleDbDataAdapter
        MyConnection = New System.Data.OleDb.OleDbConnection _
        ("provider=Microsoft.Jet.OLEDB.4.0;"  _
        " Data Source='testfile.xls'; " _
         "Extended Properties=Excel 8.0;")
        MyCommand = New System.Data.OleDb.OleDbDataAdapter _
            ("select * from [Sheet1$]", MyConnection)
        MyCommand.TableMappings.Add("Table", "TestTable")
        DtSet = New System.Data.DataSet
        MyCommand.Fill(DtSet)
        MyConnection.Close()
    Catch ex As Exception
        MsgBox(ex.ToString)
    End Try

Once you get the data you can elaborate on it, then insert the result into another Excel spreadsheet, using the same API.

Getting the list of files is easy in .NET with a call to System.IO.Directory.GetFiles(); just specify the "*.xls" wildcard. Once you have the list, just use a for loop to iterate through it, opening each file in turn, then doing the query on that file, and so on.

If you use VBScript, then the preferred way to get the list of Excel files is to use the Scripting.FileSystemObject, specifically the GetFolder method. It works basically the same way but the syntax is slightly different.


If it's VBScript or VB.NET it will probably run outside of Excel itself. You'd run it by double-clicking or from a batch file or something like that. The advantage to using VB.NET is you could put up a graphical form for interaction - it could show a progress bar, tracking how many files you've gone through, status updates, that kind of thing.

1
votes

Whenever you are accessing that many Excel files in succession, you can generally get better performance using ADODB rather than Excel's automation object.

0
votes

I agree with using that accessing the Excel object is not the quickest and if the workbooks and sheets that you're trying to retrieve data from are all consistent (i.e have the same column names, etc... or at least the column names you're looking for) it would be better to use ODBC. This does have some issues and if you can't get around them or need to actually do something more complex based on the contents then there may be no way around it. If that's the case then I would suggest creating one Excel object and then opening and closing the files as needed to try to increase the efficiency.

0
votes

It could be done with the following code

Sub LoopThroughFiles()

Dim StrFile As String
StrFile = Dir("V:\XX\XXX\*.xlsx")
 Do While Len(StrFile) > 0
    Debug.Print StrFile
       Set wbResults = Workbooks.Open("V:\XX\XXX\" & StrFile)   

                    'DO YOUR CODE HERE


       wbResults.Close SaveChanges:=True
    StrFile = Dir
 Loop
End Sub