1
votes

I hope you can help me with this issue.

I have an Excel file with 146,459 rows and I need to delete blank cells to unify my data. Here is an image of what I mean:

Click here to see the image

When I select all blanks, my laptop takes around 2 minutes, but then when I try to delete the cells from one or more columns and shift up, Excel freezes and nothing happen. I already left my laptop for over 1 hours like that and I didn't have any results.

Do you know if there's a way to do it or if any alternatives can be implemented?

Thanks in advance!

2
Is that image an excellent and accurate representation of your data structure or just a made-up approximation?user4039065
@Jeeped can an advanced filter remove blanks (with unique)?urdearboy
In other words, in 146,459 rows is every field populated but with blank cells/rows offsetting the data?user4039065
@urdearboy - I'd have to play with it; something I'm unwilling to do since I'd have to retype the data and the op is not answering inquiries.user4039065

2 Answers

2
votes

Looping through cells takes a very long time, even with the Union optimisation. The code below was tested on an imitated data set, 5 columns x 200,000 records, and finished in 5.5 seconds.

Setup: Let's say your source data is in a range "A1:E200000" on a sheet named "Source", and you want clean data in a similar range on a sheet named "Target".

Code:

Option Explicit

Sub Remove_Empty_Cells()
Dim Source        As Range
Dim Target        As Range
Dim i             As Integer

    Set Source = ThisWorkbook.Sheets("Source").Range("A1:E200000")
    Set Target = ThisWorkbook.Sheets("Target").Range("A1:E200000")

    For i = 1 To Source.Columns.Count
        Clean_Column Source.Columns(i), Target.Columns(i)
    Next i

End Sub


Sub Clean_Column(Source As Range, Target As Range)
Dim rs           As Object
Dim XML          As Object

    Set XML = CreateObject("MSXML2.DOMDocument")
    XML.LoadXML Source.Value(xlRangeValueMSPersistXML)

    Set rs = CreateObject("ADODB.Recordset")
    rs.Open XML

    rs.Filter = rs.Fields(0).Name & "<>null"
    Target.CopyFromRecordset rs

End Sub

How it works: Sub Remove_Empty_Cells loops though the source range by columns, and calls sub "Clean_Column" that removes empty cells from the provided column.

Clean_Column loads all column cells into an ADO recordset using MSXML2.DOMDocument object. The recordset is then filtered for non-empty rows, and the result is copied to the target column. All these operations are very fast in VBA.

Ideally, I would love to load the entire range into a recordset at once, but unfortunately VBA function CopyFromRecordset does not alow to paste recordset field by field. So we have to load the data column by column (if somebody knows a more optimal way, I'd love to see it).

A couple of caveats:

  1. For some reason (?), the first column copies without a header, while all consecutive columns copy with their headers. The first column then must have it's header inserted (either manually or with VBA);
  2. I assume that the number of non-empty cells in each column is the same, otherwise the cleaned records won't lineup (if that's the case, you have a much bigger problem).

[EDIT]: An alternative solution, implemented using arrays. The same data set 5x 200,000 with 40,000 valid records is cleaned in less than 1 second. It can be further optimized, I just prototyped a quick demo.

Sub Remove_Empty_Cells()
Dim Source_Data()   As Variant
Dim Clean_Data()    As Variant
Dim Source_Range    As Range
Dim Target_Range    As Range
Dim Column_Count    As Long
Dim Row_Count       As Long
Dim i               As Long
Dim j               As Long
Dim k               As Long

    Set Source_Range = ThisWorkbook.Sheets("Source").Range("A1:E200000")

    Column_Count = Source_Range.Columns.Count
    Row_Count = Source_Range.Rows.Count

    ReDim Source_Data (1 To Row_Count, 1 To Column_Count)
    ReDim Clean_Data (1 To Row_Count, 1 To Column_Count)

    Source_Data = Source_Range

    For j = 1 To Column_Count
        k = 1
        For i = 1 To Row_Count
            If Source_Data(i, j) <> "" Then
                Clean_Data(k, j) = Source_Data(i, j)
                k = k + 1
            End If
        Next i
    Next j

    Set Target_Range = ThisWorkbook.Sheets("Target").Range("A1").Resize(Row_Count, Column_Count)
    Target_Range = Clean_Data

End Sub
2
votes

Working with arrays is either one of the fastest or the fastest method of dealing with large ranges of cells.

Start with:

enter image description here

Run code:

Option Explicit

Sub delBlanks()
    Dim i As Long, j As Long, k As Long, arr As Variant, vals As Variant
    Dim s As Double, e As Double, c As Long

    s = Timer

    With Worksheets("sheet6")
        If .AutoFilterMode Then .AutoFilterMode = False

        'data validity check
        c = Application.CountA(.Columns(1))
        For j = 2 To 5
            If c <> Application.CountA(.Columns(j)) Then Exit For
        Next j
        If j <= 5 Then
            Debug.Print "GIGO, waste of time to continue"
            Exit Sub
        End If

        'collect offset values
        vals = .Range(.Cells(2, "A"), .Cells(.Rows.Count, "E").End(xlUp)).Value2
        ReDim arr(LBound(vals, 1) To UBound(vals, 1), _
                  LBound(vals, 2) To UBound(vals, 2))

        'loop through array coolating A"E to a single row
        i = LBound(vals, 1)
        k = LBound(arr, 1)
        Do
            For j = LBound(vals, 2) To UBound(vals, 2)
                Do While vals(i, j) = vbNullString: i = i + 1: Loop
                arr(k, j) = vals(i, j)
            Next j
            i = i + 1: k = k + 1
        Loop Until i > UBound(vals, 1)

        'put data back on worksheet
        .Cells(2, "A").Resize(UBound(arr, 1), UBound(arr, 2)) = arr
        .Cells(2, "C").Resize(UBound(arr, 1), 1).NumberFormat = "dd/mm/yyyy"
    End With

    e = Timer

    Debug.Print c - 1 & " records in " & UBound(vals, 1) & _
                " rows collated in " & Format((e - s), "0.000") & " seconds"
End Sub

Results:

enter image description here

30000 records in 157500 rows collated in 0.984 seconds

Seeded data:

The following was used to replicate the OP 'sample-data-in-an-image'.

Sub fillBlanks()
    Dim i As Long, j As Long, k As Long, arr As Variant, vals As Variant

    vals = Array("to: ""someone"" <[email protected]", "from: ""no one"" <[email protected]", _
                 Date, "\i\m\p\o\r\t\a\n\c\e\: 0", "subject: something nothing")

    ReDim arr(1 To 6, 1 To 5)

    With Worksheets("sheet6")
        .Cells(1, 1).CurrentRegion.Offset(1, 0).Clear
        For k = 1 To 30000
            j = 0
            For i = LBound(arr, 2) To UBound(arr, 2)
                If i = 2 And Not CBool(k Mod 4) Then j = j + 1
                If i = 4 Then
                    arr(i + j, i) = Format(k, vals(i - 1))
                Else
                    arr(i + j, i) = vals(i - 1)
                End If
            Next i
            .Cells(.Rows.Count, 5).End(xlUp).Offset(1, -4).Resize(UBound(arr, 1), UBound(arr, 2)) = arr
            ReDim arr(1 To 6, 1 To 5)
        Next k
    End With
End Sub