I am trying to find a way to filter large data and remove rows in a worksheet, in less than one minute
The goal:
- Find all records containing specific text in column 1, and delete the entire row
- Keep all cell formatting (colors, font, borders, column widths) and formulas as they are
.
Test Data:
:
.
How the code works:
- It starts by turning all Excel features Off
If the workbook is not empty and the text value to be removed exists in column 1
- Copies the used range of column 1 to an array
- Iterates over every value in array backwards
When it finds a match:
- Appends the cell address to a tmp string in the format
"A11,A275,A3900,..."
- If the tmp variable length is close to 255 characters
- Deletes rows using
.Range("A11,A275,A3900,...").EntireRow.Delete Shift:=xlUp
- Resets tmp to empty and moves on to the next set of rows
- Appends the cell address to a tmp string in the format
- At the end, it turns all Excel features back On
.
The main issue is the Delete operation, and total duration time should be under one minute. Any code-based solution is acceptable as long as it performs under 1 minute.
This narrows the scope to very few acceptable answers. The answers already provided are also very short and easy to implement. One performs the operation in about 30 seconds, so there is at least one answer that provides an acceptable solution, and other may find it useful as well
.
My main initial function:
Sub DeleteRowsWithValuesStrings()
Const MAX_SZ As Byte = 240
Dim i As Long, j As Long, t As Double, ws As Worksheet
Dim memArr As Variant, max As Long, tmp As String
Set ws = Worksheets(1)
max = GetMaxCell(ws.UsedRange).Row
FastWB True: t = Timer
With ws
If max > 1 Then
If IndexOfValInRowOrCol("Test String", , ws.UsedRange) > 0 Then
memArr = .Range(.Cells(1, 1), .Cells(max, 1)).Value2
For i = max To 1 Step -1
If memArr(i, 1) = "Test String" Then
tmp = tmp & "A" & i & ","
If Len(tmp) > MAX_SZ Then
.Range(Left(tmp, Len(tmp) - 1)).EntireRow.Delete Shift:=xlUp
tmp = vbNullString
End If
End If
Next
If Len(tmp) > 0 Then
.Range(Left(tmp, Len(tmp) - 1)).EntireRow.Delete Shift:=xlUp
End If
.Calculate
End If
End If
End With
FastWB False: InputBox "Duration: ", "Duration", Timer - t
End Sub
Helper functions (turn Excel features off and on):
Public Sub FastWB(Optional ByVal opt As Boolean = True)
With Application
.Calculation = IIf(opt, xlCalculationManual, xlCalculationAutomatic)
.DisplayAlerts = Not opt
.DisplayStatusBar = Not opt
.EnableAnimations = Not opt
.EnableEvents = Not opt
.ScreenUpdating = Not opt
End With
FastWS , opt
End Sub
Public Sub FastWS(Optional ByVal ws As Worksheet = Nothing, _
Optional ByVal opt As Boolean = True)
If ws Is Nothing Then
For Each ws In Application.ActiveWorkbook.Sheets
EnableWS ws, opt
Next
Else
EnableWS ws, opt
End If
End Sub
Private Sub EnableWS(ByVal ws As Worksheet, ByVal opt As Boolean)
With ws
.DisplayPageBreaks = False
.EnableCalculation = Not opt
.EnableFormatConditionsCalculation = Not opt
.EnablePivotTable = Not opt
End With
End Sub
Finds last cell with data (thanks @ZygD - now I tested it in several scenarios):
Public Function GetMaxCell(Optional ByRef rng As Range = Nothing) As Range
'Returns the last cell containing a value, or A1 if Worksheet is empty
Const NONEMPTY As String = "*"
Dim lRow As Range, lCol As Range
If rng Is Nothing Then Set rng = Application.ActiveWorkbook.ActiveSheet.UsedRange
If WorksheetFunction.CountA(rng) = 0 Then
Set GetMaxCell = rng.Parent.Cells(1, 1)
Else
With rng
Set lRow = .Cells.Find(What:=NONEMPTY, LookIn:=xlFormulas, _
After:=.Cells(1, 1), _
SearchDirection:=xlPrevious, _
SearchOrder:=xlByRows)
If Not lRow Is Nothing Then
Set lCol = .Cells.Find(What:=NONEMPTY, LookIn:=xlFormulas, _
After:=.Cells(1, 1), _
SearchDirection:=xlPrevious, _
SearchOrder:=xlByColumns)
Set GetMaxCell = .Parent.Cells(lRow.Row, lCol.Column)
End If
End With
End If
End Function
Returns the index of a match in the array, or 0 if a match is not found:
Public Function IndexOfValInRowOrCol( _
ByVal searchVal As String, _
Optional ByRef ws As Worksheet = Nothing, _
Optional ByRef rng As Range = Nothing, _
Optional ByRef vertical As Boolean = True, _
Optional ByRef rowOrColNum As Long = 1 _
) As Long
'Returns position in Row or Column, or 0 if no matches found
Dim usedRng As Range, result As Variant, searchRow As Long, searchCol As Long
result = CVErr(9999) '- generate custom error
Set usedRng = GetUsedRng(ws, rng)
If Not usedRng Is Nothing Then
If rowOrColNum < 1 Then rowOrColNum = 1
With Application
If vertical Then
result = .Match(searchVal, rng.Columns(rowOrColNum), 0)
Else
result = .Match(searchVal, rng.Rows(rowOrColNum), 0)
End If
End With
End If
If IsError(result) Then IndexOfValInRowOrCol = 0 Else IndexOfValInRowOrCol = result
End Function
.
Update:
Tested 6 solutions (3 tests each): Excel Hero's solution is the fastest so far (removes formulas)
.
Here are the results, fastest to the slowest:
.
Test 1. Total of 100,000 records, 10,000 to be deleted:
1. ExcelHero() - 1.5 seconds
2. DeleteRowsWithValuesNewSheet() - 2.4 seconds
3. DeleteRowsWithValuesStrings() - 2.45 minutes
4. DeleteRowsWithValuesArray() - 2.45 minutes
5. QuickAndEasy() - 3.25 minutes
6. DeleteRowsWithValuesUnion() - Stopped after 5 minutes
.
Test 2. Total of 1 million records, 100,000 to be deleted:
1. ExcelHero() - 16 seconds (average)
2. DeleteRowsWithValuesNewSheet() - 33 seconds (average)
3. DeleteRowsWithValuesStrings() - 4 hrs 38 min (16701.375 sec)
4. DeleteRowsWithValuesArray() - 4 hrs 37 min (16626.3051757813 sec)
5. QuickAndEasy() - 5 hrs 40 min (20434.2104492188 sec)
6. DeleteRowsWithValuesUnion() - N/A
.
Notes:
- ExcelHero method: easy to implement, reliable, extremely fast, but removes formulas
- NewSheet method: easy to implement, reliable, and meets the target
- Strings method: more effort to implement, reliable, but doesn't meet requirement
- Array method: similar to Strings, but ReDims an array (faster version of Union)
- QuickAndEasy: easy to implement (short, reliable and elegant), but doesn't meet requirement
- Range Union: implementation complexity similar to 2 and 3, but too slow
I also made the test data more realistic by introducing unusual values:
- empty cells, ranges, rows, and columns
- special characters, like =[`~!@#$%^&*()_-+{}[]\|;:'",.<>/?, separate and multiple combinations
- blank spaces, tabs, empty formulas, border, font, and other cell formatting
- large and small numbers with decimals (=12.9999999999999 + 0.00000000000000001)
- hyperlinks, conditional formatting rules
- empty formatting inside and outside data ranges
- anything else that might cause data issues
GetMaxCell
function. One thing to mention - you should remove the dot:Set GetMaxCell = .Cells(lRow.row, lCol.Column)
should becomeSet GetMaxCell = Cells(lRow.row, lCol.Column)
, because you would have inaccurate result if e.g. the whole columnA:A
was empty. However, now I see, that the wholeGetMaxCell
function is useless here? It's not related todeleteRowsWithValuesStrings
! – ZygDGetMaxCell
in another project, you should have the version without dot. In my test sheet I had data ranging from B2 to E4 (A:A was empty, 1:1 was empty). The result of the function with the dot was the cell F5, which is obviously incorrect. After removing the dot the result was correct - E4. – ZygDSet
beforews =
andrng =
. – ZygD