1
votes

I have 18000 rows and 26 columns.

Sample data:

A(Name)     B(Mat_Num)  C(Items) D(group)   E(Summon)   F(Plant) G(Batch_num)
1.Ram       1235         HA1      Micro      545.5      1327      893A1
2.ram       12354        rt2      Senf       5678       0001      1063F
3.Joseph    12354        cf1      Macro      9844       0001      1063F
4.andreas   12354        dw1      HR         6633.95    0001      1063F
5.John      1235         ff1      Finance    22555.09   1327      893A1
6.Russel     987         ad1      Sales      6423       0001      jjg67
7.Holger      00         dd1      purchase   3333       1327      dd567
8.Gottfried   234        fa1      rot        663        345       45678

I have to find duplicate rows based on columns (B, F, G). If the rows of these three columns are the same then sum the value of cells of column E to one row and delete duplicate rows to keep only one of the rows.

Result:

 A(Name)     B(Mat_Num)  C(Items) D(group)   E(Summon)   F(Plant) G(Batch_num)
1.Ram       1235         HA1      Micro      23101      1327      893A1
2.ram       12354        rt2      Senf       22155.95   0001      1063F

I have gone through some websites and blogs to come up with code posted below.

Sub Sample()
    Dim LastRowcheck As Long, n1 As Long
    Dim DelRange As Range

    With Worksheets("Sheet1")
        LastRowcheck = .Range("A" & .Rows.Count).End(xlUp).Row

        For n1 = 1 To LastRowcheck
            If .Cells(n1, 1).Value = Cells(n1 + 1, 1).Value Then
                If DelRange Is Nothing Then
                    Set DelRange = .Rows(n1)
                Else
                    Set DelRange = Union(DelRange, .Rows(n1))
                End If
            End If
        Next n1

        If Not DelRange Is Nothing Then DelRange.Delete
    End With
End Sub
3

3 Answers

1
votes

This should quickly take care of it. As quickly as 18K rows of data can be processed to an aggregate sum.

Sub Sum_and_Dedupe()
    With Worksheets("sheet1")
        'deal with the block of data radiating out from A1
        With .Cells(1, 1).CurrentRegion
            'step off the header and make one column wider
            With .Resize(.Rows.Count - 1, .Columns.Count + 1).Offset(1, 0)
                .Columns(.Columns.Count).Formula = "=sumifs(e:e, b:b, b2, f:f, f2, g:g, g2)"
                .Columns(5) = .Columns(.Columns.Count).Value
                .Columns(.Columns.Count).Delete
            End With

            'remove duplicates
            .RemoveDuplicates Columns:=Array(2, 6, 7), Header:=xlYes
        End With
        .UsedRange
    End With
End Sub

This took ~18 seconds for 18K rows of random data. Your own results will vary according to hardware and software but that should be the ballpark.

        sum_and_dedupe_before
                Sample data before Sum_and_Dedupe()

        sum_and_dedupe_after
                Sample data after Sum_and_Dedupe()

1
votes

here's "ballpark" #2

Sub main()
Dim helperRng As Range

With Worksheets("Sheet01")
    With .UsedRange
        Set helperRng = .Offset(, .Columns.Count + 1).Resize(, 1)
        With helperRng
            .FormulaR1C1 = "=concatenate(RC2, RC6, RC7)"
            .Offset(, 1).FormulaR1C1 = "=if(countif(R1C[-1]:RC[-1], RC[-1])=1,1,"""")"
            With .Offset(, 2)
                .FormulaR1C1 = "=sumif(C[-2], RC[-2],C5)"
                .Value = .Value
            End With
            .Offset(, 1).SpecialCells(xlCellTypeFormulas, xlTextValues).EntireRow.Delete
            Worksheets("Sheet01").Columns(5).Resize(.Rows.Count - 1).Offset(1).Value = .Offset(1, 2).Resize(.Rows.Count - 1).Value
            helperRng.Resize(, 3).Clear
        End With
    End With
End With

End Sub

only curious which is faster!

0
votes

This could be done in a 18 milliseconds (slight exaggeration) using arrays and dictionary object. I simplify the function by knowing my value to sum is in column 4. You can adjust code for multiple values in other columns. I'm writing from 1 array to another (InAy to OutAy), the dictionary determines if row already existed. The magic happens in the dictionary's Item property. I assign the item property value to the row (r) when a new OutAy row is written. Then when it exists already, I retrieve the row (r) where it was written to OutAy using the item property value: d.item(KeyIn) I can then update that value in OutAy(r, c) with a sum of existing value and new value 'KeyVal'.

This solves the same as sql query aggregate: "Select a, b, c, sum(d) from data group by a, b, c"

Note: add a tools->reference to Microsoft Scripting runtime

    sub some()
     ...
     data = Range("WhereYourDataIs") 'create data array
     Range("WhereYourDataIs").clear 'assumes you'll output to same location
     data = RemoveDupes(data) 'removedupes and sum values
     Range("A2").Resize(UBound(data), UBound(data, 2)) = data 'A2 assumes your data headers begin in row 1, column 1
     ...
    End Sub

Function RemoveDupes(InAy As Variant) As Variant
    Dim d As Scripting.Dictionary
    Set d = New Scripting.Dictionary
    ReDim OutAy(1 To UBound(InAy), 1 To 4)
    r = 1

    For i = 1 To UBound(InAy)
        KeyIn = ""
        KeyVal = InAy(i, 4) 'the value field to sum/aggregate if exists
        For c = 1 To 3 'a, b, c metadata to roll up
            KeyIn = KeyIn & InAy(i, c)
        Next c
        If d.Exists(KeyIn) Then
            OutAy(d.item(KeyIn), 4) = OutAy(d.item(KeyIn), 4) + KeyVal 'the summation of value field for existing row in OutAy
            Else:
            d.Add KeyIn, r 'r is set as the item value referencing the row of the OutAy when it was first added. The reference is used when .Exists is true
            For c = 1 To 4
                OutAy(r, c) = InAy(i, c)
            Next c
            r = r + 1
        End If
    Next
    RemoveDupes = OutAy
End Function