3
votes

I have similar question to [combine Rows with Duplicate Values][1] Excel VBA - Combine rows with duplicate values in one cell and merge values in other cell

I have data in this format (rows are sorted)


Pub     ID      CH      Ref
no      15      1      t2
no      15      1      t88
yes     15      2      t3
yes     15      2      t3
yes     15      2      t6

compare adjacent rows (say row 4 and 5) , if col 2 and 3 match then if col 4 different merge col4, delete row. if col 2,3,4 match then delete row, don't merge col 4


Desired Output

key     ID      CH      Text  
no      15      1       t2   t88
yes     15      2       t3   t6

This first code section doesn't work right

Sub mergeCategoryValues()
    Dim lngRow As Long

    With ActiveSheet
        Dim columnToMatch1 As Integer: columnToMatch1 = 2
        Dim columnToMatch2 As Integer: columnToMatch2 = 3
        Dim columnToConcatenate As Integer: columnToConcatenate = 4


        lngRow = .Cells(65536, columnToMatch1).End(xlUp).row
        .Cells(columnToMatch1).CurrentRegion.Sort key1:=.Cells(columnToMatch1), Header:=xlYes
        .Cells(columnToMatch2).CurrentRegion.Sort key1:=.Cells(columnToMatch2), Header:=xlYes

        Do
            If .Cells(lngRow, columnToMatch1) = .Cells(lngRow - 1, columnToMatch1) Then 'check col 2 row lngRow, lngRow-1
              If .Cells(lngRow, columnToMatch2) = .Cells(lngRow - 1, columnToMatch2) Then 'check col 3 row lngRow, lngRow-1
                 If .Cells(lngRow - 1, columnToConcatenate) = .Cells(lngRow, columnToConcatenate) Then
                    Else
                    .Cells(lngRow - 1, columnToConcatenate) = .Cells(lngRow - 1, columnToConcatenate) & "; " & .Cells(lngRow, columnToConcatenate)
                 End If
                .Rows(lngRow).Delete
              End If
            End If
            lngRow = lngRow - 1
        Loop Until lngRow = 1
    End With

Actual Output incorrect because when cells merge t3 will not match t3;t6, my comparison on col 4 will only work in very simple case only.

Actual Output

key ID  CH  Text
no  15  1   t2; t88
yes 15  2   t3; t3; t6

Therefore, I had to add these two sections to split the Concatenate cells and then remove duplicates

'split cell in Col d to col e+ delimited by ;
        With Range("D2:D6", Range("D" & Rows.Count).End(xlUp))
            .Replace ";", " ", xlPart
            .TextToColumns other:=True
        End With

 'remove duplicates in each row

    Dim x, y(), i&, j&, k&, s$
    With ActiveSheet.UsedRange
        x = .Value: ReDim y(1 To UBound(x, 1), 1 To UBound(x, 2))
        For i = 1 To UBound(x)
            For j = 1 To UBound(x, 2)
                If Len(x(i, j)) Then
                    If InStr(s & "|", "|" & x(i, j) & "|") = 0 Then _
                       s = s & "|" & x(i, j): k = k + 1: y(i, k) = x(i, j)
                End If
            Next j: s = vbNullString: k = 0
        Next i
        .Value = y()
    End With
    End Sub

With additional code output is

Pub ID  CH  Ref 
no  15  1   t2  t88
yes 15  2   t3  t6

Question: There must be much easier way to do this right than use three different methods? How about inserting new columns 5+ if col 4 items don't match?

Note: Remove duplicates code was found from user nilem at excelforum.

Edit: Col 1 will always be same if Col 2 and 3 match. If solution is much easier we can assume Col 1 is blank and ignore data.

I have printed book lookup table and need to convert to a simple format that will be used in equipment that use a 1960's language which has very limited commands. I am trying to preformat this data so I only need to search for one row that has all info.

Col D final output can be in col D with delimiter or into col D-K (only 8 max Ref) because I will parse to use on other machine. Whatever method is easier.

3
I don't really understand your rules, or why you have to sort your data, but, in general, I would use a user defined class and a collection object to process each row of data, and then combine the results to get the output. Perhaps a modification of Destacking ColumnsRon Rosenfeld
Your text indicates you want to compare columns 2 and 3, combining duplicates; but your example indicates that you also want to combine duplicates of column 1. Perhaps you could state your rules more clearly, and also provide a more comprehensive example.Ron Rosenfeld
I clarified at bottom of post.equalizer
Thanks. I posted a solution as an answer.Ron Rosenfeld

3 Answers

1
votes

The canonical practise for deleting rows is to start at the bottom and work toward the top. In this manner, rows are not skipped. The trick here is to find rows above the current position that match columns B and C and concatenate the strings from column D before removing the row. There are several good worksheet formulas that can acquire the row number of a two-column-match. Putting one of them into practise with application.Evaluate would seem to be the most expedient method of collecting the values from column D.

Sub dedupe_and_collect()
    Dim rw As Long, mr As Long, wsn As String

    With ActiveSheet   '<- set this worksheet reference properly!
        wsn = .Name
        With .Cells(1, 1).CurrentRegion
            .RemoveDuplicates Columns:=Array(2, 3, 4), Header:=xlYes
        End With
        With .Cells(1, 1).CurrentRegion  'redefinition after duplicate removal
            For rw = .Rows.Count To 2 Step -1 'walk backwards when deleting rows
                If Application.CountIfs(.Columns(2), .Cells(rw, 2).Value, .Columns(3), .Cells(rw, 3).Value) > 1 Then
                    mr = Application.Evaluate("MIN(INDEX(ROW(1:" & rw & ")+(('" & wsn & "'!B1:B" & rw & "<>'" & wsn & "'!B" & rw & ")+('" & wsn & "'!C1:C" & rw & "<>'" & wsn & "'!C" & rw & "))*1E+99, , ))")
                    'concatenate column D
                    '.Cells(mr, 4) = .Cells(mr, 4).Value & "; " & .Cells(rw, 4).Value
                    'next free column from column D
                    .Cells(mr, Columns.Count).End(xlToLeft).Offset(0, 1) = .Cells(rw, 4).Value
                    .Rows(rw).EntireRow.Delete
                End If
            Next rw
        End With
    End With
End Sub

The removal of records on a three-column-match is done with the VBA equivalent of the Date ► Data Tools ► Remove Duplicates command. This only considers columns B, C and D and deletes the lower duplicates (keeping the ones closest to row 1). If Column A is important in this respect, additional coding would have to be added.

It's unclear to me whether you wanted column D as delimited string or separate cells as an end result. Could you clarify?

1
votes

As I wrote above, I would iterate through the data and collect things into the User Defined Object. There is no need for the data to be sorted in this method; and duplicate REF's will be omitted.

One advantage of a User Defined Object is that it makes debugging easier as you can see more clearly what you have done.

We combine every line where ID and CH are the same, by using the property of the Collection object to raise an error if identical keys are used.

So far as combining the Ref's in a single cell with a delimiter, vs individual cells in columns D:K, either can be done simply. I chose to separate into columns, but changing it to combine into a single column would be trivial.

After Inserting the Class Module, you must rename it: cID_CH

You will note I placed the results on a separate worksheets. You could overwrite the original data, but I would advise against that.

Class Module


Option Explicit
Private pID As Long
Private pCH As Long
Private pPUB As String
Private pREF As String
Private pcolREF As Collection

Public Property Get ID() As Long
    ID = pID
End Property
Public Property Let ID(Value As Long)
    pID = Value
End Property

Public Property Get CH() As Long
    CH = pCH
End Property
Public Property Let CH(Value As Long)
    pCH = Value
End Property

Public Property Get PUB() As String
    PUB = pPUB
End Property
Public Property Let PUB(Value As String)
    pPUB = Value
End Property

Public Property Get REF() As String
    REF = pREF
End Property
Public Property Let REF(Value As String)
    pREF = Value
End Property

Public Property Get colREF() As Collection
    Set colREF = pcolREF
End Property

Public Sub ADD(refVAL As String)
    On Error Resume Next
        pcolREF.ADD refVAL, refVAL
    On Error GoTo 0
End Sub

Private Sub Class_Initialize()
    Set pcolREF = New Collection
End Sub

Regular Module


Option Explicit
Sub CombineDUPS()
    Dim wsSRC As Worksheet, wsRES As Worksheet
    Dim vSRC As Variant, vRES() As Variant, rRES As Range
    Dim cI As cID_CH, colI As Collection
    Dim I As Long, J As Long
    Dim S As String

'Set source and results worksheets and results range
Set wsSRC = Worksheets("sheet1")
Set wsRES = Worksheets("sheet2")
Set rRES = wsRES.Cells(1, 1)

'Get Source data
With wsSRC
    vSRC = .Range("A2", .Cells(.Rows.Count, "D").End(xlUp))
End With

'Collect and combine data
Set colI = New Collection
On Error Resume Next
For I = 1 To UBound(vSRC, 1)
    Set cI = New cID_CH
    With cI
        .PUB = vSRC(I, 1)
        .ID = vSRC(I, 2)
        .CH = vSRC(I, 3)
        .REF = vSRC(I, 4)
        .ADD .REF
        S = CStr(.ID & "|" & .CH)
        colI.ADD cI, S
        If Err.Number = 457 Then
            Err.Clear
            colI(S).ADD .REF
        ElseIf Err.Number <> 0 Then
            Debug.Print Err.Number, Err.Description
            Stop
        End If
    End With
Next I
On Error GoTo 0

'Create and populate Results Array
ReDim vRES(0 To colI.Count, 1 To 11)

'Header row
vRES(0, 1) = "Pub"
vRES(0, 2) = "ID"
vRES(0, 3) = "CH"
vRES(0, 4) = "Ref"

'populate array
For I = 1 To colI.Count
    With colI(I)
        vRES(I, 1) = .PUB
        vRES(I, 2) = .ID
        vRES(I, 3) = .CH
        For J = 1 To .colREF.Count
            vRES(I, J + 3) = .colREF(J)
        Next J
    End With
Next I

'Write the results to the worksheet
Set rRES = rRES.Resize(UBound(vRES, 1) + 1, UBound(vRES, 2))
With rRES
    .EntireColumn.Clear
    .Value = vRES
    With .Rows(1)
        .Font.Bold = True
        .HorizontalAlignment = xlCenter
        Range(.Cells(4), .Cells(11)).HorizontalAlignment = xlCenterAcrossSelection
    End With
    .EntireColumn.AutoFit
End With

End Sub

Original

Original Data

Processed Results

Results

1
votes

variant using dictionary below

Sub test()
    Dim Dic As Object: Set Dic = CreateObject("Scripting.Dictionary")
    Dic.Comparemode = vbTextCompare
    Dim Cl As Range, x$, y$, i&, Key As Variant
    For Each Cl In Range("B2:B" & Cells(Rows.Count, "B").End(xlUp).Row)
        x = Cl.Value & "|" & Cl.Offset(, 1).Value
        y = Cl.Offset(, 2).Value
        If Not Dic.exists(x) Then
            Dic.Add x, Cl.Offset(, -1).Value & "|" & y & "|"
        ElseIf Dic.exists(x) And Not LCase(Dic(x)) Like "*|" & LCase(y) & "|*" Then
            Dic(x) = Dic(x) & "|" & y & "|"
        End If
    Next Cl
    Range("A2:D" & Cells(Rows.Count, "B").End(xlUp).Row).ClearContents
    i = 2
    For Each Key In Dic
        Cells(i, "A") = Split(Dic(Key), "|")(0)
        Range(Cells(i, "B"), Cells(i, "C")) = Split(Key, "|")
        Cells(i, "D") = Replace(Split(Replace(Dic(Key), "||", ";"), "|")(1), ":", ";")
        i = i + 1
    Next Key
    Set Dic = Nothing
End Sub

before

enter image description here

after

enter image description here