1
votes

Using Julia, is there a way to compare 2 DataFrames cell by cell and output difference

E.g.: enter image description here

expected result would produce dataframe with True/False enter image description here

Thanks in advance for help

2

2 Answers

3
votes

AbstractDataFrame objects support broadcasting so you can just write:

julia> df1 .== df2
3×2 DataFrame
│ Row │ Col1 │ Col2 │
│     │ Bool │ Bool │
├─────┼──────┼──────┤
│ 1   │ 1    │ 1    │
│ 2   │ 1    │ 1    │
│ 3   │ 0    │ 1    │

or

julia> isequal.(df1, df2)
3×2 DataFrame
│ Row │ Col1 │ Col2 │
│     │ Bool │ Bool │
├─────┼──────┼──────┤
│ 1   │ 1    │ 1    │
│ 2   │ 1    │ 1    │
│ 3   │ 0    │ 1    │

The difference between == and isequal is how they handle the case if you have missing value in a cell (== will produce missing in such a case and isequal produces true/false).

Using the Matrix approach that Przemyslaw proposes will ignore column names (and in general will be expensive as it performs copying of data). The second option proposed by Przemyslaw ignores column order in the data frames (in some cases you actually might want it) and does not check if the sets of column names in both data frames are the same.

3
votes

Basically you need to use .== in one of many ways.

using DataFrames
df1 = DataFrame(Col1=["A","B","C"],Col2=["X","Y","Z"])
df2 = DataFrame(Col1=["A","B","D"],Col2=["X","Y","Z"])

This is the shortest version:

julia> Matrix(df1) .== Matrix(df2)
3×2 BitArray{2}:
 1  1
 1  1
 0  1

In this approach you can use dimension dropping [:] to get the list of unmatched values:

julia> Matrix(df2)[:][(.!(Matrix(df1) .== Matrix(df2))[:])]
1-element Array{String,1}:
 "D"

If you want a DataFrame:

julia> DataFrame((n => df1[!,n] .== df2[!,n] for n in names(df2))...)
3×2 DataFrame
│ Row │ Col1 │ Col2 │
│     │ Bool │ Bool │
├─────┼──────┼──────┤
│ 1   │ 1    │ 1    │
│ 2   │ 1    │ 1    │
│ 3   │ 0    │ 1    │