Using Julia, is there a way to compare 2 DataFrames cell by cell and output difference
expected result would produce dataframe with True/False
Thanks in advance for help
AbstractDataFrame
objects support broadcasting so you can just write:
julia> df1 .== df2
3×2 DataFrame
│ Row │ Col1 │ Col2 │
│ │ Bool │ Bool │
├─────┼──────┼──────┤
│ 1 │ 1 │ 1 │
│ 2 │ 1 │ 1 │
│ 3 │ 0 │ 1 │
or
julia> isequal.(df1, df2)
3×2 DataFrame
│ Row │ Col1 │ Col2 │
│ │ Bool │ Bool │
├─────┼──────┼──────┤
│ 1 │ 1 │ 1 │
│ 2 │ 1 │ 1 │
│ 3 │ 0 │ 1 │
The difference between ==
and isequal
is how they handle the case if you have missing
value in a cell (==
will produce missing
in such a case and isequal
produces true
/false
).
Using the Matrix
approach that Przemyslaw proposes will ignore column names (and in general will be expensive as it performs copying of data). The second option proposed by Przemyslaw ignores column order in the data frames (in some cases you actually might want it) and does not check if the sets of column names in both data frames are the same.
Basically you need to use .==
in one of many ways.
using DataFrames
df1 = DataFrame(Col1=["A","B","C"],Col2=["X","Y","Z"])
df2 = DataFrame(Col1=["A","B","D"],Col2=["X","Y","Z"])
This is the shortest version:
julia> Matrix(df1) .== Matrix(df2)
3×2 BitArray{2}:
1 1
1 1
0 1
In this approach you can use dimension dropping [:]
to get the list of unmatched values:
julia> Matrix(df2)[:][(.!(Matrix(df1) .== Matrix(df2))[:])]
1-element Array{String,1}:
"D"
If you want a DataFrame
:
julia> DataFrame((n => df1[!,n] .== df2[!,n] for n in names(df2))...)
3×2 DataFrame
│ Row │ Col1 │ Col2 │
│ │ Bool │ Bool │
├─────┼──────┼──────┤
│ 1 │ 1 │ 1 │
│ 2 │ 1 │ 1 │
│ 3 │ 0 │ 1 │