Question
What is the most elegant & performant way to define DAX formula calculating the values of a confusion matrix (TP, FP, FN, TN) so that another measure (e.g. Precision, Recall, F1) can make use of them?
Background
We are serving a classification model to users. Users have expressed desire to see how model evaluation metrics for a given subset of historical predictions.
My approach of one measure for each of the four values strikes me as inefficient...
Originally, I had a "What-If?" parameter for threshold so the "predicted" column was a measure and calculated columns/tables were ruled out.
I'm thinking there's a potential use case for SUMMARIZE()
...
Current approach
True Positive := CALCULATE ( DISTINCTCOUNT ( Eval[id] ), Eval[actual] = 1, Eval[predicted] = 0 )
False Negative := CALCULATE ( DISTINCTCOUNT ( Eval[id] ), Eval[actual] = 1, Eval[predicted] = 0 )
False Negative := CALCULATE ( DISTINCTCOUNT ( Eval[id] ), Eval[actual] = 1, Eval[predicted] = 0 )
False Negative := CALCULATE ( DISTINCTCOUNT ( Eval[id] ), Eval[actual] = 1, Eval[predicted] = 0 )
Precision := DIVIDE([True Positive],[True Positive]+[False Positive])
Recall := DIVIDE([True Positive],[True Positive]+[False Negative])
F1 Score := 2*DIVIDE([Precision]*[Recall],[Precision]+[Recall])
Example source data
| id | predicted | actual |
|----|-----------|--------|
| 1 | 0 | 0 |
| 2 | 0 | 0 |
| 3 | 0 | 1 |
| 4 | 1 | 1 |
| 5 | 1 | 1 |
| 6 | 1 | 0 |
| 7 | 1 | 0 |
Source query
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTIA41idaCUjFJ4xlGcI5pmAWTCeKQrPDMqD6DNH8GIB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [id = _t, predicted = _t, actual = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"id", Int64.Type}, {"predicted", Int64.Type}, {"actual", Int64.Type}})
in
#"Changed Type"
F1
I think (though ultimately dependent on what users want) (2) i've plotted theROC
already and0.5
is best, so no need forROC
in report – Anders Swanson