Consider the following PySpark dataframe
| Col1 | Col2 | Col3 |
|---|---|---|
| A, B | D, G | A, G |
| C, F | C, D | A, G |
| C, F | C, D | A, G |
I'd like to create a new dataframe with 2 columns, the first with all the different combinations, and the second column is the ratio: Frequency of Combination / Total Number of Combinations. For example,
| Combination | Ratio |
|---|---|
| A, B | 0.111 (1/9) |
| C, F | 0.222 (2/9) |
| D, G | 0.111 (1/9) |
| C, D | 0.222 (2/9) |
| A, G | 0.333 (3/9) |