I'd like to calculate the Mahalanobis distance among groups of species where:
- i) there are more than two groups (more than two species).
- ii) there are multiple variables (features of such species) to be taken into account.
- iii) there are multiple observations per group (in the dataframe, it means there is more than one row per specie).
I am trying to understand how to run the mahalanobis function in R, under such conditions. This question is similar to:
Mahalanobis distance on R for more than 2 groups
but there, only one variable was used. How could it be done having more than one variable?
Below there is an example, which I believe reproduces my actual data.
Sp. X1 X2 X3
A 0.7 11 215
B 0.8 7 214
B 0.8 6.5 187
C 0.3 4 456
D 0.4 3 111
A 0.1 7 205
A 0.2 7 196
C 0.1 9.3 77
D 0.6 8 135
D 0.8 4 167
B 0.4 6 228
C 0.1 5 214
A 0.4 7 156
C 0.5 2 344
Sp. = Specie; X1, X2 and X3 are observed variables.
In the real dataset, there are more than 50 species and the number of observations varies among them (from 100 rows/specie to 1000).