I have this data.frame
DATA
df <- data.frame(id=c(rep("site1", 3), rep("site2", 8), rep("site3", 9), rep("site4", 15)),
major_rock = c("greywacke", "mudstone", "gravel", "greywacke", "gravel", "mudstone", "gravel", "mudstone", "mudstone",
"conglomerate", "gravel", "mudstone", "greywacke","conglomerate", "gravel", "gravel", "greywacke","gravel",
"greywacke", "gravel", "mudstone", "greywacke", "gravel", "gravel", "gravel", "conglomerate", "greywacke",
"coquina", "gravel", "gravel", "greywacke", "gravel", "mudstone","mudstone", "gravel"),
minor_rock = c("sandstone mudstone basalt chert limestone", "limestone", "sand silt clay", "sandstone mudstone basalt chert limestone",
"sand silt clay", "sandstone conglomerate coquina tephra", NA, "limestone", "mudstone sandstone coquina limestone",
"sandstone mudstone limestone", "sand loess silt", "sandstone conglomerate coquina tephra", "sandstone mudstone basalt chert limestone",
"sandstone mudstone limestone", "sand loess silt", "loess silt sand", "sandstone mudstone conglomerate chert limestone basalt",
"sand silt clay", "sandstone mudstone conglomerate", "loess sand silt", "sandstone conglomerate coquina tephra", "sandstone mudstone basalt chert limestone",
"sand loess silt", "sand silt clay", "loess silt sand", "sandstone mudstone limestone", "sandstone mudstone conglomerate chert limestone basalt",
"limestone", "loess sand silt", NA, "sandstone mudstone conglomerate", "sandstone siltstone mudstone limestone silt lignite", "limestone",
"mudstone sandstone coquina limestone", "mudstone tephra loess"),
area_ha = c(1066.68, 7.59, 3.41, 4434.76, 393.16, 361.69, 306.75, 124.93, 95.84, 9.3, 8.45, 4565.89, 2600.44, 2198.52,
2131.71, 2050.09, 1640.47, 657.09, 296.73, 178.12, 10403.53, 8389.2, 8304.08, 3853.36, 2476.36, 2451.25,
1640.47, 1023.02, 532.94, 385.68, 296.73, 132.45, 124.93, 109.12, 4.87))
In it, there are 4 sites, 2 of them are independent (site1
and site3
; they don't include any site upstream) and 2 are dependent (site2
and site4
; they include upstream site(s))
I want to create a new data.frame, let's call it df_indep
. In which, I want all sites to be independent which means subtracting any upstream site(s) from the dependent sites as below
site1 and site3 will remain the same as they are independent
site2 (independent) = site2 - site1
site4 (independent) = site4 -(site2+site3)
Below is df
only for major_rock
and minor_rock
combinations that have area_percent
larger than 15% (before subtracting upstream sites; site2
and site3
)
library(dplyr)
head(df %>% group_by(id) %>%
mutate(area_percent = area_ha/sum(area_ha)*100) %>%
filter(area_percent>5),15)
# id major_rock minor_rock area_ha area_percent
# <fctr> <fctr> <fctr> <dbl> <dbl>
#1 site1 greywacke sandstone mudstone basalt chert limestone 1066.68 98.979289
#2 site2 greywacke sandstone mudstone basalt chert limestone 4434.76 77.329604
#3 site2 gravel sand silt clay 393.16 6.855592
#4 site2 mudstone sandstone conglomerate coquina tephra 361.69 6.306845
#5 site2 gravel NA 306.75 5.348848
#6 site3 mudstone sandstone conglomerate coquina tephra 4565.89 27.978879
#7 site3 greywacke sandstone mudstone basalt chert limestone 2600.44 15.934986
#8 site3 conglomerate sandstone mudstone limestone 2198.52 13.472099
#9 site3 gravel sand loess silt 2131.71 13.062701
#10 site3 gravel loess silt sand 2050.09 12.562550
#11 site3 greywacke sandstone mudstone conglomerate chert limestone basalt 1640.47 10.052479
#12 site4 mudstone sandstone conglomerate coquina tephra 10403.53 25.925869
#13 site4 greywacke sandstone mudstone basalt chert limestone 8389.20 20.906106
#14 site4 gravel sand loess silt 8304.08 20.693984
#15 site4 gravel sand silt clay 3853.36 9.602674
and here is the
FINAL RESULT
I want after subtracting upstream sites
# id major_rock minor_rock area_ha area_percent
#1 site1 greywacke sandstone mudstone basalt chert limestone 1066.68 98.979289
#2 site2 greywacke sandstone mudstone basalt chert limestone 3368.08 72.319849
#3 site2 gravel sand silt clay 389.75 8.368762
#4 site2 mudstone sandstone conglomerate coquina tephra 361.69 7.766254
#5 site2 gravel NA 306.75 6.586576
#6 site3 mudstone sandstone conglomerate coquina tephra 4565.89 27.978879
#7 site3 greywacke sandstone mudstone basalt chert limestone 2600.44 15.934986
#8 site3 conglomerate sandstone mudstone limestone 2198.52 13.472099
#9 site3 gravel sand loess silt 2131.71 13.062701
#10 site3 gravel loess silt sand 2050.09 12.562550
#11 site3 greywacke sandstone mudstone conglomerate chert limestone basalt 1640.47 10.052479
#12 site4 mudstone sandstone conglomerate coquina tephra 5475.95 30.297305
#13 site4 greywacke sandstone mudstone basalt chert limestone 1354.00 7.491403
#14 site4 gravel sand loess silt 6163.92 34.103701
#15 site4 gravel sand silt clay 2803.11 15.509031
I will appreciate any suggestions on how to do this in R.
UPDATE
This is a map showing all 4 sites
The figure below shows site4
(accumulative as in df
) and site1
(independent) as I want in the final output after subtracting site2
and site3
The figure below shows the same for site2 (accumulative) and indepenendent
Regarding @rbierman's question of how sites dependcies are encoded, please check below.
# id dependent dep_site1 dep_site2 dep_site3
#1 site1 no no no no
#2 site1 no no no no
#3 site1 no no no no
#4 site2 yes yes no no
#5 site2 yes yes no no
#6 site2 yes yes no no
#7 site2 yes yes no no
#8 site2 yes yes no no
#9 site2 yes yes no no
#10 site2 yes yes no no
#11 site2 yes yes no no
#12 site3 no no no no
#13 site3 no no no no
#14 site3 no no no no
#15 site3 no no no no
#16 site3 no no no no
#17 site3 no no no no
#18 site3 no no no no
#19 site3 no no no no
#20 site3 no no no no
#21 site4 yes yes yes yes
#22 site4 yes yes yes yes
#23 site4 yes yes yes yes
#24 site4 yes yes yes yes
#25 site4 yes yes yes yes
#26 site4 yes yes yes yes
#27 site4 yes yes yes yes
#28 site4 yes yes yes yes
#29 site4 yes yes yes yes
#30 site4 yes yes yes yes
#31 site4 yes yes yes yes
#32 site4 yes yes yes yes
#33 site4 yes yes yes yes
#34 site4 yes yes yes yes
#35 site4 yes yes yes yes