1
votes

I'm an utter beginner in R - fumbling my way through it for degree :)

i need to summarize a very large data set by site, as there are currently multiple rows per site and around 70 columns of variables - both numeric and categorical. i'm looking at seedling regeneration at each site.

I have 45 study sites, and trying to summarize all my variables per site. currently - each of the study sites has a number of plant species ranging from 5-30+ => so i can have up to 30 rows for each site, as each new species per site has its own row with #trees, #saplings#, seedlings, other variables as columns.

i've tried this code:

i <- sapply(data.df, is.factor)  ### convert "factor" variables to "character" for dply analysis
data.df[i] <- lapply(data.df[i], as.character)

select(data.df,site,total_seedlings_m2,age,age_category,landuse_history, exotic_landcover_types,native_landcover_types,prcnt_light_transmittance,avg_canopy_height,prcnt_total_herb_cover,annual_rainfall_mm,annual_sunshine_hours,annual_temp_mean,annual_ground_frost_days,annual_rel_humidity,daily_air_rh_range,daily_air_temp_range,daily_soil_temp_range,total_trees_m2,total_basal_area_m2)
group_by_(site)

summarise_all(data.df)  

i want to summarise all columns (although i need to do a mixture of Sum/Mean for different variables)

I'm just trialling this method. when i want to group data by site - which should give me 45 data rows, i get an error:

Error in UseMethod("group_by_") : no applicable method for 'group_by_' applied to an object of class "character"

it says i'm using "group_by_" when im actually using "group_by"

is there an easy fix? and is there a way to be able to summarise all columns and either add or average columns depending on variable (I would "sum" seedlings counts and would get Mean of micro-climate data)

first time asking for help online so hopefully this makes a little bit of sense :)

1
"im actually using group_by" - not according to the code in your question. It contains group_by_. Please make this question reproducible by including some or all of data.df.neilfws
try checking if your site columns is a character or factor? maybe you need to turn it into a factor first. Also maybe check out the ddply function in the plyr package. I find it a lot nicermorgan121
Also, you’re not passing and data frame to group_by. You either need to pass the data frame in as the first argument, or use a pipe %>%. The specific error is because you’re trying to group some character vector called “site”, not data.df as expecteddivibisan
Also, dplyr functions don’t edit the data in place, they return the modified version. You need to assign that somewhere to accomplish anything, ie dat <- select(dat, ...divibisan

1 Answers

-1
votes

Try this it should work

i <- sapply(data.df, is.factor)  
data.df[i] <- lapply(data.df[i], as.character)

library(dplyr)
data.df%>%group_by(site)%>%summarise(count=n())