1
votes

I know there are a plethora of packages/functions such as (janitor) "tabyl" & "pastec" to get the descriptive values of variables, but I don't know how to apply them over only certain columns.

For example

library(pastec)
stat.desc(iris) 

will return the mean/sd etc., for all the variable, but I want to apply it only to the numeric variables. I don't want to subset, because my data set has over 20 columns and the numeric columns are interspersed in varying orders.

Something else I tried is:

library(janitor) 
lapply(iris,tabyl)

Which is great, except that I don't want tabyl applied over all the columns (because columns with 14,000 ID's makes for an ugly output) & my ultimate aim is to throw this into a neat looking excel file.

Any idea's for how I can apply these cool functions for 'numeric' types and 'character'/'factor' types separately? Or to specific columns specified in a vector?

1
Subset the data you lapply to. Something like nums = sapply(iris, is.numeric); lapply(iris[nums], tabyl). Or, write yourself a wrapper function that looks at the column type and picks the right function to use.Gregor Thomas
Package DescTools has a Desc function that produces different summary stats for different variable types. If you have Microsoft Word, it will pass tables and plots to an open Word document.dcarlson

1 Answers

1
votes

{dplyr} package has some neat ways to select numeric, character variables etc.

For example:

library(pastecs)
library(dplyr)

stat.desc(select_if(iris, is.numeric))

Good luck!