0
votes

I have a big dataset with +100 observation and 68 variables. I was wondering whether there might be a way to generate plots and histograms for all those variables at once without having to write down the code for a boxplot/histogram one by one, and save them in a folder as pns or in a pdf.

possibly I'd like to have more than one plot on the same page (i know you can do that using "par")

I know is probably a simple piece of coding but it would be really helpful for me. Thank you

Ok I think an example could be the data from the iris dataset:

"Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3.5 1.4 0.2 setosa

2 4.9 3.0 1.4 0.2 setosa

3 4.7 3.2 1.3 0.2 setosa

4 4.6 3.1 1.5 0.2 setosa

5 5.0 3.6 1.4 0.2 setosa

6 5.4 3.9 1.7 0.4 setosa"

But instead of having just "Sepal.Length Sepal.Width Petal.Length Petal.Width " as observed variables, I have 68 of them. My interest is to check normality distribution for the sample on all my 68 variables and boxplot . I know how to create boxplots and histogram variable per variable, but that would take a lot of time and I imagine there must be a way to do it at once, probably using a loop or a %>% ?

1
Please create a short reproducible example including some data and an example output if possible: stackoverflow.com/questions/5963269/…emilliman5
It's easier to help you if you include a simple reproducible example with sample input and desired output that can be used to test and verify possible solutions. We don't need your actual data, just a representative example or a built in data set would be fine. Exactly what is the desired output? A PDF? A single image?MrFlick
@WannabeGandalf: this might help too stackoverflow.com/a/59791424/786542Tung

1 Answers

0
votes

Take a look at the DataExplorer, skimr and inspectdf packages. They all produce summaries like the one you want. These articles give an overview:
https://www.littlemissdata.com/blog/simple-eda
https://www.littlemissdata.com/blog/inspectdf