I have a question about pandas dataframes in Python: I have a large dataframe df
that I split into two subsets, df1
and df2
. df1
and df2
together do not make up all of df
, they are just two mutually exclusive subsets of it. I want to plot this in ggplot with rpy2 and display the variables in the plot based on whether they come from df1
or df2
. ggplot2 requires a melted dataframe so I have to create a new dataframe that has a column saying whether each entry was from df1
or df2
, so that this column can be passed to ggplot. I tried doing it like this:
# add labels to df1, df2
df1["label"] = len(df1.index) * ["df1"]
df2["label"] = len(df2.index) * ["df2"]
# combine the dfs together
melted_df = pandas.concat([df1, df2])
Now it can be plotted as in:
# plot parameters from melted_df and colour them by df1 or df2
ggplot2.ggplot(melted_df) + ggplot2.ggplot(aes_string(..., colour="label"))
My question is whether there's an easier, short hand way of doing this. ggplot requires constant melting/unmelting dfs and it seems cumbersome to always manually add the melted form to distinct subsets of df. Thanks.
df1["label"] = len(df1.index) * ["df1"]
withdf1["label"] = "df1"
– beardc