0
votes

I have a multi dimensional numpy array of shape (200, 1500). I want to visualise summary statistics for this data. Because the num_cols is too high I can't plot all of them. My questions are:

  1. Which summary statistics shall I visualise?
  2. Do i visualise all columns?
  3. I thought of randomly choosing N columns from the data and showing distribution and box plots. Example shown below is for second column in array X. However, i can't figure out how to show both plots for N columns in a single figure. Can someone help me with this?

    dist plot

    plt.figure(figsize=(20,4)) plt.subplot(121)
    ax = sns.distplot(X[:,1])

    Box Plot

    plt.subplot(122) plt.xlim(X[:,1].min()*1.1, X[:,1].max()*1.1) sns.boxplot(x=X[:,1])

enter image description here

1
Some issues in your question: 1) is this a multi-dimensional (as in >2 dimensions) or a two-dimensional array?, 2) the relevant summary statistics really depend on what exactly is the data and what you're looking to get out of it, 3) what do you mean by "i can't figure out how to show both plots for N columns in a single figure"? Do you want 2*N plots in a single figure? That's possible of course, but if N is a large number, then the figure will have too many plots, it will be hard to read, and it will be very big in size (i.e. bytes).Shiva
You can try a dimensionality reduction technique such as PCA or t-SNE, and then visualize the data on lower dimensions.czr

1 Answers

1
votes

As @Shiva mentioned, the summary statistics and visualisation approach depends on your problem. The problem formulation determines whether you need mean or median values, standard deviations, eigenvalues, frequency distributions, etc. If you provide more details, the community could offer more specific advice.

Nevertheless, there are general-purpose analytical techniques that you could consider. See e.g. this blog post demonstrating various dimensionality reduction techniques, applied to the MNIST data set. Also check out this blog post discussing the application of an autoencoder for this purpose (scroll down). More specific to visualisation, you could browse through the Seaborn examples gallery to see if there are any examples you could apply to your own dataset.