Plotting new documents to scatter plot

Question

I am looking to gain some insight into my data. I am converting them into VSM using sklearn PCA and plotting them to a matplotlib graph. THis involves

Casting the documents to a number matrix using pipeline
```
test = pipeline.fit_transform(docs).todense()
```
Fitting it to my model
```
pca = PCA().fit(test)
```
Then I am converting it using transform
```
    data = pca.transform(test)
```

Finally I am plotting the results using Matplotlib

   plt.scatter(data[:,0], data[:,1], c = categories)

My question is this: How do I take new sentences and determine where they would lie in relation to the other documents plotted. Using an X to mark their relative positions ?

Thanks

WhoIsJack WhoIsJack · Accepted Answer · 2017-08-11T21:58:20

Also cast the new documents to a numeric array
```
new = pipeline.transform(new_docs).todense()
```
Note that this uses the pipeline with the previously fitted parameters, hence it's pipeline.transform, not pipeline.fit_transform.
Transform the new data using the previously fitted pca.
```
new_data = pca.transform(new)
```
This will transform the new data to the same PC-space as the original data.

Add the new data to the plot using a second scatter.

plt.scatter(data[:,0], data[:,1], c = categories)
plt.scatter(new_data[:,0], new_data[:,1], marker = 'x')
plt.show()

Plotting new documents to scatter plot

1 Answers