1
votes

In my dataset, I have a Price column for house prices and 5 dummy columns for different locations in the city. What I want to do is to show data points on the scatter plot with different colors.

For instance, on a scatter plot including all the prices of the houses, I want to have:

  • Red for all price points when dummy1 which indicates house being in Area1 is equal to 1.
  • Blue for all price points when dummy2 which indicates house being in Area2 is equal to 2.

and so on until the last column. How can I create that plot? I can create the scatter plot without the color using plt.scatter() but don't know how to add the color code.

1
Can we see at least the header of your data! few first lines of columns?Khalil Al Hooti

1 Answers

0
votes

Have a look at the docs for matplotlib.pyplot.scatter which describes a parameter c, which can be

A sequence of color specifications of length n.

Here is an example, which creates 100 random x and y data points. If y value is over 5, the point will be blue, else red as specified in c list.

import matplotlib.pyplot as plt
import random

x = list(range(100))
y = [random.randint(0, 10) for _ in range(len(x))]
c = ["b" if y > 5 else "r" for y in y]

plt.scatter(x, y, c=c)
plt.show()

The output will look like this:

enter image description here