1
votes

How do I create a wordcloud with Altair? Vega and vega-lite provide wordcloud functionality which I have used succesfully in the past. Therefore it should be possible to access it from Altair if I understand correctly and I would prefer to prefer to express the visualizations in Python rather than embedded JSON. All the examples for Altair I have seen involve standard chart types like scatter plots and bar graphs. I have not seen any involving wordclouds, networks, treemaps, etc.

More specifically how would I express or at least approximate the following Vega visualization in Altair?

def wc(pages, width=2**10.5, height=2**9.5):
 return {
  "$schema": "https://vega.github.io/schema/vega/v3.json",
  "name": "wordcloud",
  "width": width,
  "height": height,
  "padding": 0,
  "data" : [
      {
          'name' : 'table',
          'values' : [{'text': pg.title, 'definition': pg.defn, 'count': pg.count} for pg in pages)]
      }
  ],
  "scales": [
    {
      "name": "color",
      "type": "ordinal",
      "range": ["#d5a928", "#652c90", "#939597"]
    }
  ],
  "marks": [
    {
      "type": "text",
      "from": {"data": "table"},
      "encode": {
        "enter": {
          "text": {"field": "text"},
          "align": {"value": "center"},
          "baseline": {"value": "alphabetic"},
          "fill": {"scale": "color", "field": "text"},
          "tooltip": {"field": "definition", "type": "nominal", 'fontSize': 32}
        },
        "update": {
          "fillOpacity": {"value": 1}
        },
      },
      "transform": [
        {
          "type": "wordcloud",
          "size": [width, height],
          "text": {"field": "text"},
          #"rotate": {"field": "datum.angle"},
          "font": "Helvetica Neue, Arial",
          "fontSize": {"field": "datum.count"},
          #"fontWeight": {"field": "datum.weight"},
          "fontSizeRange": [2**4, 2**6],
          "padding": 2**4
        }
      ]
    }
  ],
}

Vega(wc(pages))
2
A nice non-vega based alternative for generating word clouds in python is word_cloud. - James Draper
@JamesDraper Thanks. I use word_cloud for creating static images. I am trying to create interactive word clouds, where the word are clickable or have tooltips assiciated with them, afaik word_cloud does not support that but vega does. - Daniel Mahler
I figured as much but I just thought I'd throw the link up there just in case. - James Draper

2 Answers

6
votes

Altair's API is built on the Vega-Lite grammar, which includes only a subset of the plot types available in Vega. Word clouds cannot be created in Vega-Lite, so they cannot be created in Altair.

2
votes

With mad respect to @jakevdp, you can construct a word cloud (or something word cloud-like) in altair by recognizing that the elements of a word cloud chart involve:

  1. a dataset of words and their respective quantities
  2. text_marks encoded with each word, and optionally size and or color based on quantity
  3. "randomly" distributing the text_marks in 2d space.

One simple option to distribute marks is to add an additional 'x' and 'y' column to data, each element being a random sample from the range of your chosen x and y domain:

import random
def shuffled_range(n): return random.sample(range(n), k=n)
n = len(words_and_counts)  # words_and_counts: a pandas data frame
x = shuffled_range(n)
y = shuffled_range(n)

data = words_and_counts.assign(x=x, y=y)

This isn't perfect as it doesn't explicitly prevent word overlap, but you can play with n and do a few runs of random number generation until you find a layout that's pleasing.

Having thus prepared your data you may specify the word cloud elements like so:

base = alt.Chart(data).encode(
    x=alt.X('x:O', axis=None),
    y=alt.Y('y:O', axis=None)
).configure_view(strokeWidth=0)  # remove border

word_cloud = base.mark_text(baseline='middle').encode(
    text='word:N',
    color=alt.Color('count:Q', scale=alt.Scale(scheme='goldred')),
    size=alt.Size('count:Q', legend=None)
)

Here's the result applied to the same dataset used in the Vega docs:

altair word cloud