6
votes

The question is basically "how do I generate a good grid for the game 'Boggle' with lots of words" where good is defined as having lots of words of 5 or more letters.

Boggle is a game where you roll dice with letters on them, they are placed in a 4x4 grid. Example:

H S A V
E N I S
K R G I
S O L A

Words can be made by connecting letters horizontally, vertically or diagonally. In the good example grid above you can make the words "VANISHERS", "VANISHER", "KNAVISH", "ALIGNERS", "SAVINGS", "SINKERS" and around 271 other words depending on the dictionary used, for example "AS", "I", "AIR", "SIN", "IS", etc...

As a bad example this grid

O V W C
T K Z O
Y N J H
D E I E

only has ~44 words only 2 of which are > 4 letters long. "TYNED" and "HINKY".

There's lots of similar questions but AFAICT not this exact question. This is obviously a reference to the game "Scramble with Friends".

The first solution, picking letters at random, has the problem that if you accidently pick all consonants there will be no words. Adding a few random vowels is not enough to guarantee a good set of words. You might only get 1 to 4 letter words whereas a good algorithm will choose a set of letters that has > 200 words with many words > 7 letters.

I'm open to any algorithm. Obviously I could write code to brute force solutions finding every possible grid and then sorting them by grids with the most words but that simple solution would take forever to run.

I can imagine various heuristics like choosing a long word (8-16 letters), putting those letters in the grid at random but in a way that can actually still make the word and then filling in the left over spaces. I suspect that's also not enough to guarantee a good set of words though I haven't tried it yet.

It's possible the solution requires pre-processing a dictionary to know common parts of words. For example all words that end in "ing" or "ers" or "ght" or "tion" or "land". Or somehow organizing them into a graph of shared letters. Maybe weighting certain sets of letters so "ing" or "ers" are inserted often.

Ideas?

3
I suggest finding a letter frequency chart and using a weighted randomization based on that. Perhaps generate a grid, check it, and if it's not "good enough" try a new one.Kevin

3 Answers

1
votes

Short of the brute-force search proposal there is probably no way to guarantee that you have a good grid. If you use the letter frequency as found on the Boggle dice, then you will get 'average' grids (exactly as if you roll the dice). You could improve this by adding extra heuristics or filters, for example:

 ensure that (almost) every consonant is 'in-reach-of' a vowel
 ensure 'Q' is 'in-reach-of' a 'U'
 ensure the ratio of vowels to consonants is within a set range
 ensure the number of rare consonants is not too large
 etc

Then you could

 set letters using weighted letter frequency
 change (swap/replace) letters not meeting your heuristics

It would still be possible for a bad grid to get through unless you checked via brute-force, but you may be able to reduce the number of bad grids substantially from those returned by a simple randomly generated grid.

Alternately, generate random grids and do the brute force work as required to pick good grids. But do this in the background (days or weeks before needed). Then store a bunch of good grids and choose one randomly as required when needed (and cross it off your list so you don't see it again).

0
votes

The way Boggle works is that there are six-sided die with certain letters on the side. Those die are randomly assigned to the 16 squares and then rolled. Common letters occur on more faces of the dice. Search around - you may be able to get the exact set of dice.

0
votes
  1. Calculate statistical letter frequency and letter-pair frequencies from the dictionary.

  2. Start from randomly choosing one of the four central squares

  3. Randomly choose a letter for that square weighted by single letter frequency.

  4. Recursively:

    4.1. Randomly choose one of all the empty connected squares.

    4.2. Randomly choose a letter for that square weighted by the combination, (average), of the dual letter frequencies of any connected filled square and the single letter frequencies of any connected empty square.

Et voila!

P.S. You might also want to experiment with adding a global letter derating based on its current count of appearances in the grid to 4.2.