11
votes

In Python when I render a unicode character, e.g. a Chinese character, with a selected font, sometimes the font is incomplete regarding the common unicode characters, and can't render the unicode character in question. In those cases, if I call the "print" function, the output usually just look like a square box, regardless what the underlying unicode character should look like.

Of course, once I print the unicode character, I can look at the output and then determine that the chosen font misses the particular unicode character. But is there a way to tell before I print, automatically, without having to resort to my own human eyes to determine if a character is included in the font?

I'd also clarify that I know of fonts that are more complete than others. My question is NOT which font I can use so that if I call "print" I'd generally have a reasonable output. Please also ignore the question of how I print the character or if I actually want to print a character. My question is simply, for any given font, how do I tell if a unicode character is missing from the font, without using any manual process relying on human judgement of the output.

1
OS probably makes a difference, which one are you using?Mark Ransom
How do you know what font is even being used when calling print? Text on stdout could be going to a terminal, a file, some other application... In short, this question is not answerable without more constraints.gz.
I think you are both missing my point. Regardless whether or how I print the character, I just want to know if a character is included in a font.MichM
You ask about rendering, but reject rendering, so isn't your question actually just "How to test font data for undefined characters in Python?". Which font data?handle
@gz. "Which font is used by console" or "determine if print is going to console" would be two additional questions that could be (or perhaps have been) asked. I think this question as worded stands well on its own, if only the detail of which OS would be included. If you're leaving an answer, perhaps those other considerations could be addressed to make the answer more complete.Mark Ransom

1 Answers

14
votes

See https://unix.stackexchange.com/questions/247108/how-to-find-out-which-unicode-codepoints-are-defined-in-a-ttf-file

In short, one can install the fonttools package, supply it with the path to any .ttf font file of interest, and check if the long form of the unicode character of interest is included in the font file's unicode map table.

from fontTools.ttLib import TTFont
font = TTFont(fontpath)   # specify the path to the font in question


def char_in_font(unicode_char, font):
    for cmap in font['cmap'].tables:
        if cmap.isUnicode():
            if ord(unicode_char) in cmap.cmap:
                return True
    return False

Then just call the char_in_font function to check if the unicode character is included in the font.