2
votes

I've been playing around with Gettext MO file hash tables in PHP. Although I'm unsure how important this optional table is, I'd like to ensure I'm implementing the full spec if possible when generating MO files.

I compiled a simple PO file with entries "","a","b","c" using Gettext msgfmt on my Mac and also on Linux. The hash table is 5 bytes long, but oddly contains largely null bytes, as follows: 01 00 00 00 00

Running the algorithm pulled from Gettext source code I produce the table 01 00 02 03 04 instead.

Here is my test code:
https://gist.github.com/timwhitlock/8255619 (including example PO file)
I don't write C, but muddled my way through the GNU Gettext source code to port the functions shown.

My own hash table compilation may well be wrong, but to understand why I'd like to first understand why the msgfmt-generated MO file has a hash table is mostly zeros?

I'm pretty sure I'm pulling the hash table from the MO file correctly. I get the size and position of the table from the sixth byte as outlined in the spec.

In my 'abc' example, no double hashing is used, so I don't understand how that table is correct, regardless of whether my table is correct.

What is the correct hash table for this 'abc' example?

1

1 Answers

1
votes

I've solved this.

I was stupidly using a single byte to hold each integer in the hash table. This was due to seeing things like hash_tab[idx] in C, which my PHP brain translated to $hash_tab{$idx} which of course is wrong. It would be substr($hash_tab,$idx,$idx+4)

I was also failing to see that the hash table "size" was the number of strings and not the byte length.

My sample code works now. My generated table matches that pulled from the MO file.