3
votes

I'm trying to send a message over TCP from an C (Obj-C actually, but object orientation is out of the equation here) client to a python server. Now I'm sending an unsigned short first with the message size and then the message, which is a C struct. I wanted to append a dynamic string at the end of the package, so I decided to use the struct size to split the package in two, but then the problems began.

The thing is, either I'm doing something wrong, or there's a bug on Python's struct library size + padding calculation.

Python struct seems to resolve paddings correctly. For example, for this struct:

struct.Struct("H I").size == 8

which matches with sizeof return value for this struct:

#include <stdio.h>

typedef struct {
        unsigned short a;
        unsigned int b;
} test;

int main() {
        printf("%ld\n", sizeof(test));
        return 0;
}

$ gcc test.c 
$ ./a.out
8

But I'm not getting always the same result for some struct definitions. For example, in this case:

struct.Struct("H 5s").size == 7

typedef struct {
    unsigned short a;
    char b[5];
} test;
sizeof(test) == 8

I've read somewhere that the compiler may fill an struct to ensure proper memory access when using structs in an array. I'm not sure if this is the case (seems to be), but if it is, I can't understand why this struct doesn't get padded to 8 bytes (assuming a 4 bytes packing):

struct.Struct("H 4s").size == 6

typedef struct {
    unsigned short a;
    char b[4];
} test;
sizeof(test) == 6

So, to clarify, my question is how may I get the precise size of a given struct in Python, since it is not applying the final padding.


What I've tried:

Add final padding size manually:

real_struct_size = self._struct.size + self._struct.size % 4

This, of course, didn't work because a single member struct does not adds padding, and as you can see in the last case it doesn't work for small structs neither (unsigned short + char[4]). (Maybe I'm over-simplifying the problem here. Maybe it's not about small structs but related to another factor that I can't identify.)

Then I opened the Python's struct library to see how may I find out how many parameters are expected so I can ask if it is 1, then avoid the final padding, but there's not way to access the s_len property of PyStructObject (see Python-2.7.5/Modules/_struct.c:48) which is where the qty of packed parameters are stored.

So, as a workaround I put an offset value at the beginning of the data packet to know where the extra/dynamic string starts.

But I think there's a bug here (mine or from Python's struct library). Either way, if it's me I really need to know what I'm doing wrong, or if it's Python's library I want to report the issue. If someone could help me to get to the bottom of this, I'll be very thankful.

So, thanks in advance! Sorry for the long post :)

2
Have you tried using the struct.calcsize() function?martineau
@martineau It gives the same results as Struct.size(on my machine). In fact the docstring for Struct.size seems a copy of the docstring for calcsize. I'm not an expert of C but I believe that the standard doesn't give any guarantee on the size of structs, so python cannot compute it reliably in any case. Different compilers will add different paddings(I believe mostly depending on the architecture).Bakuriu
The documentation also says "To align the end of a structure to the alignment requirement of a particular type, end the format with the code for that type with a repeat count of zero", so maybe you could do something like that. Also make sure you prefix the format string with one of the two native types (@ or =) otherwise no padding will ever be added.martineau
mmm, this is not automatic but is a better workaround, I'll give it a try, thanks! Regarding to prefix, for instance "If the first character is not one of these, '@' is assumed." I'll try and update here :)Gonzalo Larralde
@martineau sadly adding 0s doesn't affect the size calculation. Thanks anyway!Gonzalo Larralde

2 Answers

1
votes

Short answer: You can't. The struct module only relates to C-types by reusing some symbols for basic types for the programmer's convenience. All you padding-related fixes will break the second you move your code to a different platform, code compiled by another compiler or whatever.

The only way to get the size of a struct (a c-struct) is to refer to it from C and compile that code using a compiler. You may use a one-liner like

return PyInt_FromLong(sizeof(mystruct));

Long answer: Implement some wrapper code that #includes the appropriate types, writes them to memory and passes them around (as opaque objects). You may implement the bufferview protocol so you can pass it directly to socket.send()

1
votes

To align the end of a structure to the alignment requirement we just need to find the largest integral type. Something like this:

def c_sizeof(s):
    # Types sorted in size order
    size_map = "cbB?hHiIlLqQfd"
    # Filter out chars in s that not in size_map.
    # The default align char ("c") in case filtered list is empty.
    chars = filter(lambda x: x in size_map, s) + "c"
    # Largest index and its char in size_map gives the align char
    align_char = size_map[max([size_map.index(x) for x in chars])]
    # Using native prefix to calculate alignment between fields
    return struct.calcsize("@{0}0{1}".format(s, align_char))

And running some tests

print c_sizeof("cci"), c_sizeof("cic"), c_sizeof("H5s")

produces

8 12 8