0
votes

I have a type of strings-that-know-their-sizes called pstring (a wink to Pascal strings). Instead of a struct {size, p_bytes}, the size is just prefixed in memory to a byte array. (The reason is to avoid, both conceptually and practically, double referencing; and also double heap allocation of both the struct and the byte array.) I could do that with a tool func just allocating sizeof(size_t) bytes more, return a pointer to the actual address of the byte array and provide a macro to read the size a few bytes before.

Instead I wanted to do that at a slightly higher level with a "flexible struct" using a flexible array member:

typedef struct {
    size_t   size;
    uint     hash;
    char     bytes[];
} PString;
// constructor:
PString * pstring (char * bytes, size_t size) {...}

(Those pstrings also have a hash because they are interned in a pool.) But now, I have a kind of metaphysical issue to choose the right access to pstrings. The issue comes with the fact that pstrings (just like ordinary strings) are only manipulated via pointers. But how should I provide the interface, pointed or not?

  • A string is most commonly implicitely pointed, with a type such as const char * string;, so it seems natural to also "point" the type pstring, and everyone knows it is a pointer type.
  • However, when using structs, common practice is (I guess) not to implicitely point variables (by having the type pointed), instead explicitely declare vars with a * (and sometimes prefix their names with eg 'p_').

So, what should I do here, so that users of pstrings are clear with the semantics and feel easy with the practice?

In any case, a var is pointed and field access go through ->. As of now, as you see, I have chosen the second option: people declare pstring as PString * ps;. But I'm ready to change according to the following arguments:

  • A pstring var simply cannot be unpointed (even less than an ordinary string, since by struct copy we lose the bytes --> hello segfault).
  • The 'p' in "pstring" can deal as reminder for "pointed".

What do you think?

3

3 Answers

1
votes

In my opinion, it is better for you to go with the second approach.

In case of implicitly pointer "C" character strings, they behave like byte arrays in which you can directly start accessing the string contents from byte 1. But, in this case, it is a constructed (user defined) type and to make that distinction, I would prefer it to be a explicit pointer (i.e. the second approach).

Btw, would you be providing functions for performing operations like comparison, character search etc on these strings? I suggest you can follow the C++ String kind of a model and provide a c_str function which further provides pointer to "bytes".

1
votes

I'm not a fan of flexible array members, but since you've chosen that approach, I'd go with keeping its pointer nature visible, just on the grounds of it being more honest/clear to the person who uses it.

0
votes

I would advise sticking with your current option (making the pointer explicit). The user of your library has to know that it's a pointer anyway (eg. using -> on it, knowing that using assignment creates only a shallow copy), so it only confuses matters trying to hide it.

The standard library also provides an example of this approach - the FILE type must always be used as FILE *.