3
votes

I'm trying to write hashset in C, and I've found one hash function, that hashes according to bits in data. I have the following structure:

struct triple
{
    int a;
    int b;
    int c;
};

The question is - how to get bit representation from object of type struct triple? Let's say I want to XOR its bits with 8-bit integer. How would I do that?

1
Whilst it may seem desirable to have a generic hashing algorithm that can be used for just about everything (in this case, a triple as you call it), there's something quite wrong with this idea. Consider that some implementations might employ the idea of padding bits in the form of parity checks. If a value byte is followed by a parity byte formed simply by flipping all of the bits, then what use will this hashing algorithm serve? To clarify: In this scenario, all values will yield the same hash, which comprises solely of 1 bits.autistic
@undefinedbehaviour I've never seen a C compiler produce code to maintain checksums. But nearly every one uses pad bytes for memory alignment. These bytes don't have deterministic values, so - as you said - a hash that includes them is completely unreliable.Gene
@Gene Ahh, an even better example!autistic
I've yet to see a compiler that will introduce any padding into a structure with multiple elements of a single type, as in this question. Technically, a compiler is allowed to add padding; in practice, no compiler will. However, in structures with diverse element types, padding will be added where necessary — and then the indeterminacy of the padding is a problem.Jonathan Leffler

1 Answers

5
votes

Iterate over all the bytes of the struct and XOR each one individually, e.g.,

void bytexor(unsigned char xor_byte, void *data, size_t size) {
    unsigned char *p = data;
    while (size--) {
        *p++ ^= xor_byte;
    }
}

Usage would be:

struct triple my_struct;
// ...
bytexor(0xFF, &my_struct, sizeof my_struct);

(Note: This answers the question of how to XOR the struct with a byte. As for implementing a general hash function based on this, it may not be a particularly good idea since a struct may have padding, i.e., extra bytes with potentially non-deterministic values unrelated to the values of the actual payload fields.)