Function for unaligned memory access on ARM

Question

I am working on a project where data is read from memory. Some of this data are integers, and there was a problem accessing them at unaligned addresses. My idea would be to use memcpy for that, i.e.

uint32_t readU32(const void* ptr)
{
    uint32_t n;
    memcpy(&n, ptr, sizeof(n));
    return n;
}

The solution from the project source I found is similar to this code:

uint32_t readU32(const uint32_t* ptr)
{
    union {
        uint32_t n;
        char data[4];
    } tmp;
    const char* cp=(const char*)ptr;
    tmp.data[0] = *cp++;
    tmp.data[1] = *cp++;
    tmp.data[2] = *cp++;
    tmp.data[3] = *cp;
    return tmp.n;
}

So my questions:

Isn't the second version undefined behaviour? The C standard says in 6.2.3.2 Pointers, at 7:

A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned 57) for the pointed-to type, the behavior is undefined.

As the calling code has, at some point, used a char* to handle the memory, there must be some conversion from char* to uint32_t*. Isn't the result of that undefined behaviour, then, if the uint32_t* is not corrently aligned? And if it is, there is no point for the function as you could write *(uint32_t*) to fetch the memory. Additionally, I think I read somewhere that the compiler may expect an int* to be aligned correctly and any unaligned int* would mean undefined behaviour as well, so the generated code for this function might make some shortcuts because it may expect the function argument to be aligned properly.

The original code has volatile on the argument and all variables because the memory contents could change (it's a data buffer (no registers) inside a driver). Maybe that's why it does not use memcpy since it won't work on volatile data. But, in which world would that make sense? If the underlying data can change at any time, all bets are off. The data could even change between those byte copy operations. So you would have to have some kind of mutex to synchronize access to this data. But if you have such a synchronization, why would you need volatile?
Is there a canonical/accepted/better solution to this memory access problem? After some searching I come to the conclusion that you need a mutex and do not need volatile and can use memcpy.

P.S.:

# cat /proc/cpuinfo
processor       : 0
model name      : ARMv7 Processor rev 10 (v7l)
BogoMIPS        : 1581.05
Features        : swp half thumb fastmult vfp edsp neon vfpv3 tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x2
CPU part        : 0xc09
CPU revision    : 10

The union insures sufficient alignment for all types contained within it. memcpy is the usual method these days however it will not work with volatile accesses and sometimes you have to feed the optimizer the version it is most comfortable with. As for volatile it can in itself serve as a synchronization primitive, such as reading back the volatile results after writing a volatile I/O register, but other mechanisms such as compiler barriers are often used. In addition the volatile qualifier forces byte-wise and consecutive access, which may be important for I/O. — doynax
For the second version, it's the source pointer that isn't correctly aligned, rather than the resulting pointer, but the act of calling the function is UB. — Ian Abbott
Your own (first) version looks good, especially if the compiler does the memcpy inline. — Ian Abbott
In particular what you need to keep in mind here is that I/O registers are not normal memory buffers and tend to have side-effects and limitations for which you need to carefully force particular desired access patterns. For instance reading a status register may acknowledge an event, only 32-bit accesses may be supported, and careless read-modify-write cycles to flip a bit may induces races against hardware as well as software. So take care and read the datasheet carefully. — doynax
Another thing to consider for portability is byte-ordering. For example, if the 32-bit unsigned integer value is embedded in a message that has a prescribed byte order for multi-byte integers, then you should extract the value in an endian-portable fashion. — Ian Abbott

Andrew Henle Andrew Henle · Accepted Answer · 2017-08-04T12:12:03

This code

uint32_t readU32(const uint32_t* ptr)
{
    union {
        uint32_t n;
        char data[4];
    } tmp;
    const char* cp=(const char*)ptr;
    tmp.data[0] = *cp++;
    tmp.data[1] = *cp++;
    tmp.data[2] = *cp++;
    tmp.data[3] = *cp;
    return tmp.n;
}

passes the pointer as a uint32_t *. If it's not actually a uint32_t, that's UB. The argument should probably be a const void *.

The use of a const char * in the conversion itself is not undefined behavior. Per 6.3.2.3 Pointers, paragraph 7 of the C Standard (emphasis mine):

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.
Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

The use of volatile with respect to the correct way to access memory/registers directly on your particular hardware would have no canonical/accepted/best solution. Any solution for that would be specific to your system and beyond the scope of standard C.

Function for unaligned memory access on ARM

2 Answers