Detecting endianness programmatically in a C++ program

217

votes

Is there a programmatic way to detect whether or not you are on a big-endian or little-endian architecture? I need to be able to write code that will execute on an Intel or PPC system and use exactly the same code (i.e. no conditional compilation).

c++algorithmendianness

For the sake of completeness, here is a link to someone else's question about trying to gauge endianness (at compile time): stackoverflow.com/questions/280162/… - Faisal Vali

Why not determine endianness at compile-time? It can't possibly change at runtime. - ephemient

AFAIK, there's no reliable and universal way to do that. gcc.gnu.org/ml/gcc-help/2007-07/msg00342.html - user48956

176

votes

I don't like the method based on type punning - it will often be warned against by compiler. That's exactly what unions are for !

bool is_big_endian(void)
{
    union {
        uint32_t i;
        char c[4];
    } bint = {0x01020304};

    return bint.c[0] == 1; 
}

The principle is equivalent to the type case as suggested by others, but this is clearer - and according to C99, is guaranteed to be correct. gcc prefers this compared to the direct pointer cast.

This is also much better than fixing the endianness at compile time - for OS which support multi-architecture (fat binary on Mac os x for example), this will work for both ppc/i386, whereas it is very easy to mess things up otherwise.

84

votes

You can do it by setting an int and masking off bits, but probably the easiest way is just to use the built in network byte conversion ops (since network byte order is always big endian).

if ( htonl(47) == 47 ) {
  // Big endian
} else {
  // Little endian.
}

Bit fiddling could be faster, but this way is simple, straightforward and pretty impossible to mess up.

77

votes

You can use std::endian if you have access to C++20 compiler such as GCC 8+ or Clang 7+.

Note: std::endian began in <type_traits> but was moved to <bit> at 2019 Cologne meeting. GCC 8, Clang 7, 8 and 9 have it in <type_traits> while GCC 9+ and Clang 10+ have it in <bit>.

#include <bit>

if constexpr (std::endian::native == std::endian::big)
{
    // Big endian system
}
else if constexpr (std::endian::native == std::endian::little)
{
    // Little endian system
}
else
{
    // Something else
}

66

votes

Please see this article:

Here is some code to determine what is the type of your machine
int num = 1;
if(*(char *)&num == 1)
{
    printf("\nLittle-Endian\n");
}
else
{
    printf("Big-Endian\n");
}

41

votes

This is normally done at compile time (specially for performance reason) by using the header files available from the compiler or create your own. On linux you have the header file "/usr/include/endian.h"

17

votes

I surprised no-one has mentioned the macros which the pre-processor defines by default. While these will vary depending on your platform; they are much cleaner than having to write your own endian-check.

For example; if we look at the built-in macros which GCC defines (on an X86-64 machine):

:| gcc -dM -E -x c - |grep -i endian
#define __LITTLE_ENDIAN__ 1

On a PPC machine I get:

:| gcc -dM -E -x c - |grep -i endian
#define __BIG_ENDIAN__ 1
#define _BIG_ENDIAN 1

(The :| gcc -dM -E -x c - magic prints out all built-in macros).

15

votes

Declare an int variable:

int variable = 0xFF;

Now use char* pointers to various parts of it and check what is in those parts.

char* startPart = reinterpret_cast<char*>( &variable );
char* endPart = reinterpret_cast<char*>( &variable ) + sizeof( int ) - 1;

Depending on which one points to 0xFF byte now you can detect endianness. This requires sizeof( int ) > sizeof( char ), but it's definitely true for the discussed platforms.

15

votes

Ehm... It surprises me that noone has realized that the compiler will simply optimize the test out, and will put a fixed result as return value. This renders all code examples above, effectively useless. The only thing that would be returned is the endianness at compile-time! And yes, I tested all of the above examples. Here's an example with MSVC 9.0 (Visual Studio 2008).

Pure C code

int32 DNA_GetEndianness(void)
{
    union 
    {
        uint8  c[4];
        uint32 i;
    } u;

    u.i = 0x01020304;

    if (0x04 == u.c[0])
        return DNA_ENDIAN_LITTLE;
    else if (0x01 == u.c[0])
        return DNA_ENDIAN_BIG;
    else
        return DNA_ENDIAN_UNKNOWN;
}

Disassembly

PUBLIC  _DNA_GetEndianness
; Function compile flags: /Ogtpy
; File c:\development\dna\source\libraries\dna\endian.c
;   COMDAT _DNA_GetEndianness
_TEXT   SEGMENT
_DNA_GetEndianness PROC                 ; COMDAT

; 11   :     union 
; 12   :     {
; 13   :         uint8  c[4];
; 14   :         uint32 i;
; 15   :     } u;
; 16   : 
; 17   :     u.i = 1;
; 18   : 
; 19   :     if (1 == u.c[0])
; 20   :         return DNA_ENDIAN_LITTLE;

    mov eax, 1

; 21   :     else if (1 == u.c[3])
; 22   :         return DNA_ENDIAN_BIG;
; 23   :     else
; 24   :        return DNA_ENDIAN_UNKNOWN;
; 25   : }

    ret
_DNA_GetEndianness ENDP
END

Perhaps it is possible to turn off ANY compile-time optimization for just this function, but I don't know. Otherwise it's maybe possible to hardcode it in assembly, although that's not portable. And even then even that might get optimized out. It makes me think I need some really crappy assembler, implement the same code for all existing CPUs/instruction sets, and well.... never mind.

Also, someone here said that endianness does not change during run-time. WRONG. There are bi-endian machines out there. Their endianness can vary durng execution. ALSO, there's not only Little Endian and Big Endian, but also other endiannesses (what a word).

I hate and love coding at the same time...

8

votes

For further details, you may want to check out this codeproject article Basic concepts on Endianness:

How to dynamically test for the Endian type at run time?

As explained in Computer Animation FAQ, you can use the following function to see if your code is running on a Little- or Big-Endian system: Collapse
#define BIG_ENDIAN      0
#define LITTLE_ENDIAN   1

int TestByteOrder()
{
   short int word = 0x0001;
   char *byte = (char *) &word;
   return(byte[0] ? LITTLE_ENDIAN : BIG_ENDIAN);
}

This code assigns the value 0001h to a 16-bit integer. A char pointer is then assigned to point at the first (least-significant) byte of the integer value. If the first byte of the integer is 0x01h, then the system is Little-Endian (the 0x01h is in the lowest, or least-significant, address). If it is 0x00h then the system is Big-Endian.

7

votes

Do not use a union!

C++ does not permit type punning via unions!
Reading from a union field that was not the last field written to is undefined behaviour!
Many compilers support doing so as an extensions, but the language makes no guarantee.

See this answer for more details:

https://stackoverflow.com/a/11996970

There are only two valid answers that are guaranteed to be portable.

The first answer, if you have access to a system that supports C++20,
is to use std::endian from the <type_traits> header.

(At the time of writing, C++20 has not yet been released, but unless something happens to affect std::endian's inclusion, this shall be the preferred way to test the endianness at compile time from C++20 onwards.)

C++20 Onwards

constexpr bool is_little_endian = (std::endian::native == std::endian::little);

Prior to C++20, the only valid answer is to store an integer and then inspect its first byte through type punning.
Unlike the use of unions, this is expressly allowed by C++'s type system.

It's also important to remember that for optimum portability static_cast should be used,
because reinterpret_cast is implementation defined.

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined: ... a char or unsigned char type.

C++11 Onwards

enum class endianness
{
    little = 0,
    big = 1,
};

inline endianness get_system_endianness()
{
    const int value { 0x01 };
    const void * address = static_cast<const void *>(&value);
    const unsigned char * least_significant_address = static_cast<const unsigned char *>(address);
    return (*least_significant_address == 0x01) ? endianness::little : endianness::big;
}

C++11 Onwards (without enum)

inline bool is_system_little_endian()
{
    const int value { 0x01 };
    const void * address = static_cast<const void *>(&value);
    const unsigned char * least_significant_address = static_cast<const unsigned char *>(address);
    return (*least_significant_address == 0x01);
}

C++98/C++03

inline bool is_system_little_endian()
{
    const int value = 0x01;
    const void * address = static_cast<const void *>(&value);
    const unsigned char * least_significant_address = static_cast<const unsigned char *>(address);
    return (*least_significant_address == 0x01);
}

6

votes

Unless you're using a framework that has been ported to PPC and Intel processors, you will have to do conditional compiles, since PPC and Intel platforms have completely different hardware architectures, pipelines, busses, etc. This renders the assembly code completely different between the two.

As for finding endianness, do the following:

short temp = 0x1234;
char* tempChar = (char*)&temp;

You will either get tempChar to be 0x12 or 0x34, from which you will know the endianness.

6

votes

As stated above, use union tricks.

There are few problems with the ones advised above though, most notably that unaligned memory access is notoriously slow for most architectures, and some compilers won't even recognize such constant predicates at all, unless word aligned.

Because mere endian test is boring, here goes (template) function which will flip the input/output of arbitrary integer according to your spec, regardless of host architecture.

#include <stdint.h>

#define BIG_ENDIAN 1
#define LITTLE_ENDIAN 0

template <typename T>
T endian(T w, uint32_t endian)
{
    // this gets optimized out into if (endian == host_endian) return w;
    union { uint64_t quad; uint32_t islittle; } t;
    t.quad = 1;
    if (t.islittle ^ endian) return w;
    T r = 0;

    // decent compilers will unroll this (gcc)
    // or even convert straight into single bswap (clang)
    for (int i = 0; i < sizeof(r); i++) {
        r <<= 8;
        r |= w & 0xff;
        w >>= 8;
    }
    return r;
};

Usage:

To convert from given endian to host, use:

host = endian(source, endian_of_source)

To convert from host endian to given endian, use:

output = endian(hostsource, endian_you_want_to_output)

The resulting code is as fast as writing hand assembly on clang, on gcc it's tad slower (unrolled &,<<,>>,| for every byte) but still decent.

6

votes

The C++ way has been to use boost, where preprocessor checks and casts are compartmentalized away inside very thoroughly-tested libraries.

The Predef Library (boost/predef.h) recognizes four different kinds of endianness.

The Endian Library was planned to be submitted to the C++ standard, and supports a wide variety of operations on endian-sensitive data.

As stated in answers above, Endianness will be a part of c++20.

5

votes

bool isBigEndian()
{
    static const uint16_t m_endianCheck(0x00ff);
    return ( *((uint8_t*)&m_endianCheck) == 0x0); 
}

4

votes

I would do something like this:

bool isBigEndian() {
    static unsigned long x(1);
    static bool result(reinterpret_cast<unsigned char*>(&x)[0] == 0);
    return result;
}

Along these lines, you would get a time efficient function that only does the calculation once.

4

votes

untested, but in my mind, this should work? cause it'll be 0x01 on little endian, and 0x00 on big endian?

bool runtimeIsLittleEndian(void)
{
 volatile uint16_t i=1;
 return  ((uint8_t*)&i)[0]==0x01;//0x01=little, 0x00=big
}

3

votes

union {
    int i;
    char c[sizeof(int)];
} x;
x.i = 1;
if(x.c[0] == 1)
    printf("little-endian\n");
else    printf("big-endian\n");

This is another solution. Similar to Andrew Hare's solution.

3

votes

Declare:

My initial post is incorrectly declared as "compile time". It's not, it's even impossible in current C++ standard. The constexpr does NOT means the function always do compile-time computation. Thanks Richard Hodges for correction.

compile time, non-macro, C++11 constexpr solution:

union {
  uint16_t s;
  unsigned char c[2];
} constexpr static  d {1};

constexpr bool is_little_endian() {
  return d.c[0] == 1;
}

2

votes

You can also do this via the preprocessor using something like boost header file which can be found boost endian

2

votes

Unless the endian header is GCC-only, it provides macros you can use.

#include "endian.h"
...
if (__BYTE_ORDER == __LITTLE_ENDIAN) { ... }
else if (__BYTE_ORDER == __BIG_ENDIAN) { ... }
else { throw std::runtime_error("Sorry, this version does not support PDP Endian!");
...

1

votes

int i=1;
char *c=(char*)&i;
bool littleendian=c;

1

votes

The way C compilers (at least everyone I know of) work the endianness has to be decided at compile time. Even for biendian processors (like ARM och MIPS) you have to choose endianness at compile time. Further more the endianness is defined in all common file formats for executables (such as ELF). Although it is possible to craft a binary blob of biandian code (for some ARM server exploit maybe?) it probably has to be done in assembly.

1

votes

If you don't want conditional compilation you can just write endian independent code. Here is an example (taken from Rob Pike):

Reading an integer stored in little-endian on disk, in an endian independent manner:

i = (data[0]<<0) | (data[1]<<8) | (data[2]<<16) | (data[3]<<24);

The same code, trying to take into account the machine endianness:

i = *((int*)data);
#ifdef BIG_ENDIAN
/* swap the bytes */
i = ((i&0xFF)<<24) | (((i>>8)&0xFF)<<16) | (((i>>16)&0xFF)<<8) | (((i>>24)&0xFF)<<0);
#endif

0

votes

How about this?

#include <cstdio>

int main()
{
    unsigned int n = 1;
    char *p = 0;

    p = (char*)&n;
    if (*p == 1)
        std::printf("Little Endian\n");
    else 
        if (*(p + sizeof(int) - 1) == 1)
            std::printf("Big Endian\n");
        else
            std::printf("What the crap?\n");
    return 0;
}

0

votes

Here's another C version. It defines a macro called wicked_cast() for inline type punning via C99 union literals and the non-standard __typeof__ operator.

#include <limits.h>

#if UCHAR_MAX == UINT_MAX
#error endianness irrelevant as sizeof(int) == 1
#endif

#define wicked_cast(TYPE, VALUE) \
    (((union { __typeof__(VALUE) src; TYPE dest; }){ .src = VALUE }).dest)

_Bool is_little_endian(void)
{
    return wicked_cast(unsigned char, 1u);
}

If integers are single-byte values, endianness makes no sense and a compile-time error will be generated.

0

votes

while there is no quick and standard way to determine it, this will output it:

#include <stdio.h> 
int main()  
{ 
   unsigned int i = 1; 
   char *c = (char*)&i; 
   if (*c)     
       printf("Little endian"); 
   else
       printf("Big endian"); 
   getchar(); 
   return 0; 
}

0

votes

See Endianness - C-Level Code illustration.

// assuming target architecture is 32-bit = 4-Bytes
enum ENDIANNESS{ LITTLEENDIAN , BIGENDIAN , UNHANDLE };


ENDIANNESS CheckArchEndianalityV1( void )
{
    int Endian = 0x00000001; // assuming target architecture is 32-bit    

    // as Endian = 0x00000001 so MSB (Most Significant Byte) = 0x00 and LSB (Least     Significant Byte) = 0x01
    // casting down to a single byte value LSB discarding higher bytes    

    return (*(char *) &Endian == 0x01) ? LITTLEENDIAN : BIGENDIAN;
}

-1

votes

As pointed out by Coriiander, most (if not all) of those codes here will be optimized away at compilation time, so the generated binaries won't check "endianness" at run time.

It has been observed that a given executable shouldn't run in two different byte orders, but I have no idea if that is always the case, and it seems like a hack to me checking at compilation time. So I coded this function:

#include <stdint.h>

int* _BE = 0;

int is_big_endian() {
    if (_BE == 0) {
        uint16_t* teste = (uint16_t*)malloc(4);
        *teste = (*teste & 0x01FE) | 0x0100;
        uint8_t teste2 = ((uint8_t*) teste)[0];
        free(teste);
        _BE = (int*)malloc(sizeof(int));
        *_BE = (0x01 == teste2);
    }
    return *_BE;
}

MinGW wasn't able to optimize this code, even though it does optimize the other codes here away. I believe that is because I leave the "random" value that was alocated on the smaller byte memory as it was (at least 7 of its bits), so the compiler can't know what that random value is and it doesn't optimize the function away.

I've also coded the function so that the check is only performed once, and the return value is stored for next tests.

-2

votes

I was going through the textbook:Computer System: a programmer's perspective, and there is a problem to determine which endian is this by C program.

I used the feature of the pointer to do that as following:

#include <stdio.h>

int main(void){
    int i=1;
    unsigned char* ii = &i;

    printf("This computer is %s endian.\n", ((ii[0]==1) ? "little" : "big"));
    return 0;
}

As the int takes up 4 bytes, and char takes up only 1 bytes. We could use a char pointer to point to the int with value 1. Thus if the computer is little endian, the char that char pointer points to is with value 1, otherwise, its value should be 0.

Detecting endianness programmatically in a C++ program

29 Answers

C++20 Onwards

C++11 Onwards

C++11 Onwards (without enum)

C++98/C++03

My initial post is incorrectly declared as "compile time". It's not, it's even impossible in current C++ standard. The constexpr does NOT means the function always do compile-time computation. Thanks Richard Hodges for correction.