9
votes

So I'm writing a program to test the endianess of a machine and print it. I understand the difference between little and big endian, however, from what I've found online, I don't understand why these tests show the endianess of a machine.

This is what I've found online. What does *(char *)&x mean and how does it equaling one prove that a machine is Little-Endian?

int x = 1;
if (*(char *)&x == 1) {
    printf("Little-Endian\n");
} else {
    printf("Big-Endian\n");
}
6
If you "already understand" the difference between big and little endian, why then is a basic test of endianess not understandable?wallyk
I know what it does and am not entirely sure why checking if a dereferenced version of the variable, cast as a pointer to a char will tell you that. Unless the endianness of a char is standard, in which case it'll tell you if it changes during that bitwise-cast.zebediah49

6 Answers

14
votes

If we split into different parts:

  1. &x: This gets the address of the location where the variable x is, i.e. &x is a pointer to x. The type is int *.

  2. (char *)&x: This takes the address of x (which is a int *) and converts it to a char *.

  3. *(char *)&x: This dereferences the char * pointed to by &x, i.e. gets the values stored in x.

Now if we go back to x and how the data is stored. On most machines, x is four bytes. Storing 1 in x sets the least significant bit to 1 and the rest to 0. On a little-endian machine this is stored in memory as 0x01 0x00 0x00 0x00, while on a big-endian machine it's stored as 0x00 0x00 0x00 0x01.

What the expression does is get the first of those bytes and check if it's 1 or not.

4
votes

Here's what the memory will look like, assuming a 32b integer:

Little-endian
0x01000000 = 00000001000...00

Big-endian
0x00000001 = 0......01

Dereferencing a char * gives you one byte. Your test fetches the first byte at that memory location by interpreting the address as a char * and then dereferencing it.

1
votes

Breaking down *(char *)&x:

&x is the address of integer x

(char *) causes address of integer x to be treated as an address of a character (aka byte)

* references the value of the byte

1
votes
int x;

x is a variable which can hold a 32-bit value.

int x = 1;

A given hardware can store the value 1 as a 32-bit value in one of the following format.

Little Endian
0x100    0x101    0x102    0x103
00000001 00000000 00000000 00000000 

(or) 

Big Endian
0x100    0x101    0x102    0x103
00000000 00000000 00000000 00000001

Now lets try to break the expression:

&x 

Get the address of variable x. Say the address of x is 0x100.

(char *)&x 

&x is an address of an integer variable. (char *)&x converts the address 0x100 from (int *) to (char *).

*(char *)&x 

de-references the value stored in the (char *) which is nothing but the first byte (from left to right) in the 4-byte (32-bit integer x).

(*(char *)&x == 1)

If the first byte from left to right stores the value 00000001, then it is little endian. If the 4th byte from left to right stores value 00000001, then it is big endian.

0
votes

Yes, that answers the question. Here's a more general answer:

#include <iostream>
#include <cstdlib>  
#include <cmath>

using namespace std;

int main()
{
cout<<"sizeof(char) = "<<sizeof(char)<<endl;
cout<<"sizeof(unsigned int) = "<<sizeof(unsigned int)<<endl;
//NOTE: Windows, Mac OS, and Linux and Tru64 Unix are Little Endian architectures
//Little Endian means the memory value increases as the digit significance increases
//Proof for Windows: 

unsigned int x = 0x01020408; //each hexadecimal digit is 4 bits, meaning there are 2
                             //digits for every byte
char *c = (char *)&x;
unsigned int y = *c*pow(16,0) +pow(16,2) * *(c+1)+pow(16,4) * *(c+2)+pow(16,6) * *(c+3);
//Here's the test: construct the sum y such that we select subsequent bytes of 0x01020408
//in increasing order and then we multiply each by its corresponding significance in
//increasing order.  The convention for hexadecimal number definitions is that  
//the least significant digit is at the right of the number.  
//Finally, if (y==x),then...     
if (y==x) cout<<"Little Endian"<<endl;
else cout<<"Big Endian"<<endl;

cout<<(int) *c<<endl;
cout<<(int) *(c+1)<<endl;
cout<<(int) *(c+2)<<endl;
cout<<(int) *(c+3)<<endl;
cout<<"x is "<<x<<endl;
cout<<(int)*c<<"*1 + "<<(int)*(c+1)<<"*16^2 + "<<(int)*(c+2)<<"*16^4 + "<<(int)*(c+3)<<" *16^6 = "<<y<<endl;
system("PAUSE"); //Only include this on a counsel program
return 0;
}

This displays 8 4 2 1 for the dereferenced values at c, c+1, c+2, and c+3 respectively. The sum y is 16909320, which is equal to x. Even though the significance of the digits grow from right-to-left, this is still Little Endian because the corresponding memory values also grow from right-to-left, which is why the left-shift binary operator << would increase a variable's value until non-zero digits are shifted off the variable altogether. Don't confuse this operator with std::cout's << operator. If this were Big Endian, then the display for c, c+1, c+2, and c+3 respectively would look like: 1 2 4 8

0
votes

If a big-endian 4-byte unsigned integer looks like 0xAABBCCDD which is equal to 2864434397, then that same 4-byte unsigned integer looks like 0xDDCCBBAA on a little-endian processor which is also equal to 2864434397.

If a big-endian 2-byte unsigned short looks like 0xAABB which is equal to 43707, then that same 2-byte unsigned short looks like 0xBBAA on a little-endian processor which is also equal to 43707.

Here are a couple of handy #define functions to swap bytes from little-endian to big-endian and vice-versa -->

// can be used for short, unsigned short, word, unsigned word (2-byte types)
#define BYTESWAP16(n) (((n&0xFF00)>>8)|((n&0x00FF)<<8))

// can be used for int or unsigned int or float (4-byte types)
#define BYTESWAP32(n) ((BYTESWAP16((n&0xFFFF0000)>>16))|((BYTESWAP16(n&0x0000FFFF))<<16))

// can be used for unsigned long long or double (8-byte types)
#define BYTESWAP64(n) ((BYTESWAP32((n&0xFFFFFFFF00000000)>>32))|((BYTESWAP32(n&0x00000000FFFFFFFF))<<32))