Converting array of characters to an array of uint32_t in c-- is this the proper way?

Question

I am trying to convert an array of characters into an array of uint32_t in order to use that in a CRC calculation. I was curious if this is the correct way to do this or if it is dangerous? I have a habit of doing dangerous conversions and I am trying to learn better ways to convert things that are less dangerous :). I know that each char in the array is 8 bits. Should I sum 4 of the characters up and toss it into an index of the unsigned int array or is it ok just to place each character in its separate array? Would summing four 8 bit characters up change their values into the array? I have read something about shifting characters, however, I am not sure exactly how to shift the four characters into one index of the unsigned int array.

text[i] is my array of characters.

uint32_t inputText[512];
for( i = 0; i < 504; i++)
{
  inputText[i] = (uint32_t)text[i];
}

Spencer D Spencer D · Accepted Answer · 2017-04-15T02:25:47

The cast seems fine; although, I'm not sure why you say i < 504 when your array of uint32_ts is 512. (If you do want to only convert 504 values and you want a 512-length array, you might want to use array[512] = {0} to ensure the memory is zeroed out instead of the last 8 values being set to whatever was previously in the memory.) Nonetheless, it is perfectly safe to say: SomeArrayOfLargerType[i] = (largerType_t)SomeArrayOfSmallerType[i], but bear in mind that how it is now, your binary will end up looking something like:

 0100 0001 -> 0000 0000 0000 0000 0000 0000 0100 0001

So, those 24 leading 0s might be an undesired result.

As for summing up four characters, that will almost definitely not work out how you want; unless you literally want the sum like 0000 0001 (one) + 0000 0010 (two) = 0000 0100 (three). If you would instead want the previous example to produce 00000001 000000010, then yes, you would need to apply shifts.

UPDATE - Some information about shifting via example:

The following would be an example of shifting:

uint32_t valueArray[FINAL_LENGTH] = {0};
int i;
for(i=0; i < TEXT_LENGTH; i++){ // text_length is the initial message/text length (512 bytes or something)
    int mode = i % 4; // 4-to-1 value storage ratio (4 uint8s being stored as 1 uint32)
    int writeLocation = (int)(i/4); // values will be truncated, so something like 3/4 = 0 (which is desired)
    switch(mode){
        case(0):
            // add to bottom 8-bits of index
            valueArray[writeLocation] = text[i];
            break;
        case(1):
            valueArray[writeLocation] |= (text[i] << 8); // shift to left by 8 bits to insert to second byte
            break;
        case(2):
            valueArray[writeLocation] |= (text[i] << 16); // shift to left by 16 bits to insert to third byte
            break;
        case(3):
            valueArray[writeLocation] |= (text[i] << 24); // shift to left by 24 bits to insert to fourth byte
            break;
        default:
            printf("Some error occurred here... If source has been modified, please check to make sure the number of case handlers == the possible values for mode.\n");
    }
}

You can see an example of that running here: https://ideone.com/OcDMoM (Note, there is some runtime error when executing that on IDEOne. I haven't looked intensely for that issue, though, as the output still seems to be accurate and the code is just meant to serve as an example.)

Essentially, because each byte is 8-bits, and you want to store the bytes in 4-byte chunks (32-bits each), you need four different cases for how far you shift. In the first case, the first 8-bits are filled in by a byte from the message. In the second case, the second 8-bits are filled in by the following byte in the message (which is left shifted by 8-bits because that is the offset for the binary position). And that continues for the remaining 2 bytes, and then it repeats starting at the next index of the initial message array.

When combining the bytes, |= is used because that will take what is already in uint32 and it will perform a bitwise OR on it, so the final values will combine into one single value.

So, to break down a simple example like what I had in my initial post, let's say I have 0000 0001 (one) and 0000 0010 (two), with an initial 16-bit integer to hold them 0000 0000 0000 0000. The first byte is assigned to the 16-bit integer making it 0000 0000 0000 0001. Then the second byte is left shifted by 8 making it 0000 0010 0000 0000. Finally, the two are via a bitwise OR, so the 16-bit integer becomes: 0000 0010 0000 0001.

In the case of a 32-bit integer to hold 4 bytes, that process will repeat 2 more times with 8 additional shifts, and then it will proceed to the next uint32 to repeat the process.

Hopefully that all makes sense. If not, I can try to clarify further.

Converting array of characters to an array of uint32_t in c-- is this the proper way?

1 Answers