14
votes

As far as I know, the only difference between variable types such as char, int etc. is the amount of memory they occupy. I guess that they have no role in regulating what the variable they are holding represents. If that's true, in here, I have seen the following for strcmp:

The strcmp function compares the string s1 against s2, returning a value that has the same sign as the difference between the first differing pair of characters (interpreted as unsigned char objects, then promoted to int).

I want to ask why is the result promoted to int? Since chars are being compared, their difference fits into a char in all cases. So isn't promoting the result to int simply appending bunch of 0's at the end of the result? So, why is this done?

4
Why would it return char? it returns the difference between the strings.Iharob Al Asimi
Yes but doesn't it stop upon seeing the first non-matching character? Then the result must be fitting into a char. Is that wrong?Utku
The difference between two 8-bit chars will be between -255 and 255, and that range doesn't fit in a char.interjay
@wonkorealtime I know that. I was only pointing out that the OP is wrong in assuming that the difference between two chars can be stored in a char.interjay

4 Answers

28
votes

char may or may not be signed. strcmp must return a signed type, so that it can be negative if the difference is negative.

More generally, int is preferred for passing and returning simple numerical values, since it's defined as the "natural" size for such values and, on some platforms, is more efficient to deal with than smaller types.

11
votes

Of course, despite the overflow possibility others have mentioned, it only needs to be able to return e.g. -1, 0, or 1 - which easily fit in a signed char. The real historical reason for this is that in the original version of C in the 1970s, functions couldn't return a char, and any attempt to do so resulted in returning an int.

In these early compilers, int was also the default type (many situations, including function return values as seen in main below, allowed you to declare something as int without actually using the int keyword), so it made sense to define any function that didn't specifically need to return a different type as returning an int.

Even now, a char return simply sign-extends the value into the int return register (r0 on pdp11, eax on x86), anyway. Treating it as a char would not have any performance benefit, whereas allowing it to be the actual difference rather than forcing it to be -1 or 1 did have a small performance benefit. And axiac's answer also makes the good point that it would have had to be promoted back to an int anyway, for the comparison operator. The reason for these promotions is also historical, incidentally, it was so that the compiler did not have to implement separate operators for every possible combination of char and int, especially since the comparison instructions on many processors only works with an int anyway.


Proof: If I make a test program on Unix V6 for PDP-11, the char type is silently ignored and an integer value outside the range is returned:

char foo() {
    return 257;
}

main() {
    printf("%d\n", foo());
    return 0;
}

# cc foo.c
# a.out
257
3
votes

AFAIK, the standard C library doesn't have a single function that takes or returns values of type char. It has arguments and return types of type char* or const char* but not plain char.

Look for example at int isalpha(int c); for a more shocking instance.

I don't know why, but I can guess. Maybe it is due to the ABI. In any ABI I know, any argument or return value of type char is internally promoted to int anyway, so there is no point in doing it. It acutally will make the code less efficient, as you will need to do the truncating each time the function is used.

3
votes

One possible reason why strcmp() promotes the values it returns to int is to spare a processor instruction in the calling code.

Usually (always?) the value returned by strcmp() is used with a comparison operator.

Let's see what happens with the operands of comparison operators.

Usual arithmetic conversions

The arguments of the following arithmetic operators undergo implicit conversions for the purpose of obtaining the common real type, which is the type in which the calculation is performed:

  • binary arithmetic *, /, %, +, -
  • relational operators <, >, <=, >=, ==, !=
  • binary bitwise arithmetic &, ^, |
  • the conditional operator ?:

...

4) Otherwise, both operands are integers. In that case,

First of all, both operands undergo integer promotions.
...

(source: http://en.cppreference.com/w/c/language/conversion#Usual_arithmetic_conversions)

Integer promotions

Integer promotion is the implicit conversion of a value of any integer type with rank less or equal to rank of int or of a bit field of type _Bool, int, signed int, unsigned int, to the value of type int or unsigned int.

(source: http://en.cppreference.com/w/c/language/conversion#Integer_promotions)

Back to strcmp()

As you can see from the quotes above, a possible char value returned by strcmp() is promoted to int anyway.

Why did the creators of C chose to return an int?

For a very simple reason: because the promotion is going to happen anyway and because (at least) one processor instruction is needed to perform the promotion, its more convenient to add that instruction to the code of strcmp() (i.e. in a single place) than everywhere the strcmp() function is called.

Back in the 70s both the memory and the CPU were very valuable resources. An optimization that now seems insignificant (a couple of bytes of memory saved here and there, maybe in several dozen places in the code) had much more importance back then.

Update:

On a second thought, I think the historical reasons provided by this answer and this answer are more accurate than mine.