0
votes

I am attempting to convert a string into a number by summing the int value of each letter together in C++ WinAPI. So in ASCII; the std::string "AA" would equal 130 (65+65)

The string can either be a std::string or an std::wstring.

Why does the following function always return the value of zero no matter what letter I put in it? Shouldn't it return either the ASCII or Unicode integer value of the letter?

printf("TEST a: %d \n", _tstoi(_T("a")));
printf("TEST A: %d \n", _tstoi(_T("A")));
printf("TEST b: %d \n", _tstoi(_T("b")));

My VC++ application is currently in Unicode, & the previous code prints out zero for each letter. I remember hearing that Unicode is very different to ASCII strings, can you clear up what exactly is different other than Unicode has a library of characters which is something like 30,000 long whilst ASCII is 256(I think?)?

3
Perhaps of interest, Joel's "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" joelonsoftware.com/articles/Unicode.html - HostileFork says dont trust SE

3 Answers

3
votes

The msdn article says:

"The input string is a sequence of characters that can be interpreted as a numerical value of the specified type. The function stops reading the input string at the first character that it cannot recognize as part of a number."

If you test the code with unicode strings containing actual numbers, you'll see the correct output:

printf("TEST 1: %d \n", _tstoi(_T("1")));

output:

TEST 1: 1

Like @Ylisar said, the *toi functions are used to convert number values from strings to integer variables instead.

The following code will output the number representation instead, but watch out for the pointer representation of the const variables. I've left both versions so you can see the difference:

  printf("TEST 1: %d \n", _tstoi(_T("1")));
  printf("TEST a: %d \n", _tstoi(_T("a")));
  WCHAR* b(_T("b"));
  printf("TEST A: %d \n", _T("A"));
  printf("TEST b: %d \n", *b);

Output:

TEST 1: 1
TEST a: 0
TEST A: 13457492
TEST b: 98

Check out more at http://msdn.microsoft.com/en-us/library/yd5xkb5c%28v=vs.80%29.aspx

If you want to sum up (accumulate) the values, I would recommend you checking out the STL range functions which does wonders on such things. For example

#include <numeric>
#include <string>

printf("TEST a: %d \n", *_T("a")); // 97
printf("TEST b: %d \n", *_T("b")); // 98

wstring uString(_T("ba"));
int result = accumulate(uString.begin(), uString.end(), 0);
printf("TEST accumulated: %d \n", result);

Results:

TEST a: 97
TEST b: 98
TEST accumulated: 195

This way you don't have to have for-loops going through all the values. The range functions really are nice for stuff like this.

Check out more at: http://www.sgi.com/tech/stl/accumulate.html

1
votes

the *toi family of functions converts a string representation to integer representation, that is, "10" becomes 10. What you actually want to do is no conversion at all. Change it to:

printf("TEST a: %d \n", _T('a'));
printf("TEST A: %d \n", _T('A'));
printf("TEST b: %d \n", _T('b'));

As for unicode, the underlying representation depends on the encoding ( for example UTF-8, which is very popular, maps the LSB with the ASCII table ).

0
votes

The first question, why printf does not work as intened has already been answered by Ylisar. The other question about summing the hexadecimal representation of a character is a little more complex. The conversion from strings to number values with the _tstoi() function will only work if the given string represents a number like "123" gets converted to 123. What you want is the sum of the characters representation.

In case of Unicode code points below 0x7F (0...127) this is simply the sum of the 1 Byte UTF-8 representation. However on Windows compiled with UNICODE flag a 2 Byte per character representation is used. Running the following code in the debugger will releal this.

// ASCII 1 Byte per character
const char* letterA = "A";
int sumOfLetterA = letterA[0] + letterA[0]; // gives 130

// 2 Bytes per character (Windows)
const wchar_t* letterB = TEXT("B");
int sumOfLetterB = letterB[0] + letterB[0]; // gives 132