2
votes

I spent about 4 hours yesterday trying to fix this issue in my code. I simplified the problem to the example below.

The idea is to store a string in a stringstream ending with std::ends, then retrieve it later and compare it to the original string.

#include <sstream>
#include <iostream>
#include <string>

int main( int argc, char** argv )
{
    const std::string HELLO( "hello" );

    std::stringstream testStream;

    testStream << HELLO << std::ends;

    std::string hi = testStream.str();

    if( HELLO == hi )
    {
        std::cout << HELLO << "==" << hi << std::endl;
    }

    return 0;
}

As you can probably guess, the above code when executed will not print anything out.

Although, if printed out, or looked at in the debugger (VS2005), HELLO and hi look identical, their .length() in fact differs by 1. That's what I am guessing is causing the "==" operator to fail.

My question is why. I do not understand why std::ends is an invisible character added to string hi, making hi and HELLO different lengths even though they have identical content. Moreover, this invisible character does not get trimmed with boost trim. However, if you use strcmp to compare .c_str() of the two strings, the comparison works correctly.

The reason I used std::ends in the first place is because I've had issues in the past with stringstream retaining garbage data at the end of the stream. std::ends solved that for me.

5
Okay, I understand the mechanics behind this, but I don't like the semantics. It seems like I have two choices: don't use std::ends and risk having garbage data, or do use it and add custom code to get rid of extra NULL characters.Andrew Potapov
You should try to engineer your code so that you know what the expectations of the strings are - for example if you are reading strings from a network device they are probably not null terminated, but that would depend on the API you were using, but if you are passing strings around inside your app then probably they are. Don't get in a situation where you have no idea what is in your data.1800 INFORMATION
Why are you using ends anyway? That's only used when you are building up a null terminated C style string from raw data. Here, in your example, its clearly not appropriate. You already have a C++ string.Brian Neal

5 Answers

13
votes

std::ends inserts a null character into the stream. Getting the content as a std::string will retain that null character and create a string with that null character at the respective positions.

So indeed a std::string can contain embedded null characters. The following std::string contents are different:

ABC
ABC\0

A binary zero is not whitespace. But it's also not printable, so you won't see it (unless your terminal displays it specially).

Comparing using strcmp will interpret the content of a std::string as a C string when you pass .c_str(). It will say

Hmm, characters before the first \0 (terminating null character) are ABC, so i take it the string is ABC

And thus, it will not see any difference between the two above. You are probably having this issue:

std::stringstream s;
s << "hello";
s.seekp(0);
s << "b";
assert(s.str() == "b"); // will fail!

The assert will fail, because the sequence that the stringstream uses is still the old one that contains "hello". What you did is just overwriting the first character. You want to do this:

std::stringstream s;
s << "hello";
s.str(""); // reset the sequence
s << "b";
assert(s.str() == "b"); // will succeed!

Also read this answer: How to reuse an ostringstream

4
votes

std::ends is simply a null character. Traditionally, strings in C and C++ are terminated with a null (ascii 0) character, however it turns out that std::string doesn't really require this thing. Anyway to step through your code point by point we see a few interesting things going on:

int main( int argc, char** argv )
{

The string literal "hello" is a traditional zero terminated string constant. We copy that whole into the std::string HELLO.

    const std::string HELLO( "hello" );

    std::stringstream testStream;

We now put the string HELLO (including the trailing 0) into the stream, followed by a second null which is put there by the call to std::ends.

    testStream << HELLO << std::ends;

We extract out a copy of the stuff we put into the stream (the literal string "hello", plus the two null terminators).

    std::string hi = testStream.str();

We then compare the two strings using the operator == on the std::string class. This operator (probably) compares the length of the string objects - including how ever many trailing null characters. Note that the std::string class does not require the underlying character array to end with a trailing null - put another way it allows the string to contain null characters so the first of the two trailing null characters is treated as part of the string hi.

Since the two strings are different in the number of trailing nulls, the comparison fails.

    if( HELLO == hi )
    {
        std::cout << HELLO << "==" << hi << std::endl;
    }

    return 0;
}

Although, if printed out, or looked at in the debugger (VS2005), HELLO and hi look identical, their .length() in fact differs by 1. That's what I am guessing is causing the "==" operator to fail.

Reason being, the length is different by one trailing null character.

My question is why. I do not understand why std::ends is an invisible character added to string hi, making hi and HELLO different lengths even though they have identical content. Moreover, this invisible character does not get trimmed with boost trim. However, if you use strcmp to compare .c_str() of the two strings, the comparison works correctly.

strcmp is different from std::string - it is written from back in the early days when strings were terminated with a null - so when it gets to the first trailing null in hi it stops looking.

The reason I used std::ends in the first place is because I've had issues in the past with stringstream retaining garbage data at the end of the stream. std::ends solved that for me.

Sometimes it is a good idea to understand the underlying representation.

0
votes

You're adding a NULL char to HELLO with std::ends. When you initialize hi with str() you are removing the NULL char. The strings are different. strcmp doesn't compare std::strings, it compares char* (it's a C function).

0
votes

std::ends adds a null terminator, (char)'\0'. You'd use it with the deprecated strstream classes, to add the null terminator.

You don't need it with stringstream, and in fact it screws things up, because the null terminator isn't "the special null terminator that ends a string" to stringstream, to stringstream it's just another character, the zeroth character. stringstream just adds it, and that increases the character count (in your case) to seven, and makes the comparison to "hello" fail.

0
votes

I think to have a good way to compare strings is to use std::find method. Do not mix C methods and std::string ones!