3

The Case of string_view and the Magic String

 2 years ago
source link: https://blogs.msmvps.com/gdicanio/2021/03/26/the-case-of-string_view-and-the-magic-string/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

The Case of string_view and the Magic String

Someone was working on modernizing some legacy C++ code. The code base contained a function like this:

void DoSomething(const char* name)
void DoSomething(const char* name)

The DoSomething function takes a read-only string expressed using a C-style string pointer: remember, this is a legacy C++ code base.

For the sake of this discussion, suppose that DoSomething contains some simple C++ code that invokes a C-interface API, like this (of course, the code would be more complex in a real code base):

void DoSomething(const char* name)
SomeCApi(name);
void DoSomething(const char* name) 
{
    SomeCApi(name);
}

SomeCApi also expects a “const char*” that represents a read-only string parameter.

However, note that the SomeCApi cannot be modified (think of it like a system C-interface API, for example: a Windows C API like MessageBox).

For the sake of this discussion, suppose that SomeCApi just prints out its string parameter, like this:

void SomeCApi(const char* name)
printf(“Hello, %s!\n”, name);
void SomeCApi(const char* name) 
{
    printf(“Hello, %s!\n”, name);
}

In the spirit of modernizing the legacy C++ code base, the maintainer decides to change the prototype of DoSomething, stepping up from “const char*” to std::string_view:

// Was: void DoSomething(const char* name)
void DoSomething(std::string_view name)
// Was: void DoSomething(const char* name)
void DoSomething(std::string_view name)

The SomeCApi still expects a const char*. Remember that you cannot change the SomeCApi interface.

So, the maintainer needs to update the body of DoSomething accordingly, invoking string_view::data to access the underlying character array:

void DoSomething(std::string_view name)
// Was: SomeCApi(name);
SomeCApi(name.data());
void DoSomething(std::string_view name) 
{
    // Was: SomeCApi(name);
    SomeCApi(name.data());
}

In fact, std::string_view::data returns a pointer to the underlying character array.

The code compiles fine. And the maintainer is very happy about this string_view modernization!

Then, the code is executed for testing, with a string_view name containing “Connie”. The expected output would be:

“Hello, Connie!”

But, instead, the following string is printed out:

“Hello, Connie is learning C++!”

Wow! Where does the “ is learning C++” part come from??

Is there some magic string hidden inside string_view?

As a sanity check, the maintainer simply prints out the string_view variable:

// name is a std::string_view
std::cout << “Name: “ << name;
// name is a std::string_view
std::cout << “Name: “ << name;
 

And the output is as expected: “Name: Connie”.

So, it seems that cout does print the correct string_view name.

But, somehow, when the string_view is passed deep down to a legacy C API, some string “magic” happens, showing some additional characters after “Connie”.

What’s going on here??

Figuring Out the Bug

Well, the key here are two words: Null Terminator.

In fact, the C API that takes a const char* expected the string to be null-terminated.

On the other hand, std::string_view does not guarantee null-termination!

So, consider a string_view that “views” only a portion of a string, like this:

std::string str = “Connie is learning C++”;
auto untilFirstSpace = str.find(‘ ‘);
std::string_view name{str.data(), untilFirstSpace}; // “Connie”
std::string str = “Connie is learning C++”;
auto untilFirstSpace = str.find(‘ ‘);
std::string_view name{str.data(), untilFirstSpace}; // “Connie”

The string_view certainly “views” the “Connie” part. But, if you consider the memory layout, after these “Connie” characters in memory there is no null terminator, which was expected by the C API. So, the C API views the whole initial string, until it finds the null terminator.

So, the whole string is printed out by the C API, not just the part observed by the string_view.

Memory layout: string_view vs. C-style null-terminated strings

This is a very subtle bug, that can be hard to spot in more complex code bases.

So, remember: std::string_views are not guaranteed to be null terminated! Take that into consideration when calling C-interface APIs, that expect C-style null-terminated strings.

P.S. As a side note, std::string::c_str guarantees that the returned pointer points to a null-terminated character array.

Repro Code

// The Case of string_view and the Magic String
// -- by Giovanni Dicanio
#include <stdio.h>
#include <iostream>
#include <string>
#include <string_view>
void SomeCApi(const char* name)
printf("Hello, %s!\n", name);
void DoSomething(std::string_view name)
SomeCApi(name.data());
int main()
std::string msg = "Connie is learning C++";
auto untilFirstSpace = msg.find(' ');
std::string_view v{ msg.data(), untilFirstSpace };
std::cout << "String view: " << v << '\n';
DoSomething(v);
// The Case of string_view and the Magic String 
// -- by Giovanni Dicanio

#include <stdio.h>

#include <iostream>
#include <string>
#include <string_view>

void SomeCApi(const char* name)
{
    printf("Hello, %s!\n", name);
}

void DoSomething(std::string_view name)
{
    SomeCApi(name.data());
}

int main()
{
    std::string msg = "Connie is learning C++";
    auto untilFirstSpace = msg.find(' ');

    std::string_view v{ msg.data(), untilFirstSpace };

    std::cout << "String view: " << v << '\n';

    DoSomething(v);
}



About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK