std::string is not a Container for Raw Data
source link: https://www.tuicool.com/articles/hit/Ev6re2y
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Sometimes we need unformatted data, simple byte sequences. At first glance,
might be a fitting data structure for that, but it is not.
Think about data we get from networks, a CAN bus, another process. Serialized binary data that has to be interpreted before it can be used in our business logic. The natural way to manage this kind of data is having sequence containers like
or, lacking C++17 support,
. Sometimes we also see
, which on many platforms is
However, there is another contiguous container for 8-bit values that seems tempting to be used as a means to transport byte sequences:
. I am not sure about the reasons to do this apart from
being slightly less to type than
, meaning that I can not see any reason at all. On the contrary, it is a bad idea for several reasons.
Many string operations rely on having zero-terminated character sequences. That means that there is exactly one null character, and that is at the end. Plain byte sequences, on the other hand, can contain an arbitrary number of null bytes anywhere. While
can store sequences with null characters, we have to be very careful to not use functions that take
, because those would truncate at the first null character.
The major reason not to use
is semantics: When we see that type in our code, we naturally expect a series of readable characters. We expect some text. When it is misused as a series of raw bytes, it is confusing to maintainers of our codebase. It gets even worse if we expose the use of
as a raw data container via an API that has to be used by someone else.
Especially in locations where we convert text to serialized raw data or vice versa, it will be very confusing to determine which
is text and which is raw data.
Apart from confusing the developer, having the same type for two nontrivial uses can be error prone as it neglects the safety mechanisms the strong typing of C++ gives us. Imagine for example a function that takes some text and some serialized raw data – both would take
and could easily switch places by accident.
. While this already nicely says “sequence of bytes”, consider using a typedef. For evenstronger typing, use a wrapper structure with a meaningful name.
Aggregate valuable and interesting links.
Joyk means Joy of geeK