29

The Subtle Dangers of the Comma Operator (C++)

 3 years ago
source link: https://humanreadablemag.com/issues/3/articles/the-subtle-dangers-of-the-comma-operator
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

In its powerful abilities, the C++ language allows us to do many things.

But like a philosopher who was also the uncle of a superhero once said, with great power comes great responsibility.

Translated in C++, this means that if you're not careful, some C++ features that let you write expressive code can turn around and create buggy code that doesn't do what it's supposed to.

One beautiful example (of some definition of beautiful) is overloading of the comma operator. As we're going to see, a very subtle change in working code can make it go horribly wrong.

A big thanks goes to Fluent C++ reader Nope for showing me this example.

Overloading the Comma Operator Is Powerful

First, the comma operator is a thing. Its default implementation for all types does this: a,b evaluates a, then evaluates b, then returns b. For example, 1,2 returns 2.

It's generally not recommended, but C++ allows for overloading the comma operator. Here is a detailed article on overloading the comma operator , which you can read to get more familiar with the topic.

Overloading the comma operator allows us to do nice things. For example, this is whatBoost Assign uses to allow us to append data to an existing vector:

v += 1,2,3,4,5;

Without overloading the comma operator, we can't write this expression with standard C++, even in C++20. Once the vector is constructed, we can only add elements one by one by using the push_back member function.

The preceding code allows us to add elements to an existing vector with very expressive code.

Here is a very simplified implementation of Boost Assign, which allows us to write the preceding line of code (thanks to Nope for this implementation):

template <typename Vector>
struct appender
{
    Vector& vec;

    template<typename T>
    appender<Vector>& operator,(const T& e)
    {
        vec.push_back(e);
        return *this;
    }
};

template <typename T>
appender<std::vector<T>> operator+=(std::vector<T>& v, const T& e)
{
    v.push_back(e);
    return {v};
}

int main()
{
    auto data = std::vector<int>{};
    data += 1,2,3,4,5;

    for (auto&& e: data)
        std::cout << ' ' << e;
    std::cout << '\n';
}

Here is how this code works:

  • Since the comma operator has the lowest precedence, data += 1 executes first.
  • This adds 1 to the vector and returns an appender referencing that vector.
  • The appender overloads the comma operator. When this appender is associated with 2, it adds it to the vector and returns itself.
  • The appender is then associated with 3 and also adds it, and then 4, and then 5.

The output of this program is this (run the code yourselfhere):

All good.

At least, so far.

Overloading the Comma Operator Is Dangerous

Now let's make a small change in our code. Instead of defining the comma operator as a member function, let's define it as a free function. For example, this could be desirable as it allows implicit conversions, as explained in item 24 ofEffective C++.

template <typename Vector>
struct appender
{
    Vector& vec;
};

template <typename Vector, typename T>
appender<Vector>& operator,(appender<Vector>& v, const T& e)
{
    v.vec.push_back(e);
    return v;
}

template <typename T>
appender<std::vector<T>> operator+=(std::vector<T>& v, const T& e)
{
    v.push_back(e);
    return {v};
}

int main()
{
    auto data = std::vector<int>{};
    data += 1,2,3,4,5;

    for (auto&& e: data)
        std::cout << ' ' << e;
    std::cout << '\n';
}

This shouldn't change anything, right?

Let's run the program (run it yourself here). It outputs this:

If you're like me, you're staring at the screen in disbelief. Run the program that works and the one that doesn't work if you'd like to see it with your own eyes.

Maybe the worst thing is that it compiles, not that it doesn't have the behavior we would naturally expect.

Can you see why this is happening?

I recommend you search on your own. This is highly instructive!

...

Seriously, try to find what's wrong on your own. I'll tell you in a bit, but it's more fun and rewarding to find it yourself.

...

You're on mobile and it's not convenient? No worries, bookmark this page or send it to yourself by email so you can come back to it later on your computer.

...

Found it yet?

...

Ok, I'll show you now.

What the C*mm* Is Happening?

The problem has to do with lvalues and rvalues. If we look again at the free function operator, it takes an lvalue reference as an input:

template <typename Vector, typename T>
appender<Vector>& operator,(appender<Vector>& v, const T& e)
{
    v.vec.push_back(e);
    return v;
}

The calling code is this:

data += 1,2,3,4,5;

data += 1 is an rvalue. An lvalue reference cannot bind to it. Therefore, this overload of the comma operator is not called.

If it were any other operator, the code would not have compiled. But like we saw at the beginning of this post, the comma operator has a default implementation for all types. Therefore, the default implementation is executed—the one that returns the second element, here 2. Then it returns 3. Then 4 and then 5. And it doesn't really do anything.

Incorrectly overloading the comma operator results in a silent failure . The code compiles, runs, but doesn't do what you want.

To make this implementation work, we need to provide an overload of the comma operator that can accept rvalues:

template <typename Vector, typename T>
appender<Vector>& operator,(appender<Vector>& v, const T& e)
{
	v.vec.push_back(e);
	return v;
}

template <typename Vector, typename T>
appender<Vector>& operator,(appender<Vector>&& v, const T& e)
{
	v.vec.push_back(e);
	return v;
}

Note that a const lvalue reference would also bind to lvalues and rvalues.

To use this in our case, we need to return a copy of the appender so that a const reference can bind to it. In our case, this would still append to the vector because the various copies of the appender would contain a reference to the same vector (thanks to Patrice Dalesme for showing me this solution):

template <typename Vector, typename T>
appender<Vector> operator,(appender<Vector> const& v, const T& e)
{
	v.vec.push_back(e);
	return v;
}

Lessons Learned

There are at least two lessons we can learn from that example.

The first one is that if we overload the comma operator, we need to be extra careful to cover all cases and think about lvalues and rvalues. Otherwise we end up with buggy code.

The second one is independent from the comma operator. We see that in C++, member functions are easier to define than free functions. Member functions are, by default, defined for both lvalues and rvalue references of a type, whereas free functions taking references may work for only one case.

The opposite also exists (member functions explicitly defined for lvalues or rvalues, and free functions taking by copy or const reference), but the most common prototypes have the properties we discussed earlier.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK