0

The infamous bug of range-based for loops

 7 months ago
source link: https://www.sandordargo.com/blog/2022/04/20/range-base-p2012
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

The infamous bug of range-based for loops

Sandor Dargo 1 day ago2022-04-20T00:00:00+02:00

Imagine that you used a feature of your favourite language without fear. Imagine that you used that feature endlessly, without having a second thought. You even promoted it in conversations, in code reviews, in talks, just anywhere.

Then it turns out that it’s error-prone and it might lead to undefined behaviour. Not only in barely imaginable corner cases, but in completely normal scenarios.

Has it ever happened to you?

It certainly happened to me.

I learnt just a few weeks ago that the range-based for loop is broken.

What is the problem?

In brief, iterating over a reference to a temporary value is undefined behaviour.

Let’s see a concrete example.

#include <iostream>
#include <string>
#include <vector>

std::vector<std::string> createStrings() {
    return {"This", "is", "a", "vector", "of", "strings"};
}

int main()
{
  for (auto w: createStrings()) {
      std::cout << w << " "; // this works fine
  }
  std::cout << std::endl;
  for (auto c: createStrings()[0]) {
      std::cout << c << " "; // this is UB
  }
  std::cout << std::endl;
}

If you run the following piece of code, you’ll see that the first for loop works fine, while the second one prints some garbage.

In this above example, we played with getting an element of a vector of string, but we would run into the same issue if we tried to get an element of a tuple, or if we wanted to iterate over the elements of an optional vector.

#include <iostream>
#include <optional>
#include <string>
#include <vector>

std::vector<std::string> createStrings() {
    return {"This", "is", "a", "vector", "of", "strings"};
}

std::optional<std::vector<int>> createOptionalInts() {
    return std::optional<std::vector<int>>1;
}


int main()
{
  for (auto i: createOptionalInts().value()) {
      std::cout << i << " "; // UB
  }
  std::cout << std::endl;
}
/*
In my environment, the output happened to be
0 0 3 4
*/

This is a pretty serious issue and we can run into this problem in practice.

In order to understand the root cause of this behaviour, we have to understand how range-based for loops are implemented.

According to the standard, such loops are expanded into several statements. Essentially, they are transformed into a good old for loop where both the begin and end iterators are declared externally:

#include <iostream>
#include <optional>
#include <string>
#include <vector>

std::optional<std::vector<int>> createOptionalInts() {
    return std::optional<std::vector<int>>1;
}

int main()
{  
  auto&& range = createOptionalInts().value();
  auto position = range.begin();
  auto end = range.end();
  for(; position != end; ++position) {
      std::cout << *(position) << " "; // UB
  }
  std::cout << std::endl; 
}

You can play around with it on C++Insights

According to the rules of the language, the temporary values created during the creation of the range, that are not directly bound to it are destroyed before the for loop starts.

What can you do?

First of all, you have to learn about the problem and share it with others too. For beginners, you can mention that there are constraints and risks in certain situations and describe those situations at a high level.

Unless we are aware of all the rules, this is far from an evident problem.

Therefore, for more experienced programmers you should also tell the details, in particular how a range-based for loop is expanded. That’s something we briefly saw in the previous section and P2012R0 clearly helps with its precise details.

Education is the best you can do at this moment. There are books and style guides mentioning this problem, such as Embracing Modern C++ Safely and Abseil Tip #107, but we cannot expect that based on a few sources everyone knows about the problem.

We should pass the message that “the range-based for loop does not work when iterating over references to temporary objects”.

Will it be fixed?

Will this problem be ever fixed? - you should ask at this point. Maybe yes, definitely not for the time being. I learnt about the problem because of this. I’ve read a tweet by Victor Ciura mentioning that a fix was just rejected by the C++ Evolution Working Group (EWG).

The proposal P2012R0 was written by Nico Jusuttis, Victor Zverovich, Filipe Molunde and Arthur O’Dwyer was progressing well in the committee, but finally it didn’t make it to the language because the proposed solution was not judged to be generic enough.

They proposed to fix the problem by how the loop is extended. The end goal was to extend the lifetime of the universal references so that it doesn’t end before entering the for loop. The idea was to achieve this without introducing new lifetime rules.

As the proposal was rejected, we cannot expect to have this fixed in the next version, but hopefully, the community will find a solution maybe for C++26.

Conclusion

In this article, I shared with you something that clearly surprised me and probably many of you. Range-based for loops are broken. They are the hotbed of undefined behaviour as they cannot handle references to temporary values as one would expect.

This is a real problem, we saw different realistic use-cases, and it’s been known for a long time. There was a proposal to fix it written by prominent personalities from our community, but it was not accepted - so far as the solution is not generic enough.

Have you ever run into this problem?

Connect deeper

If you liked this article, please


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK