A case study in not-quite-move-semantic library design
source link: https://quuxplusone.github.io/blog/2024/04/18/topkable/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
A case study in not-quite-move-semantic library design
Building on yesterday’s post “Name lookup in multiple base classes” (2024-04-17), Seth Bromberger and I recently talked about the following library-design problem involving (or at least seeming to involve) exactly that issue. The shape of the problem is:
-
We have a class
Graph
whose constructor takes a collection of data and puts the topk
values into a nice graph. Its constructor will be templated to work with any collectionc
, as long asc.topk(10)
yields a result of typevector<Datum>
. -
We have, let’s say, four different collection types that are usable with
Graph
; let’s call themA
,B
,C
, andD
. Let’s say thatA
andB
provide not only.topk(int) const
but also a “consuming” or “pilfering” version oftopk
, which erases the returned elements from the collection.C
provides only the non-mutatingtopk
.D
provides only the pilfering version! -
We would certainly like
Graph
to be able to use the “pilfering” version oftopk
in certain circumstances; but equally surely, constructing aGraph
from some collectionx
shouldn’t mutatex
by default — not unless we opt in somehow. -
We want to write the bulk of
topk
’s implementation just once, e.g. via a base class, instead of having to implement it in each ofA
,B
,C
,D
. And likewise for the pilfering version.
Approach 1: Move semantics
Notice that these requirements have almost exactly the same shape as C++11 move semantics! So one way to implement this would be:
struct Datum {};
struct PlainTopkable {
std::vector<Datum> topk(int k) const; // non-mutating
};
struct PilferingTopkable {
std::vector<Datum> topk(int k) &&; // pilfering
};
struct BothTopkable {
std::vector<Datum> topk(int k) const&; // non-mutating
std::vector<Datum> topk(int k) &&; // pilfering
};
struct A : BothTopkable {};
struct B : BothTopkable {};
struct C : PlainTopkable {};
struct D : PilferingTopkable {};
struct Graph {
template<class Coll>
explicit Graph(int k, Coll&& c) {
data_ = std::forward<Coll>(c).topk(k);
}
private:
std::vector<Datum> data_;
};
A a;
C c;
Graph g1 = Graph(10, a); // non-mutating
Graph g2 = Graph(10, std::move(a)); // pilfer the top 10
Graph g3 = Graph(10, c); // non-mutating
Graph g4 = Graph(10, std::move(c)); // non-mutating
However, this has two potential downsides. One: The way g2
opts into pilfering the top k
elements
of a
is by doing std::move(a)
, which looks like you’re “giving up” the entire object a
. In fact
you’re merely pilfering its top 10 elements; the others remain available; and the resulting object a
is still in a usable state. This feels like a misuse of std::move
terminology.
I have the same ambivalence about the STL’s
set::merge
, which is callable as eithera.merge(b)
ora.merge(std::move(b))
. Both calls have the same predictable effect onb
, but connote different things to the reader, and neither of them is quite what the programmer means. The right signature would have treatedb
as an inout parameter (“Pass out-parameters by pointer”) and been called asa.merge(&b)
; but the C++ STL consistently eschews that rule, so it’s forced into awkwardnesses like this.
Two: g4
“opted in” to pilfering from c
, but in fact c
doesn’t provide the pilfering version of topk
,
so it quietly fell back to the non-mutating version. That’s the right thing for move semantics, but it’s not
what we want here. (Suppose the caller assumed that c
would now be ten elements smaller, and wrote code
based on that assumption!) We want a noisy error if Graph
asks to pilfer from a non-pilferable collection type.
Approach 2: pilfering_topk
The answer to the title question of my C++Now 2021 talk “When Should You Give Two Things the Same Name?” is “only when it’s needed for a technical reason,” i.e., basically never. So let’s apply that guideline here.
struct Datum {};
struct PlainTopkable {
std::vector<Datum> topk(int k) const; // non-mutating
};
struct PilferingTopkable {
std::vector<Datum> pilfering_topk(int k); // pilfering
};
struct A : PlainTopkable, PilferingTopkable {};
struct B : PlainTopkable, PilferingTopkable {};
struct C : PlainTopkable {};
struct D : PilferingTopkable {};
struct Graph {
template<class Coll>
static Graph from_collection(int k, const Coll& c) {
return Graph(c.topk(k));
}
template<class Coll>
static Graph pilfered_from_collection(int k, Coll& c) {
return Graph(c.pilfering_topk(k));
}
private:
explicit Graph(std::vector<Datum> v) : data_(std::move(v)) {}
std::vector<Datum> data_;
};
A a;
C c;
Graph g1 = Graph::from_collection(10, a);
Graph g2 = Graph::pilfered_from_collection(10, a);
Graph g3 = Graph::from_collection(10, c);
Graph g4 = Graph::pilfered_from_collection(10, c); // error
I think it’s “more convenient than you think” to provide named factory functions like
pilfered_from_collection
instead of a gigantic constructor overload set. However, I admit
it can be unwieldy sometimes — or impossible, if your Graph
needs to be immovable for some
reason.
Re factory functions, see “Is your constructor an object-factory or a type-conversion?” (2018-06-21). Re dealing with immovable types, see “The Superconstructing Super Elider” (2018-05-17).
So we might say that we have a “good technical reason” to give the two ways
of constructing Graph
s the same name (that is, to put them both into the constructor
overload set). Then we’d have to use a tag type to differentiate them (cf. the STL’s
use of std::allocator_arg
and std::in_place
tags):
struct PilferTag { explicit PilferTag() = default; };
struct Graph {
template<class Coll>
explicit Graph(int k, const Coll& c) {
data_ = c.topk(k);
}
template<class Coll>
explicit Graph(int k, const Coll& c, PilferTag) {
data_ = c.pilfering_topk(k);
}
private:
std::vector<Datum> data_;
};
Graph g1 = Graph(10, a); // non-mutating
Graph g2 = Graph(10, a, PilferTag()); // pilfer the top 10
Graph g3 = Graph(10, c); // non-mutating
Graph g4 = Graph(10, c, PilferTag()); // error
Approach 3: Tags all the way down?
We might look at our last snippet and claim that we have a “good technical reason” to use
the same name all the way down our stack! We’d be wrong, but let’s suppose we claim it
anyway. Then we might try for a solution like the following. Notice the simpler Graph
class. But we also run into exactly the problem from
“Name lookup in multiple base classes” (2024-04-17),
and have to solve it with using
-declarations.
struct Datum {};
struct PilferTag { explicit PilferTag() = default; };
struct PlainTopkable {
std::vector<Datum> topk(int k) const; // non-mutating
};
struct PilferingTopkable {
std::vector<Datum> topk(int k, PilferTag); // pilfering
};
struct BothTopkable : PlainTopkable, PilferingTopkable {
using PlainTopkable::topk;
using PilferingTopkable::topk;
};
struct A : BothTopkable {};
struct B : BothTopkable {};
struct C : PlainTopkable {};
struct D : PilferingTopkable {};
struct Graph {
template<class Fwd, class... Tags>
explicit Graph(int k, Fwd&& c, Tags... tags) {
data_ = c.topk(k, tags...);
}
private:
std::vector<Datum> data_;
};
Graph g1 = Graph(10, a); // non-mutating
Graph g2 = Graph(10, a, PilferTag()); // pilfer the top 10
Graph g3 = Graph(10, c); // non-mutating
Graph g4 = Graph(10, c, PilferTag()); // error
Here I’m using Fwd&&
to bind to either A&
or const A&
, whatever the
caller happens to pass to me. Here const Coll&
would be fine if I never wanted
to modify the collection; but in fact I might want to modify it, depending on
what Tags...
are. Arguably this is a lazy and unstylish use of a forwarding
reference.
See “Universal reference or forwarding reference?” (2022-02-02) and “Don’t
forward
things that aren’t forwarding references” (2023-05-07).
Approach 4: Preprocessor tricks
I don’t recommend this approach personally, but it can be attractive in real codebases.
We could dispense with inheritance and the using
-declarations that come with it,
and simply reimplement topk
in each of A
, B
, C
, D
.
“But we wanted to write topk
only once!” Sure — we’ll just let the compiler cut-and-paste
our implementation into each class’s body, using either a macro (as here) or an
#include
directive.
#define DEFINE_TOPK \
std::vector<Datum> topk(int k) const { ~~~ }
#define DEFINE_TOPK_PILFERING \
std::vector<Datum> topk(int k, PilferTag) { ~~~ }
struct A {
DEFINE_TOPK;
DEFINE_TOPK_PILFERING;
};
struct B {
DEFINE_TOPK;
DEFINE_TOPK_PILFERING;
};
struct C {
DEFINE_TOPK;
};
struct D {
DEFINE_TOPK_PILFERING;
};
struct Graph {
template<class Fwd, class... Tags>
explicit Graph(int k, Fwd&& c, Tags... tags) {
data_ = c.topk(k, tags...);
}
private:
std::vector<Datum> data_;
};
Graph g1 = Graph(10, a); // non-mutating
Graph g2 = Graph(10, a, PilferTag()); // pilfer the top 10
Graph g3 = Graph(10, c); // non-mutating
Graph g4 = Graph(10, c, PilferTag()); // error
My personal feeling is that preprocessor tricks are fine in moderation
(see the #preprocessor category on this blog),
but designing a single “thing” (topk
) to rely on preprocessor macros
and templates and perfect-forwarding is asking too much of the reader.
Obviously I prefer Approach 2, that being the only one that never gives two things the same name.
See also:
- “Techniques for post-facto multiple inheritance” (2022-12-01)
Recommend
-
5
Most of the design case studies in our blog are devoted to creating interfaces, illustrations, and branding for digital products. Today’s case is different: it’s about the design of the branding for a real physical product whose main goal is...
-
18
George Bernard Shaw once said: “There is no sincerer love than the love of food”. Perhaps, that’s one of the reasons why food and cooking is the endless source of inspiration for UX designers. Today’s case study from the Tubik team presents t...
-
9
Mobile UX Design 101: Finding personas In a short time frame like this, we needed to think about the multiple functionalities of every basic exploration method. A hypothetical persona workshop started...
-
9
-
9
Man’s fascination with mountains has been unbroken since time immemorial. Today’s case study tells you the creative story about Lumen Museum, the charming place that gives this fascination a photographic home with breathtaking views and inter...
-
6
The Good Action — A Design Sprint Case StudyAwareness actions on climate change to retain users
-
7
-
7
Semantic blind spot in Ruby case statementHi, weʼre arkency 👋 Some time ago I’ve stumbled upon an art...
-
5
Ryan Smith on Twitter: "RT @andreif7: Tenstorrent's custom RISC-V core actually seems huge as a first design, quite impressive specifications. https://t.co/AcRjbCd…"Don’t miss what’s happeningPeople on Twitter are the first to kn...
-
5
Open-Source 3dicons Library: Case Study And Free DownloadsQuick summary ↬ In this article, Vijay describes his learning experiences during the design stages of creating his 3dicon...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK