Leaky abstraction by omission

Sometimes, an abstraction can be leaky because it leaves something out.

Consider the following interface definition. What's wrong with it?

public interface IReservationsRepository
{
    Task Create(int restaurantId, Reservation reservation);
 
    Task<IReadOnlyCollection<Reservation>> ReadReservations(
        int restaurantId, DateTime min, DateTime max);
 
    Task<Reservation?> ReadReservation(Guid id);
 
    Task Update(Reservation reservation);
 
    Task Delete(Guid id);
}

Perhaps you think that the name is incorrect; that this really isn't an example of the Repository design pattern, as it's described in Patterns of Enterprise Application Architecture. Ironically, of all patterns, it may be the one most affected by semantic diffusion.

That's not what I have in mind, though. There's something else with that interface.

It's not its CRUD design, either. You could consider that a leaky abstraction, since it strongly implies a sort of persistent data store. That's a worthwhile discussion, but not what I have in mind today. There's something else wrong with the interface.

Consistency #

Look closer at the parameters for the various methods. The Create and ReadReservations methods take a restaurantId parameter, but the other three don't.

Why does the ReadReservations method take a restaurantId while ReadReservation doesn't? Why is that parameter required for Create, but not for Update? That doesn't seem consistent.

The reason is that each reservation has an ID (a GUID). Once the reservation exists, you can uniquely identify it to read, update, or delete it.

As the restaurantId parameter suggests, however, this interface is part of a multi-tenant code base. This code base implements an online restaurant reservation system as a REST API. It's an online service where each restaurant is a separate tenant.

While each reservation has a unique ID, the system still needs to associate it with a restaurant. Thus, the Create method must take a restaurantId parameter in order to associate the reservation with a restaurant.

Once the reservation is stored, however, it's possible to uniquely identify it with the ID. The ReadReservation, Update, and Delete methods need only the id to work.

On the other hand, when you're not querying on reservation ID, you'll need to identify the restaurant, as with the ReadReservations methods. If you didn't identify the restaurant in that method, you'd get all reservations in the requested range, from all tenants. That's not what you want. Therefore, you must supply the restaurantId to limit the query.

The interface is inconsistent, but also allows the underlying implementation to leak through.

Implied implementation detail #

If I told you that the implementation of IReservationsRepository is based on a relational database, can you imagine the design? You may want to stop reading and see if you can predict what the database looks like.

The interface strongly implies a design like this:

CREATE TABLE [dbo].[Reservations] (
    [Id]           INT              IDENTITY (1, 1) NOT NULL,
    [At]           DATETIME2 (7)    NOT NULL,
    [Name]         NVARCHAR (50)    NOT NULL,
    [Email]        NVARCHAR (50)    NOT NULL,
    [Quantity]     INT              NOT NULL,
    [PublicId]     UNIQUEIDENTIFIER NOT NULL,
    [RestaurantId] INT              NOT NULL,
    PRIMARY KEY CLUSTERED ([Id] ASC),
    CONSTRAINT [AK_PublicId] UNIQUE NONCLUSTERED ([PublicId] ASC)
);

What I wrote above is even clearer now. You can't create a row in that table without supplying a RestaurantId, since that column has a NOT NULL constraint.

The PublicId column has a UNIQUE constraint, which means that you can uniquely read and manipulate a single row when you have an ID.

Since all reservations are in a single table, any query not based on PublicId should also filter on RestaurantId. If it doesn't, the result set could include reservations from all restaurants.

Other interpretations #

Is the above relational database design the only possible implementation? Perhaps not. You could implement the interface based on a document database as well. It'd be natural to store each reservation as a separate document with a unique ID. Again, once you have the ID, you can directly retrieve and manipulate the document.

Other implementations become harder, though. Imagine, for example, that you want to shard the database design: Each restaurant gets a separate database. Or perhaps, more realistically, you distribute tenants over a handful of databases, perhaps partitioned on physical location, or some other criterion.

With such a design, the ReadReservation, Update, and Delete methods become more inefficient. While you should be able to identify the correct shard if you have a restaurant ID, you don't have that information. Instead, you'll have to attempt the operation on all databases, thereby eliminating most sharding benefits.

In other words, the absence of the restaurantId parameter from some of the methods suggests certain implementation details.

Leak by omission #

I admit that I rarely run into this sort of problem. Usually, a leaky abstraction manifests by a language construct that contains too much information. This is typically an interface or base class that exposes implementation details by either requiring too specific inputs, or by returning data that reveals implementation details.

For a data access abstraction like the above 'repository', this most frequently happens when people design such an interface around an object-relational mapper (ORM). A class like Reservation would then typically carry ORM details around. Perhaps it inherits from an ORM base class, or perhaps (this is very common) it has a parameterless constructor or getters and setters that model the relationships of the database (these are often called navigation properties).

Another common examples of a leaky abstraction might be the presence of Connect and Disconnect methods. The Connect method may even take a connectionString parameter, clearly leaking that some sort of database is involved.

Yet another example is CQS-violating designs where a Create method returns a database ID.

All such leaky abstractions are leaky because they expose or require too much information.

The example in this article, on the contrary, is leaky because of a lack of detail.

Dependency Inversion Principle #

Ironically, I originally arrived at the above design because I followed the Dependency Inversion Principle (DIP). The clients of IReservationsRepository are ASP.NET Controller actions, like this Delete method:

[HttpDelete("restaurants/{restaurantId}/reservations/{id}")]
public async Task Delete(int restaurantId, string id)
{
    if (Guid.TryParse(id, out var rid))
    {
        var r = await Repository.ReadReservation(rid)
            .ConfigureAwait(false);
        await Repository.Delete(rid).ConfigureAwait(false);
        if (r is { })
            await PostOffice.EmailReservationDeleted(restaurantId, r)
                .ConfigureAwait(false);
    }
}

As Robert C. Martin explains about the Dependency Inversion Principle:

"clients [...] own the abstract interfaces"

Robert C. Martin, APPP, chapter 11

From that principle, it follows that the Delete method decides what IReservationsRepository.Delete looks like. It seems that the Controller action doesn't need to tell the Repository about the restaurantId when calling its Delete method. Supplying the reservation ID (rid) is enough.

There are, however, various problems with the above code. If the DIP suggests that the restaurantId is redundant when calling Repository.Delete, then why is it required when calling PostOffice.EmailReservationDeleted? This seems inconsistent.

Indeed it is.

As I often do, I arrived at the above Delete method via outside-in TDD, but as I observed a decade ago, TDD alone doesn't guarantee good design. Even when following the red-green-refactor checklist, I often fail to spot problems right away.

That's okay. TDD doesn't guarantee perfection, but done well it should set you up so that you can easily make changes.

Possible remedies #

I can think of two ways to address the problem. The simplest solution is to make the interface consistent by adding a restaurantId parameter to all methods:

public interface IReservationsRepository
{
    Task Create(int restaurantId, Reservation reservation);
 
    Task<IReadOnlyCollection<Reservation>> ReadReservations(
        int restaurantId, DateTime min, DateTime max);
 
    Task<Reservation?> ReadReservation(int restaurantId, Guid id);
 
    Task Update(int restaurantId, Reservation reservation);
 
    Task Delete(int restaurantId, Guid id);
}

This is the simplest solution, and the one that I prefer. In a future article, I'll show how it enabled me to significantly simplify the code base.

For good measure, though, I should also mention the opposite solution. Completely drain the interface of restaurantId parameters:

public interface IReservationsRepository
{
    Task Create(Reservation reservation);
 
    Task<IReadOnlyCollection<Reservation>> ReadReservations(
        DateTime min, DateTime max);
 
    Task<Reservation?> ReadReservation(Guid id);
 
    Task Update(Reservation reservation);
 
    Task Delete(Guid id);
}

How can that work in practice? After all, an implementation must have a restaurant ID in order to create a new row in the database.

It's possible to solve that problem by making the restaurantId an implementation detail. You could make it a constructor parameter for the concrete class, but this gives you another problem. Your Composition Root doesn't know the restaurant ID - after all, it's a run-time argument.

In a method like the above Delete Controller action, you'd have to translate the restaurantId run-time argument to an IReservationsRepository instance. There are various ways around that kind of problem, but they typically involve some kind of factory. That'd be yet another interface:

public interface IReservationsRepositoryFactory
{
    IReservationsRepository Create(int restaurantId);
}

That just makes the API more complicated. Factories give Dependency Injection a bad reputation. For that reason, I don't like this second alternative.

Conclusion #

Leaky abstractions usually express themselves as APIs that expose too many details; the implementation details leak through.

In this example, however, a leaky abstraction manifested as a lack of consistency. Some methods require a restaurantId argument, while others don't - because one particular implementation doesn't need that information.

It turned out, though, that when I was trying to simplify the overall code, this API design held me back. Consistently adding restaurantId parameters to all repository methods solved the problem. A future article tells that tale.

Consistency #

Implied implementation detail #

Other interpretations #

Leak by omission #

Dependency Inversion Principle #

Possible remedies #

Conclusion #

Recommend

The Law of Leaky Abstractions

A Leaky Cryptocurrency Faucet

Closing Iterables is a Leaky Abstraction

Popular posts from leaky bug-tracking systems

Leaky entities, reduced productivity, and a filesystem

Ruby 3.1 allows value omission in hash literals

New Fitbit smartwatches reportedly in the works with one surprising omission

Tailwind is a Leaky Abstraction - Jake Lazaroff

Open and Closed, Omission and Collapse

Apache APISIX plugin priority, a leaky abstraction?

About Joyk