4

Discerning and maintaining purity

 3 years ago
source link: https://blog.ploeh.dk/2020/02/24/discerning-and-maintaining-purity/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Discerning and maintaining purity

Functional programming depends on referential transparency, but identifying and keeping functions pure requires deliberate attention.

Referential transparency is the essence of functional programming. Most other traits that people associate with functional programming emerge from it: immutability, recursion, higher-order functions, functors and monads, etcetera.

To summarise, a pure function has to obey two rules:

  • The same input always produces the same output.
  • Calling it causes no side effects.
While those rules are easy to understand and remember, in practice they're harder to follow than most people realise.

Lack of abstraction #

Mainstream programming languages don't distinguish between pure functions and impure actions. I'll use C# for examples, but you can draw the same conclusions for Java, C, C++, Visual Basic .NET and so on - even for F# and Clojure.

Consider this line of code:

string validationMsg = Validator.Validate(dto);

Is Validate a pure function?

You might want to look at the method signature before you answer:

public static string Validate(ReservationDto dto)

This is, unfortunately, not helpful. Will Validate always return the same string for the same dto? Can we guarantee that there's no side effects?

You can't answer these questions only by examining the method signature. You'll have to go and read the code.

This breaks encapsulation. It ruins abstraction. It makes code harder to maintain.

I can't stress this enough. This is what I've attempted to describe in my Humane Code video. We waste significant time reading existing code. Mostly because it's difficult to understand. It doesn't fit in our brains.

Agile Principles, Patterns, and Practices defines an abstraction as

"the amplification of the essential and the elimination of the irrelevant"

Robert C. Martin

This fits with the definition of encapsulation from Object-Oriented Software Construction. You should be able to interact with an object without knowledge of its implementation details.

When you have to read the code of a method, it indicates a lack of abstraction and encapsulation. Unfortunately, that's the state of affairs when it comes to referential transparency in mainstream programming languages.

Manual analysis #

If you read the source code of the Validate method, however, it's easy to figure out whether it's pure:

public static string Validate(ReservationDto dto)
{
    if (!DateTime.TryParse(dto.Date, out var _))
        return $"Invalid date: {dto.Date}.";
    return "";
}

Is the method deterministic? It seems like it. In fact, in order to answer that question, you need to know if DateTime.TryParse is deterministic. Assume that it is. Apart from the TryParse call, you can easily reason about the rest of this method. There's no randomness or other sources of non-deterministic behaviour in the method, so it seems reasonable to conclude that it's deterministic.

Does the method produce side effects? Again, you have to know about the behaviour of DateTime.TryParse, but I think it's safe to conclude that there's no side effects.

In other words, Validate is a pure function.

Testability #

Pure functions are intrinsically testable because they depend exclusively on their input.

[Fact]
public void ValidDate()
{
    var dto = new ReservationDto { Date = "2021-12-21 19:00", Quantity = 2 };
    var actual = Validator.Validate(dto);
    Assert.Empty(actual);
}

This unit test creates a reservation Data Transfer Object (DTO) with a valid date string and a positive quantity. There's no error message to produce for a valid DTO. The test asserts that the error message is empty. It passes.

You can with similar ease write a test that verifies what happens if you supply an invalid Date string.

Maintaining purity #

The problem with manual analysis of purity is that any conclusion you reach only lasts until someone edits the code. Every time the code changes, you must re-evaluate.

Imagine that you need to add a new validation rule. The system shouldn't accept reservations in the past, so you edit the Validate method:

public static string Validate(ReservationDto dto)
{
    if (!DateTime.TryParse(dto.Date, out var date))
        return $"Invalid date: {dto.Date}.";

    if (date < DateTime.Now)
        return $"Invalid date: {dto.Date}.";

    return "";
}

Is the method still pure? No, it's not. It's now non-deterministic. One way to observe this is to let time pass. Assume that you wrote the above unit test well before December 21, 2021. That test still passes when you make the change, but months go by. One day (on December 21, 2021 at 19:00) the test starts failing. No code changed, but now you have a failing test.

I've made sure that the examples in this article are simple, so that they're easy to follow. This could mislead you to think that the shift from referential transparency to impurity isn't such a big deal. After all, the test is easy to read, and it's clear why it starts failing.

Imagine, however, that the code is as complex as the code base you work with professionally. A subtle change to a method deep in the bowels of a system can have profound impact on the entire architecture. You thought that you had a functional architecture, but you probably don't.

Notice that no types changed. The method signature remains the same. It's surprisingly difficult to maintain purity in a code base, even if you explicitly set out to do so. There's no poka-yoke here; constant vigilance is required.

Automation attempts #

When I explain these issues, people typically suggest some sort of annotation mechanism. Couldn't we use attributes to identify pure functions? Perhaps like this:

[Pure]
public static string Validate(ReservationDto dto)

This doesn't solve the problem, though, because this still still compiles:

[Pure]
public static string Validate(ReservationDto dto)
{
    if (!DateTime.TryParse(dto.Date, out var date))
        return $"Invalid date: {dto.Date}.";
            
    if (date < DateTime.Now)
        return $"Invalid date: {dto.Date}.";
            
    return "";
}

That's an impure action annotated with the [Pure] attribute. It still compiles and passes all tests (if you run them before December 21, 2021). The annotation is a lie.

As I've already implied, you also have the compound problem that you need to know the purity (or lack thereof) of all APIs from the base library or third-party libraries. Can you be sure that no pure function becomes impure when you update a library from version 2.3.1 to 2.3.2?

I'm not aware of any robust automated way to verify referential transparency in mainstream programming languages.

Language support #

While no mainstream languages distinguish between pure functions and impure actions, there are languages that do. The most famous of these is Haskell, but other examples include PureScript and Idris.

I find Haskell useful for exactly that reason. The compiler enforces the functional interaction law. You can't call impure actions from pure functions. Thus, you wouldn't be able to make a change to a function like Validate without changing its type. That would break most consuming code, which is a good thing.

You could write an equivalent to the original, pure version of Validate in Haskell like this:

validateReservation :: ReservationDTO -> Either String ReservationDTO
validateReservation r@(ReservationDTO _ d _ _ _) =
  case readMaybe d of
    Nothing -> Left $ "Invalid date: " ++ d ++ "."
    Just (_ :: LocalTime) -> Right r

This is a pure function, because all Haskell functions are pure by default.

You can change it to also check for reservations in the past, but only if you also change the type:

validateReservation :: ReservationDTO -> IO (Either String ReservationDTO)
validateReservation r@(ReservationDTO _ d _ _ _) =
  case readMaybe d of
    Nothing -> return $ Left $ "Invalid date: " ++ d ++ "."
    Just date -> do
      utcNow <- getCurrentTime
      tz <- getCurrentTimeZone
      let now = utcToLocalTime tz utcNow
      if date < now
        then return $ Left $ "Invalid date: " ++ d ++ "."
        else return $ Right r

Notice that I had to change the return type from Either String ReservationDTO to IO (Either String ReservationDTO). The presence of IO marks the 'function' as impure. If I hadn't changed the type, the code simply wouldn't have compiled, because getCurrentTime and getCurrentTimeZone are impure actions. These types ripple through entire code bases, enforcing the functional interaction law at every level of the code base.

Pure date validation #

How would you validate, then, that a reservation is in the future? In Haskell, like this:

validateReservation :: LocalTime -> ReservationDTO -> Either String ReservationDTO
validateReservation now r@(ReservationDTO _ d _ _ _) =
  case readMaybe d of
    Nothing -> Left $ "Invalid date: " ++ d ++ "."
    Just date ->
      if date < now
        then Left $ "Invalid date: " ++ d ++ "."
        else Right r

This function remains pure, although it still changes type. It now takes an additional now argument that represents the current time. You can retrieve the current time as an impure action before you call validateReservation. Impure actions can always call pure functions. This enables you to keep your complex domain model pure, which makes it simpler, and easier to test.

Translated to C#, that corresponds to this version of Validate:

public static string Validate(DateTime now, ReservationDto dto)
{
    if (!DateTime.TryParse(dto.Date, out var date))
        return $"Invalid date: {dto.Date}.";
 
    if (date < now)
        return $"Invalid date: {dto.Date}.";
 
    return "";
}

This version takes an additional now input parameter, but remains deterministic and free of side effects. Since it's pure, it's trivial to unit test.

[Theory]
[InlineData("2010-01-01 00:01", "2011-09-11 18:30", 3)]
[InlineData("2019-11-26 13:59", "2019-11-26 19:00", 2)]
[InlineData("2030-10-02 23:33", "2030-10-03 00:00", 2)]
public void ValidDate(string now, string reservationDate, int quantity)
{
    var dto = new ReservationDto { Date = reservationDate, Quantity = quantity };
    var actual = Validator.Validate(DateTime.Parse(now), dto);
    Assert.Empty(actual);
}

Notice that while the now parameter plays the role of the current time, the fact that it's just a value makes it trivial to run simulations of what would have happened if you ran this function in 2010, or what will happen when you run it in 2030. A test is really just a simulation by another name.

Summary #

Most programming languages don't explicitly distinguish between pure and impure code. This doesn't make it impossible to do functional programming, but it makes it arduous. Since the language doesn't help you, you must constantly review changes to the code and its dependencies to evaluate whether code that's supposed to be pure remains pure.

Tests can help, particularly if you employ property-based testing, but vigilance is still required.

While Haskell isn't a mainstream programming language, I find that it helps me flush out my wrong assumptions about functional programming. I write many prototypes and proofs of concept in Haskell for that reason.

Once you get the hang of it, it becomes easier to spot sources of impurity in other languages as well.

  • Anything with the void return type must be assumed to induce side effects.
  • Everything that involves random numbers is non-deterministic.
  • Everything that relies on the system clock is non-deterministic.
  • Generating a GUID is non-deterministic.
  • Everything that involves input/output is non-deterministic. That includes the file system and everything that involves network communication. In C# this implies that all asynchronous APIs should be considered highly suspect.
If you want to harvest the benefits of functional programming in a mainstream language, you must look out for such pitfalls. There's no tooling to assist you.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK