Haskell
source link: https://blog.ploeh.dk/2020/12/21/a-haskell-proof-of-concept-of-validation-with-partial-data-round-trip/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
A Haskell proof of concept of validation with partial data round trip by Mark Seemann
Which Semigroup best addresses the twist in the previous article?
This article is part of a short article series on applicative validation with a twist. The twist is that validation, when it fails, should return not only a list of error messages; it should also retain that part of the input that was valid.
In this article, I'll show how I did a quick proof of concept in Haskell.
Data definitions #
You can't use the regular Either
instance of Applicative
for validation because it short-circuits on the first error. In other words, you can't collect multiple error messages, even if the input has multiple issues. Instead, you need a custom Applicative
instance. You can easily write such an instance yourself, but there are a couple of libraries that already do this. For this prototype, I chose the validation package.
import Data.Bifunctor import Data.Time import Data.Semigroup import Data.Validation
Apart from importing Data.Validation
, I also need a few other imports for the proof of concept. All of them are well-known. I used no language extensions.
For the proof of concept, the input is a triple of a name, a date of birth, and an address:
data Input = Input { inputName :: Maybe String, inputDoB :: Maybe Day, inputAddress :: Maybe String } deriving (Eq, Show)
The goal is actually to parse (not validate) Input
into a safer data type:
data ValidInput = ValidInput { validName :: String, validDoB :: Day, validAddress :: String } deriving (Eq, Show)
If parsing/validation fails, the output should report a collection of error messages and return the Input
value with any valid data retained.
Looking for a Semigroup #
My hypothesis was that validation, even with that twist, can be implemented elegantly with an Applicative
instance. The validation package defines its Validation
data type such that it's an Applicative
instance as long as its error type is a Semigroup
instance:
Semigroup err => Applicative (Validation err)
The question is: which Semigroup
can we use?
Since we need to return both a list of error messages and a modified Input
value, it sounds like we'll need a product type of some sorts. A tuple will do; something like (Input, [String])
. Is that a Semigroup
instance, though?
Tuples only form semigroups if both elements give rise to a semigroup:
(Semigroup a, Semigroup b) => Semigroup (a, b)
The second element of my candidate is [String]
, which is fine. Lists are Semigroup
instances. But what about Input
? Can we somehow combine two Input
values into one? It's not entirely clear how we should do that, so that doesn't seem too promising.
What we need to do, however, is to take the original Input
and modify it by (optionally) resetting one or more fields. In other words, a series of functions of the type Input -> Input
. Aha! There's the semigroup we need: Endo Input
.
So the Semigroup
instance we need is (Endo Input, [String])
, and the validation output should be of the type Validation (Endo Input, [String]) a
.
Validators #
Cool, we can now implement the validation logic; a function for each field, starting with the name:
validateName :: Input -> Validation (Endo Input, [String]) String validateName (Input (Just name) _ _) | length name > 3 = Success name validateName (Input (Just _) _ _) = Failure (Endo $ \x -> x { inputName = Nothing }, ["no bob and toms allowed"]) validateName _ = Failure (mempty, ["name is required"])
This function reproduces the validation logic implied by the forum question that started it all. Notice, particularly, that when the name is too short, the endomorphism resets inputName
to Nothing
.
The date-of-birth validation function works the same way:
validateDoB :: Day -> Input -> Validation (Endo Input, [String]) Day validateDoB now (Input _ (Just dob) _) | addGregorianYearsRollOver (-12) now < dob = Success dob validateDoB _ (Input _ (Just _) _) = Failure (Endo $ \x -> x { inputDoB = Nothing }, ["get off my lawn"]) validateDoB _ _ = Failure (mempty, ["dob is required"])
Again, the validation logic is inferred from the forum question, although I found it better keep the function pure by requiring a now
argument.
The address validation is the simplest of the three validators:
validateAddress :: Monoid a => Input -> Validation (a, [String]) String validateAddress (Input _ _ (Just a)) = Success a validateAddress _ = Failure (mempty, ["add1 is required"])
This one's return type is actually more general than required, since I used mempty
instead of Endo id
. This means that it actually works for any Monoid a
, which also includes Endo Input
.
Composition #
All three functions return Validation (Endo Input, [String])
, which has an Applicative
instance. This means that we should be able to compose them together to get the behaviour we're looking for:
validateInput :: Day -> Input -> Either (Input, [String]) ValidInput validateInput now args = toEither $ first (first (`appEndo` args)) $ ValidInput <$> validateName args <*> validateDoB now args <*> validateAddress args
That compiles, so it probably works.
Sanity check #
Still, it'd be prudent to check. Since this is only a proof of concept, I'm not going to set up a test suite. Instead, I'll just start GHCi for some ad-hoc testing:
λ> now <- localDay <&> zonedTimeToLocalTime <&> getZonedTime λ> validateInput now & Input Nothing Nothing Nothing Left (Input {inputName = Nothing, inputDoB = Nothing, inputAddress = Nothing}, ["name is required","dob is required","add1 is required"]) λ> validateInput now & Input (Just "Bob") Nothing Nothing Left (Input {inputName = Nothing, inputDoB = Nothing, inputAddress = Nothing}, ["no bob and toms allowed","dob is required","add1 is required"]) λ> validateInput now & Input (Just "Alice") Nothing Nothing Left (Input {inputName = Just "Alice", inputDoB = Nothing, inputAddress = Nothing}, ["dob is required","add1 is required"]) λ> validateInput now & Input (Just "Alice") (Just & fromGregorian 2002 10 12) Nothing Left (Input {inputName = Just "Alice", inputDoB = Nothing, inputAddress = Nothing}, ["get off my lawn","add1 is required"]) λ> validateInput now & Input (Just "Alice") (Just & fromGregorian 2012 4 21) Nothing Left (Input {inputName = Just "Alice", inputDoB = Just 2012-04-21, inputAddress = Nothing}, ["add1 is required"]) λ> validateInput now & Input (Just "Alice") (Just & fromGregorian 2012 4 21) (Just "x") Right (ValidInput {validName = "Alice", validDoB = 2012-04-21, validAddress = "x"})
In order to make the output more readable, I've manually edited the GHCi session by adding line breaks to the output.
It looks like it's working like it's supposed to. Only the last line successfully parses the input and returns a Right
value.
Conclusion #
Before I started this proof of concept, I had an inkling of the way this would go. Instead of making the prototype in F#, I found it more productive to do it in Haskell, since Haskell enables me to compose things together. I particularly appreciate how a composition of types like (Endo Input, [String])
is automatically a Semigroup
instance. I don't have to do anything. That makes the language great for prototyping things like this.
Now that I've found the appropriate semigroup, I know how to convert the code to F#. That's in the next article.
Next: An F# demo of validation with partial data round-trip.
Comments
Great work and excellent post. I just had a few clarification quesitons.
...But what about
Input
? Can we somehow combine twoInput
values into one? It's not entirely clear how we should do that, so that doesn't seem too promising.What we need to do, however, is to take the original
Input
and modify it by (optionally) resetting one or more fields. In other words, a series of functions of the typeInput -> Input
. Aha! There's the semigroup we need:Endo Input
.
How rhetorical are those questions? Whatever the case, I will take the bait.
Any product type forms a semigroup if all of its elements do. You explicitly stated this for tuples of length 2; it also holds for records such as Input
. Each field on that record has type Maybe a
for some a
, so it suffices to select a semigroup involving Maybe a
. There are few different semigropus involving Maybe
that have different functions.
I think the most common semigroup for Maybe a
has the function that returns the first Just _
if one exists or else returns Nothing
. Combining that with Nothing
as the identity element gives the monoid that is typically associated with Maybe a
(and I know by the name monoidal plus). Another monoid, and therefore a semigroup, is to return the last Just _
instead of the first.
Instead of the having a preference for Just _
, the function could have a preference for Nothing
. As before, when both inputs are Just _
, the output could be either of the inputs.
I think either of those last two semigroups will achieved the desired behavior in the problem at hand. Your code never replaces an instace of Just a
with a different instance, so we don't need a preference for some input when they are both Just _
.
In the end though, I think the semigroup you derived from Endo
leads to simpler code.
At the end of the type signature for validateName
/ validateDoB
/ validateAddress
, what does String
/ Day
/ String
mean?
Why did you pass all three arguments into every parsing/validation function? I think it is a bit simpler to only pass in the needed argument. Maybe you thought this was good enough for prototype code.
Why did you use add1
in your error message instead of address
? Was it only for prototype code to make the message a bit shorter?
Wish to comment?
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK