5

Pure Print-Style Debugging in Haskell | by Rebecca Skinner | The Pragmatic Progr...

 2 years ago
source link: https://medium.com/pragmatic-programmers/pure-print-style-debugging-in-haskell-c4c5d4f39afa
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Photo by David Clode on Unsplash

Pure Print-Style Debugging in Haskell

Print-Style Debugging with Trace

https://pragprog.com/newsletter/
https://pragprog.com/newsletter/

Print debugging, logging, tracing, this debugging technique goes by many names. Whatever you call it, the technique is the same: add a bunch of code to print messages to the screen, often including the values of some inputs, outputs, and intermediate variables. Run the program a bunch of times, read the debugging output, and hope you can spot what’s going wrong. It’s not the most elegant debugging technique, but print debugging can be incredibly effective.

For developers working in impure languages, print debugging is often the first stop for figuring out where and why something is going wrong. In Haskell, where printing something to the screen is a side effect to avoid, it might seem like you need to give up print debugging for the sake of purity. Thankfully, you can have your cake and eat it too… if you're okay with eating buggy cake.

The Debug.Trace module, which is part of base, provides tools for print-style debugging for our pure functional programs. In this article, we’ll walk through a hands-on example using print-style debugging to narrow down the problem with a small example program. You’ll learn how to apply tracing to finding bugs in a Haskell program, and you’ll learn a little bit about Unicode along the way.

Counting Human-Readable Characters

Photo by Glen Carrie on Unsplash

For this example, we’re going to write a program to count the number of characters in a document. We would like our program to print the same number of characters to the screen that we’d get if we counted the characters by hand. We’ll include things like spaces and punctuation, but we’ll exclude any non-printable characters, like new lines or control characters.

Let’s start with a straightforward implementation and see how it works:

This program looks like a good first attempt, and it’ll work for ASCII text, but if we try to count some text that includes emoji we might get the wrong result. Let’s create a file that includes emojis:

This file contains four Unicode emoji symbols:

  1. Female Technologist: 👩‍💻
  2. Pirate Flag: 🏴‍☠️
  3. Pride Flag:🏳️‍🌈
  4. Female Astronaut: 👩‍🚀

If we save this file and then try to count the characters in it, we’d expect our program to return 4, since there are four symbols in the file. Unfortunately, that’s not the case. Let’s run it and see what happens:

Somehow our four characters have turned into ten! Clearly something has gone wrong. In theory, we could change our program so that charCounts ran in IO and then debug by printing the characters, or the entire string, to the screen. For a small example like this, that approach would work, but in a larger program, refactoring a pure function into an impure one can be a lot of work. We’d rather not undertake a significant refactor just to debug our application.

Our First Trace

As an alternative to calling print inside of an IO action, we can use the tracing functions that are available in Debug.Trace. The trace function, and its variations, let us print text to the screen without having to make our function run in IO. Let’s try it out:

In this example, we’ve imported two new modules. Debug.Trace gives us access to trace — responsible for printing a message to the screen. We’re also using printffrom the Text.Printf module. Don’t let the name mislead you, printf is only formatting the message. It doesn’t print anything to the screen itself. You can read more about printf and format strings in the documentation on hackage.

If we run this version of the program, we’ll start to get some useful information:

There are a couple of important things to see here, and we’ll work through them one at a time. The first thing you’ll notice is that we’re getting what looks like an empty character, but its hex code is 0xfe0f. If you look up this particular code, you’ll see that it’s a Unicode Variant Selector. Specifically, this is a control character that’s telling whatever is rendering the text that we want the full color emoji version of the symbol. Unfortunately, it looks like this control character is classified as a printing character, so our original code will need to account for this character in particular. Let’s refactor to account for the variant selector:

Our code is getting a bit longer now. We’ve added variantSelectors to list all of the different variant selectors we might encounter. We’ve also replaced our call to isPrint with a call to our new isActualPrint function. This function will not only filter out the variant selectors when we have a printable character, it will also print out a debug message so that we can make sure it’s behaving as we expect.

If we run this code, we’ll see that we’re not filtering out the variant selectors when we count characters. As we’d hope, each time we would have previously seen the variant selector in our debugging output, we now see the message that we’ve found a variant selector:

Our variant selectors are getting filtered out now, and we’re getting closer to a right answer. Our program thinks there are eight symbols, which is wrong, but it’s at least a result that’s an even multiple of the right answer, so it seems like we’re on the right track.

Zero-Width Joiners

Photo by Markus Winkler on Unsplash

If you’re familiar with Unicode emoji, you might have spotted the problem already. If not, let’s add a little bit of extra debugging to see if we can spot the problem:

In this example, we’ve added a new trace statement to isActualPrint that will show us the value of any space characters that we were otherwise ignoring. If we run this code, we’ll see something curious:

It appears that a character is getting inserted between several of our individual emoji symbols. If you look closely, you’ll see a pattern:

  • 👩‍💻 has become 👩 0x200d💻
  • 🏴‍☠️ is now 🏴 0x200d ‍☠
  • 🏳️‍🌈 becomes 🏳 0x200d 🌈, plus a variant selector
  • 👩‍🚀 completes the pattern as 👩 0x200d ‍🚀

It turns out that many of the emoji that we see every day are not defined by a particular character. Instead, they are combinations of other emoji. We join emoji like this using the zero-width joiner — not a printing character, but it definitely affects how we should count characters if we want to reflect the way a person reading the screen might intuitively count them.

Working with Zero-Width Joiners

Let’s refactor our program to handle the fact that zero-width joiners are telling us that some symbols should be visually combined, even if they are distinct symbols when we are looking at individual characters. We’ll remove our old isActualPrint function, and replace it with classifyCharacter. This function will tell us what kind of character we’re dealing with, so we can handle it appropriately. We’ll also add in some additional debugging messages so that we can check the output against our expectations.

This example introduces quite a few new changes over our previous version. First, we’ve added a new type, CharacterClass, that will tell us what kind of character we’re dealing with. We’re classifying characters now, rather than just asking if they are printable. Finally, in charCounts we’re subtracting from the overall character count if we find a zero-width joiner.

The way that we’re approaching debugging in charCounts has changed slightly as well. We’re no longer adding debugging messages directly into the logic we want to test. Instead, we’ve defined a wrapper function called debugCharacterCount that will handle the debugging for us. Adding instrumentation functions that enhance a function call with debugging is a useful technique when you are using print-style debugging, because it gives you a single set of functions that you can remove to clean up all of your debugging. Debugging can be tricky in an impure language. If your code can have arbitrary side effects, and your instrumentation functions call the code multiple times when generating debugging output, then debugging might change the results of your program. In most cases, such changes aren’t a risk in Haskell. Since we can call pure functions as many times as we like without any side effects, we don’t have to worry as much about instrumentation changing the behavior of our program. Just keep in mind that trace is a living example of the fact that Haskell does have some escape hatches that let you write impure code. If you instrument code that also does its own tracing, for example, the results might be unpredictable.

Now that we have some new code, and new debugging messages, let’s give our program another run and see what we get:

It looks like we have a bug. Our program is now claiming that the input has no characters. Maybe we’re not adding or subtracting like we think? Let’s add some more debugging to find out:

Immediately after warning that we should be careful about having an instrumentation function call another function that does tracing, we’ve gone and done just that. Since we’re calling countCharacter exactly once per call to our instrumentation function, we should be fine. No duplicate side effects to worry about. Let’s run it and see what happens:

Well, this is concerning. Not only do we still have a bug, it seems like our new debugging output isn’t being called either. What’s going on?

Debugging in the Land of Laziness

Photo by Meg Jerrard on Unsplash

When you are trying to use print-style debugging in Haskell and find that you’re suddenly not seeing the debugging output you expected, there are two probably causes:

  1. Instrumentation. If you are using an instrumentation function to add debugging, you might have forgot to change the calls. You might still be calling the non-instrumented versions of the function.
  2. Laziness. The code might not be running because of laziness.

We’re certainly calling our instrumented code, because we see all of the extra output that it’s generating. So we might be missing our debugging output because of laziness.

When debugging Haskell code, especially using a relatively impure approach to debugging like the print-style debugging we’re using in these examples, it’s easy to get stuck in the mindset of a traditional strict imperative language. We read the code and see that it’s being called, and we expect to see our output make it to the screen. Calls to trace in our Haskell code are still expressions though, and they are still expressions that are evaluated only if their results are needed. We can see this behavior in action with a short example:

In this example, we call trace several times with various intermediate values, but if we run unusedTraces nothing is printed to the screen. Nothing prints because none of those intermediate values are ever evaluated.

Perhaps the reason we’re not seeing any debugging messages is that our code isn’t being evaluated. Debugging is nothing if not testing hypotheses, so let’s try to force our call to countCharacter to be evaluated, so that we can see if that fixes the problem. One way to ensure our code is being evaluated is to tell the compiler to make that particular expression strict. We can do that with a bang pattern.

To create a bang pattern, we need to add an exclamation point before the name of the binding that we want to make strict. Let’s update our example to make newCount strict:

This example is largely similar to the previous version, but on line 35 we’ve added a bang pattern to ensure that newCount is now strictly evaluated. We’ve also added the BangPatterns language extension. You won’t need this extension if you are using version 9.0 or later of GHC, but at the time of this writing, older versions of GHC are still in wide use, and you’ll need the extension there.

If we run our program with the bang pattern added, we’ll see that we’re still getting our buggy output, but the new debugging messages that we added are showing up as we’d hoped:

Not only are we seeing debugging output, but the output is what we’d expect. What’s going on here?

It turns out the fact that our debugging output wasn’t being displayed was a good clue. We weren’t using our updated value. If you look at line 39 of our example, you’ll see that we’re actually returning our original count, rather than the new one we’ve just calculated. It turns out we were never updating the count at all!

When you’re debugging Haskell programs, the absence of useful information is often a powerful clue that you might have a problem caused by some code not being evaluated when you expect it to be. In our case, a compiler warnings or a linter can tell us that we have unused values, but in more complicated programs it might not be immediately obvious that some code is never being evaluated. Adding bang patterns and observing changes to the output can be a tip-off that your code isn’t being run when you expect. Frequently, a single trace message at the start of a function call to tell you when it’s been evaluated can put you on a path toward identifying bugs related to to laziness.

Now that we thing we know the problem, let’s add a fix and run our application with debugging one more time. In this example, we’ll remove our bang pattern and fix the return value of debugCharacterCount so that it uses our newly updated value:

If we run this code, we get exactly what we’d hoped: we’re adding and subtracting when we should be, and we correctly count 4 characters in the file:

Cleaning Up

The last thing to do is remove all of our extraneous debugging. Since we’ve used an instrumentation function for some of our debugging, we can remove that easily. We’ve also mixed some debugging messages into our program logic. That’s harder to remove, but thankfully we have only a couple of instances of messages in our small examples. Once we’ve cleaned everything up, we have a program that’s somewhat longer than our first attempt, but much shorter than our intermediate debugging attempts. Most importantly, it actually works:

Let’s run it, just to be sure:

Perfect, we counted our four emoji correctly 👩‍💻 🏴‍☠️ 🏳️‍🌈👩‍🚀.

Summary

Debugging code is always difficult and often rewarding. Not only will you create programs that work better, you’ll also gain a deeper understanding of your code, the language, and runtime. Debugging Haskell is similar in many ways to debugging any other language, but from time to time unique aspects of the language, like it’s laziness, mean that you’ll need to experiment with different approaches and learn new ways to interpret the debugging output.

Print-style debugging is, of course, not the only way to debug Haskell programs. You can also use tools like the GHC Event Log, step through your programs and set breakpoints interactively with GHCi, generate a call stack to trace through your program, and even use GDB to compiled Haskell applications. All of these techniques offer their own benefits and drawbacks.

With the tools that Haskell gives you to write programs that don’t fail, and the tools it gives you to debug your programs when they fail anyway, you’ll be better prepared to deliver high quality code, whether you’re using Haskell for your own learning, side projects, or at work.

Be sure to check out Rebecca Skinner’s book from The Pragmatic Bookshelf, now in beta. You’ll find a code that saves you 35% on the book’s forum page on DevTalk, where you can also contribute to the conversation about the beta. Promo codes are not valid on prior purchases.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK