

Nibble Stew: Unit testing PDF generation
source link: https://nibblestew.blogspot.com/2023/02/unit-testing-pdf-generation.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

A gathering of development thoughts of Jussi Pakkanen. Some of you may know him as the creator of the Meson build system.
Monday, February 27, 2023
Unit testing PDF generation
How would you test PDF generation?
This turns out to be unexpectedly difficult because you need to check that the files are both syntactically and semantically valid. The former could be tested with existing PDF validators and converters but the latter is more difficult.
If you, say, try to render a red square, the end result should be that the PDF command stream has two commands, a re command and an f command. That could be verified simply by grepping the command stream with some regexps. It would not be very useful, though, as there is no guarantee that those commands actually produce a red square in the output document. There are dozens of ways to make the output stream not produce a red square in the intended location without breaking file "validity" in any way.
What even is the truth?
The only reliable way is to render the PDF file into an image and compare it to a ground truth image. Assuming the output is "close enough" then the generator can be said to have worked correctly. The problem, as is often the case, lies inside those quote marks. Fuzzy image equality is a difficult problem. Those interested in details can look it up online. For our case we'll just ignore it and require pixel perfect reproduction. This means that we can have test failures if we change the PDF rendering backend, run it on a different operating system or even just upgrade it to a new version.
The other problem comes from the desire to have a plain C API. Writing unit tests in C is cumbersome to say the least. Fortunately there is a simpler solution. Since the project ships its own Python bindings, we can write all of these tests using Python. This affords us all the niceties that exist in Python such as an extensive unit testing suite, the ability to easily spawn external processes and image difference operators (via PIL). After some boilerplate, writing an unit tests reduces to this:
Behind the scenes this will generate the PDF, render it with Ghostscript and compare the result to an existing image. If the output is not bitwise identical the test fails.
Get the code
Recommend
-
6
Nibble Stew A gathering of development thoughts of Jussi Pakkanen. Some of you may know him as the creator of the Meson build...
-
8
Nibble Stew A gathering of development thoughts of Jussi Pakkanen. Some of you may know him as the creator of the Meson build...
-
10
Nibble Stew A gathering of development thoughts of Jussi Pakkan...
-
12
Nibble Stew A gathering of development thoughts of Jussi Pakkan...
-
8
Nibble Stew A gathering of development thoughts of Jussi Pakkan...
-
10
Nibble Stew A gathering of development thoughts of Jussi Pakkan...
-
5
Nibble Stew A gathering of development thoughts of Jussi Pakkan...
-
3
PDF supports playing back video content since version 1.5. I could do the whole shtick and shpiel routine of "surely this is working technology as the specification is over 20 years old by now". But you already know that it is not the case. Probab...
-
6
PDF transparency groups and composition Sunday, July 16, 2023 PDF transparency groups and composition The PDF specification has the following image as an example of how to do...
-
12
Friday, July 21, 2023 Creating a PDF/X-3 document with CapyPDF The original motivation for creating CapyPDF was understanding how fully color ma...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK