3

Using Watermarks in iText 7

 2 years ago
source link: https://dzone.com/articles/using-watermarks-in-itext-7
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Preface

The iText 7 is a powerful library for PDF manipulation. This article is the third one in the dedicated series to the iText library. The previous articles in this series are:

There are several articles for adding a watermark into PDF (e.g. https://kb.itextpdf.com/home/it7kb/faq/how-to-watermark-pdfs-using-text-or-images). Unfortunately, none of them served me well. This led me to summarizing my investigation here.

In This Article, You Will Learn:

  • What is a digital watermark
  • How to generate a PDF with the watermark
  • How to modify a PDF with the watermark
  • What are pitfalls related to the watermarks

Introduction

The "digital watermark" is defined according to Wikipedia as:

A digital watermark is a kind of marker covertly embedded in a noise-tolerant signal such as audio, video or image data. It is typically used to identify ownership of the copyright of such signal.

For PDF, the watermark feature is represented by some element (usually a text or an image) or elements added into the PDF over the regular content. The typical use cases for the watermark are:

  • A transparent label (e.g. preview/confidential/internal/restricted) or page information (e.g. current and total page numbers) or
  • an image (e.g. logo of the company or product).

Preview Watermark

In this article, we add a PREVIEW watermark to a PDF document.  Such a PDF feature promotes the user experience by allowing users to see the PDF output, but before the document is really finalized. The provided PDF forbids its usage because the PDF is marked with a clearly visible PREVIEW label. The PDF with the PREVIEW watermark example looks like this:

15551151-1642406269805.png

Note: Of course, the PREVIEW label can be formatted in a different way. The label can have a different size, color,  font family, or even transparency.

iText Solution for Watermarks

Usually, we add the watermark in post-processing, when the page content is already known. There are two solutions for it:

  • a high-level object layout or
  • a low-level content stream manipulation.

Let's start with the first option.

Generate a PDF With Watermark

Before we start adding any watermark, we need to generate any PDF first. We can do it in a standard way (see lines 3-23) as it was already demonstrated in my previous article Introduction to iText 7. The watermark is added at the end in post-processing, once the PDF is completed (25-28). At the end, we just verify the existence of the watermark on every PDF page (line 30). More details for each part are mentioned below.

@Test
void generateWatermarkToPdf() throws IOException {
	String targetPdf = "target/example-watermark-generated.pdf";
	var text1 = """
			Duisnetus ceteros proin ...  Constituamneque constituam.
			""";
	var text2 = """
			Mattisne dui imperdiet ... ut evertitur.
			""";
	var watermark = "PREVIEW";
	var textStyle = TextStyle.builder()
			.fontFamily(HELVETICA)
			.fontSize(100f)
			.rotationInDegrees(45f)
			.opacity(0.5f)
			.build();

	try (var writer = new PdfWriter(targetPdf);
			var pdfDocument = new PdfDocument(writer);
			var document = new Document(pdfDocument)) {

		document.add(new Paragraph(text1));
		document.add(new Paragraph(text2));

		var paragraph = createWatermarkParagraph(watermark, textStyle);
		for (int i = 1; i <= document.getPdfDocument().getNumberOfPages(); i++) {
			addWatermarkToPage(document, i, paragraph, textStyle, 0f);
		}
	}
	verifyPreviewWatermark(targetPdf, watermark);
}

Note: the PDF post-processing is used because we need to know these things:

  • the page size and rotation (in order to place the watermark correctly) and
  • the number of pages (to add the watermark to every page).

The output of this code (a PDF with the watermark) can be found in the figure in the "Preview Watermark" section. This screenshot represents a PDF page with the PREVIEW watermark across the whole page (over regular text).

Generate PDF Content

For the PDF content, we just use two String instances (text1 and text2 variables) with long texts. The rest is simple PDF generation as it was already demonstrated in my previous article Introduction to iText 7.

Note: The values of text1 and text2 variables were reduced for the clarity and readability of the code example.

Creating Watermark Paragraph

For the watermark, we need to start by creating a paragraph with the "PREVIEW" text which is later used as a watermark template for every single page. The createWatermarkParagraph and createFont methods use the basic features (covered in the Introduction to iText 7 article).

It's worth mentioning line 6 where the opacity (transparency) of the paragraph is set. Of course, we can omit it, but the watermark looks better with it. Note: Unfortunately, I found the opacity a little bit tricky. More details about it can be found at the end of this article.

Paragraph createWatermarkParagraph(String watermark, TextStyle textStyle) {
	var text = new Text(watermark);
	text.setFont(createFont(textStyle.getFontFamily()));
	text.setFontSize(textStyle.getFontSize());
	text.setFontColor(textStyle.getColor());
	text.setOpacity(textStyle.getOpacity());
	return new Paragraph(text);
}

PdfFont createFont(String fontFamily) {
	try {
		return PdfFontFactory.createFont(fontFamily);
	} catch (IOException e) {
		throw new PdfException("Font creation failed", e);
	}
}

Adding Watermark

The main part happens in the addWatermarkToPage method where we add the watermark to the document. This action is done in the following steps:

  1. Collect the already mentioned information about PDF (lines 3-4).
  2. Calculate a position and a rotation for placing the watermark paragraph into the pdfPage (lines 6-9).
  3. Add the watermark paragraph to the document by calling the showTextAligned method with several arguments (line 10). All these arguments are important for the final appearance of the watermark in the PDF document. Note: the argument pageIndex specifies the PDF page where the watermark is placed (see point 1).
void addWatermarkToPage(Document document, int pageIndex, 
                        Paragraph paragraph, TextStyle textStyle, float verticalOffset) {
	var pdfPage = document.getPdfDocument().getPage(pageIndex);
	var pageSize = pdfPage.getPageSizeWithRotation();

	float x = (pageSize.getLeft() + pageSize.getRight()) / 2;
	float y = (pageSize.getTop() + pageSize.getBottom()) / 2;
	float xOffset = textStyle.getFontSize() / 2;
	float rotationInRadians = (float) (PI / 180 * textStyle.getRotationInDegrees());
	document.showTextAligned(paragraph, x - xOffset, y + verticalOffset, pageIndex, CENTER, TOP, rotationInRadians);
}

Verify Watermark

At the end, we just verify the existence of the "PREVIEW" text on every page. This is a very dumb test, but it serves our purpose to verify the existence of the PREVIEW watermark in the generated PDF.

Note: please check the "Test Generated PDF Content" section in the Introduction to iText 7 if you need more information about this part.

void verifyPreviewWatermark(String targetPdf, String watermark) throws IOException {
	var extStrategy = new LocationTextExtractionStrategy();
	try (var pdfDocument = new PdfDocument(new PdfReader(targetPdf))) {
		for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++) {
			var textFromPage = getTextFromPage(pdfDocument.getPage(i), extStrategy);
			assertThat(textFromPage).contains(watermark);
		}
	}
}

Adding Watermark to an Existing PDF

Sometimes we just need to add the watermark to an already existing PDF. In order to do that, we need to follow these steps:

  • Load an already existing PDF (lines 10-13).
  • Prepare the watermark paragraph - same as above (lines 2-9 & line 15).
  • Add a watermark (line 18) and store it into the desired PDF (line 12).
  • Verify the watermark existence - same as before (line 22).
@Test
void addWatermarkToExistingPdf() throws IOException {
	var watermark = "CONFIDENTIAL";
	var textStyle = TextStyle.builder()
			.color(BLUE)
			.fontFamily(HELVETICA)
			.fontSize(50f)
			.rotationInDegrees(20f)
			.build();
	String targetPdf = RESULT_PATH + "/example-watermark-modified.pdf";

	try (var pdfDoc = new PdfDocument(new PdfReader(SOURCE_PDF), new PdfWriter(targetPdf))) {
		var document = new Document(pdfDoc);

		var paragraph = createWatermarkParagraph(watermark, textStyle);
		var transparentGraphicState = new PdfExtGState().setFillOpacity(0.5f);
		for (int i = 1; i <= document.getPdfDocument().getNumberOfPages(); i++) {
			addWatermarkToPage(document, i, paragraph, transparentGraphicState, textStyle, 330f);
		}
	}

	verifyPreviewWatermark(targetPdf, watermark);
}

You can see the output of the watermark based on the code above in the next screenshot.

15647373-1644996978574.png

Modify Existing PDF

We can use almost the same approach as we used for the PDF verification (see above) for loading the PDF.  However, we need to create a PdfDocument instance with PdfReader and PdfWriter (see line 12) this time. Basically, we read the existing PDF from the PdfReader and let iText write the modified content to the PdfWriter. Additionally, we use a low-level content stream manipulation (instead of the high-level object layout) for setting the opacity to the watermark. 

Note: this approach is used in all watermark examples I've found on the internet.

Adding Watermark

We also use the same approach as in the previous example. However, this time we use a low-level content stream manipulation (instead of the high-level object layout) for setting the opacity to the watermark. Note: this approach is used in all watermark examples I've found on the internet. 

This solution uses a graphical state wrapper where the desired opacity is specified to an instance of thePdfExtGState class instead of the Paragraph class. Note: we can re-use this wrapper (instance of the PdfExtGState class) on every page or just create a new one for every single page.

The addWatermarkToPage method works almost the same as the one before, but there are some significant changes:

  1. Create a PdfCanvas instance from the PdfPage in order to write data to the PDF content stream (line 9).
  2. Save the current state (line 10) and switch the canvas state to the graphicState passed from the method's argument (line 11).
  3. Add the watermark paragraph to the page - same as before (lines 12-14).
  4. Restore saved state from point 2 (line 16).
  5. Optionally: flush the document changes (line 15 - see the reason for it at the end) and release the canvas data (line 18). Note: these steps are useful for stability and performance reasons.
void addWatermarkToPage(Document document, int pageIndex, Paragraph paragraph, PdfExtGState graphicState, TextStyle textStyle,
		float verticalOffset) {
	var pdfDoc = document.getPdfDocument();
	var pdfPage = pdfDoc.getPage(pageIndex);
	var pageSize = pdfPage.getPageSizeWithRotation();

	float x = (pageSize.getLeft() + pageSize.getRight()) / 2;
	float y = (pageSize.getTop() + pageSize.getBottom()) / 2;
	var over = new PdfCanvas(pdfDoc.getPage(pageIndex));
	over.saveState();
	over.setExtGState(graphicState);
	float xOffset = textStyle.getFontSize() / 2;
	float rotationInRadians = (float) (PI / 180 * textStyle.getRotationInDegrees());
	document.showTextAligned(paragraph, x - xOffset, y + verticalOffset, pageIndex, CENTER, TOP, rotationInRadians);
	document.flush();
	over.restoreState();
	over.release();
}

Known Pitfalls

When playing with watermarks in iText 7 I struggled with two main issues. Both of them were discussed in PR #80. Nevertheless, there is a lot of useful information to be mentioned here. I want to mention here the important parts.

Automatic Flushing of PDF Content

We should be very careful with post-processing PDF content immediately after the PDF generation -> when the PDF content is still in the memory (before you close PdfDocument). This approach was used in our first example in the "Generation of a PDF with Watermark" section. 

The iText library flushes the PDF content per couple of pages (just after two pages by my experience) in order to mitigate memory issues. When we want to modify the PDF content already flushed then we get an ugly NullPointerException (NPE). See the full stack trace here:

Plain Text
java.lang.NullPointerException: Cannot invoke "java.util.Map.get(Object)" because "this.map" is null
	at com.itextpdf.kernel.pdf.PdfDictionary.get(PdfDictionary.java:462)
	at com.itextpdf.kernel.pdf.PdfDictionary.getAsArray(PdfDictionary.java:160)
	at com.itextpdf.kernel.pdf.PdfPage.getMediaBox(PdfPage.java:569)
	at com.itextpdf.kernel.pdf.PdfPage.getPageSize(PdfPage.java:135)
	at com.itextpdf.kernel.pdf.PdfPage.getPageSizeWithRotation(PdfPage.java:144)
	at com.github.aha.poc.itext.WatermarkTests.addWatermarkToPage(WatermarkTests.java:79)
	at com.github.aha.poc.itext.WatermarkTests.generateWatermarkToPdf(WatermarkTests.java:70)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	...

The possible solution preventing the immediate flush is to forbid the auto-flush in the setting of the Document (see the last argument on line 3). We just use another constructor from the Document class.

try (PdfWriter writer = new PdfWriter(targetPdf);
		PdfDocument pdfDocument = new PdfDocument(writer);
		Document document = new Document(pdfDocument, PageSize.A4, false)) {

	document.add(new Paragraph(text1));
	document.add(new Paragraph(text2));
	document.add(new Paragraph(text1));
	document.add(new Paragraph(text2));

	var paragraph = createWatermarkParagraph(watermark, textStyle);
	for (int i = 1; i <= document.getPdfDocument().getNumberOfPages(); i++) {
		addWatermarkToPage(document, i, paragraph, textStyle, 0f);
	}
}

Opacity

Another tricky part is the opacity of the watermark. There are basically two ways (high-level and low-level) to specify an opacity to the watermark. Both ways have some issues causing the opacity is not applied.

Object Layout Approach

At a high-level, we just add an instance of the Paragraph class with the defined opacity set to the Text instance as text.setOpacity(0.5f). This approach allows us to avoid using the external graphic state. It requires less code and it also seems more stable. Here are the additional comments:

  • We can omit the Text instance and set all the settings directly on the instance Paragraph class as:
Paragraph createWatermarkParagraph(String watermark, TextStyle textStyle) {
	return new Paragraph(watermark)
			.setFont(createFont(textStyle.getFontFamily()))
			.setFontSize(textStyle.getFontSize())
			.setFontColor(textStyle.getColor())
			.setOpacity(textStyle.getOpacity());
}
  • PR #80 recommends using text.setFontColor(textStyle.getColor(), textStyle.getOpacity()) code, but it just didn't work for me.

Content Stream Manipulation

On a low level, we add an external graphics state with an instance of PdfExtGState class. This approach was demonstrated above. It's worth mentioning  document.flush().

When we use Document(pdfDocument, PageSize.A4, false)) in order to avoid NPE described above then we need to flush the PDF document immediately. Otherwise, the added watermark is not there at all.

Conclusion

This article has covered all possibilities of how to add a watermark to a PDF document. We started by adding the watermark to the completely new PDF document. Next, the article described the PDF modification and added the watermark to an already existing PDF document. We used different solutions for both of these approaches. In the final sections, the article covered some known issues and limitations related to the watermarks.

The complete source code demonstrated above is available in my GitHub repository.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK