PDFs are one of the most common and most significant document formats on the internet. Typically, developers must use expensive tools from Adobe or cumbersome APIs to generate PDFs. In this article, you will learn how to programmatically generate PDFs easily with plain XHTML and CSS using two open source Java libraries: Flying Saucer and iText.
The Problem with PDFs
PDFs are a great technology. When Adobe created the PDF format, they had a vision for a portable document format (hence the name) that could be viewed on any computer and printed to any printer. Unlike web pages, PDFs will look exactly the same on every device, thanks to the rigorous PDF specification. And the best thing about PDFs is that the specification is open so you can generate them on the fly, using readily available open source libraries.
There is one big problem with PDFs, however: the spec is complicated and the APIs for generating PDFs tend to be cumbersome, requiring a lot of low-level coding of paragraphs and headers. More importantly, you have to use code to generate PDFs. But to make good-looking PDFs, you need a graphic designer to create the layout. Even if graphic designers are up to the task of programming, they still must convert their layout from some other format to code, which can be cumbersome, buggy, and time-consuming. Fortunately, there is a better way.
The way to make good looking PDFs is to let the programmers do what they are good at: writing code that manipulates data, and let the graphic designers do what they are good at: making attractive graphic designs. Flying Saucer and iText are tools that do this. They let you render CSS stylesheets and XHTML, either static or generated, directly to PDFs.
An Introduction to Flying Saucer and iText
Flying Saucer, which is the common name for the xhtmlrenderer project on java.net, is an LGPLed Java library on java.net originally created by me and continually developed by the java.net community. Download it from the project page, or use the copy included with this article's sample code (see Resources). Flying Saucer's primary purpose is to render spec-compliant XHTML and CSS 2.1 to the screen as a Swing component. Though it was originally intended for embedding markup into desktop applications (things like the iTunes Music Store), Flying Saucer has been extended work with iText as well. This makes it very easy to render XHTML to PDFs, as well as to images and to the screen. Flying Saucer requires Java 1.4 or higher.
iText
is a PDF generation library created by Bruno Lowagie and Paulo Soares, licensed under the LGPL and the Mozilla Public License. You can download iText from its home page or use the copy in the download bundle at the end of this article (see Resources). Using the iText API, you can produce paragraphs, headers, or any other PDF feature. Since the PDF imaging model is fairly similar to Java2D's model, Flying Saucer and iText can easily work together to produce PDFs. In fact, the PDF version of the Flying Saucer user manual was itself produced using Flying Saucer and iText.
Generating a Simple PDF
To get started, I'm going to show you how to render a very simple HTML document as a PDF file. You can see in the samples/firstdoc.xhtml file below that it's a plain XHTML document (note the XHTML DTD in the header) and contains only a single formatting rule: b { color: green; }
. This means the default HTML formatting for paragraphs and text will apply, with the exception that all b
elements will be green.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My First Document</title>
<style type="text/css"> b { color: green; } </style>
</head>
<body>
<p>
<b>Greetings Earthlings!</b>
We've come for your Java.
</p>
</body>
</html>
Now that we have a document, we need code to produce the PDF. The FirstDoc.java file below is the simplest possible way to render a PDF document.
package flyingsaucerpdf;
import java.io.*;
import com.lowagie.text.DocumentException;
import org.xhtmlrenderer.pdf.ITextRenderer;
public class FirstDoc {
public static void main(String[] args)
throws IOException, DocumentException {
String inputFile = "samples/firstdoc.xhtml";
String url = new File(inputFile).toURI().toURL().toString();
String outputFile = "firstdoc.pdf";
OutputStream os = new FileOutputStream(outputFile);
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(url);
renderer.layout();
renderer.createPDF(os);
os.close();
}
}
There are two main parts to the code. First it prepares the input and output files. Since Flying Saucer deals with input URLs, the code above converts a local file string into a file://
URL using the File
class. The output document is just a FileOutputStream
that writes to the firstdoc.pdf file in the current working directory.
The second part of the code creates a new ITextRenderer
object. This is the Flying Saucer class that knows how to render PDFs using iText. You must first set the document
property of the renderer using the setDocument(String)
method. There are other methods for setting the document using URLs and W3C DOM objects. Once the document is installed you must call layout()
to perform the actual layout of the document and then createPDF()
to draw the document into a PDF file on disk.
To compile and run this code you need the Flying Saucer .jar, core-renderer.jar. For this article I am using a recent development build (R7 HEAD
). R7 final should be out in a few weeks, perhaps by the time you read this. I chose to use a recent R7 build instead of the year-old R6 because R7 has a rewritten CSS parser, better table support, and of course, many, many bugfixes. You will also need the iText .jar itext_paulo-155.jar (this is actually an early access copy of iText from its SourceForge project page). All of these .jars are included in the standard Flying Saucer R6 download, and also in the examples.zip file in this article's Resources section. Once you put these .jars in your classpath everything will compile and run. The finished PDF looks like Figure 1:
Figure 1. Screenshot of firstdoc.pdf (click to download full PDF document)
Generating Content on the Fly
Producing a PDF from static documents is useful, but it would be more interesting if you could generate the markup programmatically. Then you could produce documents that contain more interesting content than simple static text.
Below is the code for a simple program that generates the lyrics to the song "99 Bottles of Beer on the Wall." This song has a repeated structure, so we can easily produce the lyrics with a simple loop. This document also uses some extra CSS styles like color, text transformation, and modified padding.
In first part of the OneHundredBottles.java code, all of the style and markup is appended to a StringBuffer
. Note that the style rule for h3
includes the text-transform
property. This will capitalize the first letter of every word in the title. The body of the document is produced by the loop that goes from 99 to 0. Notice that there is an image, 100bottles.jpg
, included at the top of the document. iText will embed the image in the resulting PDF, meaning the user will not need to load any other images once they receive the PDF. This is an advantage of PDFs over HTML, where images must be stored separately.
public class OneHundredBottles {
public static void main(String[] args) throws Exception {
StringBuffer buf = new StringBuffer();
buf.append("<html>");
// put in some style
buf.append("<head><style language='text/css'>");
buf.append("h3 { border: 1px solid #aaaaff; background: #ccccff; ");
buf.append("padding: 1em; text-transform: capitalize; font-family: sansserif; font-weight: normal;}");
buf.append("p { margin: 1em 1em 4em 3em; } p:first-letter { color: red; font-size: 150%; }");
buf.append("h2 { background: #5555ff; color: white; border: 10px solid black; padding: 3em; font-size: 200%; }");
buf.append("</style></head>");
// generate the body
buf.append("<body>");
buf.append("<p><img src='100bottles.jpg'/></p>");
for(int i=99; i>0; i--) {
buf.append("<h3>"+i+" bottles of beer on the wall, "
+ i + " bottles of beer!</h3>");
buf.append("<p>Take one down and pass it around, "
+ (i-1) + " bottles of beer on the wall</p>\n");
}
buf.append("<h2>No more bottles of beer on the wall, no more bottles of beer. ");
buf.append("Go to the store and buy some more, 99 bottles of beer on the wall.</h2>");
buf.append("</body>");
buf.append("</html>");
The second part of the code parses the StringBuffer
into a DOM document using the standard Java XML APIs and then sets that as the document on the ITextRenderer
object. The renderer needs a base URL to load resources like images and external CSS files. If you pass a URL for the document to the renderer, then it will infer the base URL. For example the document URL http://myserver.com/pdf/mydoc.xhtml would result in a base URL of http://myserver.com/pdf/ However, if you pass in a pre-parsed Document
object instead of a URL, then the renderer will have no idea what the base URL is. You can manually set the base URL using the second argument to the setDocument()
method. In this case I have used a value of null
, since I am not referencing any external resources.
// parse the markup into an xml Document
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(new StringBufferInputStream(buf.toString()));
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(doc, null);
String outputFile = "100bottles.pdf";
OutputStream os = new FileOutputStream(outputFile);
renderer.layout();
renderer.createPDF(os);
os.close();
}
}
The final document looks like Figure 2:
Figure 2. Screenshot of 100bottles.pdf (click to download full PDF document)
Page-Specific Features
So far the documents we have rendered are basically just web pages in PDF form. They don't have any features that take advantage of pages. Paged media like printed documents or slideshows have certain features specific to pages. In particular, pages have specific sizes and margins. Text laid out for an 8 1/2 by 11 inch piece of paper will look very different than text for a paperback book, or a CD cover. In short, pages matter, and Flying Saucer gives you some control over pages using page-specific features in CSS.
This next example will print the first chapter of Lewis Carroll's Alice in Wonderland in a paperback format. The markup is pretty straightforward, just a bunch of headers and paragraphs. Below are the first few paragraphs of the document (see the download for the entire chapter). There are two things to notice in this document. First, all of the style is included in the alice.css file linked in the header with a link
element. The media="print"
attribute must be included, or the style will not be loaded. The other important thing to notice are the two div
s at the top: header
and footer
. The footer has two special elements in it, pagenumber
and pagecount
, which are used to generate the page numbers. These div
s and the page number elements will not be rendered at the top of the page. Instead, we will use some special CSS to make these div
s repeat on every page and generate the proper page numbers at runtime.
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Alice's Adventures in Wonderland -- Chapter I</title>
<link rel="stylesheet" type="text/css" href="alice.css" media="print"/>
</head>
<body>
<div id="header" style="">Alice's Adventures in Wonderland</div>
<div id="footer" style=""> Page <span id="pagenumber"/> of <span id="pagecount"/> </div>
<h1>CHAPTER I</h1>
<h2>Down the Rabbit-Hole</h2>
<p class="dropcap-holder">
<div class="dropcap">A</div>
lice was beginning to get very tired of sitting by her sister
on the bank, and of having nothing to do: once or twice she had
peeped into the book her sister was reading, but it had no pictures
or conversations in it, `and what is the use of a book,' thought
Alice `without pictures or conversation?'
</p>
<p>So she was considering in her own mind (as well as she could,
for the hot day made her feel very sleepy and stupid), whether the
pleasure of making a daisy-chain would be worth the trouble of
getting up and picking the daisies, when suddenly a White Rabbit
with pink eyes ran close by her. </p>
<p class="figure">
<img src="alice2.gif" width="200px" height="300px"/>
<br/>
<b>White Rabbit checking watch</b>
</p>
... the rest of the chapter
Most of the alice.css file contains normal CSS rules that can apply to any kind XHTML document, printed or not. There are a few, however, that are page-specific extensions:
@page {
size: 4.18in 6.88in;
margin: 0.25in;
-fs-flow-top: "header";
-fs-flow-bottom: "footer";
-fs-flow-left: "left";
-fs-flow-right: "right";
border: thin solid black;
padding: 1em;
}
#header {
font: bold serif;
position: absolute; top: 0; left: 0;
-fs-move-to-flow: "header";
}
#footer {
font-size: 90%; font-style: italic;
position: absolute; top: 0; left: 0;
-fs-move-to-flow: "footer";
}
#pagenumber:before {
content: counter(page);
}
#pagecount:before {
content: counter(pages);
}
The first thing you'll notice in the CSS above is the @page
rule. This is a rule that is attached to the page itself rather than to any particular elements within the document. Within this @page
rule, you can set the size of the page as well as page margins using the size
and margin
properties. Note that I have set the size to 4.18in 6.88in
, which is the size of a standard mass-market paperback book in the U.S. (according to CafePress). Also in the @page
rule are four special properties beginning with -fs-flow-
. These are Flying Saucer-specific properties that tell the renderer to move content marked with the specified names: header
, footer
, left
, and right
to every page in the top, bottom, left, and right positions.
In the rules for the header and footer div
s, you can see another Flying Saucer-specific property called -fs-move-to-flow
, which will take the div
out of the normal document and put it in the special place marked by "footer"
or "header"
. This property works in conjunction with the -fs-flow-
* properties in the @page
element to make repeated content work. These custom properties are needed because CSS 2.1 does not define any way to have repeated headers and footers. CSS 3 does define a way to have repeated content, and Flying Saucer will support the new standard mechanism in the future.
After the @page
and header rules, you'll find two more rules for the pagenumber
and pagecount
elements. These are made-up elements (not standard XHTML) that will have counters added to their content. Since those two elements are empty, you will only see the counters themselves. Since the pagenumber
and pagecount
elements were defined in the footer, the final page numbers will also appear in the footer. Again, these page number elements will be replaced with their proper CSS 3 equivalents in the future.
The final rendered alice.xhtml is shown in Figure 3:
Figure 3. Screenshot of two pages of pagination.pdf (click to download full PDF document)
A quick note on debugging: CSS can be tricky sometimes, and it is very easy to misspell a keyword or forget some punctuation. Flying Saucer R7 has a brand new CSS parser with very robust error reporting. When developing your application, I recommend turning on the built-in logging. The in-depth details of Flying Saucer configuration are available in the FAQ. I have found the most useful setting is to set the logging level to INFO
by adding this to your Java command line:
-Dxr.util-logging.java.util.logging.ConsoleHandler.level=INFO
This setting will print lots of debugging information, including places where the CSS or markup may be broken.
Rendering Generic XML Instead of XHTML
Every example so far has used XHTML, meaning the XHTML dialect of XML defined by the W3C. Many documents rendered into PDF are in fact XHTML documents, but Flying Saucer can actually handle any well-formed XML file. In fact, Flying Saucer does very little that is XHTML-specific. XHTML documents are just XML documents with a default stylesheet. If you define your own stylesheet, then you can render any XML document you want. This could be particularly useful when working with the output of databases or web services, since that output is probably in XML already.
Below is a very simple custom XML document, weather.xml, that describes the weather at multiple locations. It does not use standard XHTML elements at all; every element is custom. Notice the second line contains a reference to the stylesheet.
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href='weather.css' type='text/css'?>
<weather>
<station>
<location>Springfield, NT</location>
<description>Sunny</description>
<tempf>85</tempf>
</station>
<station>
<location>Arlen, TX</location>
<description>Super Sunny</description>
<tempf>99</tempf>
</station>
<station>
<location>South Park, CO</location>
<description>Snowing</description>
<tempf>18</tempf>
</station>
</weather>
Here is the DirectXML.java code that renders the document. Notice that the code does nothing special. As far as Flying Saucer is concerned, the only difference between XHTML and XML is the file extension.
public class DirectXML {
public static void main(String[] args) throws IOException, DocumentException {
String inputFile = "samples/weather.xml";
String outputFile = "weather.pdf";
OutputStream os = new FileOutputStream(outputFile);
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(new File(inputFile));
renderer.layout();
renderer.createPDF(os);
os.close();
}
}
Here's the weather.css CSS that will style the XML.
* { display: block; margin: 0; padding: 0; border: 0;}
station {
clear: both;
width: 3in; height: 3in;
padding: 0.5em; margin: 1em;
border: 3px solid black; background-color: green;
font-size: 30pt;
page-break-inside: avoid;
}
tempf {
border: 1px solid white;
background-color: blue; color: white;
width: 1.5in; height: 1.5in;
margin: 5pt;
padding: 8pt;
font: 300% sans-serif bold;
}
location { color: white; }
description { color: yellow; }
The CSS stylesheet contains all of the magic in this example. Since this is all XML, there are no default rules to show how any element is drawn. That's why the first rule is a *
rule to affect all elements: they should all be blocks with no border, margins, or padding. Then I have defined a rule for each of the four content elements. The elements take the standard CSS properties that you could apply to HTML elements. Note that the station
element has a page-break-inside: avoid
property. This is a CSS 3 property that tells the renderer that you don't want the station element split by a page break. This is useful when you have content sections that must be printed whole. For example you might be printing to label paper for stickers on a map display. In that case, you definitely would not want any boxes to be split across pages.
Note that I've set the size of the station block using inches. When coding for the Web you usually want to avoid absolute units like inches, pixels, or centimeters. Instead, you should use relative units like points or ems, since these work well when a user resizes the document or changes their font size. But then again, PDFs aren't for the Web. They are paged media for printing. That means absolute units are perfectly fine, and in fact encouraged, since their use ensures the user will get a document that looks exactly like you wanted.
The final document looks like Figure 4.:
Figure 4. Screenshot of weather.pdf (click to download full PDF document)
Generating PDFs in a Server-Side Application
All of the examples in this article have been small command-line programs that write PDF files. However, you can easily use this technology to produce PDFs in a web application using a servlet. The only difference is that you will be writing to a ServletOutputStream
instead of a FileOutputStream
. Below is a portion of the code for a PDF generation servlet that produces a tabular report of sales for a particular user:
public class PDFServlet extends HttpServlet {
protected void processRequest(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
response.setContentType("application/pdf");
StringBuffer buf = new StringBuffer();
buf.append("<html>");
String css = getServletContext().getRealPath("/PDFservlet.css");
// put in some style
buf.append("<head><link rel='stylesheet' type='text/css' "+
"href='"+css+"' media='print'/></head>");
... //generate the rest of the HTML
// parse our markup into an xml Document
try {
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(new StringBufferInputStream(buf.toString()));
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(doc, null);
renderer.layout();
OutputStream os = response.getOutputStream();
renderer.createPDF(os);
os.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
The code above looks pretty much like the previous examples. There are two special things to notice, though. First, you must set the content type to application/pdf
. This will make the user's web browser pass the PDF on to their PDF reader or plugin instead of showing it as garbled text. Second, the CSS is stored in a separate file in the main webapp directory (where the JSPs and HTML would go). In order for Flying Saucer to find it, you must use the getServletContext().getRealPath()
method to convert PDFservlet.css
into an absolute URL and put it in the link tag at the top of the generated markup. Once you have your HTML properly generated, you can just parse it into a Document
and render the PDF to the output stream returned by response.getOutputStream()
.
The final document looks like Figure 5:
Figure 5. Screenshot of servlet.pdf (click to download full PDF document)
Conclusion
PDFs are a great format for maps, receipts, reports, and printable labels. Flying Saucer and iText let you produce PDF files programmatically without having to use expensive tools or cumbersome APIs. By using plain XHTML and CSS, your graphic designer can use their existing web tools like Dreamweaver to produce great looking CSS templates that you or your developers plug in to your applications. By splitting the work, you can save both time and money.
If you use Flying Saucer to produce PDFs for your company or project, please post a link in the comments of this article or email me. The Flying Saucer team would love to have more examples of cool things people are doing with Flying Saucer and iText.
Resources
Joshua Marinacci first tried Java in 1995 at the request of his favorite TA and has never looked back.
相关推荐
在"Episode-Based Prototype Generating Network for Zero-Shot Learning"这篇论文中,作者提出了一种新的训练框架,用于解决零样本学习(ZSL)中的挑战。这个框架基于episode训练,每个episode模拟一个零样本分类...
HTML5 Gamesshows you how to combine HTML5, CSS3 and JavaScript to make games for the web and mobiles - games that were previously only possible with plugin technologies like Flash. Using the latest ...
single-layer transmission-type metasurfaces are proposed to generate a converged vortex beam and vortex beams with different topological charges. A new metasurface design is used to generate the ...
Web technologies are increasingly relevant to scientists working with data, for both accessing data and creating rich dynamic and interactive displays. The XML and JSON data formats are widely used in...
Generating Random Networks and Graphs By 作者: Ton Coolen – Alessia Annibale – Ekaterina Roberts ISBN-10 书号: 0198709897 ISBN-13 书号: 9780198709893 Edition 版本: 1 出版日期: 2017-05-23 pages 页数...
Graphics and GUIs with matlab, third edition CONTENTS 1 INTRODUCTION 1.1 OVERVIEW 1.2 ORGANIZATION OF THIS BOOK 1.3 TERMINOLOGY AND THE MATLAB PROGRAMMING LANGUAGE 1.3.1 Getting Started 1.3.2 Getting ...
Digital data is heavily used when generating, storing, and transmitting information, and special codes are used to represent the data and to control its size, reliability, and security. Data coding is...
Chapter 6 discusses a method for generating dynamically optimal trajectories through a series of predefined waypoints and safe corridors and Ch. 7 extends that method to enable heterogeneous quadrotor...
4. SPCHS(Searchable Public-Key Ciphertexts with Hidden Structures): SPCHS是本论文中提出的一种新的加密数据结构,它将所有可搜索的密文组织成具有隐藏关系的形式。搜索时,只需暴露关系的最小信息给搜索算法...
《Generating Parsers with JavaCC-Centennial》是Tom Copeland撰写的一本书,出版于2009年,主要探讨了如何使用JavaCC工具生成解析器。JavaCC(Java Compiler Compiler)是一个广泛使用的开源工具,它允许开发者用...
In fact, Wikipedia chose Reportlab as their tool of choice for generating PDFs of their content. Anytime you click the “Download as PDF” link on the left side of a Wikipedia page, it uses Python ...
In conclusion, the paper "Modelling and Generating AJAX Applications: A Model-Driven Approach" highlights the need for a new modeling methodology tailored to AJAX applications. By adopting MDA ...
This book starts with an introduction to EJB 3 and how to set up the environment, including the... In the final leg of this book, we will discuss support for generating and parsing JSON with WildFly 8.1.
demonstrated for text (where the data are discrete) and online handwrit- ing (where the data are real-valued). It is then extended to handwriting synthesis by allowing the network to condition its ...
If you're looking for an easy-to-read, comprehensive introduction to statistics with a guide to SPSS, this is the book for you! Table of Contents Part 1 An introduction to statistical analysis ...