When people ask me what I do for a living, I often hesitate. The easiest answer would be to say I work for Ghent University. That's true, of course, but in reality, I'd like to answer that I'm a Free/Open Source Software Developer. I've tried this answer m a n y times and it immediately raises more questions. Free Software?
About the author
Bruno Lowagie is the founder of iText library.
Contact with the author: bruno@lowagie.com
Then how do you make a living? And what software are you developing? I also tried to be more specific, saying 'I am the founder of iText, a free JAVA-PDF library'. It doesn't help; people get even more confused. What is a library? Is it something I can download and run on my computer? PDF, is that the stuff that opens Adobe Reader in my browser? There are just too many questions to answer in a couple of sentences, so let me write an article about my work instead.
PDF
First of all, let's look at PDF, the Portable Document Format. Yes, if you encounter a PDF (or PDF related) file on the net, you need a Viewer application such as Adobe Reader, Ghostview, Preview or Foxit. PDF is called portable because a PDF document can be viewed and printed on any platform: UNIX, Mac, Windows, Linux, Palm OS... In analogy with Java's Write Once, Run Anywhere, you could say PDF is Write Once, Read Anywhere (but in a more reliable way than the catchy Java advertising phrase promises). When I talk about PDF documents, the probability is high that you are thinking of a 'traditional' PDF file: a read-only paginated document with a print-ready layout (the way it looks on the screen, is the way it will look when it is printed), but that's only what's on the surface. There is a lot more functionality inside a PDF. A PDF can contain multimedia, bookmarks and all kinds of l i n k s , actions and annotations. It can be encrypted and password-protected, or you can add a digital signature. A PDF file can also contain an interactive form, sometimes referred to as an AcroForm. An AcroForm is a collection of fields. These fields can be used to gather information interactively from the user. They also act as place-holders with fixed coordinates that can be filled with variable content. There's the Forms Data Format (FDF) and the XML version of FDF (XFDF). FDF and XFDF files contain a collection of keys and values, as well a reference to a PDF file. If you open an FDF or XFDF file, the PDF that contains the form is opened and filled in automatically. Lots of tools (including iText) are able to merge a PDF form and its FDF data.
PDF/X - PDF/A
There are many different ways to create a perfectly valid PDF file. This freedom is an advantage, but it can be a disadvantage too . Not all valid PDF files are usable in every context. Especially the prepress sector felt the need to restrict the format. A consortium of prepress companies got together and released specifications for PDF/X ('X' for eXchange). PDF/X is a set of ISO standards describing well defined subsets of the PDF specification that promise predictable and consistent PDF files. More recently (September 2005) yet another ISO specification was published: PDF/A ('A' for Archiving). There's a wide variety of electronic formats (ASCII, TIFF, PDF, XML...) and technologies (databases, repositories...) to choose from for archiving. The propriety nature of most of these formats however is one of the biggest disadvantages: they can't be guaranteed to continue for the long term. For instance: if you try t o open an MS WORD file that is ten years old in the most recent version of WORD, you can't expect it to look the same way it looked ten years ago. As opposed to most word processing formats, a PDF file can be viewed without the originating application. Also all the revisions o f the PDF specification are backwards compatible. For instance: if your viewer can read and print a PDF with version 1 . 6 , it can also read a PDF with version 1.2. Moreover the information about the file format will always be in the public domain. Anyone, at any time, using any hard- or software, can create programs to access PDF documents. This makes PDF a very interesting candidate for archiving. P D F / A is a subset of PDF 1.4, intended for long-term preservation of page-oriented documents. The constraints imposed by PDF/A and PDF/X are very similar. I won't go into details , but the constraints include that all fonts have to be embedded, encryption is not allowed, audio and video content are forbidden, and so on. For PDF/A, self-documentation of every archived file is also very important. You have to be able to search and find documents in a reliable way. That's why PDF/A files use XMP. XMP is Adobe's Extensible Metadata Platform. It is a standard format for the creation, processing and interchange of metadata. (Note that XMP is not limited to the PDF format.)
Internals of a PDF file
PDF is not a simple text stream (like an RTF file), nor is it a program (like a PostScript file). It is a collection of objects of which you need to know the exact disk offset, in order to write a n index at the end. Below you see the output of a PDF produced by iText and written to the System.out. If you want to reproduce it, just change the line that gets the PdfWriter in one of the Hello World examples:
PdfWriter.getInstance(document, System.out);
The actual PDF file begins with %PDF-1.4 and ends with %%EOF. The parts comprised between obj and endobj are PDF objects as described in the PDF Reference (by Adobe Systems Inc.). The part after xref and before trailer is the cross - reference table. It contains references to all the objects in the PDF.

Figure 1. A PDF file written to the System.out
These are some lesser known capabilities of PDF. I hope you understand that PDF is a really flexible format that has a wide range of possibilities. Was it merely a coincidence that I got involved in PDF? I don't know. This is how it all started. The Origin of iText In 1998, Ghent University was starting up a migration project with the intention to redesign a series of standalone programs used by the Student Administration. I was hired to redesign part of the existing (fat client) software to a Java Servlet based web application (thin client). In this project, a lot of documents were involved: grading lists with courses and students, deliberation reports , transcripts of records... At that time, you couldn't depend on HTML to print out neat documents, especially not if you wanted them to look 'official'. So without fully realizing the consequences, I promised my employer: I'll produce PDF! I assumed there would be ample free or open source software available on the net. Unfortunately none of the libraries I tested was able to meet our needs. If I wanted to keep my promise, I'd have to write my own library. That's how iText was born. I wrote the first version of iText in about six weeks. This rudimental library was published on the net in March 1999 and although there weren't many users back then, the people who did use iText helped a lot with the development. For instance: I started to work on the library from 9 AM in Ghent (Belgium) and I published my 'work of the day' at 5 PM. At that time, the working day of one of the iText users in Canada started and by the time my next working day began, my Canadian colleague had a complete bug report for me. So in other words: by making it Open Source, the library was being improved day and night, the clock round. Little by little iText got more powerful. Since 1999, hundreds of people have contributed code. Paulo Soares is the main contributor (and official co-developer of iText). Mark Hall is responsible for the RTF package. For the moment I am trying to keep up with the library for what concerns the documentation. As a matter of fact, I am writing a book on iText for Manning Publications co. That's the publisher of the 'in Action' series. I hope I will be able to present 'iText in Action' in the summer of 2006. In the book I try to cover as much functionality as
possible. This isn't an easy job as iText is a very extensive library and people are using it in many different ways.
High-Level Objects
Creating a PDF document from scratch with iText is always done in five elementary steps: (1) create a document, (2) get an instance of a writer that will write all the document specific syntax to an output stream, (3) open the document, (4) add content to the document and (5) close the document.
Document document = new Document(); // step 1 try {
PdfWriter.getInstance(document, // step 2
new FileOutputStream("HelloWorld1.pdf")); // "
document.open(); // step 3
document.add(new Paragraph("Hello World")); // step 4
} catch (Exception e) {
System.err.println(e.getMessage());
}
document.close(); // step 5
There are three different ways to implement step 4. The easiest way is to make use of iText's high-level objects. In the example we used a Paragraph object; a document can be composed of all kinds of structures and objects: lists (List and ListItem), hyperlinks (Anchor), tables (PdfPTable, Table or SimpleTable), images (Image), columns (ColumnText or MultiColumnText)...In the book, you'll see all these objects 'in Action'.

Figure 2. Using iText to produce an article
Next page >>