How to Start Convert PDF Read PDF Edit PDF PDF Report Builder Work with PDF Modules PDF Document PDF Pages Text Image Graph & Path Annotation, Markup & Drawing Redaction Security Digital Signature Forms Watermark Bookmark Link File Attachment File Metadata Printing Work with Other SDKs Barcode read Barcode create OCR Twain

C# PDF SDK Library
Introduction basic SDK concept of XDoc.PDF for .NET


Introductions to Classes and APIs Included in .NET XDoc.PDF for C# Programming











Introduction





Portable Document Format (PDF) is a type of document created by Adobe in 1990s. It can contain various resources (such as text, images, embedded fonts, and forms etc.) and be independent to application software and operating system. As a standardized format, it has been commonly used for documents like user manuals, application forms, and scanned documents etc.

A PDF file contains a collection of Indirect Objects, which is the basic unit for all kind of contents and resources in the document. The library XDoc.PDF provides an easy way to retreive and manipulate PDF files' content instead of dealing with those Indirect Objects directly.

The core component in the library is PDFDocument, which represent a PDF document. All file contents, such as pages, bookmarks, annotations, interactive fields etc., are integrated in this class. Class PDFDocument provides those APIs for loading from a file, renderring page contents, doing some basic modification and saving these changes back to the file.

For those more complex functionalities, the library placed them independently into different Handler classes. Main handler classes includes:

  • PDFAnnotHandler: to add and edit PDF Annotations.
  • PDFFormHandler: to add, edit and fill Interactive Fields.
  • PDFPageFieldHandler: to set pagination artifacts, such as Watermarks.
  • PDFOptionalContentHandler: to manage Optional Contents in the page.


Sample Code: to load an existing PDF file into PDFDocument object

//  To load a PDF file to PDFDocument object.
//  Input inputFilePath is the absolute path of the PDF file.
PDFDocument doc = new PDFDocument(inputFilePath);
...




Document



A PDFDocument object can be constructed through a PDF file (by its absolute path), a file stream or a byte array. It is a container that contains all kinds of contents loaded from a PDF file. User must get various document contents through PDFDocument. Below is a sample for getting document inforamtion of the file.



PDFDocument doc = new PDFDocument(inputFilePath);
Console.WriteLine("Page Count:   {0}", doc.GetPageCount());
PDFMetadata metadata = doc.GetDescription();
Console.WriteLine("Author:       {0}", metadata.Author);
Console.WriteLine("Creator:      {0}", metadata.Creator);
Console.WriteLine("Created Date: {0}", metadata.CreatedDate.ToString("yyyy-MM-dd"));


In addition to using PDFDocument to get different document coments, it also supports page organizing, such as insert a new page, delete an exist page, resort pages etc. After making any changes to the document object, method Save can output the modified document to a target either a file or a byte array.

PDFDocument doc = new PDFDocument(inputFilePath);
//  Insert an empty page with A4 size before the first page.
doc.InsertPage(0, PaperSize.A4);
//  Save the changed document to file.
doc.Save(outputFilePath);




Page



Each page in a PDF document will be represented by a PDFPage object and its reference can be obtained by the method GetPage(int pageIdx).



PDFDocument doc = new PDFDocument(inputFilePath);
for (int pageIndex = 0; pageIndex < doc.GetPageCount(); pageIndex++)
{
    PDFPage page = (PDFPage)doc.GetPage(pageIndex);
    Console.WriteLine("Page Index: {0}", pageIndex);
    Console.WriteLine("Width:  {0} inches", page.GetWidth());
    Console.WriteLine("Height: {0} inches", page.GetHeight());
}


PDFPage mainly provides APIs for page renderring. which can draw all page contents to a .NET Bitmap object with different settings.



PDFPage page = (PDFPage)doc.GetPage(pageIndex);
int resolution = 192;   //  192 dpi
Bitmap bitmap = page.GetBitmap(resolution);
...




Annotation



PDF document supports lots of standard annotations such as Line, Polygon, Text and Highlight etc. These annotations can be loaded from a PDF document to a list of objects by the annotation handler. Each item in the list belongs to a type of annotation class (with prefix "PDFAnnot"), which derived from the abstract class IPDFAnnot.



PDFDocument doc = new PDFDocument(inputFilePath);
List<IPDFAnnot> annots = PDFAnnotHandler.GetAllAnnotations(doc);
// ...


The annotation handler also provides methods to insert new annotations to or remove exist annotations from the docuemnt.



//  Create a Rectangle annotation.
PDFAnnotRectangle annot = new PDFAnnotRectangle();
annot.Location = new PointF(180, 180);
annot.Width = 180;
annot.Height = 180;
PDFAnnotHandler.AddAnnotation(inputFilePath, annot, outputFile);




Redaction



This freature is used to permanently remove sensitive information in a page, which could be either text, image or any other page contents. The library allows user to redact the entire page or redact a list of specified regions in the page; all page items inside the redact regions will be erased.



PDFDocument doc = new PDFDocument(inputFilePath);
PDFPage page = (PDFPage)doc.GetPage(0);
//  Set page regions to redact (unit: in 96 dpi)
RectangleF[] redactRegions = new RectangleF[] {
    new RectangleF(280, 180, 100, 300),
    new RectangleF(50, 400, 500, 200),
};
RedactionOptions ops = new RedactionOptions();
//  Transparent for NO FILL.
ops.AreaFillColor = System.Drawing.Color.Black;
ops.EnableOverlayText = false;
ops.EnableModifyImage = true;

page.Redact(redactRegions, ops);
doc.Save(outputFilePath);




Bookmark



In PDF, a bookmark (also called Document Outline Item) allows user to naviagte interatively from one part of the document to another. All bookmarks are kept together in a REOutline object, which can be retrieved by GetOutline method from a PDFDocument object.



PDFDocument doc = new PDFDocument(inputFilePath);
REOutline outline = doc.GetOutline();
foreach (REEntry entry in outline.Entry)
{
    if (entry is Outline)
    {
        Outline obj = (Outline)entry;
        Console.WriteLine("Entry:      {0}", obj.GetText());
        Console.WriteLine("Level:      {0}", obj.GetLevel());
        //Console.WriteLine("Has Action: {0}", obj.ContainsAction());
        if (obj.ContainsAction())
        {
            Console.WriteLine("Target Page Area");
            Console.WriteLine("Page Index: {0}", obj.GetPageIndex());
            Console.WriteLine("Location:   {0}", obj.GetLocation().ToString());
        }
    }
}


User can use method SetOutline(REOutline outline) to add or update the Document Outline in the PDF document object by a valid REOutline object.



PDFDocument doc = new PDFDocument(inputFilePath);
//  Initial and ceate a outline object.
REOutline outline = new REOutline();
//  ... add items to the document outline here ...

//  update the new outline
doc.SetOutline(outline);
doc.Save(outputFilePath);




Security



A plain PDF file can be encrypted by passwords to protect its contents from unauthorized access. This feature can be acheived by the static method PDFDocumet.AddPassword.



//  Encrypt a PDF file with user password "user123", owner password "owner123"
String userPassword = "user123";
String ownerPassword = "owner123";
PasswordSetting passwordSetting = new PasswordSetting(userPassword, ownerPassword); 
passwordSetting.Level = EncryptionLevel.AES_128bit;

int errCode = PDFDocument.AddPassword(inputFilePath, outputFile, passwordSetting);
if (errCode != 0)
    Console.WriteLine("[ERROR]: fail to add password.");


By providing a correct password (either user's or owner's), it can also remove passwords in a secured PDF file by the static method PDFDocumet.RemovePassword.



if (PDFDocument.IsEncrypted(inputFilePath))
    PDFDocument.RemovePassword(inputFilePath, outputFile, "user123");
else
    Console.WriteLine("This file has not been encrypted.");




Form



An interacitve form (also called AcroForm in PDF) is a collection of fields for gatherring information interactively from the user. These fields can be read from the source PDF file into a list of BaseFormField objects.



PDFDocument doc = new PDFDocument(inputFilePath);
List<BaseFormField> fields = PDFFormHandler.GetFormFields(doc);
foreach (BaseFormField field in fields)
{
    Console.WriteLine("Field:      {0}", field.Name);
    Console.WriteLine("Page Index: {0}", field.PageIndex);
    Console.WriteLine("Position:   {0}", field.Position.ToString());
    Console.WriteLine("Class:      {0}", field.GetType().Name);
}


Besides add and delete fields from the document, the form handler also supports field filling by programlly. Those filling information must be input to the method through an user input class which derived from BaseUserInput. Below is a sample to fill a Textbox field by text "Test".



if (field is AFTextBox)
{
    BaseUserInput input = new AFTextBoxInput("Test");
    PDFFormHandler.FillFormField(inputFilePath, field.Name, input, outputFilePath);
    File.Copy(outputFilePath, tmpFilePath, true);
}




Watermark



A watermark is a text or image that is superimposed onto all pages in the document. It is implemented by PDF's pagination artifacts, which are disinguished from real page content by enclosing it in the tag Artifact. The library allows user to apply a Watermark (with subtype Watermark of a Pagination Artifact tag) to all pages or all even/odd pages in the document.



PDFDocument doc = new PDFDocument(inputFilePath);

//  define a watermark with text content
PDFWatermarkTextRes resWatermark = new PDFWatermarkTextRes("Confidential", new Font("Arial", 72F, FontStyle.Regular), Color.Black, WatermarkTextAlignment.Center);
PDFPageWatermark watermark1 = new PDFPageWatermark(resWatermark);
watermark1.IsAbovePage = false;
watermark1.Rotate = -45F;
watermark1.Opacity = 0.2F;

//  define page range: all pages
PageRangeOptions pageRange1 = new PageRangeOptions();
pageRange1.AllPages = true;
pageRange1.Subset = PageRangeSubset.All;

//  apply watermark settings to all odd pages
PDFPageFieldHandler.ApplyWatermark(doc, watermark1, pageRange1);
doc.Save(outputFilePath);




Layer



A page layer refers to sections of content in a PDF page that can be selectively viewed or hidden by the consumers. This feature mostly used for CAD drawings, layered artwork, and multi-language document. Each layer in the document can be distinguish from other page contents by its unique layer ID.



//  To get all layer IDs in the document
List<int> layerIDs = PDFOptionalContentHandler.ExtractLayerIDs(inputFilePath);
foreach (int id in layerIDs)
{
Bitmap img = PDFOptionalContentHandler.RenderPageLayer(inputFilePath, 0, id);
//  ...
}


Optional content handler allows user to insert layers to the specified page in a document. The source of the layer could come from an exist single-page PDF file or .NET Bitmap object.



//  Create a PDF layer in the first page.
String inputFilePath = "***.pdf";
String resourceFilePath = "***.pdf";
String outputFilePath = "***.pdf";
//  Set new page layer's name to 'Test Layer 1'
ImportPageLayerArgs importArgs = ImportPageLayerArgs.Create("Layer 1");
//  Create an item from the resouce file.
ImportPageLayerArgs.Item item = importArgs.AddItem(resourceFilePath);
//  Apply item to the 1st page with default settings.
item.TargetPageIndex = 0;
//  Import a new page layer to the input PDF file.
PDFOptionalContentHandler.ImportPageLayer(inputFilePath, outputFilePath, importArgs);