C# TIFF Image Library
Extract Text from TIFF File in C#.NET


Complete C# .NET Tutorial for How to Extract Text from TIFF File






C# Extract Text from TIFF File Overview



Using RasterEdge XDoc.Tiff for .NET and .NET OCR SDK, C# programmers can implement high performance text extraction from Tiff image file. Mature and reliable .NET APIs for extracting text from Tiff file in Visual C# .NET project are well-designed and provided. Moreover, text content, style, and format of original Tiff image can be retained during extraction.

By simply integrating our .NET SDKs, C# users can easily add and perform text extraction functionality into .NET Tiff image processing application. If you've already add respective DLL assemblies into your C# project as references, you may directly have a quick test by using the following C# sample code.



C# Code to Extract Certain Page Text from Multi-page TIFF



The following C# coding example demonstrates how to extract the first page text from a multi-page TIFF file, and then save the result as a text file. Certainly, you may also render it to a PDF, Word or SVG file.




// Set the training data path. Please put eng.traineddata (for english) under the path specified.
            OCRHandler.SetTrainResourcePath(@"D:\Alice\DLL\Source\");

            // Set supported language. You can also set this attribute in OCRPage or OCRZone.
            OCRHandler.Settings.LanguagesEnabled.Add(Language.Eng);

            // Load Tiff document.
            TIFFDocument doc = new TIFFDocument(@"C:\demo1.tif");

            // Load the first page to recongnize.
            TIFFPage page = (TIFFPage)doc.GetPage(0);

            // Import the page to recoginze.
            OCRPage oPage = OCRHandler.Import(page);
            oPage.Recognize();
            String outputTxt = @"C:\tiffpage0.txt";

            // Save ocr result as other documet formats, like txt, pdf, and svg.
            oPage.SaveTo(MIMEType.TXT, outputTxt);




C# Code to Extract Certain Page Text from Multi-page TIFF  and Save to PDF



The following C# coding example demonstrates how to extract the first page text from a multi-page TIFF file, and then save the result as a pdf file. Certainly, you may also render it to a PDF, Word or SVG file.




// Set the training data path. Please put eng.traineddata (for english) under the path specified.
            OCRHandler.SetTrainResourcePath(@"D:\Alice\DLL\Source\");

            // Set supported language. You can also set this attribute in OCRPage or OCRZone.
            OCRHandler.Settings.LanguagesEnabled.Add(Language.Eng);

            // Load Tiff document.
            TIFFDocument doc = new TIFFDocument(@"C:\demo1.tif");

            // Load the first page to recongnize.
            TIFFPage page = (TIFFPage)doc.GetPage(0);

            // Import the page to recoginze.
            OCRPage oPage = OCRHandler.Import(page);
            oPage.Recognize();
            String outputTxt = @"C:\tiffpage0.pdf";

            // Save ocr result as other documet formats, like txt, pdf, and svg.
            oPage.SaveTo(MIMEType.PDF, outputTxt);




C# Code to Extract Text from Multi-page TIFF Document  and Save to PDF



The following C# coding example demonstrates how to extract text from a multi-page TIFF file, and then save the result as a pdf file. Certainly, you may also render it to a PDF, Word or SVG file.




// The folder that contains '.traineddata' files.
            OCRHandler.SetTrainResourcePath(@"D:\Alice\DLL\Source\");

            // Set input file path.
            String inputFilePath = @"C:\input.tif";

            // Set output file path.
            String outputFilePath = @"C:\Output.pdf";
            TIFFDocument tiff = new TIFFDocument(inputFilePath);
            int pageCount = tiff.GetPageCount();
            MemoryStream[] stream = new MemoryStream[pageCount];
            for (int i = 0; i < pageCount; i++)
            {
                BasePage page = tiff.GetPage(i);
                Bitmap bmp = page.ConvertToImage();
                OCRPage ocrPage = OCRHandler.Import(page);
                ocrPage.Recognize();
                stream[i] = new MemoryStream();
                ocrPage.SaveTo(MIMEType.PDF, stream[i]);
                stream[i].Seek(0, SeekOrigin.Begin);
            }
            PDFDocument.CombineDocument(stream, outputFilePath);