How to Start Convert PDF Work with PDF Modules PDF Document PDF Pages Text Image Graph & Path Annotation, Markup & Drawing Redaction Security Digital Signature Forms Watermark Bookmark Link File Attachment File Metadata Printing Work with Other SDKs Barcode read Barcode create OCR Twain

PDF OCR VB.NET Library
How to read, extract text from scanned PDF file using OCR vb.net library


VB.NET Tutorial for Using OCR Library to Extract Text from Adobe PDF Document in Visual Basic Class





In this vb.net tutorial, you will learn how to use XDoc.PDF and OCR vb.net library to read and extract text content from scanned PDF file or from images inside PDF document.

  • Extract text from scanned PDF file
  • Using OCR to convert scanned PDF file to editable PDF document or Microsoft Word document
  • Quick to enable PDF OCR features in VB.NET Windows Forms, WPS, ASP.NET applications.

How to OCR PDF using Visual Basic .NET

  1. Download XDoc.PDF OCR VB.NET library
  2. Install vb library to extract PDF text content
  3. Step by Step Tutorial










  • Best VB.NET OCR SDK for Visual Studio .NET
  • Scan text content from adobe PDF document in Visual Basic.NET application
  • For C# developers, view tutorials at How to read, extract text from scanned PDF file using c#
  • Able to specify any area of PDF to perform OCR function in .NET WinForms and ASP.NET webpage
  • .NET library for batching OCR PDF text content in VB.NET
  • Support .NET WinForms, ASP.NET MVC in IIS, ASP.NET Ajax, Azure cloud service, DNN (DotNetNuke), SharePoint
  • Recognize the whole PDF document and get all text content in VB.NET
  • Recognize a page of PDF document and extract its text content in Visual Basic .NET class
  • Recognize scanned PDF file and output OCR result to adobe PDF file
  • Recognize scanned PDF document and output OCR result to MS Word file
  • Online VB.NET class source code for evaluation
  • Free VB.NET components and controls for downloading and using in .NET framework


asp.net remove text from pdf online, asp.net display image from database in gridview, pdf viewer in asp.net core mvc, asp.net mvc open excel file, free pdf preview in asp net c#, asp.net pdf editor control, asp.net pdf viewer disable save.





How to OCR, read text from a scanned PDF file using VB.NET?


The steps and sample VB.NET code below shows how to read text content from a PDF file using Visual Basic.

  1. Set OCR resource files path through OCRHandler.SetTrainResourcePath() method.
  2. Create a new PDFDocument object with a scanned PDF file loaded.
  3. Get the first page of PDF document and convert to image Bitmap object
  4. Create a new OCRPage object with the PDF page image loaded
  5. Call OCRPage.Recognize() method to scan and extract text from PDF page
  6. Save extracted text content to a TXT file.


String ocrSource = @"D:\Alice\DLL\Source\";
OCRHandler.SetTrainResourcePath(ocrSource);
PDFDocument pdf = new PDFDocument(@"C:\input.pdf");
BasePage page = pdf.GetPage(0);
Bitmap bmp = page.ConvertToImage();
OCRPage ocrPage = OCRHandler.Import(bmp);
ocrPage.Recognize();
ocrPage.SaveTo(MIMEType.TXT, @"C:\output.txt");




Convert scanned PDF file to editable pdf document using VB.NET


Add the following VB.NET example source code will show how to convert scanned pdf document into editable PDF file



String inputFilePath = @"C:\demo_1.pdf";
String outputFilePath = @"C:\output.pdf";

// The folder that contains '.traineddata' files.
OCRHandler.SetTrainResourcePath(@"C:\Source");

PDFDocument doc = new PDFDocument(inputFilePath);
int pageCount = doc.GetPageCount();

MemoryStream[] streams = new MemoryStream[pageCount];
for (int i = 0; i < doc.GetPageCount(); i++)
{
    streams[i] = new MemoryStream();
    OCRPage page = OCRHandler.Import(doc.GetPage(i));
    page.Recognize();
    page.SaveTo(MIMEType.PDF, streams[i]);
}
PDFDocument.CombineDocument(streams, outputFilePath);




Convert scanned PDF file to word document (.docx) using VB.NET


Add the following VB.NET example source code will show how to convert scanned pdf document into Microsoft Word document (.docx)



String inputFilePath = @"C:\demo_1.pdf";
String tempFilePath = @"C:\output.pdf";
String outputFilePath = @"C:\output.docx";

// The folder that contains '.traineddata' files.
OCRHandler.SetTrainResourcePath(@"C:\Source");

PDFDocument doc = new PDFDocument(inputFilePath);
int pageCount = doc.GetPageCount();

MemoryStream[] streams = new MemoryStream[pageCount];
for (int i = 0; i < doc.GetPageCount(); i++)
{
    streams[i] = new MemoryStream();
    OCRPage page = OCRHandler.Import(doc.GetPage(i));
    page.Recognize();
    page.SaveTo(MIMEType.PDF, streams[i]);
}
PDFDocument.CombineDocument(streams, tempFilePath);

PDFDocument doc1 = new PDFDocument(tempFilePath);
doc1.ConvertToDocument(DocumentType.DOCX, outputFilePath);