XDoc.PDF
Features
Tech Specs
How-to VB.NET
Pricing
How to Start Convert PDF Work with PDF Modules PDF Document PDF Pages Text Image Graph & Path Annotation, Markup & Drawing Redaction Security Digital Signature Forms Watermark Bookmark Link File Attachment File Metadata Printing Work with Other SDKs Barcode read Barcode create OCR Twain

VB.NET PDF - Extract Text from Scanned PDF Using OCR SDK for VB.NET


VB.NET Tutorial for Using OCR Library to Extract Text from Adobe PDF Document in Visual Basic Class










  • Best VB.NET OCR SDK for Visual Studio .NET
  • Scan text content from adobe PDF document in Visual Basic.NET application
  • Able to specify any area of PDF to perform OCR function in .NET WinForms and ASP.NET webpage
  • .NET library for batching OCR PDF text content in VB.NET
  • Support .NET WinForms, ASP.NET MVC in IIS, ASP.NET Ajax, Azure cloud service, DNN (DotNetNuke), SharePoint
  • Recognize the whole PDF document and get all text content in VB.NET
  • Recognize a page of PDF document and extract its text content in Visual Basic .NET class
  • Recognize scanned PDF file and output OCR result to adobe PDF file
  • Recognize scanned PDF document and output OCR result to MS Word file
  • Online VB.NET class source code for evaluation
  • Free VB.NET components and controls for downloading and using in .NET framework




VB.NET combine, concatenate multiple PDF files together


Add the following VB.NET OCR PDF text demo code to your project.



String ocrSource = @"D:\Alice\DLL\Source\";
OCRHandler.SetTrainResourcePath(ocrSource);
PDFDocument pdf = new PDFDocument(@"C:\input.pdf");
BasePage page = pdf.GetPage(0);
Bitmap bmp = page.ConvertToImage();
OCRPage ocrPage = OCRHandler.Import(bmp);
ocrPage.Recognize();
ocrPage.SaveTo(MIMEType.TXT, @"C:\output.txt");




Convert scanned PDF file to editable pdf document using VB.NET


Add the following VB.NET example source code will show how to convert scanned pdf document into editable PDF file



String inputFilePath = @"C:\demo_1.pdf";
String outputFilePath = @"C:\output.pdf";

// The folder that contains '.traineddata' files.
OCRHandler.SetTrainResourcePath(@"C:\Source");

PDFDocument doc = new PDFDocument(inputFilePath);
int pageCount = doc.GetPageCount();

MemoryStream[] streams = new MemoryStream[pageCount];
for (int i = 0; i < doc.GetPageCount(); i++)
{
    streams[i] = new MemoryStream();
    OCRPage page = OCRHandler.Import(doc.GetPage(i));
    page.Recognize();
    page.SaveTo(MIMEType.PDF, streams[i]);
}
PDFDocument.CombineDocument(streams, outputFilePath);




Convert scanned PDF file to word document (.docx) using VB.NET


Add the following VB.NET example source code will show how to convert scanned pdf document into Microsoft Word document (.docx)



String inputFilePath = @"C:\demo_1.pdf";
String tempFilePath = @"C:\output.pdf";
String outputFilePath = @"C:\output.docx";

// The folder that contains '.traineddata' files.
OCRHandler.SetTrainResourcePath(@"C:\Source");

PDFDocument doc = new PDFDocument(inputFilePath);
int pageCount = doc.GetPageCount();

MemoryStream[] streams = new MemoryStream[pageCount];
for (int i = 0; i < doc.GetPageCount(); i++)
{
    streams[i] = new MemoryStream();
    OCRPage page = OCRHandler.Import(doc.GetPage(i));
    page.Recognize();
    page.SaveTo(MIMEType.PDF, streams[i]);
}
PDFDocument.CombineDocument(streams, tempFilePath);

PDFDocument doc1 = new PDFDocument(tempFilePath);
doc1.ConvertToDocument(DocumentType.DOCX, outputFilePath);