|
| The PDF to Text Converter is a component of the
HiQPdf Next Library for .NET
that enables text extraction from PDF documents in the original layout or optimized for reading,
as well as text search in PDF that returns the exact positions of the matches.
You can see the list of
all HiQPdf Next components
on the library page.
|
|
| The PDF to Text Converter is distributed as part of the
HiQPdf Next PDF Processor for .NET,
which also includes functionality for converting PDF pages to images and extracting images from PDF documents.
|
|
Compatibility Platforms
|
|
| HiQPdf Next for .NET can run on a variety of Windows and Linux platforms in web, console and
desktop applications across all modern .NET platforms.
The library components can be used in Azure App Service and Azure Functions environments
on both Windows and Linux. Deployment to Docker Windows and Linux containers is also supported.
|
|
| The .NET library targets .NET Standard 2.0, which makes it compatible with a wide range of .NET Core and .NET Framework
applications.
|
|
Getting Started
|
|
| The online documentation
contains Getting Started guides for Windows, Linux, Azure App Service and Azure Functions,
with detailed instructions for integrating the library into your application and complete C# examples
for each important feature of the library.
|
|
| You can see the current capabilities of the library by checking the
online demo application for this library
and the API reference in the online documentation.
|
|
Download Demo Application
|
|
|
You can also download a free trial package for .NET, which includes
an ASP.NET demo application project with complete C# source code as a starting point for experimenting with your own usage scenarios.
|
|
|
Running the samples in the demo application that involve HTML to PDF conversion features on Linux platforms
might require installing some dependency packages.
The documentation includes an entire section dedicated to building, publishing and running the demo application on multiple platforms.
|
|
NuGet Packages
|
|
| The PDF to Text Converter is distributed as part of the
HiQPdf.Next.PdfProcessor.Windows
NuGet package when targeting Windows and as part of the
HiQPdf.Next.PdfProcessor.Linux
NuGet package when targeting Linux.
|
|
| The Windows package is referenced by the
HiQPdf.Next.Windows
metapackage for all components and the Linux package is referenced by the
HiQPdf.Next.Linux
metapackage for all components.
|
|
| There are also multiplatform metapackages that reference both the Windows and Linux packages:
HiQPdf.Next.PdfProcessor
for the PDF Processor functionality and
HiQPdf.Next
for the entire HiQPdf Next library.
|
|
Installation
|
|
| The PDF Processor component generally does not require the installation of additional dependencies,
either on Windows or on Linux.
|
|
HiQPdf.Next Namespace
|
|
| All components of the HiQPdf Next for .NET library share the same
HiQPdf.Next
namespace and can be used together in the same application.
To use the library in your own code, add the using directive at the top of your C# source file, as shown below.
|
|
// Include the HiQPdf.Next namespace at the top of your C# file
using HiQPdf.Next;
|
|
|
|
Sample Code
|
|
| After you add the reference to the NuGet package to your project, use the sample code below to convert a PDF document to a string and search for text in the PDF.
|
|
// Create the PDF to Text converter instance with default options
PdfToTextConverter pdfToTextConverter = new PdfToTextConverter();
// Extract text from the specified PDF file
string extractedText = pdfToTextConverter.ConvertToText(pdfFilePath);
// Search text in PDF
bool caseSensitive = false;
bool wholeWord = false;
FindTextLocation[] findTextLocations = pdfToTextConverter.FindText(pdfFilePath, textToFind, caseSensitive, wholeWord);
|
|
|
|
Features List
|
|
| HiQPdf Next PDF to Text Converter for .NET offers advanced options for extracting text from PDF and searching text in PDF.
You can specify the user and owner passwords to open password-protected PDF documents, as well as the PDF page range to process.
For text extraction, you can choose the output text layout and optionally mark page breaks with a special character in the output string.
The text search operation offers options such as performing a case-sensitive search or matching whole words only.
|
|
|
Convert or Search PDF Documents From Memory, Stream or File
|
| You can convert or search PDF documents from a memory buffer, a stream or a file to a .NET string.
The PDF to Text conversion produces a .NET string that contains the extracted text.
The text search operation returns the positions of the searched text within the PDF pages.
|
|
|
|
Convert or Search Password-Protected PDF Documents
|
| If the PDF document you process is password-protected, you must specify the user or owner password used to decrypt the document
before performing a text extraction or text search operation.
|
|
|
|
Convert or Search the Entire Document or a Range of PDF Pages
|
| The convert and search functions allow you to specify the range of PDF pages you want to process,
either starting from a given page number to the end of the document or between specified start and end page numbers.
|
|
|
|
Specify the Extracted Text Layout
|
| You can choose whether the text is extracted from the PDF while preserving the original layout
or optimized for reading.
|
|
|
|
Mark Page Breaks in the Extracted Text
|
| You can choose whether page breaks are marked with a special character in the extracted text.
|
|
|
|
Case-Sensitive Text Search Option
|
| The text search functions allow you to specify whether the search should be case-sensitive.
|
|
|
|
Whole-Word Text Search Option
|
| The text search functions allow you to specify whether the search should match whole words only.
|
|
|
|
Asynchronous Convert and Search Methods
|
| There are asynchronous variants of the convert and search methods with the Async suffix
that allow these operations to run in parallel using async and await.
|
|
|
|
Available on Both Windows and Linux Platforms
|
| HiQPdf Next for .NET can run on both Windows 64-bit and Linux 64-bit platforms.
There are different NuGet packages for Windows and Linux, including the same .NET library but with different
native runtimes. For Windows, the minimum required version is Windows 10 or Windows Server 2016.
|
|
|
|
Built for .NET Standard 2.0 for Maximum Compatibility
|
| The .NET library targets .NET Standard 2.0, making it compatible
with a wide range of .NET Core and .NET Framework applications. It is compatible with
.NET 10.0, 9.0, 8.0, 7.0, 6.0, .NET Standard 2.0 and .NET Framework 4.6.2 to 4.8.1.
|
|
|
|
Fully Compatible with Azure App Service and Azure Functions on Both Windows and Linux
|
| The converter can run without restrictions in your Azure App Service and Azure Functions
.NET Core applications targeting both Windows and Linux platforms. Web fonts and other features are fully supported
by HiQPdf Next for .NET.
Online documentation offers detailed usage instructions for Azure applications targeting both Windows and Linux.
|
|
|
|
NuGet Packages for Windows and Linux
|
| HiQPdf Next for .NET is delivered as NuGet packages for Windows and Linux.
The packages include the .NET Standard 2.0 library, the same for both platforms,
and the specific native runtime for each platform.
|
|
|
|
ASP.NET Core Demo Application with C# Code for All Features
|
| The zip package that can be downloaded from the website contains
the project for the ASP.NET Core demo application with C# sample code
for all major library features.
|
|
|
|
Simple and Flexible Licensing with a Single License for All Libraries
|
| The license for HiQPdf Next for .NET works with both the
classic HiQPdf Library for .NET and the multi-platform client-server solution.
There are no additional runtime or deployment costs charged for using our software
component in your applications.
|
|
|
HiQPdf Next for .NET - Search Text in PDF C# Code Sample for ASP.NET Core
|
|
using System;
using System.IO;
using System.Threading.Tasks;
using System.ComponentModel.DataAnnotations;
using Microsoft.AspNetCore.Hosting;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using HiQPdf_Next_AspNetDemo.Models;
using HiQPdf.Next;
namespace HiQPdf_Next_AspNetDemo.Controllers
{
public class FindPdfTextController : Controller
{
private readonly IWebHostEnvironment m_hostingEnvironment;
public FindPdfTextController(IWebHostEnvironment hostingEnvironment)
{
m_hostingEnvironment = hostingEnvironment;
}
public IActionResult Index()
{
var model = SetViewModel();
return View(model);
}
[HttpPost]
public async Task<IActionResult> FindPdfText(FindPdfTextViewModel model)
{
if (!ModelState.IsValid)
{
var errorMessage = ModelStateHelper.GetModelErrors(ModelState);
throw new ValidationException(errorMessage);
}
// Replace the demo serial number with the serial number received upon purchase
// to run the converter in licensed mode
Licensing.SerialNumber = "YCgJMTAE-BiwJAhIB-EhlWTlBA-UEBRQFBA-U1FOUVJO-WVlZWQ==";
// Create the PDF to Text converter instance with default options
PdfToTextConverter pdfToTextConverter = new PdfToTextConverter();
// Optionally set the user password to open a password-protected PDF
if (!string.IsNullOrEmpty(model.UserPassword))
pdfToTextConverter.UserPassword = model.UserPassword;
// Optionally set the owner password to open a password-protected PDF
if (!string.IsNullOrEmpty(model.OwnerPassword))
pdfToTextConverter.OwnerPassword = model.OwnerPassword;
// PDF page number to start text search from
int startPageNumber = model.StartPageNumber;
// PDF page number to end text search at
// If 0, search continues to the end of the document
int endPageNumber = 0;
if (model.EndPageNumber.HasValue)
endPageNumber = model.EndPageNumber.Value;
byte[] inputPdfBytes = null;
string outputFileName = null;
// If an uploaded file exists, use it with priority
if (model.PdfFile != null && model.PdfFile.Length > 0)
{
try
{
using var ms = new MemoryStream();
await model.PdfFile.CopyToAsync(ms);
inputPdfBytes = ms.ToArray();
}
catch (Exception ex)
{
throw new Exception("Failed to read the uploaded PDF file", ex);
}
outputFileName = Path.GetFileNameWithoutExtension(model.PdfFile.FileName) + "_Highlighted.pdf";
}
else
{
// Otherwise, fall back to the URL
string pdfUrl = model.PdfFileUrl?.Trim();
if (string.IsNullOrWhiteSpace(pdfUrl))
throw new Exception("No PDF file provided: upload a file or specify a URL");
try
{
if (pdfUrl.StartsWith("file://", StringComparison.OrdinalIgnoreCase))
{
string localPath = new Uri(pdfUrl).LocalPath;
inputPdfBytes = await System.IO.File.ReadAllBytesAsync(localPath);
}
else
{
using var httpClient = new System.Net.Http.HttpClient();
inputPdfBytes = await httpClient.GetByteArrayAsync(pdfUrl);
}
}
catch (Exception ex)
{
throw new Exception("Could not download the PDF file from URL", ex);
}
outputFileName = Path.GetFileNameWithoutExtension(model.PdfFileUrl) + "_Highlighted.pdf";
}
// Search text in PDF
FindTextLocation[] findTextLocations = pdfToTextConverter.FindText(inputPdfBytes, model.TextToFind,
startPageNumber, endPageNumber, model.CaseSensitive, model.WholeWord);
// Open the PDF in editor
string password = string.IsNullOrEmpty(model.OwnerPassword) ? model.UserPassword : model.OwnerPassword;
using PdfEditor pdfEditor = new PdfEditor(inputPdfBytes, password);
// Highlight the found text in PDF
foreach (FindTextLocation findTextLocation in findTextLocations)
{
PdfRectangleElement highlightRectangle = new PdfRectangleElement(findTextLocation.X, findTextLocation.Y,
findTextLocation.Width, findTextLocation.Height);
highlightRectangle.BorderColor = PdfColor.Yellow;
pdfEditor.AddRectangle(findTextLocation.PageNumber, highlightRectangle);
}
// Save the highlighted PDF in a memory buffer
byte[] outPdfBuffer = pdfEditor.Save();
// Return the highlighted PDF as a downloadable file
return File(outPdfBuffer, "application/pdf", outputFileName);
}
private FindPdfTextViewModel SetViewModel()
{
var model = new FindPdfTextViewModel();
HttpRequest request = ControllerContext.HttpContext.Request;
UriBuilder uriBuilder = new UriBuilder();
uriBuilder.Scheme = request.Scheme;
uriBuilder.Host = request.Host.Host;
if (request.Host.Port != null)
uriBuilder.Port = (int)request.Host.Port;
uriBuilder.Path = request.PathBase.ToString() + request.Path.ToString();
uriBuilder.Query = request.QueryString.ToString();
string currentPageUrl = uriBuilder.Uri.AbsoluteUri;
string rootUrl = currentPageUrl.Substring(0, currentPageUrl.Length - "FindPdfText".Length);
model.PdfFileUrl = rootUrl + "/DemoFiles/PdfProcessor/PDF_Document.pdf";
return model;
}
}
}
|
|