HiQPdf Next PDF to Text Converter for .NET
			The PDF to Text Converter is a component of the HiQPdf Next Library for .NET that enables text extraction from PDF documents in the original layout or optimized for reading, as well as text search in PDF that returns the exact positions of the matches. You can deploy the library on a variety of Windows and Linux platforms including Azure App Service and Functions or Docker.

Download Now

Online Demo

Documentation

HiQPdf Next Library

All Library Components

Convert HTML, RTF, Excel, RTF, MD to PDF, PDF to Text or Image, Search PDF Text, extract PDF Images.

Download ASP.NET Demo

Download the ASP.NET Core demo project with complete C# source code

Same License for All Libraries

The same license works with classic HiQPdf library, HiQPdf Next and multiplatform library for .NET

The PDF to Text Converter is a component of the HiQPdf Next Library for .NET that enables text extraction from PDF documents in the original layout or optimized for reading, as well as text search in PDF that returns the exact positions of the matches. You can see the list of all HiQPdf Next components on the library page.

The PDF to Text Converter is distributed as part of the HiQPdf Next PDF Processor for .NET, which also includes functionality for converting PDF pages to images and extracting images from PDF documents.

Compatibility Platforms

HiQPdf Next for .NET can run on a variety of Windows and Linux platforms in web, console and desktop applications across all modern .NET platforms. The library components can be used in Azure App Service and Azure Functions environments on both Windows and Linux. Deployment to Docker Windows and Linux containers is also supported.

The .NET library targets .NET Standard 2.0, which makes it compatible with a wide range of .NET Core and .NET Framework applications.

Getting Started

The online documentation contains Getting Started guides for Windows, Linux, Azure App Service and Azure Functions, with detailed instructions for integrating the library into your application and complete C# examples for each important feature of the library.

You can see the current capabilities of the library by checking the online demo application for this library and the API reference in the online documentation.

Download Demo Application

You can also download a free trial package for .NET, which includes an ASP.NET demo application project with complete C# source code as a starting point for experimenting with your own usage scenarios.

Running the samples in the demo application that involve HTML to PDF conversion features on Linux platforms might require installing some dependency packages. The documentation includes an entire section dedicated to building, publishing and running the demo application on multiple platforms.

NuGet Packages

The PDF to Text Converter is distributed as part of the HiQPdf.Next.PdfProcessor.Windows NuGet package when targeting Windows and as part of the HiQPdf.Next.PdfProcessor.Linux NuGet package when targeting Linux.

The Windows package is referenced by the HiQPdf.Next.Windows metapackage for all components and the Linux package is referenced by the HiQPdf.Next.Linux metapackage for all components.

There are also multiplatform metapackages that reference both the Windows and Linux packages: HiQPdf.Next.PdfProcessor for the PDF Processor functionality and HiQPdf.Next for the entire HiQPdf Next library.

Installation

The PDF Processor component generally does not require the installation of additional dependencies, either on Windows or on Linux.

HiQPdf.Next Namespace

All components of the HiQPdf Next for .NET library share the same HiQPdf.Next namespace and can be used together in the same application. To use the library in your own code, add the using directive at the top of your C# source file, as shown below.

// Include the HiQPdf.Next namespace at the top of your C# file
using HiQPdf.Next;

Sample Code

After you add the reference to the NuGet package to your project, use the sample code below to convert a PDF document to a string and search for text in the PDF.

// Create the PDF to Text converter instance with default options
PdfToTextConverter pdfToTextConverter = new PdfToTextConverter();

// Extract text from the specified PDF file
string extractedText = pdfToTextConverter.ConvertToText(pdfFilePath);

// Search text in PDF
bool caseSensitive = false;
bool wholeWord = false;
FindTextLocation[] findTextLocations = pdfToTextConverter.FindText(pdfFilePath, textToFind, caseSensitive, wholeWord);

Features List

HiQPdf Next PDF to Text Converter for .NET offers advanced options for extracting text from PDF and searching text in PDF. You can specify the user and owner passwords to open password-protected PDF documents, as well as the PDF page range to process. For text extraction, you can choose the output text layout and optionally mark page breaks with a special character in the output string. The text search operation offers options such as performing a case-sensitive search or matching whole words only.

	Convert or Search PDF Documents From Memory, Stream or File
	You can convert or search PDF documents from a memory buffer, a stream or a file to a .NET string. The PDF to Text conversion produces a .NET string that contains the extracted text. The text search operation returns the positions of the searched text within the PDF pages.

	Convert or Search Password-Protected PDF Documents
	If the PDF document you process is password-protected, you must specify the user or owner password used to decrypt the document before performing a text extraction or text search operation.

	Convert or Search the Entire Document or a Range of PDF Pages
	The convert and search functions allow you to specify the range of PDF pages you want to process, either starting from a given page number to the end of the document or between specified start and end page numbers.

	Specify the Extracted Text Layout
	You can choose whether the text is extracted from the PDF while preserving the original layout or optimized for reading.

	Mark Page Breaks in the Extracted Text
	You can choose whether page breaks are marked with a special character in the extracted text.

	Case-Sensitive Text Search Option
	The text search functions allow you to specify whether the search should be case-sensitive.

	Whole-Word Text Search Option
	The text search functions allow you to specify whether the search should match whole words only.

	Asynchronous Convert and Search Methods
	There are asynchronous variants of the convert and search methods with the Async suffix that allow these operations to run in parallel using async and await.

	Available on Both Windows and Linux Platforms
	HiQPdf Next for .NET can run on both Windows 64-bit and Linux 64-bit platforms. There are different NuGet packages for Windows and Linux, including the same .NET library but with different native runtimes. For Windows, the minimum required version is Windows 10 or Windows Server 2016.

	Built for .NET Standard 2.0 for Maximum Compatibility
	The .NET library targets .NET Standard 2.0, making it compatible with a wide range of .NET Core and .NET Framework applications. It is compatible with .NET 10.0, 9.0, 8.0, 7.0, 6.0, .NET Standard 2.0 and .NET Framework 4.6.2 to 4.8.1.

	Fully Compatible with Azure App Service and Azure Functions on Both Windows and Linux
	The converter can run without restrictions in your Azure App Service and Azure Functions .NET Core applications targeting both Windows and Linux platforms. Web fonts and other features are fully supported by HiQPdf Next for .NET. Online documentation offers detailed usage instructions for Azure applications targeting both Windows and Linux.

	NuGet Packages for Windows and Linux
	HiQPdf Next for .NET is delivered as NuGet packages for Windows and Linux. The packages include the .NET Standard 2.0 library, the same for both platforms, and the specific native runtime for each platform.

	ASP.NET Core Demo Application with C# Code for All Features
	The zip package that can be downloaded from the website contains the project for the ASP.NET Core demo application with C# sample code for all major library features.

	Simple and Flexible Licensing with a Single License for All Libraries
	The license for HiQPdf Next for .NET works with both the classic HiQPdf Library for .NET and the multi-platform client-server solution. There are no additional runtime or deployment costs charged for using our software component in your applications.

HiQPdf Next for .NET - PDF to Text C# Code Sample for ASP.NET Core

using System;
using System.IO;
using System.Text;
using System.Threading.Tasks;
using System.ComponentModel.DataAnnotations;
using Microsoft.AspNetCore.Hosting;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using HiQPdf_Next_AspNetDemo.Models;

using HiQPdf.Next;

namespace HiQPdf_Next_AspNetDemo.Controllers
{
    public class PdfToTextController : Controller
    {
        private readonly IWebHostEnvironment m_hostingEnvironment;
        public PdfToTextController(IWebHostEnvironment hostingEnvironment)
        {
            m_hostingEnvironment = hostingEnvironment;
        }

        public IActionResult Index()
        {
            var model = SetViewModel();

            return View(model);
        }

        [HttpPost]
        public async Task<IActionResult> ConvertPdfToText(PdfToTextViewModel model)
        {
            if (!ModelState.IsValid)
            {
                var errorMessage = ModelStateHelper.GetModelErrors(ModelState);
                throw new ValidationException(errorMessage);
            }

            // Replace the demo serial number with the serial number received upon purchase
            // to run the converter in licensed mode
            Licensing.SerialNumber = "YCgJMTAE-BiwJAhIB-EhlWTlBA-UEBRQFBA-U1FOUVJO-WVlZWQ==";

            // Create the PDF to Text converter instance with default options
            PdfToTextConverter pdfToTextConverter = new PdfToTextConverter();

            // Optionally set the user password to open a password-protected PDF
            if (!string.IsNullOrEmpty(model.UserPassword))
                pdfToTextConverter.UserPassword = model.UserPassword;

            // Optionally set the owner password to open a password-protected PDF
            if (!string.IsNullOrEmpty(model.OwnerPassword))
                pdfToTextConverter.OwnerPassword = model.OwnerPassword;

            // Configure the output text layout
            pdfToTextConverter.Layout = model.TextLayout == "Original" ? PdfToTextLayout.Original : PdfToTextLayout.Reading;

            // Mark PDF page breaks with the PdfToTextConverter.PAGE_BREAK_MARK special character
            pdfToTextConverter.MarkPageBreaks = model.MarkPageBreaks;

            // PDF page number to start text extraction from
            int startPageNumber = model.StartPageNumber;

            // PDF page number to end text extraction at
            // If 0, extraction continues to the end of the document
            int endPageNumber = 0;
            if (model.EndPageNumber.HasValue)
                endPageNumber = model.EndPageNumber.Value;

            byte[] inputPdfBytes = null;
            string outputFileName = null;

            // If an uploaded file exists, use it with priority
            if (model.PdfFile != null && model.PdfFile.Length > 0)
            {
                try
                {
                    using var ms = new MemoryStream();
                    await model.PdfFile.CopyToAsync(ms);
                    inputPdfBytes = ms.ToArray();
                }
                catch (Exception ex)
                {
                    throw new Exception("Failed to read the uploaded PDF file", ex);
                }

                outputFileName = Path.GetFileNameWithoutExtension(model.PdfFile.FileName) + ".txt";
            }
            else
            {
                // Otherwise, fall back to the URL
                string pdfUrl = model.PdfFileUrl?.Trim();
                if (string.IsNullOrWhiteSpace(pdfUrl))
                    throw new Exception("No PDF file provided: upload a file or specify a URL");

                try
                {
                    if (pdfUrl.StartsWith("file://", StringComparison.OrdinalIgnoreCase))
                    {
                        string localPath = new Uri(pdfUrl).LocalPath;
                        inputPdfBytes = await System.IO.File.ReadAllBytesAsync(localPath);
                    }
                    else
                    {
                        using var httpClient = new System.Net.Http.HttpClient();
                        inputPdfBytes = await httpClient.GetByteArrayAsync(pdfUrl);
                    }
                }
                catch (Exception ex)
                {
                    throw new Exception("Could not download the PDF file from URL", ex);
                }

                outputFileName = Path.GetFileNameWithoutExtension(model.PdfFileUrl) + ".txt";
            }

            // Extract text from the specified PDF page range
            string extractedText = pdfToTextConverter.ConvertToText(inputPdfBytes, startPageNumber, endPageNumber);

            // Encode the extracted text as UTF-8 bytes
            byte[] outputTextBytes = Encoding.UTF8.GetBytes(extractedText);

            // Return the text as a downloadable file
            return File(outputTextBytes, "text/plain; charset=utf-8", outputFileName);
        }

        private PdfToTextViewModel SetViewModel()
        {
            var model = new PdfToTextViewModel();

            HttpRequest request = ControllerContext.HttpContext.Request;
            UriBuilder uriBuilder = new UriBuilder();
            uriBuilder.Scheme = request.Scheme;
            uriBuilder.Host = request.Host.Host;
            if (request.Host.Port != null)
                uriBuilder.Port = (int)request.Host.Port;
            uriBuilder.Path = request.PathBase.ToString() + request.Path.ToString();
            uriBuilder.Query = request.QueryString.ToString();

            string currentPageUrl = uriBuilder.Uri.AbsoluteUri;
            string rootUrl = currentPageUrl.Substring(0, currentPageUrl.Length - "PdfToText".Length);

            model.PdfFileUrl = rootUrl + "/DemoFiles/PdfProcessor/PDF_Document.pdf";

            return model;
        }
    }
}

HiQPdf Next for .NET - Search Text in PDF C# Code Sample for ASP.NET Core

using System;
using System.IO;
using System.Threading.Tasks;
using System.ComponentModel.DataAnnotations;
using Microsoft.AspNetCore.Hosting;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using HiQPdf_Next_AspNetDemo.Models;

using HiQPdf.Next;

namespace HiQPdf_Next_AspNetDemo.Controllers
{
    public class FindPdfTextController : Controller
    {
        private readonly IWebHostEnvironment m_hostingEnvironment;
        public FindPdfTextController(IWebHostEnvironment hostingEnvironment)
        {
            m_hostingEnvironment = hostingEnvironment;
        }

        public IActionResult Index()
        {
            var model = SetViewModel();

            return View(model);
        }

        [HttpPost]
        public async Task<IActionResult> FindPdfText(FindPdfTextViewModel model)
        {
            if (!ModelState.IsValid)
            {
                var errorMessage = ModelStateHelper.GetModelErrors(ModelState);
                throw new ValidationException(errorMessage);
            }

            // Replace the demo serial number with the serial number received upon purchase
            // to run the converter in licensed mode
            Licensing.SerialNumber = "YCgJMTAE-BiwJAhIB-EhlWTlBA-UEBRQFBA-U1FOUVJO-WVlZWQ==";

            // Create the PDF to Text converter instance with default options
            PdfToTextConverter pdfToTextConverter = new PdfToTextConverter();

            // Optionally set the user password to open a password-protected PDF
            if (!string.IsNullOrEmpty(model.UserPassword))
                pdfToTextConverter.UserPassword = model.UserPassword;

            // Optionally set the owner password to open a password-protected PDF
            if (!string.IsNullOrEmpty(model.OwnerPassword))
                pdfToTextConverter.OwnerPassword = model.OwnerPassword;

            // PDF page number to start text search from
            int startPageNumber = model.StartPageNumber;

            // PDF page number to end text search at
            // If 0, search continues to the end of the document
            int endPageNumber = 0;
            if (model.EndPageNumber.HasValue)
                endPageNumber = model.EndPageNumber.Value;

            byte[] inputPdfBytes = null;
            string outputFileName = null;

            // If an uploaded file exists, use it with priority
            if (model.PdfFile != null && model.PdfFile.Length > 0)
            {
                try
                {
                    using var ms = new MemoryStream();
                    await model.PdfFile.CopyToAsync(ms);
                    inputPdfBytes = ms.ToArray();
                }
                catch (Exception ex)
                {
                    throw new Exception("Failed to read the uploaded PDF file", ex);
                }

                outputFileName = Path.GetFileNameWithoutExtension(model.PdfFile.FileName) + "_Highlighted.pdf";
            }
            else
            {
                // Otherwise, fall back to the URL
                string pdfUrl = model.PdfFileUrl?.Trim();
                if (string.IsNullOrWhiteSpace(pdfUrl))
                    throw new Exception("No PDF file provided: upload a file or specify a URL");

                try
                {
                    if (pdfUrl.StartsWith("file://", StringComparison.OrdinalIgnoreCase))
                    {
                        string localPath = new Uri(pdfUrl).LocalPath;
                        inputPdfBytes = await System.IO.File.ReadAllBytesAsync(localPath);
                    }
                    else
                    {
                        using var httpClient = new System.Net.Http.HttpClient();
                        inputPdfBytes = await httpClient.GetByteArrayAsync(pdfUrl);
                    }
                }
                catch (Exception ex)
                {
                    throw new Exception("Could not download the PDF file from URL", ex);
                }

                outputFileName = Path.GetFileNameWithoutExtension(model.PdfFileUrl) + "_Highlighted.pdf";
            }

            // Search text in PDF
            FindTextLocation[] findTextLocations = pdfToTextConverter.FindText(inputPdfBytes, model.TextToFind,
                        startPageNumber, endPageNumber, model.CaseSensitive, model.WholeWord);

            // Open the PDF in editor
            string password = string.IsNullOrEmpty(model.OwnerPassword) ? model.UserPassword : model.OwnerPassword;
            using PdfEditor pdfEditor = new PdfEditor(inputPdfBytes, password);

            // Highlight the found text in PDF
            foreach (FindTextLocation findTextLocation in findTextLocations)
            {
                PdfRectangleElement highlightRectangle = new PdfRectangleElement(findTextLocation.X, findTextLocation.Y,
                    findTextLocation.Width, findTextLocation.Height);
                highlightRectangle.BorderColor = PdfColor.Yellow;

                pdfEditor.AddRectangle(findTextLocation.PageNumber, highlightRectangle);
            }

            // Save the highlighted PDF in a memory buffer
            byte[] outPdfBuffer = pdfEditor.Save();

            // Return the highlighted PDF as a downloadable file
            return File(outPdfBuffer, "application/pdf", outputFileName);
        }

        private FindPdfTextViewModel SetViewModel()
        {
            var model = new FindPdfTextViewModel();

            HttpRequest request = ControllerContext.HttpContext.Request;
            UriBuilder uriBuilder = new UriBuilder();
            uriBuilder.Scheme = request.Scheme;
            uriBuilder.Host = request.Host.Host;
            if (request.Host.Port != null)
                uriBuilder.Port = (int)request.Host.Port;
            uriBuilder.Path = request.PathBase.ToString() + request.Path.ToString();
            uriBuilder.Query = request.QueryString.ToString();

            string currentPageUrl = uriBuilder.Uri.AbsoluteUri;
            string rootUrl = currentPageUrl.Substring(0, currentPageUrl.Length - "FindPdfText".Length);

            model.PdfFileUrl = rootUrl + "/DemoFiles/PdfProcessor/PDF_Document.pdf";

            return model;
        }
    }
}

HiQPdf Next PDF to Text Converter for .NET

Compatibility Platforms

Getting Started

Download Demo Application

NuGet Packages

Installation

HiQPdf.Next Namespace

Sample Code

Features List

Convert or Search PDF Documents From Memory, Stream or File

Convert or Search Password-Protected PDF Documents

Convert or Search the Entire Document or a Range of PDF Pages

Specify the Extracted Text Layout

Mark Page Breaks in the Extracted Text

Case-Sensitive Text Search Option

Whole-Word Text Search Option

Asynchronous Convert and Search Methods

Available on Both Windows and Linux Platforms

Built for .NET Standard 2.0 for Maximum Compatibility

Fully Compatible with Azure App Service and Azure Functions on Both Windows and Linux

NuGet Packages for Windows and Linux

Simple and Flexible Licensing with a Single License for All Libraries

HiQPdf Next for .NET - PDF to Text C# Code Sample for ASP.NET Core

HiQPdf Next for .NET - Search Text in PDF C# Code Sample for ASP.NET Core