HiQPdf Next PDF Images Extractor for .NET

HiQPdf Next PDF Images Extractor for .NET

The PDF Images Extractor is a component of the HiQPdf Next Library for .NET that enables the extraction of images from PDF files. The extracted images are in PNG format and preserve the transparency information available in the PDF.

You can deploy the library on a variety of Windows and Linux platforms including Azure App Service and Functions or Docker.

Download Now
Online Demo
Documentation

The PDF Images Extractor is a component of the HiQPdf Next Library for .NET that enables the extraction of images from PDF files in PNG format while preserving the transparency information available in the PDF.
The PDF Images Extractor is distributed as part of the HiQPdf Next PDF Processor for .NET, which also includes functionality for converting PDF to text, searching text in PDF and converting PDF pages to images.

Compatibility Platforms

HiQPdf Next for .NET can run on a variety of Windows and Linux platforms in web, console and desktop applications across all modern .NET platforms. The library components can be used in Azure App Service and Azure Functions environments on both Windows and Linux. Deployment to Docker Windows and Linux containers is also supported.
The .NET library targets .NET Standard 2.0, which makes it compatible with a wide range of .NET Core and .NET Framework applications.

Getting Started

The online documentation contains Getting Started guides for Windows, Linux, Azure App Service and Azure Functions, with detailed instructions for integrating the library into your application and complete C# examples for each important feature of the library.
You can see the current capabilities of the library by checking the online demo application for this library and the API reference in the online documentation.

Download Demo Application

You can also download a free trial package for .NET, which includes an ASP.NET demo application project with complete C# source code as a starting point for experimenting with your own usage scenarios.
Running the samples in the demo application that involve HTML to PDF conversion features on Linux platforms might require installing some dependency packages. The documentation includes an entire section dedicated to building, publishing and running the demo application on multiple platforms.

NuGet Packages

The PDF Images Extractor is distributed as part of the HiQPdf.Next.PdfProcessor.Windows NuGet package when targeting Windows and as part of the HiQPdf.Next.PdfProcessor.Linux NuGet package when targeting Linux.
The Windows package is referenced by the HiQPdf.Next.Windows metapackage for all components and the Linux package is referenced by the HiQPdf.Next.Linux metapackage for all components.
There are also multiplatform metapackages that reference both the Windows and Linux packages: HiQPdf.Next.PdfProcessor for the PDF Processor functionality and HiQPdf.Next for the entire HiQPdf Next library.

Installation

The PDF Processor component generally does not require the installation of additional dependencies, either on Windows or on Linux.

HiQPdf.Next Namespace

All components of the HiQPdf Next for .NET library share the same HiQPdf.Next namespace and can be used together in the same application. To use the library in your own code, add the using directive at the top of your C# source file, as shown below.
// Include the HiQPdf.Next namespace at the top of your C# file
using HiQPdf.Next;

Sample Code

After you add the reference to the NuGet package to your project, use the sample code below to extract images from PDF documents.
// Create the PDF Images Extractor instance with default options
PdfImagesExtractor pdfImagesExtractor = new PdfImagesExtractor();

// Extract the images from the specified PDF file, grouped by page
ExtractedImage[][] extractedImages = pdfImagesExtractor.ExtractImages(pdfFilePath);

Features List

HiQPdf Next PDF Images Extractor for .NET offers advanced options for extracting images from PDF documents in PNG format while preserving the transparency information available in the PDF. You can specify the user and owner passwords to open password-protected PDF documents, as well as the PDF page range to process.

Extract Images from PDF Documents from Memory, Stream or File in PNG Format

You can extract images from a PDF document from a memory buffer, a stream or a file to image objects in memory or to image files. The extracted images are in PNG format and preserve the transparency information available in the PDF.

Extract Images from Password-Protected PDF Documents

If the PDF document you process is password-protected, you must specify the user or owner password used to decrypt the document before extracting the images from PDF.

Extract Images from the Entire PDF or from a Range of PDF Pages

The extract functions allow you to specify the range of PDF pages you want to process, either starting from a given page number to the end of the document or between specified start and end page numbers.

Asynchronous Extract Methods

There are asynchronous variants of the extract methods with the Async suffix that allow these operations to run in parallel using async and await.

Available on Both Windows and Linux Platforms

HiQPdf Next for .NET can run on both Windows 64-bit and Linux 64-bit platforms. There are different NuGet packages for Windows and Linux, including the same .NET library but with different native runtimes. For Windows, the minimum required version is Windows 10 or Windows Server 2016.

Built for .NET Standard 2.0 for Maximum Compatibility

The .NET library targets .NET Standard 2.0, making it compatible with a wide range of .NET Core and .NET Framework applications. It is compatible with .NET 10.0, 9.0, 8.0, 7.0, 6.0, .NET Standard 2.0 and .NET Framework 4.6.2 to 4.8.1.

Fully Compatible with Azure App Service and Azure Functions on Both Windows and Linux

The converter can run without restrictions in your Azure App Service and Azure Functions .NET Core applications targeting both Windows and Linux platforms. Web fonts and other features are fully supported by HiQPdf Next for .NET. Online documentation offers detailed usage instructions for Azure applications targeting both Windows and Linux.

NuGet Packages for Windows and Linux

HiQPdf Next for .NET is delivered as NuGet packages for Windows and Linux. The packages include the .NET Standard 2.0 library, the same for both platforms, and the specific native runtime for each platform.
ASP.NET Core Demo Application with C# Code for All Features
The zip package that can be downloaded from the website contains the project for the ASP.NET Core demo application with C# sample code for all major library features.

Simple and Flexible Licensing with a Single License for All Libraries

The license for HiQPdf Next for .NET works with both the classic HiQPdf Library for .NET and the multi-platform client-server solution. There are no additional runtime or deployment costs charged for using our software component in your applications.

HiQPdf Next for .NET - PDF Images Extractor C# Code Sample for ASP.NET Core

using System;
using System.IO;
using System.Threading.Tasks;
using System.ComponentModel.DataAnnotations;
using Microsoft.AspNetCore.Hosting;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using HiQPdf_Next_AspNetDemo.Models;

using HiQPdf.Next;

namespace HiQPdf_Next_AspNetDemo.Controllers
{
    public class ExtractPdfImagesController : Controller
    {
        private readonly IWebHostEnvironment m_hostingEnvironment;
        public ExtractPdfImagesController(IWebHostEnvironment hostingEnvironment)
        {
            m_hostingEnvironment = hostingEnvironment;
        }

        public IActionResult Index()
        {
            var model = SetViewModel();

            return View(model);
        }

        [HttpPost]
        public async Task<IActionResult> ExtractPdfImages(ExtractPdfImagesViewModel model)
        {
            if (!ModelState.IsValid)
            {
                var errorMessage = ModelStateHelper.GetModelErrors(ModelState);
                throw new ValidationException(errorMessage);
            }

            // Replace the demo serial number with the serial number received upon purchase
            // to run the extractor in licensed mode
            Licensing.SerialNumber = "YCgJMTAE-BiwJAhIB-EhlWTlBA-UEBRQFBA-U1FOUVJO-WVlZWQ==";

            // Create the PDF Images Extractor instance with default options
            PdfImagesExtractor pdfImagesExtractor = new PdfImagesExtractor();

            // Optionally set the user password to open a password-protected PDF
            if (!string.IsNullOrEmpty(model.UserPassword))
                pdfImagesExtractor.UserPassword = model.UserPassword;

            // Optionally set the owner password to open a password-protected PDF
            if (!string.IsNullOrEmpty(model.OwnerPassword))
                pdfImagesExtractor.OwnerPassword = model.OwnerPassword;

            // PDF page number to start extraction from
            int startPageNumber = model.StartPageNumber;

            // PDF page number to end extraction at
            // If 0, extraction continues to the end of the document
            int endPageNumber = 0;
            if (model.EndPageNumber.HasValue)
                endPageNumber = model.EndPageNumber.Value;

            byte[] inputPdfBytes = null;
            string outputFileName = null;

            // If an uploaded file exists, use it with priority
            if (model.PdfFile != null && model.PdfFile.Length > 0)
            {
                try
                {
                    using var ms = new MemoryStream();
                    await model.PdfFile.CopyToAsync(ms);
                    inputPdfBytes = ms.ToArray();
                }
                catch (Exception ex)
                {
                    throw new Exception("Failed to read the uploaded PDF file", ex);
                }

                outputFileName = Path.GetFileNameWithoutExtension(model.PdfFile.FileName);
            }
            else
            {
                // Otherwise, fall back to the URL
                string pdfUrl = model.PdfFileUrl?.Trim();
                if (string.IsNullOrWhiteSpace(pdfUrl))
                    throw new Exception("No PDF file provided: upload a file or specify a URL");

                try
                {
                    if (pdfUrl.StartsWith("file://", StringComparison.OrdinalIgnoreCase))
                    {
                        string localPath = new Uri(pdfUrl).LocalPath;
                        inputPdfBytes = await System.IO.File.ReadAllBytesAsync(localPath);
                    }
                    else
                    {
                        using var httpClient = new System.Net.Http.HttpClient();
                        inputPdfBytes = await httpClient.GetByteArrayAsync(pdfUrl);
                    }
                }
                catch (Exception ex)
                {
                    throw new Exception("Could not download the PDF file from URL", ex);
                }

                outputFileName = Path.GetFileNameWithoutExtension(model.PdfFileUrl);
            }

            // Extract the images from the specified PDF page range, grouped by page
            ExtractedImage[][] extractedImages = pdfImagesExtractor.ExtractImages(inputPdfBytes, startPageNumber, endPageNumber);

            int nPdfPages = extractedImages.Length;
            if (nPdfPages == 1 && extractedImages[0].Length > 0 && model.ExtractLargest)
            {
                // If only one page was processed and only the largest image is requested, return that image directly
                // Return the largest image as a downloadable file
                outputFileName += "-largest.png";
                ExtractedImage largestImage = GetLargestImage(extractedImages[0]);
                return File(largestImage.ImageData, "image/png", outputFileName);
            }
            else
            {
                // Build an in-memory ZIP with all page images and return it
                using var zipMs = new MemoryStream();
                using (var zip = new System.IO.Compression.ZipArchive(zipMs, System.IO.Compression.ZipArchiveMode.Create, leaveOpen: true))
                {
                    for (int pageIdx = 0; pageIdx < extractedImages.Length; pageIdx++)
                    {
                        var pageImages = extractedImages[pageIdx];
                        if (model.ExtractLargest)
                        {
                            // Add only the largest image from the page to the ZIP
                            ExtractedImage largestImage = GetLargestImage(pageImages);
                            if (largestImage != null)
                            {
                                var entry = zip.CreateEntry($"page-{largestImage.PageNumber:000000}-largest.png", System.IO.Compression.CompressionLevel.Fastest);
                                // Write the image bytes into the ZIP entry
                                using var entryStream = entry.Open();
                                entryStream.Write(largestImage.ImageData, 0, largestImage.ImageData.Length);
                            }
                        }
                        else
                        {
                            // Add all images from the PDF page to the ZIP
                            for (int imgIdx = 0; imgIdx < pageImages.Length; imgIdx++)
                            {
                                ExtractedImage extractedImage = pageImages[imgIdx];
                                var entry = zip.CreateEntry($"page-{extractedImage.PageNumber:000000}-{imgIdx:000000}.png", System.IO.Compression.CompressionLevel.Fastest);

                                // Write the image bytes into the ZIP entry
                                using var entryStream = entry.Open();
                                entryStream.Write(extractedImage.ImageData, 0, extractedImage.ImageData.Length);
                            }
                        }
                    }
                }

                outputFileName += ".zip";

                // Copy ZIP memory stream to a byte array
                byte[] outputZipBytes = zipMs.ToArray();

                // Return the ZIP as a downloadable file
                return File(outputZipBytes, "application/zip", outputFileName);
            }
        }

        private ExtractedImage GetLargestImage(ExtractedImage[] extractedImages)
        {
            ExtractedImage largestImage = null;
            int largestSize = 0;
            foreach (var image in extractedImages)
            {
                if (image.ImageData.Length > largestSize)
                {
                    largestImage = image;
                    largestSize = image.ImageData.Length;
                }
            }
            return largestImage;
        }

        private ExtractPdfImagesViewModel SetViewModel()
        {
            var model = new ExtractPdfImagesViewModel();

            HttpRequest request = ControllerContext.HttpContext.Request;
            UriBuilder uriBuilder = new UriBuilder();
            uriBuilder.Scheme = request.Scheme;
            uriBuilder.Host = request.Host.Host;
            if (request.Host.Port != null)
                uriBuilder.Port = (int)request.Host.Port;
            uriBuilder.Path = request.PathBase.ToString() + request.Path.ToString();
            uriBuilder.Query = request.QueryString.ToString();

            string currentPageUrl = uriBuilder.Uri.AbsoluteUri;
            string rootUrl = currentPageUrl.Substring(0, currentPageUrl.Length - "ExtractPdfImages".Length);

            model.PdfFileUrl = rootUrl + "/DemoFiles/PdfProcessor/PDF_Document.pdf";

            return model;
        }
    }
}