|
| The PDF Images Extractor is a component of the
HiQPdf Next Library for .NET
that enables the extraction of images from PDF files in PNG format
while preserving the transparency information available in the PDF.
|
|
| The PDF Images Extractor is distributed as part of the
HiQPdf Next PDF Processor for .NET,
which also includes functionality for converting PDF to text, searching text in PDF and converting PDF pages to images.
|
|
Compatibility Platforms
|
|
| HiQPdf Next for .NET can run on a variety of Windows and Linux platforms in web, console and
desktop applications across all modern .NET platforms.
The library components can be used in Azure App Service and Azure Functions environments
on both Windows and Linux. Deployment to Docker Windows and Linux containers is also supported.
|
|
| The .NET library targets .NET Standard 2.0, which makes it compatible with a wide range of .NET Core and .NET Framework
applications.
|
|
Getting Started
|
|
| The online documentation
contains Getting Started guides for Windows, Linux, Azure App Service and Azure Functions,
with detailed instructions for integrating the library into your application and complete C# examples
for each important feature of the library.
|
|
| You can see the current capabilities of the library by checking the
online demo application for this library
and the API reference in the online documentation.
|
|
Download Demo Application
|
|
|
You can also download a free trial package for .NET, which includes
an ASP.NET demo application project with complete C# source code as a starting point for experimenting with your own usage scenarios.
|
|
|
Running the samples in the demo application that involve HTML to PDF conversion features on Linux platforms
might require installing some dependency packages.
The documentation includes an entire section dedicated to building, publishing and running the demo application on multiple platforms.
|
|
NuGet Packages
|
|
| The PDF Images Extractor is distributed as part of the
HiQPdf.Next.PdfProcessor.Windows
NuGet package when targeting Windows and as part of the
HiQPdf.Next.PdfProcessor.Linux
NuGet package when targeting Linux.
|
|
| The Windows package is referenced by the
HiQPdf.Next.Windows
metapackage for all components and the Linux package is referenced by the
HiQPdf.Next.Linux
metapackage for all components.
|
|
| There are also multiplatform metapackages that reference both the Windows and Linux packages:
HiQPdf.Next.PdfProcessor
for the PDF Processor functionality and
HiQPdf.Next
for the entire HiQPdf Next library.
|
|
Installation
|
|
| The PDF Processor component generally does not require the installation of additional dependencies,
either on Windows or on Linux.
|
|
HiQPdf.Next Namespace
|
|
| All components of the HiQPdf Next for .NET library share the same
HiQPdf.Next
namespace and can be used together in the same application.
To use the library in your own code, add the using directive at the top of your C# source file, as shown below.
|
|
// Include the HiQPdf.Next namespace at the top of your C# file
using HiQPdf.Next;
|
|
|
|
Sample Code
|
|
| After you add the reference to the NuGet package to your project, use the sample code below to extract images from PDF documents.
|
|
// Create the PDF Images Extractor instance with default options
PdfImagesExtractor pdfImagesExtractor = new PdfImagesExtractor();
// Extract the images from the specified PDF file, grouped by page
ExtractedImage[][] extractedImages = pdfImagesExtractor.ExtractImages(pdfFilePath);
|
|
|
|
Features List
|
|
| HiQPdf Next PDF Images Extractor for .NET offers advanced options for extracting images from PDF documents in PNG format while preserving
the transparency information available in the PDF.
You can specify the user and owner passwords to open password-protected PDF documents, as well as the PDF page range to process.
|
|
|
Extract Images from PDF Documents from Memory, Stream or File in PNG Format
|
| You can extract images from a PDF document from a memory buffer, a stream or a file to image objects in memory or to image files.
The extracted images are in PNG format and preserve the transparency information available in the PDF.
|
|
|
|
Extract Images from Password-Protected PDF Documents
|
| If the PDF document you process is password-protected, you must specify the user or owner password used to decrypt the document
before extracting the images from PDF.
|
|
|
|
Extract Images from the Entire PDF or from a Range of PDF Pages
|
| The extract functions allow you to specify the range of PDF pages you want to process,
either starting from a given page number to the end of the document or between specified start and end page numbers.
|
|
|
|
Asynchronous Extract Methods
|
| There are asynchronous variants of the extract methods with the Async suffix
that allow these operations to run in parallel using async and await.
|
|
|
|
Available on Both Windows and Linux Platforms
|
| HiQPdf Next for .NET can run on both Windows 64-bit and Linux 64-bit platforms.
There are different NuGet packages for Windows and Linux, including the same .NET library but with different
native runtimes. For Windows, the minimum required version is Windows 10 or Windows Server 2016.
|
|
|
|
Built for .NET Standard 2.0 for Maximum Compatibility
|
| The .NET library targets .NET Standard 2.0, making it compatible
with a wide range of .NET Core and .NET Framework applications. It is compatible with
.NET 10.0, 9.0, 8.0, 7.0, 6.0, .NET Standard 2.0 and .NET Framework 4.6.2 to 4.8.1.
|
|
|
|
Fully Compatible with Azure App Service and Azure Functions on Both Windows and Linux
|
| The converter can run without restrictions in your Azure App Service and Azure Functions
.NET Core applications targeting both Windows and Linux platforms. Web fonts and other features are fully supported
by HiQPdf Next for .NET.
Online documentation offers detailed usage instructions for Azure applications targeting both Windows and Linux.
|
|
|
|
NuGet Packages for Windows and Linux
|
| HiQPdf Next for .NET is delivered as NuGet packages for Windows and Linux.
The packages include the .NET Standard 2.0 library, the same for both platforms,
and the specific native runtime for each platform.
|
|
|
|
ASP.NET Core Demo Application with C# Code for All Features
|
| The zip package that can be downloaded from the website contains
the project for the ASP.NET Core demo application with C# sample code
for all major library features.
|
|
|
|
Simple and Flexible Licensing with a Single License for All Libraries
|
| The license for HiQPdf Next for .NET works with both the
classic HiQPdf Library for .NET and the multi-platform client-server solution.
There are no additional runtime or deployment costs charged for using our software
component in your applications.
|
|
|
HiQPdf Next for .NET - PDF Images Extractor C# Code Sample for ASP.NET Core
|
|
using System;
using System.IO;
using System.Threading.Tasks;
using System.ComponentModel.DataAnnotations;
using Microsoft.AspNetCore.Hosting;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using HiQPdf_Next_AspNetDemo.Models;
using HiQPdf.Next;
namespace HiQPdf_Next_AspNetDemo.Controllers
{
public class ExtractPdfImagesController : Controller
{
private readonly IWebHostEnvironment m_hostingEnvironment;
public ExtractPdfImagesController(IWebHostEnvironment hostingEnvironment)
{
m_hostingEnvironment = hostingEnvironment;
}
public IActionResult Index()
{
var model = SetViewModel();
return View(model);
}
[HttpPost]
public async Task<IActionResult> ExtractPdfImages(ExtractPdfImagesViewModel model)
{
if (!ModelState.IsValid)
{
var errorMessage = ModelStateHelper.GetModelErrors(ModelState);
throw new ValidationException(errorMessage);
}
// Replace the demo serial number with the serial number received upon purchase
// to run the extractor in licensed mode
Licensing.SerialNumber = "YCgJMTAE-BiwJAhIB-EhlWTlBA-UEBRQFBA-U1FOUVJO-WVlZWQ==";
// Create the PDF Images Extractor instance with default options
PdfImagesExtractor pdfImagesExtractor = new PdfImagesExtractor();
// Optionally set the user password to open a password-protected PDF
if (!string.IsNullOrEmpty(model.UserPassword))
pdfImagesExtractor.UserPassword = model.UserPassword;
// Optionally set the owner password to open a password-protected PDF
if (!string.IsNullOrEmpty(model.OwnerPassword))
pdfImagesExtractor.OwnerPassword = model.OwnerPassword;
// PDF page number to start extraction from
int startPageNumber = model.StartPageNumber;
// PDF page number to end extraction at
// If 0, extraction continues to the end of the document
int endPageNumber = 0;
if (model.EndPageNumber.HasValue)
endPageNumber = model.EndPageNumber.Value;
byte[] inputPdfBytes = null;
string outputFileName = null;
// If an uploaded file exists, use it with priority
if (model.PdfFile != null && model.PdfFile.Length > 0)
{
try
{
using var ms = new MemoryStream();
await model.PdfFile.CopyToAsync(ms);
inputPdfBytes = ms.ToArray();
}
catch (Exception ex)
{
throw new Exception("Failed to read the uploaded PDF file", ex);
}
outputFileName = Path.GetFileNameWithoutExtension(model.PdfFile.FileName);
}
else
{
// Otherwise, fall back to the URL
string pdfUrl = model.PdfFileUrl?.Trim();
if (string.IsNullOrWhiteSpace(pdfUrl))
throw new Exception("No PDF file provided: upload a file or specify a URL");
try
{
if (pdfUrl.StartsWith("file://", StringComparison.OrdinalIgnoreCase))
{
string localPath = new Uri(pdfUrl).LocalPath;
inputPdfBytes = await System.IO.File.ReadAllBytesAsync(localPath);
}
else
{
using var httpClient = new System.Net.Http.HttpClient();
inputPdfBytes = await httpClient.GetByteArrayAsync(pdfUrl);
}
}
catch (Exception ex)
{
throw new Exception("Could not download the PDF file from URL", ex);
}
outputFileName = Path.GetFileNameWithoutExtension(model.PdfFileUrl);
}
// Extract the images from the specified PDF page range, grouped by page
ExtractedImage[][] extractedImages = pdfImagesExtractor.ExtractImages(inputPdfBytes, startPageNumber, endPageNumber);
int nPdfPages = extractedImages.Length;
if (nPdfPages == 1 && extractedImages[0].Length > 0 && model.ExtractLargest)
{
// If only one page was processed and only the largest image is requested, return that image directly
// Return the largest image as a downloadable file
outputFileName += "-largest.png";
ExtractedImage largestImage = GetLargestImage(extractedImages[0]);
return File(largestImage.ImageData, "image/png", outputFileName);
}
else
{
// Build an in-memory ZIP with all page images and return it
using var zipMs = new MemoryStream();
using (var zip = new System.IO.Compression.ZipArchive(zipMs, System.IO.Compression.ZipArchiveMode.Create, leaveOpen: true))
{
for (int pageIdx = 0; pageIdx < extractedImages.Length; pageIdx++)
{
var pageImages = extractedImages[pageIdx];
if (model.ExtractLargest)
{
// Add only the largest image from the page to the ZIP
ExtractedImage largestImage = GetLargestImage(pageImages);
if (largestImage != null)
{
var entry = zip.CreateEntry($"page-{largestImage.PageNumber:000000}-largest.png", System.IO.Compression.CompressionLevel.Fastest);
// Write the image bytes into the ZIP entry
using var entryStream = entry.Open();
entryStream.Write(largestImage.ImageData, 0, largestImage.ImageData.Length);
}
}
else
{
// Add all images from the PDF page to the ZIP
for (int imgIdx = 0; imgIdx < pageImages.Length; imgIdx++)
{
ExtractedImage extractedImage = pageImages[imgIdx];
var entry = zip.CreateEntry($"page-{extractedImage.PageNumber:000000}-{imgIdx:000000}.png", System.IO.Compression.CompressionLevel.Fastest);
// Write the image bytes into the ZIP entry
using var entryStream = entry.Open();
entryStream.Write(extractedImage.ImageData, 0, extractedImage.ImageData.Length);
}
}
}
}
outputFileName += ".zip";
// Copy ZIP memory stream to a byte array
byte[] outputZipBytes = zipMs.ToArray();
// Return the ZIP as a downloadable file
return File(outputZipBytes, "application/zip", outputFileName);
}
}
private ExtractedImage GetLargestImage(ExtractedImage[] extractedImages)
{
ExtractedImage largestImage = null;
int largestSize = 0;
foreach (var image in extractedImages)
{
if (image.ImageData.Length > largestSize)
{
largestImage = image;
largestSize = image.ImageData.Length;
}
}
return largestImage;
}
private ExtractPdfImagesViewModel SetViewModel()
{
var model = new ExtractPdfImagesViewModel();
HttpRequest request = ControllerContext.HttpContext.Request;
UriBuilder uriBuilder = new UriBuilder();
uriBuilder.Scheme = request.Scheme;
uriBuilder.Host = request.Host.Host;
if (request.Host.Port != null)
uriBuilder.Port = (int)request.Host.Port;
uriBuilder.Path = request.PathBase.ToString() + request.Path.ToString();
uriBuilder.Query = request.QueryString.ToString();
string currentPageUrl = uriBuilder.Uri.AbsoluteUri;
string rootUrl = currentPageUrl.Substring(0, currentPageUrl.Length - "ExtractPdfImages".Length);
model.PdfFileUrl = rootUrl + "/DemoFiles/PdfProcessor/PDF_Document.pdf";
return model;
}
}
}
|
|