Getting the HTML code of a web page can be useful when converting a web page to PDF in a certain context or state, for example, when you are already authenticated in an ASP.NET application and you want to convert a web page which is accessible only if you are authenticated, or if you want to convert an ASP.NET web page after some values were filled in a form. In these situation a possible solution is to get the HTML code being sent to browser and convert it to PDF, optionally providing a base URL used to resolve images, CSS and script files.
In this section will be presented three practical methods of getting the HTML code of web page using the HttpWebRequest class, overriding the Render method of the ASP.NET pages and calling the Server.Execute method from ASP.NET.
The System.NetHttpWebRequest class can be used to retreive the HTML code of a web page. HTTP cookies and headers, authentication credentials, proxy and other options can be set before accessing the web page. Below there is a simple example of getting the HTML code of a web page and converting it to PDF.
using System.Net; using System.IO; using System.Text; using HiQPdf; protected void buttonGetHtmlCode_Click(object sender, EventArgs e) { // the URL of the web page from where to retrieve the HTML code string url = textBoxUrl.Text; // create the HTTP request HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url); // Set credentials to use for this request request.Credentials = CredentialCache.DefaultCredentials; HttpWebResponse response = (HttpWebResponse)request.GetResponse(); long contentLength = response.ContentLength; string contentType = response.ContentType; // Get the stream associated with the response Stream receiveStream = response.GetResponseStream(); // Pipes the stream to a higher level stream reader with the required encoding format StreamReader readStream = new StreamReader(receiveStream, Encoding.UTF8); // get the HTML code of the web page string htmlCode = readStream.ReadToEnd(); // close the response and response stream response.Close(); readStream.Close(); // convert the HTML code to PDF // create the HTML to PDF converter HtmlToPdf htmlToPdfConverter = new HtmlToPdf(); // the base URL used to resolve images, CSS and script files string baseUrl = url; // convert HTML code to a PDF memory buffer byte[] pdfBuffer = htmlToPdfConverter.ConvertHtmlToMemory(htmlCode, baseUrl); // inform the browser about the binary data format HttpContext.Current.Response.AddHeader("Content-Type", "application/pdf"); // let the browser know how to open the PDF document, attachment or inline, and the file name HttpContext.Current.Response.AddHeader("Content-Disposition", String.Format("attachment; filename=HtmlToPdf.pdf; size={0}", pdfBuffer.Length.ToString())); // write the PDF buffer to HTTP response HttpContext.Current.Response.BinaryWrite(pdfBuffer); // call End() method of HTTP response to stop ASP.NET page processing HttpContext.Current.Response.End(); }
The PageRender(HtmlTextWriter) method of the ASP.NET page can be overridden to get the HTML code of the page as it would be sent to the browser. Using this method it is even possible to capture the values entered in a web form and posted back to ASP.NET page when a button in page is pressed. Below there is a simple example of getting the HTML code of the current ASP.NET page and converting it to PDF if a 'Convert to PDF' button was pressed.
using System.Text; using System.IO; using HiQPdf; namespace WebApplication { public partial class GetHtmlCode : System.Web.UI.Page { bool convertToPdf = false; protected override void Render(HtmlTextWriter writer) { if (convertToPdf) { // setup a TextWriter to capture the current page HTML code TextWriter tw = new StringWriter(); HtmlTextWriter htw = new HtmlTextWriter(tw); // render the HTML markup into the TextWriter base.Render(htw); // get the current page HTML code string htmlCode = tw.ToString(); // convert the HTML code to PDF // create the HTML to PDF converter HtmlToPdf htmlToPdfConverter = new HtmlToPdf(); // the base URL used to resolve images, CSS and script files string currentPageUrl = HttpContext.Current.Request.Url.AbsoluteUri; // convert HTML code to a PDF memory buffer byte[] pdfBuffer = htmlToPdfConverter.ConvertHtmlToMemory(htmlCode, currentPageUrl); // inform the browser about the binary data format HttpContext.Current.Response.AddHeader("Content-Type", "application/pdf"); // let the browser know how to open the PDF document, attachment or inline, and the file name HttpContext.Current.Response.AddHeader("Content-Disposition", String.Format("attachment; filename=HtmlToPdf.pdf; size={0}", pdfBuffer.Length.ToString())); // write the PDF buffer to HTTP response HttpContext.Current.Response.BinaryWrite(pdfBuffer); // call End() method of HTTP response to stop ASP.NET page processing HttpContext.Current.Response.End(); } else { base.Render(writer); } } protected void buttonConvertCurrentPageToPdf_Click(object sender, EventArgs e) { convertToPdf = true; } } }
The HttpServerUtilityExecute(String, TextWriter) method can be called from an ASP.NET page to get the HTML code of another ASP.NET page in the same application. The ASP.NET page for which to retrieve the HTML code is accessed in the session of the calling ASP.NET page. Below there is a simple example of getting the HTML code of an ASP.NET page and converting it to PDF when a 'Convert to PDF' button from the current page is pressed.
using System.IO; using HiQPdf; protected void buttonConvertToPdf_Click(object sender, EventArgs e) { // setup a TextWriter to capture the HTML code of the page to convert TextWriter tw = new StringWriter(); // execute the 'AspNetPage.aspx' page in the same application and capture the HTML code Server.Execute("AspNetPage.aspx", tw); // get the HTML code from writer string htmlCode = tw.ToString(); // convert the HTML code to PDF // create the HTML to PDF converter HtmlToPdf htmlToPdfConverter = new HtmlToPdf(); // the base URL used to resolve images, CSS and script files string baseUrl = HttpContext.Current.Request.Url.AbsoluteUri; // convert HTML code to a PDF memory buffer byte[] pdfBuffer = htmlToPdfConverter.ConvertHtmlToMemory(htmlCode, baseUrl); // inform the browser about the binary data format HttpContext.Current.Response.AddHeader("Content-Type", "application/pdf"); // let the browser know how to open the PDF document, attachment or inline, and the file name HttpContext.Current.Response.AddHeader("Content-Disposition", String.Format("attachment; filename=HtmlToPdf.pdf; size={0}", pdfBuffer.Length.ToString())); // write the PDF buffer to HTTP response HttpContext.Current.Response.BinaryWrite(pdfBuffer); // call End() method of HTTP response to stop ASP.NET page processing HttpContext.Current.Response.End(); }
