Generating PDF Document with NodeJS and PHP
Posted on 2012/04/14
Was working on my site - Curriculum Vitae - and needed to generate PDF's and extract data from PDF files for the users to download their CV in PDF Format.
Had the following Requirements for the Project:
- Generate a PDF Document from HTML. Had to be able to generate PDF Documents that look just like the HTML Templates I created.
- Generate a Image from HTML. Used for Previews of your CV.
- Extract a Image from a PDF. Need one big image of the entire document, going to use this to add attachments documents to all generated CV's.
So how are we going to do this?
Was using a good old Python Script I wrote a couple of years ago but this just seems clanky:
from PySide.QtCore import * from PySide.QtGui import * from PySide.QtWebKit import * app = QApplication(sys.argv) web = QWebView() if "http://" in sys.argv[1]: web.load(QUrl(sys.argv[1])) else: f = open(sys.argv[1], 'rb') content = "".join(f.readlines()) web.setHtml(content) printer = QPrinter() printer.setPageSize(QPrinter.A4) printer.setOutputFormat(QPrinter.PdfFormat) printer.setFullPage(True) printer.setOutputFileName(sys.argv[2]) import os def convertIt(): print sys.argv[2] QApplication.exit() QObject.connect(web, SIGNAL("loadFinished(bool)"), convertIt) sys.exit(app.exec_())
This worked for the problem requirements and PySide is a brilliant and fast binding for Python but wanted a solution where I did not want to write a temporary file and did not want to keep this old script floating above the water. I't needs to retire some time.
Then I found WKHtmlToPDF. Was very excited , finally someone created a binary that would do what I've been doing with the PySide binding.
Checked the documentation and the binary even allowed input and output with stdin and stdout! Which I could have done with Python too, but that's more effort on my time. This allowed me to create a PDF that I could easily write out to the client. Bingo!
Required Actions before we can start
- Download WKHtmlToPDF or WKHtmlToImage and remember to get the static qt patch version.
- Create a alias for wkhtmltopdf or / and wkhtmltopdf which points to the binaries.
- Install GhostScript to generate Images from PDF. Ensure the Convert command is available.
How to do this in PHP
The site is mostly written in PHP so did a PHP wrapper first.
Generate a PDF from HTML
/** * Returns the Binary Content of the Generated PDF from the HTML * @author Johann du Toit */ function pdf_from_html($html) { $descriptorspec = array( 0 => array('pipe', 'r'), // stdin 1 => array('pipe', 'w'), // stdout 2 => array('pipe', 'w'), // stderr ); // Send the HTML on stdin fwrite($pipes[0], $html); fclose($pipes[0]); // Read the outputs $contents = stream_get_contents($pipes[1]); $errors = stream_get_contents($pipes[2]); fclose($pipes[1]); $return_value = proc_close($process); return $contents; }
Generate a Image FROM HTML
/** * Returns the Binary Content of the Image from the HTML * @author Johann du Toit */ function image_from_html($html) { $descriptorspec = array( 0 => array('pipe', 'r'), // stdin 1 => array('pipe', 'w'), // stdout 2 => array('pipe', 'w'), // stderr ); $process = proc_open('wkhtmltoimage -q - -', $descriptorspec, $pipes); // Send the HTML on stdin fwrite($pipes[0], $html); fclose($pipes[0]); // Read the outputs $contents = stream_get_contents($pipes[1]); $errors = stream_get_contents($pipes[2]); fclose($pipes[1]); $return_value = proc_close($process); return $contents; }
Generate a Image of a Document from PDF
/** * Returns the Binary Content of a Image Generated from a PDF * @author Johann du Toit */ function image_from_pdf($pdf_path) { $descriptorspec = array( 0 => array('pipe', 'r'), // stdin 1 => array('pipe', 'w'), // stdout 2 => array('pipe', 'w'), // stderr ); $process = proc_open('convert -density 350% -quality 85 -append pdf:- png:-', $descriptorspec, $pipes); // Send the HTML on stdin fwrite($pipes[0], file_get_contents($pdf_path)); fclose($pipes[0]); // Read the outputs $contents = stream_get_contents($pipes[1]); $errors = stream_get_contents($pipes[2]); fclose($pipes[1]); $return_value = proc_close($process); return $contents; }
How to do this in NodeJS
Generate a PDF from HTML
var util = require('util'), spawn = require('child_process').spawn; /** * Returns the Binary Content of the PDF Generated from the HTML * @author Johann du Toit */ function html_to_pdf(html, fn, err) { var dt = false; child_process.on('exit', function (code) { fn(dt); }); child_process.stdout.on('data', function (data) { dt = data; }); child_process.stderr.on('data', function (data) { dt = data; }); child_process.stdin.write(html); child_process.stdin.end(); }
Generate a Image FROM HTML
var util = require('util'), spawn = require('child_process').spawn; /** * Returns a Image created from the HTML given to the method. * @author Johann du Toit */ function html_to_image(html, fn, err) { var child_process = spawn('wkhtmltoimage', ['-', '-']); var dt = false; child_process.on('exit', function (code) { if(code 0) fn(dt); else err(dt); }); child_process.stdout.on('data', function (data) { dt = data; }); child_process.stderr.on('data', function (data) { dt = data; }); child_process.stdin.write(html); child_process.stdin.end(); }
Generate a Image of a Document from PDF
var util = require('util'), spawn = require('child_process').spawn; /** * Returns the Binary Content of a Image Generated from a PDF * @author Johann du Toit */ function pdf_to_image(pdf_content, fn, err) { var child_process = spawn('convert', ['-density', '350%', '-quality', '85', '-append', 'pdf:-', 'png:-']); var dt = false; child_process.on('exit', function (code) { if(code 0) fn(dt); else err(dt); }); child_process.stdout.on('data', function (data) { dt = data; }); child_process.stderr.on('data', function (data) { dt = data; }); child_process.stdin.write(pdf_content); child_process.stdin.end(); }
And that's it
There you have, generate PDF Document from Various Input in PHP and NodeJS. Not anything advance but always good to have in your toolbox.
Have a better way ? Let me know !
What's currently keeping me busy

Testing and keeping websites safe

Tech/product of new incubating startups

Advocate and educate on the Google Cloud

Easy prescribed book management

Loadshedding being constantly updated and watched

Secret management for PAAS

National microchip database

Youtube channel of edited meetup talks

Gaming Youtube Channel