Wednesday, August 12, 2009

Convert CHM files to HTML/PDF


A few years back reading book involved going to the neighborhood book shop,purchasing the book and then finding cozy place to sit and read the book . However with the advent of Internet and computing with laptops getting smaller ,less bulkier and cooler and the easy availability of Ebooks on the internet scenario has changed.These days you could go to online book shop and purchase Ebook (any time of day ) and start reading them immediately , all this sitting in your Bed.However most of these Ebooks are in CHM format (Microsoft Compiled HTML Help) ,which is a native documentation format of Windows operating system. CHM basically combines HTML and it's associated images together into a single (.chm) file.

Now by default Ubuntu and many other Linux distributions do not include support for opening (.chm) files out of box owing to CHM file being proprietary file format of Windows operating system. There are viewers available on Linux which allows you to open these files , as i had highlighted in my previous article (Read Here ). Still if you want to convert (.chm) files to (.html) or (.pdf) , maybe for sending them to your friend who does not have this chm viewer installed you can do so easily .

First open Terminal from (Applications -> Accessories -> Terminal ) and issue the following command to install chmlib
sudo apt-get install libchm-bin
chmlib allows extracting HTML files and images from (.chm) files. Now if you want to convert extracted HTML files into PDF, PS etc , you would need to install htmldoc which you could install easily by issuing the following command in the terminal window :
sudo apt-get install htmldoc

Converting CHM files to HTML and eventually PDF

Now suppose you have a file named "Primer.chm" from which you want to extract HTML files and images into "Primer" directory , you could do so easily by issuing the following command in the terminal window :
extract_chmLib Primer.chm Primer
This should quickly extract all the HTML files and associated images from the chm file and put it into Primer directory.

Now once you have extracted the HTML , you are ready to convert them and combine them into a single (.pdf) file . Open the Terminal Window (Applications -> Accessories -> Terminal ) and issue the following command in the terminal window to launch "htmldoc"
Once htmldoc finishes loading its interface ,click on Continuous radio button and press "Add Files..." and add all the files you would like to combine into single PDF document, as shown in image below :

After choosing all the HTML files you would like to combine , click on the "output" tab and chose output file type to be PDF and the name and location of the finally generated PDF file.If you want you could change compression level , whether you want output to be in Grayscale etc .
Finally press the "Generate" button to actually start the process of combining (.html ) files with their images into single (.pdf) file .
Files being combined into single (.pdf) file
The entire process of combining (.html) files into (.pdf) files should not take more than few minutes , infact on my Core 2 Duo based laptop entire process of combining about 1000 page long book in HTML format to PDF format took 4 minutes.

No comments:

Post a Comment