Bulk convert HTML, RTF, etc. documents to PDF using the Mac OS X command line or an AppleScript

OS X has PDF printing support built into all applications, and this PDF support can be used from the command line or in an AppleScript script / application to convert virtually any kind of document to the PDF format.

In the example below there are five HTML files (file1.html, file2.html, …, file5.html) that have to be converted to a single PDF (all.pdf). To accomplish this on the command line, open up a new Terminal window, go to the folder the HTML files are stored in, and optionally execute this command to merge them into one file:

cat * > all.html

Next, call Apple’s ‘convert‘ utility (this is not the ‘convert’ utility related to ImageMagick) and specify the input and output filenames:

/System/Library/Printers/Libraries/convert -f all.html
  -o all.pdf

The ‘convert‘ utility will build the output file and write it to the folder your Terminal shell is currently focused on. The command to convert to / from other filetypes is similar, except that you can’t ‘cat‘ the input files if they’re not text-based.

This functionality can be automated by writing an AppleScript application that makes use of the command line tool. Enter the following code into Script Editor (found in the Utilities folder in your Applications folder):

on open input_documents
  repeat with this_document in input_documents
    set this_document_path to POSIX path of this_document
    --display dialog this_document_path
    do shell script "/System/Library/Printers/Libraries/convert -f " & quoted form of this_document_path & " -o " & quoted form of this_document_path & ".pdf"
  end repeat
end open

Click ‘File->Save‘ to save the script’s source code, then ‘File->Save As…‘ and change the File Format to Application to save an executeable (application) version of the script. If you drop documents on this application, it will automatically launch and create PDF copies of the documents in their source folder.

 

Related posts:

  1. Change the OS X Network Location from the command line or an AppleScript application
  2. Resize Images Using AppleScript
  3. Mac OS X Quick Tip: Launch the screen saver with a bash script using AppleScript
  4. Mac OS X Quick Tip: Using Spotlight to search from the command line
  5. Editing, Validating and Querying XML with the XMLStarlet command line utility
Twitter Digg Delicious Stumbleupon Technorati Facebook Email

8 Responses to “Bulk convert HTML, RTF, etc. documents to PDF using the Mac OS X command line or an AppleScript”

  1. If i use the above command for converting html to pdf , the generated pdf not formatted properly.
    Please help me for generating pdf in required format.

  2. Hey Murthy

    If it’s a multi-page HTML document you’re trying to convert, maybe you should try using the “Save to PDF” feature available in all Cocoa apps (in the browser, click File->Print->PDF->Save as PDF).

    Alternatively (if you’re trying to capture a webpage that’s not too image-heavy) you could use an application like LittleSnapper to grab a full-page screenshot, then open that screenshot in Preview and save it to a PDF.

    Let me know if either of those suggestions work!

  3. @willem
    This is really a nice command! Thanks a lot for sharing it. Do you know of any documentation available for this command? I looked briefly but could not find anything quickly. I mean the help you get from the command itself is enough to get you going but still ;)

  4. Hi constantin

    Sorry, the only available documentation seems to be the built-in help:

    Usage: convert [ options ]
     
    Options:
     
      -f filename          Set file to be converted (otherwise stdin)
      -o filename          Set file to be generated (otherwise stdout)
      -i mime/type         Set input MIME type (otherwise auto-typed)
      -j mime/type         Set output MIME type (otherwise application/pdf)
      -P filename.ppd      Set PPD file
      -a 'name=value ...'  Set option(s)
      -U username          Set username for job
      -J title             Set title
      -c copies            Set number of copies
      -u                   Remove the PPD file when finished
      -D                   Remove the input file when finished
  5. Jesse Bachrach Reply 28 Oct 2009 at 03:18

    First, thanks so much for this very helpful seeming tip. I’ve messed around some with Automator, but I am a scripting newbie and I am scared of terminal so I am trying to do it in Apple Script, but I keep getting this warning, and then it not working:
    convert: Unable to determine MIME type of, and then the path to the file I dropped onto it.
    Can you tell me what I am doing wrong?
    My ultimate goal is to be able to print mail .doc attachments as pdfs so they load faster for me to open them so I don’t need to be working in Word.
    Thanks so much for any assistance.

  6. Hey Jesse

    When you say you want to convert mail .doc attachments, just keep in mind that you shouldn’t convert them directly in the “~/Library/Mail Downloads/”, else Mail won’t recognize them as the attachments on the messages.

    First, save the attachments somewhere else (e.g. a folder on your Desktop), then use the above script.

    I think the reason you’re getting that error message might be because it’s a Word document (.doc or .docx) that you’re trying to convert. The Microsoft Document standard is a closed standard and most apps can’t read it (in the case of applications like OpenOffice and Pages, their developers had to reverse-engineer the .DOC and .DOCX formats to be able to read and write them).

    I think unfortunately you’re either going to have to convert the DOC / DOCX files to RTF first, then run the above scripts, or print the files directly to PDF when you have them open. :/

  7. If the html file has an HTML Tag (image import), then the PDF produced simply has an icon for an image, but not the image.

Trackbacks/Pingbacks

  1. How to print emails from Mutt to PDF files on Mac OS X - OnMercury - 29 Jul 2009

    [...] I know about the muttprint package, but just needed a quick fix. I have got the idea for this hack from this article : http://www.geekology.co.za/blog/2009/02/bulk-convert-html-rtf-to-pdf-using-mac-os-x-command-line-or-... [...]

Afrigator