[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

File conversion utility



Dear All,

attached is a re-working of my file conversion utility I posted to the
list back in April 2005. This version is substantially changed and I
have taken on some of the suggestions I recieved. In particular 

    * Now uses emacs custom facility for managing customizable
      variables. 

    * The functionality is now integrated into the view-file function.
      Now if you try to view a file which is a *.doc, *.ps *.pdf or
      *.ppt, the file will be automatically converted to either plain
      text or HTML (for ppt). 

    * In the case of HTML files, this utility now uses the browse-url
      package to render the buffer contents as a web page. So if you
      are in dired and hit v for view file and the file is either an
      HTML file or a PowerPoint file that will be converted to HTML,
      after view-file has put the contents into the buffer, the
      default browser configured in the standard browse-url package
      will render the contents as a web page. 

Note that I have developed this utility under emacs 22, but it should
work under emacs 21 as well. Only very minimal testing has been done
and I expect there are some bugs. Bottom line, this is probably best
considered a work in progress. 

To use this utility, put it somewhere in your load path and then add
the line 

(require 'txutils)

to your .emacs. 

You need to have the external packages for doing the file conversion.
See the commentary at the beginning of the source file. Most of these
utilities are available as part of well appointed Linux distributions.
For Debian users, all the utilities are only an apt-get away!

You will also want to make sure you have an appropriate browser
defined for browse-url. Currently, I'm using w3m, but I have used w3
as well. 

As usual, feedback, bug reports, patches or general suggestions always
welcome. See the file header for contact address etc. 

regards,

Tim

;;      Filename: /home/tcross/projects/emacs-convert/txutils.el
;; Creation Date: Wednesday, 20 September 2006 10:13 PM EST
;; Last Modified: Sunday, 24 September 2006 05:35 PM EST
;;        Author: Tim Cross <tcross@une.edu.au>
;;   Description: Convert files from doc, ps, pdf, ppt to a format
;;                which can be viewed within emacs (i.e. text or html)

;;; Copyright (C) 2006. Tim Cross <tcross@une.edu.au>
;;; All Rights Reserved.
;;;
;;; This file is not part of GNU Emacs, but the same permissions apply.
;;;
;;; GNU Emacs is free software; you can redistribute it and/or modify
;;; it under the terms of the GNU General Public License as published by
;;; the Free Software Foundation; either version 2, or (at your option)
;;; any later version.
;;;
;;; GNU Emacs is distributed in the hope that it will be useful,
;;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;;; GNU General Public License for more details.
;;;
;;; You should have received a copy of the GNU General Public License
;;; along with GNU Emacs; see the file COPYING.  If not, write to
;;; the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
;;;
;;; Commentary
;;; ==========
;;;
;;; The very simple idea behind this basic utility is to make accessing
;;; files in .doc, .pdf, .ps and .ppt more easily accessible without
;;; having to leave emacs or manually convert the file format prior
;;; to being able to view the contents in emacs.
;;;
;;; There are packages which will enable calls to external viewers
;;; for files of specific formats, such as xpdf for pdf etc. However,
;;; I wanted to have everything within emacs as this makes integration,
;;; cutting/pasting etc a lot easier, plus as a blind user, most
;;; external utilities are of little use because they don't also include
;;; speech support.
;;;
;;; The objective here is to have things setup so that when browsing
;;; a directory with dired, you can just hit 'v' for any file you want to
;;; view and you will be presented with a text or html version without
;;; needing to do any manual conversion - or even careing about what
;;; would need to be done.
;;;
;;; You need the following packages (or at least utilities which will
;;; do the same thing). Most of these are fairly standard with many Linux 
;;; distros these days. 
;;; The wv utilities which contain wvText for converting MS Word docs
;;; The xpdf utilities which include pdftotext for converting PDF to text
;;; The Ghostscript package which contains pstotext for converting PS to text
;;; The ppthtml utility for converting MS Power Point files to html
;;; A configured and working browse-url setup. I use w3m as my browser
;;;
;;; Customizing 
;;; ===========
;;;
;;; The variables used to hold the paths to programs used to convert
;;; various file formats have been defined using the custom package.
;;; These conversion programs are called with two arguments, the input 
;;; file and the output file for the converted text. Use the command
;;; M-x customize-group <ret> txutils <ret> to set these values if the
;;; default values are not sufficient.
;;;
;;; Installation 
;;; ============
;;; 
;;; Pretty straight forward. Place this file somewhere in your load path 
;;; and put a (require 'txutils) in your .emacs. You may want to byte
;;; compile this file.
;;;
;;; Reporting Bugs
;;; ==============
;;; 
;;; This is the first bit of elisp I've allowed out into the world and 
;;; while I am really learning to love both elisp and cl lisp, I'm still 
;;; very much a novice. Therefore, there IS bugs and probably some pretty 
;;; poor style within this stuff. Feedback, bug reports and suggestions 
;;; always welcome. Send e-mail to tcross@une.edu.au
;;;
;;; Emacspeak Users Note - I've not attempted to enhance this code to provide 
;;; better spoken or audio icon feedback. When the code matures a bit and
;;; once I get some feedback, I will see if a re-worked version can be 
;;; included in emacspeak. In the meantime, feel free to use 'advice' to 
;;; improve things. 

(require 'custom)
(require 'browse-url)

(defgroup txutils nil
  "Customize group for txutils."
  :prefix "txutils-"
  :group 'External)

(defcustom txutils-msword2text-prog "/usr/bin/wvText"
  "Program to convert MS Word .doc files to text."
  :type 'string
  :group 'txutils)

(defcustom txutils-pdf2text-prog "/usr/bin/pdftotext"
  "Program to convert PDF file to text."
  :type 'string
  :group 'txutils)

(defcustom txutils-ps2text-prog "/usr/bin/pstotext"
  "Program to convert PostScript files to text."
  :type 'string
  :group 'txutils)

(defcustom txutils-ppt2html-prog "/usr/bin/ppthtml"
  "Program to convert MS PowerPoint slides to HTML."
  :type 'string
  :group 'txutils)

(defun txutils-run-command (cmd arg1 arg2 &optional output-buffer)
  "Execute shell command with arguments, putting output in buffer."
  (if output-buffer
      (if (= 0 (shell-command (format "%s %s %s" cmd arg1 arg2)
                              output-buffer "*txutils-output*"))
          t
        nil)
    (if (= 0 (shell-command (format "%s %s %s" cmd arg1 arg2)
                            "*txutils-output*"))
        t
      nil)))

(defun txutils-quote-expand-file-name (file-name)
  "Expand file name and quote special chars if required."
  (shell-quote-argument (expand-file-name file-name)))

(defun txutils-file-type (file-name)
  "Return symbol representing file type."
  (cond
   ((string-match "\\.\\(?:DOC\\|doc\\)$" file-name)
    'doc)
   ((string-match "\\.\\(?:PDF\\|pdf\\)$" file-name)
    'pdf)
   ((string-match "\\.\\(?:PS\\|ps\\)$" file-name)
    'ps)
   ((string-match "\\.\\(?:PPT\\|ppt\\)$" file-name)
    'ppt)
   ((string-match "\\.\\(?:HTML?\\|html?\\)$" file-name)
    'html)
   (t 'plain)))

(defun txutils-make-temp-name (orig-name type)
  "Create a temp file name from original file name."
  (let ((name-prefix (file-name-nondirectory orig-name)))
    (cond
     ((eq 'ppt type)
      (make-temp-file name-prefix nil ".html"))
     (t (make-temp-file name-prefix nil ".txt")))))

(defun txutils-do-file-conversion (file-name)
  "Based on file extension, convert file to text. Return name of text file."
  (interactive "fFile to convert: ")
  (let* ((file-type (txutils-file-type file-name))
         (output-file (txutils-make-temp-name file-name file-type)))
    (message "Performing file conversion for %s." file-name)
    (cond
     ((eq 'doc file-type)
      (if (txutils-run-command txutils-msword2text-prog
                               (txutils-quote-expand-file-name file-name)
                               (txutils-quote-expand-file-name output-file))
          output-file
        file-name))
     ((eq 'pdf file-type)
      (if (txutils-run-command txutils-pdf2text-prog
                               (txutils-quote-expand-file-name file-name)
                               (txutils-quote-expand-file-name output-file))
          output-file
        file-name))
     ((eq 'ps file-type)
      (if (txutils-run-command txutils-ps2text-prog
                               (concat "-output " 
                                       (txutils-quote-expand-file-name 
                                        output-file))
                               (txutils-quote-expand-file-name file-name))
          output-file
        file-name))
     ((eq 'ppt file-type)
      (if (txutils-run-command txutils-ppt2html-prog
                               (txutils-quote-expand-file-name file-name)
                               (concat "> " 
                                       (txutils-quote-expand-file-name 
                                        output-file)))
          output-file
        file-name))
     ((eq 'html file-type)
      file-name)
     (t file-name))))

(defadvice view-file (around txutils pre act comp)
  "Perform file conversion or call web browser to view contents of file."
  (let (ad-new-arg
        file-type)
    (setq file-type (txutils-file-type (ad-get-arg 0)))
    (when (and (not (eq 'plain file-type))
               (not (eq 'html file-type)))
      (setq ad-new-arg (txutils-do-file-conversion (ad-get-arg 0)))
      (ad-set-arg 0 ad-new-arg))
    ad-do-it
    (if (eq 'html (txutils-file-type (ad-get-arg 0)))
        (browse-url-of-buffer nil))))

(provide 'txutils)



-- 
Tim Cross
tcross@rapttech.com.au

There are two types of people in IT - those who do not manage what they 
understand and those who do not understand what they manage.

Emacspeak Files | Subscribe | Unsubscribe | Search