Skip to content

Convert Word Docs to Text Using Mac Automator

This document is specifically focused on the early stages of working on an Avalon Travel Publishing manuscript, which is traditionally delivered in Word format. Why get your manuscript from the publisher in the first place? Because numerous editorial changes happened to your manuscript in the time since you delivered it, and the only way to begin work on a subsequent version is by getting a copy of the latest and greatest from your publisher. But this document is also a useful tool for anyone who wants to convert a large number of Word documents to Text for any reason. And if you haven't got a reason of your own, let me give you one: proprietary document formats are poison whose sole purpose is to lock you into one vendor and their software "solutions" for as long as you are able to pay. Choose freedom.

1. First, get the "Convert Text Files" Action from Armins' Automator Actions and install the .action file into your Library/Automator folder.

2. Now download my Automator Script, which makes use of the text conversion capability you just acquired from Armin. Unzip it and save it wherever you'd like and then double click it to start Automator running. Automator will let you know if any necessary components are missing. You would get an error message, for example, if you performed this step without having first performed step one.

3. Convert away! Run the Automator script, select the Word doc files you would like to convert to text files, and save them when prompted. If your ultimate goal is to work with them either in Jedit or another text editor, save them as type .avl so your text editor recognizes them as Avalon files. Caveat: the script saves text files with the UTF-8 encoding, which means they will not look right unless you open them using the same encoding in your text editor. For example, if you open the text document using TextEdit, Macintosh's preferred text editor, make sure to select the "Plain text encoding: UTF-8" option from the bottom of the Open File dialog.

There's a more technical way, if you are comfortable at the command line prompt. This script makes use of the textutil program standard on Mac OS X. From the command line you can issue a:

textutil -convert txt -encoding mac crap.doc

to change the crap.doc Word document to a text file with the same name and Mac Roman encoding (that is, not UTF-8). Build this into a bash script and crank away.

For more information about working on your Avalon manuscripts using all the power of Jedit's macros, syntax highlighting, and search/replace functionality, continuing reading Woodnotes Guide to Using Jedit for Avalon Publishing Documents. If you are more of a command prompt junkie, you might be interested in my Vim page.

Lastly, if this floats your boat, then welcome to the fun world of scripting. I have learned a lot from loafing around Automator World, a site whose motto, "Better living through Macintosh Scripting" pretty much spells it out. It's more fun to spend time building a system to save time subsequently than to waste time on each occasion you are faced with the task.

Trackbacks

No Trackbacks

Comments

Display comments as Linear | Threaded

No comments

The author does not allow comments to this entry