Archives / Search ›

Creating a dictation buffer

As I mentioned in a post last month, I recently upgraded my Windows dictation setup to Dragon NaturallySpeaking (DNS) 12 and Word 2013.

This upgrade broke the Emacs dictation interface (vr-mode) I had earlier used with DNS 8 and 10. But it also encouraged me to explore new dictation workflows using Natlink directly from my own Python scripts.

Primarily, I have since switched from editing entire documents in Windows, or using the clipboard to transfer text, to using my minimal dictation surface as a buffer while editing documents on the Mac side. I was inspired to do this after I spent a day with a radiologist observing PowerScribe 360 in use.

PowerScribe is a dictation system which uses a dedicated handheld speech controller. Rather than being inserted at the insertion point like typed text, dictated text is buffered and “placed” by buttons on the speech controller or by clicking. You can also choose to discard the dictated text, accompanied by a cute sound effect. Color coding and other affordances distinguish templated from dictated and typed text. (This would be much easier to show than to describe, but I couldn’t find any good examples of the system actually in use on YouTube.)

Thanks to PowerScribe, I realized that it’s actually easier for me to work with shorter fragments of text, a sentence or a paragraph at a time, rather than importing the entire document at a time. What I’ve implemented so far is on GitHub; here’s a video showing it in use and explaining some technical details:

There are some disadvantages with this system. If you do want to dictate individual words or something smaller than a sentence into the buffer, you will need to manage the spaces, capitalization and punctuation yourself, since your Word document in the dictation buffer isn’t aware of the surrounding contents. In reality, I seldom find this a problem; saying “no caps” or “lowercase that” from time to time isn’t overly arduous. I could theoretically go even further and implement the Mac side of the solution with an input method rather than services and scripts, which would give me access to the surrounding context, but I think that would be a lot of work for relatively little added benefit.

I’ve still got some more work to do; while writing this post, I realized I need a “discard” command much like the one in PowerScribe. (Done.)

While my setup isn’t yet to the point of being usable “out of the box”, I hope that this brief exploration will help other technically inclined dictation users expand their workflows.

Creating a minimal dictation surface with Word 2013

As I discussed in my previous post, I do my serious dictation in a Windows 7 virtual machine. Having recently upgraded my dictation setup and transferred it to a new Mac, I figured it'd be a good thing to share.

While I don't have any experience with its competitors, VMware Fusion 6 does a good job of making my USB headset plugged into OS X available to Windows for dictation, without interfering with its use in OS X. Dragon NaturallySpeaking calls VMware's audio source “Microphone (Mic-In).” In earlier versions of VMware Fusion, I had mapped my USB headset exclusively to Windows for dictation. This also worked well, but I'm not sure if it was actually necessary.

Most of the time I'm not actually editing documents directly on Windows; the OS simply holds my text on the way to its destination in a Mac application. Dragon NaturallySpeaking (and its Medical derivatives) include a WordPad knockoff called DragonPad. Its stated purpose is exactly as such a dictation intermediary, but the user interface looks like it was frozen around Windows 2000 and it only supports single undo — not even redo. So, it's a bit of a nonstarter.

My next best bet is Microsoft Word, for which NaturallySpeaking includes a COM-based addin. Previously, NaturallySpeaking 10 limited me to using the 32-bit version of Office 2010; with version 12, I can use 64-bit Office 2013.

Happily, some of the Office 2013 changes made to support smaller-screened touchscreen tablets have helped my use case of a minimal "dictation surface" sitting in the corner of my Mac's screen. Here's how I set things up:

  • Use VMware Fusion's "Single Window" view, rather than its Unity view. The Word view options I describe only work if the Word window is maximized.  In Unity view this means maximizing to the Mac’s screen, whereas in the Single Window view, you can resize the VM screen as needed.  It's nice to be able to see your other Mac apps as you dictate (or not; I use Shroud’s keyboard shortcuts to alternatively hide everything else but the VM window when I really need to focus).
  • Set the Windows taskbar to auto-hide.
  • Change Dragon NaturallySpeaking’s DragonBar mode (in Tools > Options > View) to “Tray Icon Only”. I find the continual display of audio levels distracting; Apple’s dictation display, despite doing something conceptually similar, is less distracting, probably because it doesn’t change color. I have a keystroke, Ctrl-Alt-`, set to toggle the microphone and don't find it a problem that I sometimes get extraneous dictation in my document if I forget that the VM is listening in the background. (Actually, sometimes it’s pretty humorous.)
  • Turn off Word’s Start screen so you get an untitled document when you start Word.
  • Switch Word to draft view, via Alt, W, E or clicking the Draft button in the View ribbon tab. Those of you who have been using Word since 4.0 or earlier may remember this used to be the default view, but it’s been marginalized in favor of more WYSIWYG alternatives in recent Word versions. However, this heritage explains the corresponding dictation command, “normal view”. (If you accidentally say “draft view”, you’ll probably find everything changes into Courier; say “turn off draft mode” to fix that.) The main advantage of draft view is that you can resize the window without changing your font size; it's also more space efficient, as it doesn’t display your margins.
  • Outline view (Alt, W, U, or say “outline view”) is also a good choice for dictation. though Word is no OmniOutliner. Say “new line” rather than “new paragraph” or you’ll dictate a bunch of empty headings. “Tab” and “press shift-tab” will indent and unindent respectively.
  • Set Word's ribbon to auto-hide, via the button between the help and minimize buttons at the top right of the window. This maximizes the window if it isn’t already; it also hides the status bar, window title bar and most other chrome. A long button across the top of the window labeled with an ellipsis will restore the chrome if you need it, as will a press of the Alt key.
  • Experiment with Office themes (Options > General > Personalize your copy of Microsoft Office). The White theme is more trendy but I prefer a bit more separation between content and chrome, so I picked Dark Gray.
  • Consider disabling the cursor animations (smooth movement of the insertion point as you type) if they're as disconcerting to you as me.

Once you have your view set up, you'll find that Word reverts to Print Layout view for new documents. Unfortunately, to solve this problem you must delve into the crufty world of Office automation with Visual Basic for Applications. From the look of its toolbars, the VBA editor last had serious work done in the Office 2003 timeframe; most of it appears unchanged since VBA’s inception.

If it isn’t there already, add the Developer tab to your Ribbon (Options > Customize Ribbon > Main Tabs). Click Developer > Visual Basic, or press Alt, L, V.  Select the Normal project if you haven’t already to put code in the template, and paste in the following (if you’ve already got code in there, I trust you know what to do):

Sub AutoExec()
    ' Wait until a document opens.
    Application.OnTime Now, "AutoNew"
End Sub

Sub AutoNew()
    ' Ensure that the draft font isn't used
    ' (e.g., if you say "draft view" by accident)
    With Dialogs(wdDialogToolsOptionsView)
        .DraftFont = False
        .Execute
    End With
    ' Draft view is wdNormalView.
    If ActiveWindow.View.Type = wdPrintView Then
        ActiveWindow.View.Type = wdNormalView
    End If
    ' If window isn't maximized, ribbon doesn't collapse fully.
    Application.CommandBars("Ribbon").Visible = False
End Sub

Update: I have posted an updated version of the above macros to GitHub.

(There's some incorrect information on the Internet about scripting draft view in Word, for example here. The issue, as above, is that Draft view used to be Normal view, and still is Normal view both from VBA as well as in Dragon voice commands. View.Draft, despite the name, controls the font.)

Save, quit and restart Word; you should find yourself with a minimal dictation surface ready for your use:

Word 2013 dictation setup

A related note: I experimented with Windows Live Writer for this post, versus my usual process of copying and pasting into MarsEdit. As long as I turn off the “Blog Theme” button (which causes problems unrelated to dictation), dictation into Windows Live Writer works acceptably. The biggest issue is the markup ending up all one line in WordPress, despite looking fine (seriously — a Microsoft tool that generates tolerable markup!) in Windows Live Writer. Smaller issues include the results box appearing in the top left corner of the screen regardless of my cursor location (normally it appears at the insertion point) and dictation inserting unnecessary newlines, particularly in a bulleted list.

OS X dictation alternatives

I dictate to my computer a lot. It helps me write faster and saves my hands for other pursuits.

In the last few years, dictation, both of the local and network-hosted variety, has improved to the point that this choice is no longer an infuriating time sink. Coding, versus writing prose, via dictation is still in its infancy and I continue to anxiously await Tavis Rudd’s release of the dictation system he demoed at several conferences last year (warning: videos may be NSFW thanks to some synthesized expletives).

In a conversation on Twitter earlier this week I noted that, despite considerable enhancements in the past few years, dictation on OS X doesn’t get discussed much — hence this post.

If you’re going to be playing with dictation, make sure you have a decent headset, properly positioned. Wired, noise-canceling USB headsets are not expensive, and even though Apple’s been adding microphones and improving noise canceling on their Macs recently, you still do better with a headset. If the dictation system you’re using doesn’t include an audio setup step, just record and play back some of your own speech to make sure it’s audible and relatively free of background noise.

On OS X, you have 4 choices for dictation:

Networked dictation

Networked dictation was introduced in OS X 10.8 Mountain Lion.  It’s similar to the dictation service on iOS, and benefits from its use by Siri. I appreciate its well executed incorporation into the OS: you can dictate effectively into nearly every text field everywhere; you can easily start and stop dictation from the keyboard; dictation alternatives (blue dotted underline) are part of the Cocoa text system, and dictated text nicely integrates its capitalization and sentence structure with the surrounding material. The software on the other end of the network has a huge vocabulary, including medical terms.

Usability disadvantages with this method of dictation as currently implemented include:

  1. no trainability (though, given it’s designed to be a speaker-independent system, this is less of an issue)
  2. no real-time feedback: dictation happens in 1-minute batches
  3. no editing by voice
  4. no error handling whatsoever. If the server fails to respond or recognize your words, up to a minute of spoken text is lost. This is somewhat understandable on iOS, but given the essentially infinite resources of OS X in comparison, it’s not defensible there. Ideally, I’d expect audio to be saved as a text attachment for deferred recognition, much like the Newton did with ink text.

There are also privacy issues, of course. I’m careful not to use this service to dictate anything sensitive, regardless of the promised or actual handling of my data.

“Enhanced Dictation”

OS X Mavericks (10.9) introduces “Enhanced Dictation”, a locally hosted version of Nuance’s recognizer. It’s not installed and off by default; you can turn it on in System Preferences. Like OS X’s networked dictation, Enhanced Dictation is not trainable and doesn’t let you edit by voice, but it does let you mix keyboard/mouse editing and dictation. While it does provide the feedback expected of a local recognizer and does away with the one minute dictation limitation, it’s the only one of these options I find unusable in practice.

Enhanced Dictation’s omissions of training and editing likely protect sales of the Dragon Mac products (discussed below). The bigger issue is that this seems fundamentally a speaker-dependent system without a method of training, resulting in frequent dictation errors you can’t fix. The vocabulary seems smaller than the networked alternative, though because of its frustratingly high error rate, I haven’t done a lot of testing. It also uses a lot of memory.

Dragon products

Nuance offers Dragon Dictate for Mac, MacSpeech Scribe and Dragon Dictate Medical. The Mac-specific components of these products and their predecessors have always been buggy and flaky. My experience with the support and sales surrounding them have ranged from incompetence to sleaziness. I have purchased several versions and upgrades of these products going back to the original pre-OS X, Philips recognizer-based versions, but I’m not going to keep supporting software that is this poorly developed, sold and supported.

Windows in a virtual machine

Nuance’s Windows dictation products (Dragon NaturallySpeaking and Medical/Legal) are better than their Mac equivalents, though that’s not saying a lot. The UI is a scattered, slowly-evolving mess; true interaction between keyboard/mouse and voice editing is limited to individual versions of specific applications, and the medical product is expensive (upgrades are $500 on sale).

The main reason I dictate into Windows is the ecosystem surrounding the Dragon products there. There are quite a few abandoned research projects and other near-abandonware to contend with, but it’s possible with some effort to construct a productive system. What I’ve done thus far is nowhere near what Tavis Rudd did, but it works for me. Natlink is a Python framework for building recognition systems, with several macro languages/frameworks built on top including Unimacro, Vocola and Dragonfly (the basis of Tavis’s system).

Microsoft also bundles speech recognition with Windows these days; I’ve used it very little, but it does work with Dragonfly.

My choices

I use OS X’s networked dictation for brief passages, and a Windows 7 VM for anything longer, like this post. I recently upgraded my Windows environment to the current Dragon Medical 2 (equivalent to NaturallySpeaking 12) and Word 2013. More on that setup is coming in my next post.

Apple II to Mac: Copying physical disks

As you may have noted if you follow me on Twitter or Flickr, I’ve recently been trying to preserve my and my family’s Apple II life from binders and boxes full of 5.25ʺ and 3.5ʺ disks.

There’s a remarkably rich ecosystem of Apple II emulation and file transfer software for the Mac. This and the next few posts, while by no means comprehensive, will discuss the hardware and software which are helping me to save this data. If you still have an Apple II legacy to save, hopefully they’ll help you as well.

Copying 5.25ʺ disks

I use a CFFA3000 in an Apple //e (my first computer). The CFFA software images a floppy (DOS, ProDOS, Apple Pascal, etc.) from a Disk II drive to a file on a CF card in a few seconds. It logs and aggressively retries on read errors.

Copying 800K 3.5ʺ disks

You could also use a CFFA3000, but my IIgs and Apple 3.5ʺ Drive are long gone.  Instead, I use a PowerBook G3 (PDQ), Apple’s last computer to support a SuperDrive, with Mac OS 9. Apple’s Disk Copy works for imaging and MacSFTP handles file transfer, as I wasn’t able to coax Mac OS 9 into connecting to the AFP server on current OS X versions.  Once on the Mac, I use Hazel to automatically convert the Disk Copy-generated NDIF (.img) images to data-fork-only UDIF (.dmg), which Sweet16 has no trouble with:

Hazel convert to UDIF

Unfortunately, Disk Copy gives up quickly on read errors, but Mac OS 9 will mount ProDOS disks directly in the Finder, so I have been able to rescue a few individual files when the disk can’t be imaged as a whole.

My original expansion bay floppy drive stopped reading reliably after 10–15 disks. I could probably have cleaned it, but either the floppy drive mechanism (the same Mitsubishi one was used in PowerBook SuperDrives from 1994–1998) or complete floppy expansion bay modules are currently available on eBay for $10–15. I bought one of each; one appears to have been damaged in shipment, and I could always scavenge the mechanism out of my PowerBook 540 if I was desperate.

If you used 1.4 MB MFM disks on your Apple II, I imagine you may be able to get away with an external USB floppy drive, but I don’t have any such disks to test with.

Pester 1.1b8 released

Get it.

Pester 1.1 has been a long time coming. It’s been my own personal battle with the second-system effect. 1.1b8 is a big step towards finishing, though.

-rw-r--r--  1 nriley  users  295049 Oct 14  2002 Pester-1.0.dmg
-rw-r--r--  1 nriley  users  428727 Jan  7  2003 Pester-1.1b2.dmg
-rw-r--r--  1 nriley  users  524911 Apr  9  2003 Pester-1.1b4.dmg
-rw-r--r--  1 nriley  users  749389 Nov 25  2007 Pester-1.1b5.dmg
-rw-r--r--  1 nriley  users  746933 Nov 27  2007 Pester-1.1b6.dmg
-rw-r--r--  1 nriley  users  756428 Nov 28  2007 Pester-1.1b7.dmg
-rw-r--r--  1 nriley  users  881197 Mar 25 23:34 Pester-1.1b8.dmg

The list of new features and bug fixes since 1.1b7 is substantial. Among them:

  • Customizable alert sounds—the most requested feature. It uses QuickTime, so you’re welcome to pick a movie or even a bitmap or PDF to use as well. I’ve used the latter to display stretching exercises on a regular basis, for example.
  • Selectable sound output device and volume for alert sounds. If you have your headphones and speakers accessible as separate output devices, as I do (e.g. if you have a Mac Pro or an external audio device), it’s useful to have the alert sound audible when you’re away from your desk. Change the output device from Preferences. (Note: Pester doesn’t yet respond to audio devices being connected/disconnected while it is running, although you should always get audio output somewhere.)
  • Baseline, ICU-based support for non-natural language dates and times is much more robust (for example, simply “20” or “8p” works to specify 8:00 PM).
  • Support natural-language dates in non-English languages via Date::Manip. I uncovered some bugs here, which the author of Date::Manip is working on fixing, but Catalan, Danish, Dutch, French, German, Polish and Russian should work fine. The date popup is limited to the days of the week, but that’s easily remedied (see below).
  • Optionally wait until you stop typing or moving the mouse to display a message. This is quite helpful so you don’t start typing into the Snooze box when you want to be typing into another document. The feature is disabled by default; enable it in Preferences.
  • Handle time zone changes, many more time zones, and more reliably determine the time zone.
  • Autocomplete common natural-language dates—my favorite new feature.
  • Simplify tab order. I previously tried to do something very complex with the tab order, which confused Cocoa and could cause you to get “stuck” in some area of the window.
  • Better save and restore focus when you’re working as an alarm goes off; will no longer bring unwanted windows to the front.
  • Open the Set Alarm window in the current Space when triggered with a keyboard equivalent or the Dock menu.
  • Switch to tomorrow automatically if necessary when tabbing into “on”. This is a feature I first implemented in Pester for hiptop, which I’m actually happier with than desktop Pester (see the aforementioned second-system effect). The upshot of this is that if you specify a time that’s already passed, while the specified date is today, simply tabbing into the date field will switch the date to tomorrow. It’s easier to use than to describe, really!
  • Allow intervals up to 999 weeks.
  • Display “today” and “tomorrow” in the bottom left corner of the Set Alarm window in case you can’t remember what day it is.

Note that the natural language date’s language is determined by the date format, as specified in System Preferences → Language & Text → Formats.

If you speak Catalan, Danish, Dutch, French, German, Italian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish or Turkish, it’d help if you could translate the date popup contents:

/* Date popup */
"today" = "today";
"tomorrow" = "tomorrow";
"in 2 days" = "in 2 days";
"next «day»" = "next «day»";
"next week" = "next week";
"in 2 weeks" = "in 2 weeks";
"next month" = "next month";
"in 2 months" = "in 2 months";
"in 1 year" = "in 1 year";

Feel free to put the results in a comment or email them to me (pester at sabi.net).

If you’ve got any bug reports, comments or feature requests, please let me know. If you’re looking for things to test, read the release notes in the Read Me (Help menu) which summarize the changes since 1.0.

For now, the main thing that needs finishing for 1.1 is documentation (help wanted!). The latest source is on GitHub.

Enjoy!

‹ Newer Posts  •  Older Posts ›