Archives / Search ›

Workaround for Dragon Medical Practice Edition on high-DPI displays

In my last blog post, I pointed to a Nuance support article which indicated that there was no support for high-DPI displays in DMPE. This does remain the case in 2.3, as launching it out-of-the-box causes the text to be scaled and blurry, and icons/windows incorrectly placed relative to the insertion point. No useful workarounds are provided in the article.

I was working on something else today, when I remembered “hey, doesn’t Windows have compatibility options for this?” Of course it does! Find natspeak.exe and check the following box in Compatibility Properties:

Fixing high DPI display.png

This makes DMPE look better and no longer have issues with window placement. DMPE complains once during launch that it doesn’t want to be in compatibility mode, but it doesn’t stop you.

Dragon Medical updates

Dragon Medical Practice Edition (DMPE) 2.3 was released as a free update at the end of June — which I didn’t notice until last week, since I had the automatic updater disabled because it was so bad, and had assumed that there were going to be no further updates before another paid upgrade.

The reality turned out to be good in the short term but potentially concerning in the long term.

Dragon Medical Practice Edition 2.3

DMPE 2.3 (the version in the About box is actually 12.53.350.033) does add partial compatibility with Windows 10, Internet Explorer 11 and supposedly Office 2016 — I am still using 2013 on the Windows side. It does not support current versions of any modern browser (Firefox, Chrome or Edge). It also fixes a very irritating and long-standing issue described in the release notes as “Too many internal messages overwhelm the system and cause Dragon to appear frozen.” These freezes would last for many seconds and happen seemingly at random, but often manifested when trying to turn dictation on and off. Now toggling dictation is a reliably sub-second operation — I have not noticed a single such freeze since upgrading to 2.3, which is incredibly gratifying.

My dictation buffer setup

I continue to iterate on my dictation buffer setup. Since my most recent post on the subject about a year ago, I have fixed some pesky bugs and worked on further virtual machine automation. This has involved such diverse things as performance profiling that helped me partially work around the above freezes, and editing the Windows registry to prevent the Dragon Word addin from getting automatically disabled. I am currently hard-coding my profile path as this is necessary to fully automate dictation startup with a roaming profile, which you’ll need to change if you want to try it yourself.

Macro performance

I had been automating some basic formatting tasks with the built-in Dragon “Advanced Scripting”, as it’ll work on the hospital’s Dragon 360 environment as well as my home/laptop DMPE setup. Unfortunately, Advanced Scripting simulates keystrokes extremely slowly (and apparently slower still on Windows 10). I recently discovered that the older Dragon NaturallySpeaking macro language is still supported; you just need to import an old command, and you can duplicate it. The old language appears to be at least an order of magnitude faster even on Windows 7, so I’ll be using it in future.

Windows 10 and high-DPI support

In addition to my virtual machine Windows 7 environments, I have recently installed DMPE on Windows 10 in Boot Camp on my 13ʺ Retina MacBook Pro, to see if I can do some of my work without a dictation buffer at all. This has been frustrated by DMPE’s lack of support for high-DPI displays (which persists in 2.3). So instead of using high-DPI, I set the display magnification to 100%, try to use font size adjustments where available to make text readable, and squint or use Magnifier where such adjustments are not available. (I found a better workaround later.) Despite several years and Windows versions, high DPI support in Windows is still inconsistent and buggy. OS X looks amazing by comparison.

Another issue with Windows 10 in Boot Camp, unrelated to DMPE, is Boot Camp’s keyboard and trackpad drivers. Unlike on OS X, remapping Caps Lock to Control does not eliminate the accidental-activation delay, and while a kind soul has reverse engineered the appropriate HID commands to remove the delay, I will need to port the code to Windows before I can make use of it. The trackpad is even worse with no workaround of which I’m aware — simple trackpad activities such as clicking and right clicking are completely reliable when booted into OS X but inconsistent on Windows.

The future

The future contains many potential pitfalls.

If Apple switches to ARM-based Macs, running Windows in virtualization will become untenable and I will likely need to start using a non-Mac as my primary machine, or attempt to make do with a Mac version of Dragon Medical.

EMC, in their great wisdom, laid off the entire team developing VMware Fusion for OS X. There has been a single patch release of VMware Fusion since then, which didn’t seem to break anything too horribly, but i don’t have high hopes for the future of the software. The only other option for high-performance virtualization on the Mac, Parallels, is well-known for their shady business practices, would be more expensive, require more frequent upgrades, and from what I can read, also has poorer quality audio driver support.

Currently, DMPE is still based on Dragon 12, which is two major versions behind the current non-medical version. Nuance has publicly stated that there will be no further major releases of DMPE, and that its future is Internet-connected, subscription software which would end up costing about 5-10× as much ($135/month) in my usage model. In addition, automation choices appear to be substantially reduced or eliminated in this future version, which would likely mean I would have to rewrite or completely abandon my dictation buffer software.

The good news is that with Windows 10 and Office 2016 support, and a relatively new laptop, I’ve got a few years to persevere with my current setup before I am forced to make any changes.

I continue to hope that speech recognition’s entry into the mainstream will eventually penetrate the medical market, finally dragging medical speech recognition out of its expensive, flaky, buggy backwater into the 21st century. In the meantime, I will be thankful for small victories, such as that I didn’t experience any freezing while I dictated this blog post.

Dragon NaturallySpeaking roaming user profiles with Apache

Some editions of Dragon NaturallySpeaking (including Medical) support a Roaming User Profile feature. With this, you can store your voice profile on a server and download it to/upload it from computers on which you dictate. Like most aspects of Dragon NaturallySpeaking, it’s unnecessarily complex and flaky, but I got it to work in my distinctly non-enterprise environment a few weeks ago. For anyone else in a similar situation who wants their training, custom dictionaries and commands to follow them, I hope the following is helpful.

I assume here you have an existing local user profile to migrate. Dragon NaturallySpeaking’s WebDAV client is inefficient and includes many configuration options of dubious utility, but does (eventually) work. For WebDAV on IIS (or SMB), the instructions in the administration manual appear relatively complete. The manual mentions Apache compatibility but includes no setup information, nor could I find any elsewhere on the Internet. So, my server examples use WebDAV with the Apache HTTP server 2.4.x.

Setting up a WebDAV server

It's 2016 and you should be using SSL/TLS by now. Mozilla has a nice SSL configuration generator; this is the configuration I'm using. The newest protocol Dragon NaturallySpeaking 12 claims it supports is TLSv1, so the "modern" configuration likely won't work.

My configuration follows. Authentication is however you want to set it up; I use digest auth behind SSL/TLS. Obviously, replace my file paths as appropriate. The Dragon NaturallySpeaking WebDAV client configuration includes options to follow redirects, but they don't work properly and aren't compatible with connection keep-alive. Thankfully, Apache has a workaround for such brokenness (redirect-carefully). The client expects infinite-depth requests to work, hence DavDepthInfinity on.

DavLockDB /var/www/sabi.net/webdav/dav_lock.db
<Directory /var/www/sabi.net/public/dragon>
        Dav On
        DavDepthInfinity on
        AuthType Digest
        AuthName dragon
        AuthUserFile /var/www/sabi.net/etc/digest.passwd
        Require valid-user
        SSLRequireSSL
        # Redirects don't work. At all.                                         
        BrowserMatch "Nuance component" redirect-carefully
        RewriteEngine off
</Directory>

Make sure the directory is writable by the Web server user; mine looks like this:

drwxrwsr-x 4 nriley www-nriley 4.0K Apr 23 11:49 /var/www/sabi.net/public/dragon/

Setting up the WebDAV client

Documentation is here. Follow the instructions under Enable the Roaming User Profile feature and Set location of Master Roaming User Profiles.

In HTTP Settings, specify your username, password and an Authentication Type as appropriate. Under Connection, click Never for Follow Redirects and check the Keep Connection Alive box. I didn't change the Timeouts from the defaults.

My SSL Settings are as follows:

SSL settings.png

I haven't actually tested if my server certificate is verified, but I do know enough not to check Using OpenSSL in an application that hasn't been updated in years.

Click Test Connection. If it fails, check your Apache logs; client-side feedback ranges from unhelpful to misleading. You'll notice that every single request is initially tried unauthenticated — I couldn't figure out a way to stop this from happening. Once I was confident that authentication was working, I filtered out these duplicate requests. Here’s the whole test:

% tail -fn 0 /var/www/sabi.net/logs/ssl.*~*.gz | grep nriley
nriley [23/Apr/2016:19:18:15 +0000] "PROPFIND /dragon HTTP/1.1" 207 1210 "-" "Nuance component"
nriley [23/Apr/2016:19:18:15 +0000] "DELETE /dragon/tst.tmp HTTP/1.1" 404 522 "-" "Nuance component"
nriley [23/Apr/2016:19:18:15 +0000] "PUT /dragon/tst.tmp HTTP/1.1" 201 442 "-" "Nuance component"
nriley [23/Apr/2016:19:18:15 +0000] "DELETE /dragon/TempDir HTTP/1.1" 404 522 "-" "Nuance component"
nriley [23/Apr/2016:19:18:16 +0000] "MKCOL /dragon/TempDir HTTP/1.1" 201 442 "-" "Nuance component"
nriley [23/Apr/2016:19:18:16 +0000] "DELETE /dragon/TempDir/tst1.tmp HTTP/1.1" 404 522 "-" "Nuance component"
nriley [23/Apr/2016:19:18:16 +0000] "PROPFIND /dragon HTTP/1.1" 207 6554 "-" "Nuance component"
nriley [23/Apr/2016:19:18:16 +0000] "PUT /dragon/TempDir/tst1.tmp HTTP/1.1" 201 458 "-" "Nuance component"
nriley [23/Apr/2016:19:18:16 +0000] "DELETE /dragon/TempDir/tst2.tmp HTTP/1.1" 404 522 "-" "Nuance component"
nriley [23/Apr/2016:19:18:16 +0000] "PROPFIND /dragon HTTP/1.1" 207 6554 "-" "Nuance component"
nriley [23/Apr/2016:19:18:16 +0000] "PUT /dragon/TempDir/tst2.tmp HTTP/1.1" 201 458 "-" "Nuance component"
nriley [23/Apr/2016:19:18:17 +0000] "PROPFIND /dragon/TempDir HTTP/1.1" 207 2858 "-" "Nuance component"
nriley [23/Apr/2016:19:18:17 +0000] "GET /dragon/TempDir/tst1.tmp HTTP/1.1" 200 341 "-" "Nuance component"
nriley [23/Apr/2016:19:18:17 +0000] "PROPFIND /dragon/TempDir/ HTTP/1.1" 207 1162 "-" "Nuance component"
nriley [23/Apr/2016:19:18:17 +0000] "MOVE /dragon/TempDir/tst1.tmp HTTP/1.1" 201 458 "-" "Nuance component"
nriley [23/Apr/2016:19:18:17 +0000] "MOVE /dragon/TempDir/ HTTP/1.1" 201 442 "-" "Nuance component"
nriley [23/Apr/2016:19:18:17 +0000] "COPY /dragon/newTempDir HTTP/1.1" 201 442 "-" "Nuance component"
nriley [23/Apr/2016:19:18:18 +0000] "DELETE /dragon/tst.tmp HTTP/1.1" 204 261 "-" "Nuance component"
nriley [23/Apr/2016:19:18:18 +0000] "DELETE /dragon/newTempDir HTTP/1.1" 204 293 "-" "Nuance component"
nriley [23/Apr/2016:19:18:18 +0000] "DELETE /dragon/newTempDir2 HTTP/1.1" 204 293 "-" "Nuance component"

Roaming options

Nuance documentation is here and does a reasonably good job of explaining the options; I recommend you read it prior to my comments below. Here's how I have the roaming Administrative Settings configured:
Administrative Settings.png

If you’re going to be the only user, check Display Classic Open User Profiles dialog. This displays a flat versus a hierarchical list of users and dictation sources. Every time you click on anything in this dialog, be prepared for a long synchronous wait for server access. By disabling the hierarchy, you eliminate the wait while expanding your user. (If you only have one user and dictation source, you may not see this dialog at all.)

Allow non-Roaming User Profiles to be opened will need to be checked while you are migrating your user profile to a roaming profile, but can be unchecked afterward.

Merge contents of vocdelta.dat into network User Profile when file is full involves a 500K file; in a WAN environment with reasonably fast links, latency is likely to outweigh any time savings, so I kept this checked.

I unchecked Access network at User Profile open/close only because I keep my profiles open for days at a time and have an Internet connection available at all times. If your usage pattern is different, you may select otherwise.

Despite documentation suggesting that Ask before breaking locks on network User Profiles does not apply to profiles accessed through HTTP, I was asked to break a lock nearly every time I opened my profile until I unchecked it. There might be some server configuration that will let this be checked, but I’m unaware of it.

Always copy acoustic information to network and Conserve archive size on network are somewhat related. How you decide to limit/copy acoustic information really depends on your network performance, patience and desired strategy for propagating corrections and optimizing your profile.

Converting your profile

Again, there's official documentation which I won't repeat. There's no progress bar, just an unresponsive interface during migration; watch the server logs or your favorite network monitoring utility if you get nervous.

If you’ve been using Dragon NaturallySpeaking for some time, you may think of your profile as a large, unwieldy multi-gigabyte entity. Much of this is backups and audio data that aren’t strictly necessary — and you’ll notice that the server profile is much smaller because it omits them. My local profiles (compressed!) on two machines prior to migration were 1.4 and 1.1 GB; corresponding sizes on the server are 437 and 430 MB. ~320 MB of each is (primarily) audio in the voice_container subdirectory.

Once you're comfortable your roaming profile works, don't forget to delete your local profile(s).

Pitfalls

Much of the information here is out of date but an important and still-relevant sentence is "When using a roaming user profile, backup files cannot be generated in any location". The downside of backups not being written to the roaming profile is that if your profile becomes corrupted (which just happened for me today — I set up Dragon Medical Practice Edition on a new Windows 10 installation and subsequently DMPE crashed every time I opened the profile from my Windows 7 VMs) you’ll have to rely on your server backups. If you don’t have server backups — go fix that.

The Language and Acoustic Optimizers don't run on a roaming profile; they idea is that you run them server-side. I plan on seeing how well they work on a fast network by remotely mounting the WebDAV share, but haven't had a chance to do this yet.

Dragon NaturallySpeaking startup and shutdown obviously takes longer when the network is involved. You can automate opening a profile with a command-line argument to natspeak.exe, but you can't specify a dictation source (if you have more than one) without relying on AutoHotKey or similar. Thanks to various VMware Fusion and/or OS X bugs I already have to babysit dictation startup, so one more click to select a profile hasn't been a great additional hardship.

For more

My other dictation-related blog posts are in the Dictation category, if you're interested. Right now all my dictation effort is targeted at prose, but at some point I plan to investigate VoiceCode — which is currently in the process of being rewritten.

Dictation buffer updates

It’s been a little over a year since I started using Dragon Medical in Windows as a dictation buffer for my Mac. Please see my previous few posts on the subject for some background.

Since then, I’ve eliminated pain points and added features which have made the dictation experience smoother and less infuriating.  Once again, while this does not represent a turnkey system, in keeping with the DIY nature of such projects, hopefully it may help others out who want to do something similar.  The code remains up to date on GitHub and I plan on maintaining it for my own use until something better comes along.

So, here’s a change log in approximate chronological order:

Better microphone status

As you may recall, I find the DragonBar distracting in its various forms and keep it hidden most of the time. One thing that does need to be visible, however, is the microphone status. I had originally used Growl to display notifications when the microphone was turned on and off, but I have since set up a combination of Python and AutoHotKey which monitors the Dragon microphone status and displays an overlay window on the Windows side when both Word is in front and the microphone is disabled.  (While AutoHotKey is a complete mess, it reminds me a lot of OneClick on classic Mac OS; I do wish something like it were still available for persistent script UIs on the Mac).

Here’s how it looks in practice:

Dragon Not Listening

An even more minimal dictation surface

With some more Word macro work, I’m able to use Word’s full-screen mode rather than auto-hiding the ribbon. Now the only vestige of an operating system and full-featured word processor behind my dictation buffer is a vertical scrollbar (which I could disable if I really wanted) and a couple of pixels at the bottom of the screen where the autohidden taskbar lives.  For Word’s full UI back, just press Esc; for a return to minimalism, I’ve added “Full Screen” to Word’s Quick Access Toolbar where it’s accessible via Alt+1.

Working with Citrix Viewer

While dictating into native Mac apps is great, most of the work I do these days is in our electronic medical record, which is accessible either via Citrix Viewer or VMware Horizon. One advantage of Citrix Viewer is that it presents the remote application as individually movable windows. There is still an underlying Windows desktop so many of the same issues exist as in similar solutions such as VMware’s Unity (of course Citrix long predates Unity), but overall it is usable. Since getting a larger (2560×1440) monitor at home, when dictating into the EMR I typically configure my desktop with three side-by-side windows: the main EMR window, my current note and the dictation buffer.  Previously I had an older 1680×1050 display and used my iPad for the dictation buffer, but more about that later.

The next problem is getting text from the dictation buffer into Citrix Viewer. Service support would be great, but there isn’t any, so instead I just use the existing clipboard bridging functionality and synthesize Ctrl+C and Ctrl+V.  Pasting into Citrix Viewer is the easy part; copying into the buffer is harder because the Mac clipboard doesn’t immediately update with the Windows clipboard contents.  I just have to poll the clipboard until something happens — ugly but effective.

Citrix Viewer

You may wonder why I go to all this trouble when I’m ultimately dictating into a native Windows app with Windows dictation software. The short answer is that the app and dictation software can’t run on the same computer. As has been the case in several hospitals in which I have worked, our fat-client EMR doesn’t actually get installed directly onto individual client devices either inside the hospital or out, so a local copy of Dragon needs to understand how to get through to the application running on the remote host. Prior versions of Dragon Medical and Citrix would work directly with one another, but this is no longer supported. The currently supported method for dictating into Citrix is vSync, which involves an agent on the Citrix server that talks to the Dragon client. Unfortunately, vSync isn’t supported with non-networked versions of Dragon Medical (such as the one I own). Even with vSync, things aren’t perfect — after our recent Dragon and EMR upgrades, text I’m dictating often fails to display at all until I use the mouse or keyboard to update the screen.

Dictating into Fantastical

My fellow residents and I receive some scheduling information every month in a Word document. It’s nice to look at but not too useful in practice. It’s faster to dictate this information into Fantastical in order to convert it into calendar entries than to try to reformat it manually.

But now I can easily dictate into Fantastical's natural language parser, I’ve found it useful day-to-day as a simple version of Siri for the Mac.  With an AppleScript you can tell Fantastical to parse sentence, and it'll open the Fantastical window with your dictation pre-populated.

Using RDP rather than the VMware Fusion console

While overall I really love VMware Fusion and appreciate its continued development and refinement, there are a few issues which have made it frustrating to use for this project.

First, there’s a substantial delay associated with relaying audio to the virtual machine, which notably slows dictation response. On my MacBook Pro I map my USB headset's dongle directly to the virtual machine to eliminate this delay; on my Mac mini I can't do this because I like to use the headset for Mac stuff such as listening to music at the same time. The current preview version of VMware Fusion claims to have improved audio for conferencing purposes, which I assumed would address this latency, but unfortunately I don’t notice a difference.

Second, the VMware Fusion window can sometimes lag and hang during dictation. (I'm not using Unity mode, as full-screen Word in Unity would consume the entire host entire Mac screen.) I don't understand why it's happening, but I stumbled on a workaround while getting the dictation buffer onto my iPad. I had originally used VMware Fusion's VNC support to do this, I eventually realized I could use RDP instead. With Jump Desktop on both Mac and iOS, Word became more responsive than it ever was on the console. So I now launch the VM headless and connect to it via RDP; audio, either via the VMware audio driver or USB redirection, remains independent of where the desktop displays. This has further advantages including letting me configure multiple RDP settings for varying desktop sizes (smaller or larger to fit with host applications, or big for development), rather than having to resize the VMware Fusion window, as well as letting me quit Jump Desktop and pause the VM to save battery power on my MacBook Pro when I'm not dictating. I have not tried using RDP’s own support for audio recording redirection, which Jump Desktop doesn’t support. (My scripts will still work with VMware Fusion directly if Jump Desktop isn’t running, however.)

Supporting rich text

Until a few weeks ago, input and output with the dictation buffer was limited to plain text. However, VMware Fusion, Jump Desktop and Citrix Viewer all support bridging an RTF clipboard between Windows and the Mac. So, I’ve added bidirectional RTF support to the dictation server, Mac services and command line tools. This involves using the Windows clipboard as there's no way via COM to extract RTF (or anything but text or XML) from Word on Windows. Right now, I only return either RTF or plain text, not both, based on whether you have styled your Word document at all; primarily this is so that all your Mac apps don't end up with unwanted 11 point Calibri text. Figuring out whether a Word document is styled was actually quite difficult (doing something logical like enumerating style runs doesn't work, because each paragraph gets its own run), and I end up doing it by rudimentary parsing of the document's XML looking for character and paragraph styles. pbpaste -Prefer rtf is broken at least in 10.10, so I also implement some direct Mac clipboard setting support for RTF only.

Another video

I’m working on another video to demonstrate these changes; I’ll re-record it when I get a chance.

Plans for the future

In no particular order, some further improvements I’ve been pondering...

  1. I often end up with a number of auto-recovered documents which I wish would go away. I try to close my temporary dictation documents without saving, however I do want to preserve their contents in a crash, so I may consider enumerating these documents on Word launch and deleting those which contain no text — assuming I can do this from the COM interface.
  2. Word’s full-screen mode sometimes doesn’t measure the screen correctly with an auto-hiding taskbar, so you end up with a taskbar-sized strip of desktop background at the bottom of the screen.  This appears somewhat related to the way in which I try to launch Word via COM; Windows is far more aggressive about preventing focus stealing than I realized — certainly more than the Mac — to the point that even when acting as a pseudo-user in automating the user interface, it Can be difficult to give a newly opened window focus, and the taskbar ends up staying focused. In any case, it should be easy enough to detect when Word’s window is the wrong size instead of just fixing the problem manually (move pointer out of taskbar area if necessary, then press Esc and Alt+1) when it happens.
  3. I’d like to be able to use VMware Horizon; while it’d mean some wasted screen space with non-floating EMR windows, it opens up a unique possibility. VMware Horizon supports multiple protocols — PCoIP and RDP.  While the PCoIP client is built into Horizon Client on the Mac, it will launch Microsoft’s now-obsolete Remote Desktop Connection app, if present, to connect via RDP.  RDP supports virtual channels; much like what Citrix uses for vSync, except that since I can run arbitrary Windows apps on the virtual machine rather than just the EMR-in-a-bubble as with Citrix, I could directly manipulate the Windows clipboard on the remote machine.  I’d need to write an app to impersonate Remote Desktop Connection which converted its settings files into Jump Desktop ones, a virtual channel plugin for Jump Desktop for Mac and the virtual channel server on Windows.  If successful, this would both avoid polluting the Mac clipboard and make the whole process more reliable and controllable than its Citrix equivalent. But my current setup has become a lot more reliable recently, so it may be way too much work for too little benefit.
  4. Speaking of reliability, I want to eliminate my reliance on NatLink. It's big, old, crufty and sometimes hangs when I try to connect to it (though much less frequently since I don’t do so as often), forces my server to be single-threaded, and if I could figure out how to interface directly with the Dragon microphone objects directly from Python or even my own C++ code, I could get rid of it completely. I also suspect some of the random-appearing hangs I'm seeing while dictating are NatLink’s fault, too, as they don’t happen when using Dragon at work.

On the other hand, my open source Mac apps need to be updated for El Capitan…

Creating a dictation buffer

As I mentioned in a post last month, I recently upgraded my Windows dictation setup to Dragon NaturallySpeaking (DNS) 12 and Word 2013.

This upgrade broke the Emacs dictation interface (vr-mode) I had earlier used with DNS 8 and 10. But it also encouraged me to explore new dictation workflows using Natlink directly from my own Python scripts.

Primarily, I have since switched from editing entire documents in Windows, or using the clipboard to transfer text, to using my minimal dictation surface as a buffer while editing documents on the Mac side. I was inspired to do this after I spent a day with a radiologist observing PowerScribe 360 in use.

PowerScribe is a dictation system which uses a dedicated handheld speech controller. Rather than being inserted at the insertion point like typed text, dictated text is buffered and “placed” by buttons on the speech controller or by clicking. You can also choose to discard the dictated text, accompanied by a cute sound effect. Color coding and other affordances distinguish templated from dictated and typed text. (This would be much easier to show than to describe, but I couldn’t find any good examples of the system actually in use on YouTube.)

Thanks to PowerScribe, I realized that it’s actually easier for me to work with shorter fragments of text, a sentence or a paragraph at a time, rather than importing the entire document at a time. What I’ve implemented so far is on GitHub; here’s a video showing it in use and explaining some technical details:

There are some disadvantages with this system. If you do want to dictate individual words or something smaller than a sentence into the buffer, you will need to manage the spaces, capitalization and punctuation yourself, since your Word document in the dictation buffer isn’t aware of the surrounding contents. In reality, I seldom find this a problem; saying “no caps” or “lowercase that” from time to time isn’t overly arduous. I could theoretically go even further and implement the Mac side of the solution with an input method rather than services and scripts, which would give me access to the surrounding context, but I think that would be a lot of work for relatively little added benefit.

I’ve still got some more work to do; while writing this post, I realized I need a “discard” command much like the one in PowerScribe. (Done.)

While my setup isn’t yet to the point of being usable “out of the box”, I hope that this brief exploration will help other technically inclined dictation users expand their workflows.

Older Posts ›