Sunday, 13 April 2014
Creating a dictation buffer
As I mentioned in a post last month, I recently upgraded my Windows dictation setup to Dragon NaturallySpeaking (DNS) 12 and Word 2013.
This upgrade broke the Emacs dictation interface (vr-mode) I had earlier used with DNS 8 and 10. But it also encouraged me to explore new dictation workflows using Natlink directly from my own Python scripts.
Primarily, I have since switched from editing entire documents in Windows, or using the clipboard to transfer text, to using my minimal dictation surface as a buffer while editing documents on the Mac side. I was inspired to do this after I spent a day with a radiologist observing PowerScribe 360 in use.
PowerScribe is a dictation system which uses a dedicated handheld speech controller. Rather than being inserted at the insertion point like typed text, dictated text is buffered and “placed” by buttons on the speech controller or by clicking. You can also choose to discard the dictated text, accompanied by a cute sound effect. Color coding and other affordances distinguish templated from dictated and typed text. (This would be much easier to show than to describe, but I couldn’t find any good examples of the system actually in use on YouTube.)
Thanks to PowerScribe, I realized that it’s actually easier for me to work with shorter fragments of text, a sentence or a paragraph at a time, rather than importing the entire document at a time. What I’ve implemented so far is on GitHub; here’s a video showing it in use and explaining some technical details:
There are some disadvantages with this system. If you do want to dictate individual words or something smaller than a sentence into the buffer, you will need to manage the spaces, capitalization and punctuation yourself, since your Word document in the dictation buffer isn’t aware of the surrounding contents. In reality, I seldom find this a problem; saying “no caps” or “lowercase that” from time to time isn’t overly arduous. I could theoretically go even further and implement the Mac side of the solution with an input method rather than services and scripts, which would give me access to the surrounding context, but I think that would be a lot of work for relatively little added benefit.
I’ve still got some more work to do; while writing this post, I realized I need a “discard” command much like the one in PowerScribe. (Done.)
While my setup isn’t yet to the point of being usable “out of the box”, I hope that this brief exploration will help other technically inclined dictation users expand their workflows.