Skip to content using Screen Reader
What it would Take
Home | Back to How To List

Designing Games for Self Voicing

The actual technology for self voicing is easy, as the Hello World examples below demonstrate.

Its' where and how to place that technology in your game that's more challenging.
The areas to address are:
  • Controls
  • Block text
  • Game objects
  • Maintaining context

Self Voicing Hello World

In C++, the Hello World for speaking via SAPI is:
#include < stdafx.h >
#include < sapi.h >

int main(int argc, char* argv[])
 ISpVoice * pVoice = NULL;

 if (FAILED(::CoInitialize(NULL)))
  return FALSE;

     (void **)&pVoice;

 if( SUCCEEDED( hr ) )
  hr = pVoice->Speak(L"Hello world.", 0, NULL);
  pVoice = NULL;

 return TRUE;

In Java, using the Quadmore JNI SAPI interface, Hello World is:

 QuadmoreTTS quadmoreTTS = new QuadmoreTTS();

 //The method was named by them, not us
 boolean result = quadmoreTTS.speakDarling("Hello world.");

 if( ! result ){
  //Handle failure condition
}catch(Exception ex){
  //Handle exception

In Java, using the CScript feature in Windows, Hello World is:
runtime.exec("cscript speaktome.vbs Hello World.");
Calling the following speaktome.vbs script:
dim sapi

set sapi = CreateObject("SAPI.SpVoice")

Set objArgs   = WScript.Arguments

countArgs = objArgs.count

if 0 = countArgs then


end if
if 1 = countArgs then

  message = objArgs(0)

  For i = 0 to countArgs  - 1
    message = message & " " & objArgs(i)
end if



Using Java to invoke VoiceOver on the Mac, you could do:
runtime.exec("say Hello World.");

Speaking Non-Displayed Text

The most powerful aspect of self voicing is that it enables your game to speak text without displaying it on screen.

This can be useful:
  • Speaking controls or game artifacts that have images but no text displayed
  • Speaking additional information for blind gamers only
  • Speaking text that is organized specifically for audio presentation


People who are sighted scan a screen for context. Their attention jumps around the display. It's very much a direct-access process.

People who are blind are used to getting their input from a screen reader in a linear fashion. Audio is linear.

Some implications of this for your design are:
  • Put the important stuff first
  • Verboseness is bad
  • Enable the gamer to interrupt an utterance


People who are blind operate their computers by memorization. They memorize:
  • Key assignments
  • Traversal order
  • Audio clues as to context
When you're designing the audio portion of your game, particularly the controls, ask yourself, "How easy is this to memorize?"

Making Controls Speak

In a way, it's easier to get your controls to speak through self voicing than it is through a screen reader. You don't have to display the text and can have image-only button faces.

However, you have to code each and every control to self voice.

An effective solution is to have a speaking button class, a speaking dropdown class, a speaking list class, etc.

Notice how the below example displays the icon, speaks the text, but does not display the text.

With a language that supports multiple interitance such as C++, you could have MyButton inherit from both the JButton equivalent class and your own SpeakingControl class.
class MyButton extends JButton{

 public MyButton(ImageIcon icon, final String controlTitle){


  addFocusListener(new FocusAdapter(){

    public void focusGained(FocusEvent e){ 
      runtime.exec("cscript speaktome.vbs " + controlTitle);


Focus Traversal and Speaking

If you're just mousing about, it's easy to totally overlook traversal order, or even leave some controls inaccessible by keystroke. When you first try to navigate by keystroke and sound, it can get embarassing very fast.

For your self vocing games, it is absolutely vital that:
  • All controls be reachable by keystroke
  • The focus traversal order meet user expectations
Typically a gamer who is blind will expect the Tab key to make the major jumps and the left and right Arrow keys to make the lesser jumps.

You could also use the up and down Arrow keys and the Page up and down keys for traversal.

Western gamers expect focus traversal to be from left to right, top to bottom. Remember, your sighted gamers will see this.

The important design element here is to implement a complete, consistent, and familiar means of traversing your controls.

Making Labels and Group Boxes Speak

Your game probably includes labels. They may be fancier than the plain text used in business apps, but they're still little bits of text that guide your player. Often they are associated with a control, such as a list.

Your game probably includes group boxes, titled or plain.

Gamers who are sighted can visually link your labels and group boxes with the controls or other artifacts as you intend. For self voiced games, you must do this explicitly.

One design solution would be to pass in the string for an associated label or group box title to the associated control that receives focus.

Audio Delimiters

When you're linking two pieces of distinct but related audio text, be sure to include an audio delimiter.

In the group box example above, you might want your gamer to know that a button is part of a group of controls. Perhaps the group box does not have a visible title.

A design solution could be to have the button speak something like, "Battle Control Group - Fire Lasers Button." The next button might speak, "Fire Torpedos Button."

Note how the dash in the first string denotes that the first part identifies the group and the second part identifies the specific control.

Consider your policy for audio delimiters in your design.


You can also use hotkeys to trigger actions. Jim Kitchen and Ian Humphreys both use this technique effectively in their games. (See Examples.)

Hotkeys and Screen Readers

If your game is going to be playable via self voicing or alternately via a screen reader, don't steal hotkeys from any of the screen readers you support.

See Screen Readers and Games for more on this.

If your game just self voices, then this isn't an issue.

Hotkeys and Gamer Expectations

Consistency is especially important to the gamer who is blind because they can't visually detect anomalies in your user interface.

In our games, the F1 key always displays Help, the F2 key displays blind-specific help, and all of our other hotkey assignments are consistent across all of our games.

The design element to remember for gamers who are blind is that they are going to memorize your hotkey assignments. You make it easier for them if you use a consistent and memorizable assignment schema.

Hotkeys vs. Traversal

For gamers who are blind, sometimes it's better to trigger a control directly via hotkeys. Sometimes it's better to traverse into that control and trigger it via the Enter key.

For gamers who are blind, good candidates for hotkeys are controls that are used individually including:
  • Status controls, such as Time Elapsed, Score, Level
  • Immediate need controls, such as Fire Lasers
  • Game-level controls, such as Help, Main Menu, Options
Game Objects are often good candidates for traversal, for example:
  • Individual words in a word game
  • Individual planets on a space travel screen
In our Scrambled Sayings screen below, the F4 key speaks the Score, the F5 key speaks the Time, the F6 key speaks the game Level.

The Tab key starts on the first word, then traverses the game controls on the right, then returns to the first word.

Scrambled Sayings game screen

Speaking Content vs. Speaking Position

When your gamer traverses into a complex game object, do you speak the contents of that object or the location of the cursor within that object?

A good design cheat is to make speaking the position within the complex object irrelevant.

In our Scrambled Sayings game above, we use the Right Arrow and Left Arrow keys to traverse the words. The jump is to the first character of each word.

On getting focus, each word speaks something like "Word 3 is A E R." This tells the gamer the position of the word within the saying, plus its current spelling. Thus they get both content and position.

To rearrange the letters, the sighted gamer normally uses the mouse. This because sighted gamers are used to using the mouse.

The blind gamer normally types in the letters. The position of a letter within each word is not needed to be spoken explicitly.

BTW. This is a good example of how a gamer who is blind can beat a gamer who is sighted. Typing is faster. Both could do it. But the sighted gamer generally doesn't.

Making Block Text Speak

By block text we mean descriptive or dialog paragraphs.

Screen readers enable the user to speak a block of text in its entirety, or by line, word, or character. For your self voicing, you may not want to put the coding, testing, and support effort into that degree of functionality. However, at a minimum you should let the gamer interrupt or repeat block text.

Bolding, italics, and font sizes are not normally heard in self voicing. Nor are headers and other structural elements. You can compensate for this by using the following annotations, familiar to users who are blind:
  • Three asterisks
  • Three dashes
  • Slashes
  • Vertical Bars
These annotations do not have to be visible to sighted users. Remember, what your self voicing speaks does not have to be exactly what is displayed to the screen.

Depending on the self voicing technology you've chosen, you may need to chop your block text into chunks that fit your buffer limits. 255 characters is a not uncommon limit.


Punctuation is important to self voicing.
  • End sentences with the proper punctuation
  • Use punctuation to indicate short pauses in an utterance
  • Use punctuation to indicate structure, as in using a colon at the end of a heading
  • Punctuation is not necessary on button faces or other controls

Sound Gaps

Sometimes an utterance is clearer when it contains brief pauses. Think of how you speak to empahasize things or focus attention on something.

You can use the same technique in your game by:
  • Breaking a piece of text into separate sentences
  • Using punctuation
The design point here is that you want your text to sound clearly as well as read clearly.

Making Help Text Speak

Help text is a special case of block text. Structure, especially for FAQ type help, is important.

We've found rendering FAQ help in HTML to be effective. It's a model the gamer is familiar with and enables them to use the headers for navigation.

In our case this requires using Java's HTMLEditor class and HyperLinkListener class, and wiring in calls to SAPI.

Making Help Systems Speak

Help systems such as Microsoft's or the JavaHelp system may bring their own challenges to self voicing.

We don't use either technology in our games and so don't address them here.

Making Game Objects Speak

Games often have canvas play areas with game objects located on them. For example, see the Smugglers 4 screen below.

image of Smugglers 4 game

Some canvases are rendered simply as bitmap images, especially if double buffering is used for performance reasons.

Other canvases might have individual game objects on them.

Unless these objects are focusable, and you've coded them to be keyboard traversable, your self voicing isn't going to speak them.

Make each game object:
  • Actually an object, not just a part of the canvas bitmap
  • Keyboard traversable
  • Focusable
  • Speak on receiving focus
That being said, games that emphasize game objects on canvas play areas, especially if those game objects are numerous and moving, generally are not good candidates for blind accessibility.

This is probably your most important design decision with respect to self voicing: can this game be made blind accessible without changing the fundamental nature of the game?

Tables and Headers

If your game includes tables, make the column headers speak.

Even better, make them identify themselves as headers as opposed to cell values.

For a gamer who is blind, not making your headers speak is like giving someone only one half of a sports game score - pretty useless.

Smugglers 4 does not self voice. It uses screen readers. However, if its Trade Screen below were self voicing, a good implementation would be to have:
  • The four Arrow keys traverse the cells, speaking each
  • Traversal of the headers speak them as such
  • The Tab key jump to and traverse the non-table controls

Linked Controls

Sometimes a game will have controls or game objects linked. That is, an action on one control will have an effect on another part of the display.

An example is the table in the Trade Screen dialog box from Smugglers 4 below.

When a sighted gamer mouses a cell in the table, they can see values associated with each planet change on the Space Screen in the play area canvas behind that dialog box.

A gamer who is blind can't see those values. Fortunately, this feature is not essential for playing the game.

Trade Screen on top of main play area canvas

The self voicing design element to glean from this example is, make any essential linked objects speak.

Making Keystrokes Speak

People who are blind expect their screen readers to speak each keystroke.

They will prefer your self voicing game speak each keystroke, too; though not all self voicing games do this.

Tooltips and Self Voicing

A game that is self voicing is not likely to trigger tooltips because the gamer is unlikely to be using the mouse. If your game self voices, best not have tooltips that are essential to the game.

If you think tooltips, or the equivalent, are essential for some controls or game objects, you could have those controls or game objects speak the tooltip text when they gain focus.

Speech Speed

Sighted people generally read at a rate of about 150 words per minute. People who are blind read at up to 300 words per minute, or faster.

If you're using SAPI, the Windows Control Panel lets the user set their Voice speed. Typically, they use this same speed for all SAPI enabled apps.

If you're using VoiceOver, the Apple Control Panel lets the user set their Voice speed. Typically, they use this same speed for all apps.

If you're using FreeTTS or some other speech synthesizer, it's essential that you provide a control so the gamer can set the voice speed.

An interesting game phenomenon is that if you have a gamer who is sighted playing with a gamer who is blind, the blind gamer can play considerably faster than the sighted gamer; especially if the game has large blocks of text.

Interrupting Speech

Our colleague, Dark, an expert in audio games says:

"...being able to interrupt long descriptions, ---- especially of controls, is very much recommended, since there's nothing worse than having to sit through a tutorial explanation of what control X is or what the game is about when your playing for the tenth time."

For SAPI and FreeTTS, you'll need to implement some control that interrupts utterances. You might consider using the Ctl key, as that is what screen readers tend to use for this purpose. Thus it is part of your gamers' expectation.

Apple's VoiceOver has its own intrinsic command for interrupting speech.

Skipping Speeches

When your gamer is traversing quickly through a set of controls or game objects, it is convenient to not hear each item.

In some games, this is just an aesthetic or user experience issue. In twitch games or game situations such as a battle, this can make a difference as to whether the game is playable at all.

Conversely, there may be speeches that you may never want skipped, particularly those that establish and maintain context.

SAPI, FreeTTS, and VoiceOver do not intrinsically know when you want to skip a speech. Therefore, you'll need to make your code know when to skip a speech.

In our Java speech code we have the methods:
speak(String message);
speakAll(String message);
speakSometimes(String message);
These evolved as we learned accessible design. Less brittle code would be to have a single
speak(String message, boolean skippable);

Verbosity Controls

If your gamer can adjust the speech speed and can interrupt utterances, then it is probably over engineering to include a speech verbosity control.

Essential and Advanced Self Voicing Controls

You'll need the following minimal controls:
  • Interrupt a speech
  • Repeat a speech
Generally those two are sufficient. Additional controls could include:
  • Speech speed
  • Turn self voicing on and off
  • Speak next paragraph
  • Speak next line
  • Speak next word
  • Speak next character
These latter are the kinds of commands your users are familiar with in their screen readers. In implementing these commands, you might consider using hotkeys that are the ones used by screen readers.

Third Party Software

If you're using a game engine and do not have access to or the resources to modify its code, then it may not be possible for you to put self voicing in your game.

Similarly with third-party components.

Using Non-Speach Sounds to Maintain Context

Sometimes non-speech sounds can augment self voicing. They're shorter than speech and add variety to your user interface.

For example, in linear collections or in grids, it's useful to tell the gamer who is blind when they've reached the end of that collection or the edge of that grid.

We use the Perkins Bell, the sound a Perkins School for the Blind brailler makes, to indicate the end of a collection or the right edge of a grid.

We use other sounds to indicate the start of a collection when traversing backwards, or the other edges of a grid.

Consider non-speech sounds in your self voicing design.

Final Words

Designing for self voicing is very much like designing for internationalization. It's not an add-on.

Either you make your game self voicing from the start, or it's just too much work and too many changes.

Unlike making your game accessible to screen readers, it's unlikely you can just tweak your game to make it self voice.

Still, SAPI, VoiceOver, and sound files make self voicing technically easy to implement by even indie developers. And triple A titles have long used sound files for cutscenes and dialogs.

So making your game playable by the many gamers who are blind, visually impaired, or otherwise needing voice is achievable.

To find more self voicing games, go to our Top 25 Web Sites for Gamers who are Blind page, updated annually.

John Bannick
Chief Technical Officer
7-128 Software