Skip to content using Screen Reader
What it would Take
Home | Back to How To List

Self Voicing and Games

Self voicing means that a program speaks for itself.

It does not use a screen reader to communicate to users who are blind. Most blind accessible games self voice.

SAPI - Speech Application Program Interface


SAPI was developed by Microsoft to give Windows applications speech synthesis and speech recognition. It's embedded in Windows itself and in Microsoft Office products.

SAPI is used by the JAWS, Window-Eyes, and NVDA screen readers and the Adobe Reader.

It's also used extensively in the automated telephone voice industry.

There's a good writeup on it at Wikipedia with some good links.

SAPI is the most common technology used by games to self voice. Our games use it. Jim Kitchen and Ian Humphreys use it in their games. (See Examples.) Many of the blind accessible games available at www.audiogames.net use SAPI.

See below for more on SAPI.

VoiceOver


VoiceOver is Apple's speech synthesis system used on Mac OS X, the iPad, and the iPhone.

VoiceOver is tightly integrated into Apple's operating systems and Apple's apps.

Its integration with apps from sources other than Apple varies by how those vendors coded those apps.

You can get the latest overview of VoiceOver at www.apple.com/accessibility/voiceover.

An excellent starting place for technical information is here.

We're not going to explore the VoiceOver API here. However, for a quick and dirty way to implement VoiceOver in your games, just send the string you want spoken to the command line, prefixed by the string "say". For example, in Java:
...
runtime.exec("say Hello World!");
...

FreeTTs - Speech Synthesis


FreeTTS (Free Text To Speech) is a free open source product.

It was built by the Speech Integration Group at Sun Microsystems in collaboration with Carnagie Mellon University and the Language Technology Lab at Deutsches Forschungszentrum fur Kunstliche Intelligenz, Saarbrucken, DE.

You can download it from http://freetts.sourceforge.net.

FreeTTS is a Java-only API. It includes the Kevin Voice, which is unabashedly not very elequant. FreeTTS does not work with SAPI Voices, though it could be made to do so.

We used FreeTTS in our own games until we ported all of them to SAPI.

FreeTTS does not appear to have been under further development since 2005. However, it is free and it does work.

One major caveat is that FreeTTS is asynchronous. The FreeTTS method that is called in order to play a speech does not have a return value. This can require some tricky timing code in order to avoid conflict with actions occuring after an utterance is initiated.

Sound Files


Many blind-accessible games use WAV, and sometimes MP3, files to produce speech.

The Super Liam game on our Examples page uses WAV files to good effect.

The down sides of WAV files as a self-voicing solution are:
  • They can take a lot of disk space, especially if your game is localized to many languages
  • They are hard to play at the higher speech rates desired by gamers who are blind
  • They require recording
  • They require re-recording when your text changes
Still, it's been argued that there are more self voicing games that use sound files than those that use SAPI.

Self Voicing and Screen Readers


People who are blind and running Windows use a type of app known as a screen reader in order to operate their computers. A screen reader speaks the displayed text. (See Screen Readers and Games for how to make your game screen reader accessible.)

Below are some interactions between self voicing and screen readers.

Screen Reader Artifacts


If a screen reader is in use, it will speak at least the Window title bar, and often other display artifacts in your game, even if you do not enable your game to be screen reader accessible.

There's nothing you can do about this. However, gamers who are blind expect this.

Turning Self Voicing Off


If you enable your game for self voicing, and you also believe your gamers might be able to play it using a screen reader, it is vital that you give your gamers a way to turn off your self voicing. Otherwise if they want to use their screen reader, they will hear two voices.

It is possible for users to turn off their screen reader manually, or, in some cases (JAWS) for you to turn the screen reader off and on programatically from within your game. However, that could make it awkward for them to tab out of the game, either to Windows or to another app.

Our blind accessible games are enabled for both screen readers and self voicing. Our experience is that some gamers use their screen reader for our games, and others turn their screen reader off and use our self voicing.

Stealing Hotkeys


If your game is going to be accessible via screen readers and also self voice, then it's vital that your game not use the hotkeys used by screen readers.

See Screen Readers and Games for details.

Self Voicing vs. Screen Readers


Self voicing, especially using SAPI, has some advantages over using a screen reader:
  • Children without screen readers can play your game
  • You can speak text without displaying it to the screen
  • For canvas play areas, it's much easier to speak your game objects
  • You don't have to test your game against multiple screen readers

All things SAPI


SAPI requires:
  • Your game include calls to the Windows SAPI interface
  • Your game include events that trigger those calls
  • The gamer's computer have installed at least one SAPI Voice

SAPI 5 and Earlier


SAPI has evolved.

SAPI versions 1 to 4 had similar architectures. The current version, SAPI 5, has a very different architecture.

This does not mean that stuff you code today won't work with earlier versions of Windows, particularly if you are just sending strings to the SAPI interface as opposed to using its more advanced features. However, you should consult the Microsoft Web site before implemting a SAPI interface in your game code.

SAPI and Coding Languages


We know of different games that access SAPI from C++, Visual Basic, and Java. There's no reason that other languages couldn't do likewise.

Microsoft has code examples at http://msdn.microsoft.com. One of their C++ examples is:
...
HRESULT                           hr = S_OK;
CComPtr < ISpVoice >              cpVoice;
CComPtr < ISpObjectToken >        cpToken;
CComPtr < IEnumSpObjectTokens >   cpEnum;

//Create a SAPI voice
hr = cpVoice.CoCreateInstance( CLSID_SpVoice );
	
//Enumerate voice tokens with attribute "Name=Microsoft Sam"
if(SUCCEEDED(hr))
{
 hr = SpEnumTokens(SPCAT_VOICES, L"Name=Microsoft Sam", NULL, &cpEnum;);
}

//Get the closest token
if(SUCCEEDED(hr))
{
 hr = cpEnum ->Next(1, &cpToken;, NULL);
}
	
//set the voice
if(SUCCEEDED(hr))
{
 hr = cpVoice->SetVoice( cpToken);
}

//set the output to the default audio device
if(SUCCEEDED(hr))
{
 hr = cpVoice->SetOutput( NULL, TRUE );
}

//Speak the text file (assumed to exist)
if(SUCCEEDED(hr))
{
 hr = cpVoice->Speak( L"c:\\ttstemp.txt",  SPF_IS_FILENAME, NULL );
}	

//Release objects
cpVoice.Release();
cpEnum.Release();
cpToken.Release();
...
To access SAPI from Java, you can use the Java Native Interface (JNI).

Quadmore Software www.quadmore.com offers a good open source implementation of this. They claim their product works with Java 2 on Windows XP; but we've run it successfully using Java 6 on Vista and Windows 7 as well.

To implement the Quadmore JNI solution, put the QuadTTS.dll in your path. To speak in your game, call Quadmore's single exposed method.

The one major caveat with this implementation is that, as coded, it does not include a command to interrupt an utterance. As this feature is essential to blind gaming, you'll have to extend Quadmore's C++ code yourself.

An example of Java calling the Quadmore implementation is below. The method was named by them, not us.
...
QuadmoreTTS quadmoreTTS;

...

try{

  quadmoreTTS = new QuadmoreTTS();

}catch(Exception ex){
  //Handle exception
}

...

boolean result = quadmoreTTS.speakDarling(string);

if( ! result ){
  //Handle failure condition
}

...
Note that this code is synchronous. It does not require the tricky timing code that FreeTTS does.

SAPI for Cheap


The cscript language that Microsoft ships with Windows XP, Vista, and Windows 7 can be used easily to speak SAPI Voices.

Cscript.exe is a command-line version of the Windows Script Host. With Cscript.exe, you can run scripts by entering the name of a script file at the command prompt, followed by the text you want spoken.

For example, the following script is contained in a file named speaktome.vbs. To have SAPI speak the string "Hello World!" you would issue the following to the Windows command line: cscript speaktome.vbs Hello World!
dim sapi

set sapi = CreateObject("SAPI.SpVoice")

Set objArgs   = WScript.Arguments

countArgs = objArgs.count

if 0 = countArgs then

  WScript.quit

end if
  
if 1 = countArgs then

  message = objArgs(0)

else
   
  For i = 0 to countArgs  - 1
  
    message = message & " " & objArgs(i)
	
  Next
  
end if

sapi.Speak(message)

'EOF
The Java code to do this would be:
...
runtime.exec("cscript speaktome.vbs Hello World!");
...
The nice thing about this implementation is that you can interrupt the speech by programmatically killing the script process you have just launched.

The bad thing about this implementation is that you incur a latency each time the cscript interpretor is launched. Plus you have to ship your script; another moving part.

Getting SAPI Voices


A SAPI Voice is a binary data file that contains the sounds, or phonemes, that SAPI speaks.

SAPI doesn't work without at least one SAPI Voice installed.

SAPI Voices are localized, even unto dialects. There are both male and female voices. We know of Voices for:
  • US English
  • UK English
  • Indian English
  • Australian English
  • American Spanish
  • Castillian Spanish
  • Mexican Spanish
  • German
  • French
  • Quebecois
  • Arabic
  • Dutch
  • Belgian Dutch
  • Italian
  • Norwegian
  • Swedish
  • Mandarin
  • Taiwanese Mandarin
  • Cantonese
  • Korean
  • Iberian Portuguese
  • Brazilian Portuguese
  • Japanese
  • Russian
  • Greek
  • Scottish
  • Irish
  • Danish
  • Finnish
  • Icelandic
  • Faroese
  • Czech
  • Polish
  • Turkish
There are character voices for:
  • English Child
  • Shouty English
  • Whispery English
  • Damien Character
  • Dog Character
  • Duchess Character
SAPI Voice files are large, on the order of 500 Mb each. Typically a person purchases them online and downloads them.

Each SAPI Voice has a name. Some of the better known SAPI Voices are:
  • Microsoft Mike
  • Microsoft Mary
  • Microsoft Anna
  • Microsoft Sam - Sounds like a robot
Later versions of Windows XP, and all versions of Vista and Windows 7 (that we know of) come with at least one SAPI Voice installed. This could be any one of the above Microsoft SAPI Voices.

SAPI Voices can be purchased from this non-exhaustive list of vendors: SAPI Voices cost on the order of 30 USD each. Some older SAPI Voices, especially the more robotic ones like Microsoft Sam, can be obtained for free.

This plethora of SAPI voices appears to be driven by the telephone automation industry. It's not clear the extent to which computer users who are blind purchase SAPI Voices or are content to use the one that comes with Windows.

What is relevant to SAPI and games is that a wide variety of SAPI Voices are readily available and that a SAPI Voice comes for free on all but the oldest Windows computers.

Final Words


Not all games can be made self voicing without changing the nature of the game itself. But the technology for self voicing itself makes this achievable with SAPI, VoiceOver, or sound files.

The benefit to you, the developer, is that by implementing self voicing, you expand your market to include gamers who are:
  • Blind
  • Visually impaired
  • Older (think Baby Boomers)
  • Cognitively impaired, such as dyslexic
See our article Designing Games for Self Voicing

John Bannick
Chief Technical Officer
7-128 Software

jbannick@7128.com