Phoneme Tool

From Valve Developer Community
Revision as of 00:26, 29 June 2005 by Tom Edwards (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Phoneme Editor and Extraction Tool

In order to perform phoneme extraction you must have the Microsoft Speech API 5.1 (SAPI 5.1) installed. It can be downloaded from Microsoft's web site at the following URL:

Faceposer phoneme tool.jpg

The FacePoser application contains a tool for editing phoneme/word tags for the .wav files that actors can use with the "SPEAK" event. You can either load a scene that contains a spoken .wav file and the select any of the SPEAK events in the Choreography View, or you can directly load a .wav file by clicking the "Load" button along the bottom of the Phoneme Editor view.

Once you've loaded a .wav file, the display will show the general wave form of the sound file. In addition, along the top, the display shows the previously recognized words of the sentence, while along the bottom the display shows the previously tagged phonemes of the spoken .wav. Useful information about the .wav file is displayed in the bottom section of the view. The full text of the sentence, and information about the currently selected phoneme/word is displayed along the right side of the workspace. There is a scroll bar at the top to allow sliding the view of the wave view left/right. In addition, the mouse wheel can be used to zoom in/out. The zoom factor is shows at the bottom left of the tool window. Finally, there is a tab control that allows changing from manipulation of phonemes to editing of phoneme emphasis or of close captioning/localization information.

Phoneme Editor Tools

The row of buttons along the bottom of the editor view have these functions:

Redo Extraction

Resubmits the sound file to the speech recognizer. If this is successful, a new list of words/phonemes will show up "inset" from the original data. To accept the new data and begin editing it, right-click in the workspace (in the wave form display) and choose "Commit extraction" from the context menu. To remove the inset data, right-click and select "Clear extraction" from the menu. Note, committing the results doesn't clobber the original .wav file, that only occurs when you click the "Save Changes" button, or you say "Yes" to the "Save file" prompt when changing .wav files or quitting the FacePoser application.


Press the save changes button to save the working .wav file out to disk.


Load a new .wav file into the editor for editing.

In addition, there are several less often used commands available from the right mouse context menu:


This option has three sub-options to play the original .wav, the edited wav or just the selected portion, if a selected portion is active. Playing and stopping the .wav can also be accomplished by pressing the Spacebar.


These options either load a new .wav or save the changes made to the current .wav.


Stops all sound playback on the sound engine

Also, there are additional options available from the right-click menu.


If you've marked some portions of the .wav file as selected by dragging the left mouse along the wave form, you can click this button to remove all such markings.

Redo extraction

Same as above

Redo extraction of selected words

This option requires that you have a portion of the wave form selected as well as a contiguous set of words form the sentence selected. The option will send the subset of the sentence off to the phoneme extraction tool and will display the results when finished. The tool will not change the positions of words, though it will wipe out and re-populate any phonemes belonging to words in the set. Sometimes the phoneme extractor has a hard time with long sentences. In such cases, working on sections of the sentence piecemeal can help with extraction.

Commit extraction

If word/phoneme data has been processed by the extraction system, choosing "Commit" will overwrite the current working data.

Clear extraction

Throws away the "uncommitted" data.

Cleanup words/phonemes

Iterates through all phonemes and words and finds words that are within a couple of pixels of touching (or are overlapping by such and amount) and fixes up the start/end times of the words/phonemes.

Change Speech API

The SDK version of FacePoser supports Microsoft SAPI 5.1 for performing automatic phoneme extraction from .wav files.

Import / export word data to .txt

If you need to work with the .wav file in a sound tool which strips our data chunks, you can save the original data lump into a .txt file and reapply after you edit the .wav externally.

Disable voice duck

The Source engine automatically lowers non-voice volume levels when a spoken wav is playing back. This behavior can be disabled for a spoken .wav by choosing "Disable voice duck" from the right-click menu.

Other Controls

In addition to these buttons, the mouse and keyboard can be used to perform various actions on the words/phonemes/wave form.

The general interaction UI works as follows:

  • To select, simply left-click on items.
  • To deselect, click outside the item area for type of item being used
  • To shift the position of an item left, right, hold down the SHIFT key
  • To shift a boundary/edge of an item, hold down the CTRL key

Note that the cursor will reflect the appropriate mode (4 way cursor == item can be shifted, East-West cursor means item can be resized)

Waveform Editing

To select a portion of the waveform, simply click and drag with the left mouse button. To move the selection area, hold SHIFT and use the left mouse to drag the area. To resize the selection, hover the mouse over the solid blue lines at either edge while holding the CTRL key. To deselect, click anywhere outside of the current selection, or press the ESCAPE key. You can play the current selection or re-extract phonemes using the right mouse context menu or by hitting the SPACE bar.

Word Editing

Use the left mouse to select words. Once selected, one or more words can be moved by holding down the SHIFT key and using the mouse to drag the selection. If a single word is selected, it can be moved by holding down the SHIFT key and using the RIGHT or LEFT arrow on the keyboard to shift it pixel by pixel. The size of a word can be adjusted by holding the CTRL key and hovering the left mouse over the edge of the word, then clicking and dragging the edge left or right. The right boundary (end time) of a word can be adjusted using the keyboard by holding CTRL and using the RIGHT/LEFT arrows.

To deselect words, click anywhere outside of the word area (e.g., just above the words area works just fine)

Right clicking without words selected brings up a context menu with just a couple of options: First, the "Edit sentence text…" option allows you to specify the entire text of the current sentence. Clicking okay to exact the dialog will cause phoneme extraction to be performed again. Additionally, "Cleanup words phonemes" is an available option any time a .wav is loaded.

If you have one or more words selected, the right menu shows additional options:

Delete 'word' - You can delete the selected word(s) using this option.

Edit 'word' - If there is just one word selected, you can type in new text for the word by selecting this option. Only one word may be entered.

Insert word before/after 'word' - If you have a single word selected, and there is sufficient time before/after the word, then you can insert a new word by choosing this menu item. A dialog appears in which you can type a single word, once you click OK, another dialog appears which allows you to pick one or more phonemes for the word just entered. You can type a space separated list of phonemes, or click one or more phoneme buttons to create the phoneme list for the newly entered word, or just click Cancel to put in a word with no phonemes.

Add phoneme to 'word' - If the selected word doesn't have any phonemes, you can choose this option to allow entry of a string of one or more phonemes to use for the word.

Select all words before/after 'word' - If a single word is selected, you can use this option to select the rest of the row in either direction (so you can shift everything down with the mouse easily)

Deselect all - Deselects all words/phonemes currently selected

Merge words - If two or more contiguous words are selected, choosing "Merge words" will make the start time of each word match the end time of the previous word

Separate words - If two or more contiguous selected words are close together, this option will provide a bit of space between the words.

Clear Undo - Resets undo information, deleting the undo history.

Phoneme Editing

The phoneme area behaves almost identically to the word area as far as mouse and keyboard interaction are concerned.

When using the mouse to drag one or more selected phonemes/words, selection rubber band while dragging as well as the entire move is bounded to a valid amount of space.

Phoneme Editor Keyboard Shortcuts

ESCAPE - if a .wav is currently being played, stop playback. If not, deselects all words/phonemes/selection areas

PGUP/PGDN - moves the keyboard focus either to the word area (PGUP) or the phoneme area (PGDN). The current focus area is shown by a light green bar along the top or bottom edge of the word or phoneme display. Clicking/manipulating words or phonemes will set the focus appropriately.

RIGHT/LEFT arrow - The right/left arrows move and select the next or previous word or phoneme. For phonemes, the arrows cycle within a word.

TAB/SHIFT + TAB - You can change words at any time by using the TAB key.

SHIFT + ARROW KEY - Move the selected word/phoneme to right or left

CTRL + ARROW KEY - Resize end position of selected word phoneme

INSERT / SHIFT + INSERT - Insert a new word to right/left of selected word/phoneme

DELETE - Delete selected word(s) (which deletes all phonemes of the word, too) or delete selected phoneme(s).

UP or CTRL+RETURN - Edit the selected word or phoneme.

CTRL+Z - Undo

CTRL+Y -Redo

SPACE - Play selection or entire wav file.

Phoneme Emphasis Editing

Faceposer phoneme emphasis tool.jpg

By clicking on the "Emphasis" tab with a .wav loaded, you'll see most of the view grayed out but there will now be a work area with a blue line at the center of the screen. You can create an emphasis spline by laying down points using the CTRL key and left-clicking on points in the work area.

Once you have placed points, you can select them (shown in red) by dragging a rectangle around the desired points with the mouse. To move the points, just left-click on one or more selected points and move the mouse. If you right-click in the work area, there are various options for selecting/deselecting all points and for undo/redo of editing changes.

The emphasis track scales the intensity of phonemes during playback. For certain phonemes, you may want to author a "weak" and "strong" version and add these to the "phonemes_weak" and "phonemes_strong" expression class files. Note that Valve did not actually use this feature in shipping HL2 (but in theory, it should work).

The blue center line is normal emphasis of the phonemes in the "phonemes" class. As the line goes to the top, the amount of the phoneme from phonemes is faded out and the phoneme from "phonemes_strong" is faded in. If a phoneme doesn't have strong or weak override, then the absolute scale for emphasis is appropriately clamped.