Phoneme Tool

From Valve Developer Community
Jump to: navigation, search
Bug: Automatic phoneme extraction is broken on Windows Vista or later due to a Speech API upgrade. Valve actually use a different phoneme extraction library that can't be distributed in the public SDK, so don't expect this problem to ever be officially fixed! (You can still create phonemes manually, but the process is very tedious.)

In order to perform phoneme extraction you must have the Microsoft Speech API 5.1 (SAPI 5.1) installed. It can be downloaded from Microsoft's web site.

Faceposer phoneme tool.jpg

The FacePoser application contains a tool for editing phoneme/word tags for the .wav files that actors can use with the "SPEAK" event. You can either load a scene that contains a spoken .wav file and the select any of the SPEAK events in the Choreography View, or you can directly load a .wav file by clicking the "Load" button along the bottom of the Phoneme Editor view.

Once you've loaded a .wav file, the display will show the general wave form of the sound file. In addition, along the top, the display shows the previously recognized words of the sentence, while along the bottom the display shows the previously tagged phonemes of the spoken .wav. Useful information about the .wav file is displayed in the bottom section of the view. The full text of the sentence, and information about the currently selected phoneme/word is displayed along the right side of the workspace. There is a scroll bar at the top to allow sliding the view of the wave view left/right. In addition, the mouse wheel can be used to zoom in/out. The zoom factor is shown at the bottom left of the tool window. Finally, there is a tab control that allows changing from manipulation of phonemes to editing of phoneme emphasis or of close captioning/localization information.

Phoneme Editor Tools

Buttons

Redo Extraction
Resubmits the sound file to the speech recognizer. If this is successful, a new list of words/phonemes will show up "inset" from the original data. To accept the new data and begin editing it, right-click in the workspace (in the wave form display) and choose "Commit extraction" from the context menu. To remove the inset data, right-click and select "Clear extraction" from the menu. Note, committing the results doesn't clobber the original .wav file, that only occurs when you click the "Save Changes" button, or you say "Yes" to the "Save file" prompt when changing .wav files or quitting the FacePoser application.
Save
Press the save changes button to save the working .wav file out to disk (see Phoneme Tool/data format).
Load
Load a new .wav file into the editor for editing.

Context Menu

Play
This option has three sub-options to play the original .wav, the edited wav or just the selected portion, if a selected portion is active. Playing and stopping the .wav can also be accomplished by pressing the Spacebar.
Load/Save
These options either load a new .wav or save the changes made to the current .wav.
Stop
Stops all sound playback on the sound engine
Deselect
If you've marked some portions of the .wav file as selected by dragging the left mouse along the wave form, you can click this button to remove all such markings.
Redo extraction
Same as button (above)
Redo extraction of selected words
This option requires that you have a portion of the wave form selected as well as a contiguous set of words form the sentence selected. The option will send the subset of the sentence off to the phoneme extraction tool and will display the results when finished. The tool will not change the positions of words, though it will wipe out and re-populate any phonemes belonging to words in the set. Sometimes the phoneme extractor has a hard time with long sentences. In such cases, working on sections of the sentence piecemeal can help with extraction.
Commit extraction
If word/phoneme data has been processed by the extraction system, choosing "Commit" will overwrite the current working data.
Clear extraction
Throws away the "uncommitted" data.
Cleanup words/phonemes
Iterates through all phonemes and words and finds words that are within a couple of pixels of touching (or are overlapping by such and amount) and fixes up the start/end times of the words/phonemes.
Change Speech API
The SDK version of FacePoser supports Microsoft SAPI 5.1 for performing automatic phoneme extraction from .wav files.
Import / export word data to .txt
If you need to work with the .wav file in a sound tool which strips our data chunks, you can save the original data lump into a .txt file and reapply after you edit the .wav externally.
Disable voice duck
The Source engine automatically lowers non-voice volume levels when a spoken wav is playing back. This behavior can be disabled for a spoken .wav by choosing "Disable voice duck" from the right-click menu.

Mouse actions

The general interaction UI works as follows:

  • To select, use Template-LMB.png left mouse button on items.
  • To deselect, click outside the item area for type of item being used
  • To shift the position of an item left, right, hold down Shift
  • To shift a boundary/edge of an item, hold down CTRL

Note that the cursor will reflect the appropriate mode (4 way cursor == item can be shifted, East-West cursor means item can be resized)

Waveform Editing

To select a portion of the waveform, simply click and drag with the Template-LMB.png left mouse button. To move the selection area, hold Shift and use the Template-LMB.png left mouse button to drag the area. To resize the selection, hover the mouse over the solid blue lines at either edge while holding CTRL. To deselect, click anywhere outside of the current selection, or press ESCAPE. You can play the current selection or re-extract phonemes using the right mouse context menu or by hitting SPACE.

Word Editing

Use the left mouse to select words. Once selected, one or more words can be moved by holding down the Shift key and using the mouse to drag the selection. If a single word is selected, it can be moved by holding down Shift and using or on the keyboard to shift it pixel by pixel. The size of a word can be adjusted by holding CTRL and hovering the mouse over the edge of the word, then clicking and dragging the edge left or right. The right boundary (end time) of a word can be adjusted using the keyboard by holding CTRL and using the / keys.

To deselect words, click anywhere outside of the word area (e.g., just above the words area works just fine)

Right clicking without words selected brings up a context menu with just a couple of options: First, the "Edit sentence text…" option allows you to specify the entire text of the current sentence. Clicking okay to exact the dialog will cause phoneme extraction to be performed again. Additionally, "Cleanup words phonemes" is an available option any time a .wav is loaded.

If you have one or more words selected, the right menu shows additional options:

Delete word
You can delete the selected word(s) using this option.
Edit word
If there is just one word selected, you can type in new text for the word by selecting this option. Only one word may be entered.
Insert word before/after word
If you have a single word selected, and there is sufficient time before/after the word, then you can insert a new word by choosing this menu item. A dialog appears in which you can type a single word, once you click OK, another dialog appears which allows you to pick one or more phonemes for the word just entered. You can type a space separated list of phonemes, or click one or more phoneme buttons to create the phoneme list for the newly entered word, or just click Cancel to put in a word with no phonemes.
Add phoneme to word
If the selected word doesn't have any phonemes, you can choose this option to allow entry of a string of one or more phonemes to use for the word.
Select all words before/after word
If a single word is selected, you can use this option to select the rest of the row in either direction (so you can shift everything down with the mouse easily)
Deselect all
Deselects all words/phonemes currently selected
Merge words
If two or more contiguous words are selected, choosing "Merge words" will make the start time of each word match the end time of the previous word
Separate words
If two or more contiguous selected words are close together, this option will provide a bit of space between the words.
Clear Undo
Resets undo information, deleting the undo history.

Phoneme Editing

The phoneme area behaves almost identically to the word area as far as mouse and keyboard interaction are concerned.

When using the mouse to drag one or more selected phonemes/words, selection rubber band while dragging as well as the entire move is bounded to a valid amount of space.

Phoneme Editor Keyboard Shortcuts

ESCAPE
If a .wav is currently being played, stop playback. If not, deselects all words/phonemes/selection areas
PGUP/PGDN
Moves the keyboard focus either to the word area (PGUP) or the phoneme area (PGDN). The current focus area is shown by a light green bar along the top or bottom edge of the word or phoneme display. Clicking/manipulating words or phonemes will set the focus appropriately.
/ arrow
The right/left arrows move and select the next or previous word or phoneme. For phonemes, the arrows cycle within a word.
Tab / Shift + Tab
You can change words at any time by using the TAB key.
Shift + ARROW KEY
Move the selected word/phoneme to right or left
CTRL + ARROW KEY
Resize end position of selected word phoneme
INSERT / Shift + INSERT
Insert a new word to right/left of selected word/phoneme
DELETE
Delete selected word(s) (which deletes all phonemes of the word, too) or delete selected phoneme(s).
or CTRL+RETURN
Edit the selected word or phoneme.
CTRL+Z
Undo
CTRL+Y
Redo
SPACE
Play selection or entire wav file.

Phoneme Emphasis Editing

Faceposer phoneme emphasis tool.jpg


By clicking on the "Emphasis" tab with a .wav loaded, you'll see most of the view grayed out but there will now be a work area with a blue line at the center of the screen. You can create an emphasis spline by laying down points using the CTRL key and left-clicking on points in the work area.

Once you have placed points, you can select them (shown in red) by dragging a rectangle around the desired points with the mouse. To move the points, just left-click on one or more selected points and move the mouse. If you right-click in the work area, there are various options for selecting/deselecting all points and for undo/redo of editing changes.

The emphasis track scales the intensity of phonemes during playback. For certain phonemes, you may want to author a "weak" and "strong" version and add these to the "phonemes_weak" and "phonemes_strong" expression class files. Note that Valve did not actually use this feature in shipping HL2 (but in theory, it should work).

The blue center line is normal emphasis of the phonemes in the "phonemes" class. As the line goes to the top, the amount of the phoneme from phonemes is faded out and the phoneme from "phonemes_strong" is faded in. If a phoneme doesn't have strong or weak override, then the absolute scale for emphasis is appropriately clamped.