Phoneme Tool: Difference between revisions

From Valve Developer Community
Jump to navigation Jump to search
m (Setting bug notice hidetested=1 param on page where the bug might not need tested in param specified)
 
(21 intermediate revisions by 9 users not shown)
Line 1: Line 1:
==Phoneme Editor and Extraction Tool==
{{note|The '''Phoneme Extraction Tool''' requires additional setup to work in most versions of the Source engine. See the [[#Setup|Setup]] section of this page for more info.}}
 
[[File:Faceposer phoneme tool.jpg|center]]
In order to perform phoneme extraction you must have the Microsoft Speech API 5.1 (SAPI 5.1) installed. It can be downloaded from Microsoft's web site at the following URL:
 
http://www.microsoft.com/speech/download/old/sapi5.asp
 
 
<center>[[image:Faceposer phoneme tool.jpg]]</center>
 


The FacePoser application contains a tool for editing phoneme/word tags for the .wav files that actors can use with the "SPEAK" event. You can either load a scene that contains a spoken .wav file and the select any of the SPEAK events in the Choreography View, or you can directly load a .wav file by clicking the "Load" button along the bottom of the Phoneme Editor view.
The FacePoser application contains a tool for editing phoneme/word tags for the .wav files that actors can use with the "SPEAK" event. You can either load a scene that contains a spoken .wav file and the select any of the SPEAK events in the Choreography View, or you can directly load a .wav file by clicking the "Load" button along the bottom of the Phoneme Editor view.
Line 13: Line 6:
Once you've loaded a .wav file, the display will show the general wave form of the sound file. In addition, along the top, the display shows the previously recognized words of the sentence, while along the bottom the display shows the previously tagged phonemes of the spoken .wav. Useful information about the .wav file is displayed in the bottom section of the view. The full text of the sentence, and information about the currently selected phoneme/word is displayed along the right side of the workspace. There is a scroll bar at the top to allow sliding the view of the wave view left/right. In addition, the mouse wheel can be used to zoom in/out. The zoom factor is shown at the bottom left of the tool window. Finally, there is a tab control that allows changing from manipulation of phonemes to editing of phoneme emphasis or of close captioning/localization information.
Once you've loaded a .wav file, the display will show the general wave form of the sound file. In addition, along the top, the display shows the previously recognized words of the sentence, while along the bottom the display shows the previously tagged phonemes of the spoken .wav. Useful information about the .wav file is displayed in the bottom section of the view. The full text of the sentence, and information about the currently selected phoneme/word is displayed along the right side of the workspace. There is a scroll bar at the top to allow sliding the view of the wave view left/right. In addition, the mouse wheel can be used to zoom in/out. The zoom factor is shown at the bottom left of the tool window. Finally, there is a tab control that allows changing from manipulation of phonemes to editing of phoneme emphasis or of close captioning/localization information.


==Phoneme Editor Tools==
==Setup==
 
The row of buttons along the bottom of the editor view have these functions:


===Redo Extraction===
===Using Lipsinc Speech API===
The original Faceposer used a third-party API for phoneme extraction.


Resubmits the sound file to the speech recognizer. If this is successful, a new list of words/phonemes will show up "inset" from the original data. To accept the new data and begin editing it, right-click in the workspace (in the wave form display) and choose "Commit extraction" from the context menu. To remove the inset data, right-click and select "Clear extraction" from the menu. Note, committing the results doesn't clobber the original .wav file, that only occurs when you click the "Save Changes" button, or you say "Yes" to the "Save file" prompt when changing .wav files or quitting the FacePoser application.
I'm certainly no expert in this field but after hours of searching for the correct method to get it working I think I finally found it!


===Save===
It's working for me, I'm using it for Source SDK Base 2013: Multiplayer on Windows 10, that means it should work for the rest because it's the most "up-to-date" branch.


Press the save changes button to save the working .wav file out to disk.


===Load===
1.Download this: https://drive.google.com/file/d/1aWA2ME2yA5or4yTdUvm0daBg1_kFOt-b/view?usp=sharing


Load a new .wav file into the editor for editing.
2.Unpack it and put the folder here (in my case) [Source SDK Base 2013 Multiplayer\bin]


In addition, there are several less often used commands available from the right mouse context menu:
(So the .dat files should be in [Source SDK Base 2013 Multiplayer\bin\lipsinc_data])


===Play===
3.Run faceposer


This option has three sub-options to play the original .wav, the edited wav or just the selected portion, if a selected portion is active. Playing and stopping the .wav can also be accomplished by pressing the Spacebar.
4.Go to Phoeneme editor


===Load/Save===
5.Right Click on the Phoeneme editor window


These options either load a new .wav or save the changes made to the current .wav.
6.At the "Change Speech API" change it to: "Lipsinc Speech API"


===Stop===


Stops all sound playback on the sound engine


Also, there are additional options available from the right-click menu.
{{Note|For games after {{Game link|Left 4 Dead}}, including {{Game link|Alien Swarm}}, you will have to copy <code>phonemeextractor_ims.dll</code> and <code>ims_helper.dll</code> from the <code>bin/phonemeextractors</code> folder in [[Source SDK 2013]] (or another game released before Left 4 Dead) to the <code>bin/phonemeextractors</code> folder in your engine installation.}}


===Deselect===


If you've marked some portions of the .wav file as selected by dragging the left mouse along the wave form, you can click this button to remove all such markings.
===Using Microsoft Speech API===


===Redo extraction===
Alternatively, you may use the Microsoft Speech API. To do this, you must have the Microsoft Speech API 5.1 (SAPI 5.1) installed. [http://www.microsoft.com/downloads/details.aspx?FamilyID=5e86ec97-40a7-453f-b0ee-6583171b4530&DisplayLang=en It can be downloaded from Microsoft's web site.]


Same as above
{{bug|hidetested=1|Phoneme extraction with Microsoft Speech API is broken on Windows Vista or later due to a Speech API upgrade. Valve actually [https://web.archive.org/web/20110109210627/https://www.gamedev.net/community/forums/topic.asp?topic_id=306087 use a different phoneme extraction library] that can be configured as described in the [[#Using Lipsinc Speech API|Using Lipsinc Speech API]] section above, instead of the Microsoft Speech API.}}


===Redo extraction of selected words===
==Phoneme Editor Tools==
 
===Buttons===
This option requires that you have a portion of the wave form selected as well as a contiguous set of words form the sentence selected. The option will send the subset of the sentence off to the phoneme extraction tool and will display the results when finished. The tool will not change the positions of words, though it will wipe out and re-populate any phonemes belonging to words in the set. Sometimes the phoneme extractor has a hard time with long sentences. In such cases, working on sections of the sentence piecemeal can help with extraction.
;Redo Extraction
 
:Resubmits the sound file to the speech recognizer. If this is successful, a new list of words/phonemes will show up "inset" from the original data. To accept the new data and begin editing it, right-click in the workspace (in the wave form display) and choose "Commit extraction" from the context menu. To remove the inset data, right-click and select "Clear extraction" from the menu. Note, committing the results doesn't clobber the original .wav file, that only occurs when you click the "Save Changes" button, or you say "Yes" to the "Save file" prompt when changing .wav files or quitting the FacePoser application.
===Commit extraction===
;Save
:Press the save changes button to save the working .wav file out to disk (see [[Phoneme Tool/data format]]).
;Load
:Load a new .wav file into the editor for editing.


If word/phoneme data has been processed by the extraction system, choosing "Commit" will overwrite the current working data.
===Context Menu===
;Play
:This option has three sub-options to play the original .wav, the edited wav or just the selected portion, if a selected portion is active. Playing and stopping the .wav can also be accomplished by pressing the Spacebar.
;Load/Save
:These options either load a new .wav or save the changes made to the current .wav.
;Stop
:Stops all sound playback on the sound engine
;Deselect
:If you've marked some portions of the .wav file as selected by dragging the left mouse along the wave form, you can click this button to remove all such markings.
;Redo extraction
:Same as button (above)
;Redo extraction of selected words
:This option requires that you have a portion of the wave form selected as well as a contiguous set of words form the sentence selected. The option will send the subset of the sentence off to the phoneme extraction tool and will display the results when finished. The tool will not change the positions of words, though it will wipe out and re-populate any phonemes belonging to words in the set. Sometimes the phoneme extractor has a hard time with long sentences. In such cases, working on sections of the sentence piecemeal can help with extraction.
;Commit extraction
:If word/phoneme data has been processed by the extraction system, choosing "Commit" will overwrite the current working data.
;Clear extraction
:Throws away the "uncommitted" data.
;Cleanup words/phonemes
:Iterates through all phonemes and words and finds words that are within a couple of pixels of touching (or are overlapping by such and amount) and fixes up the start/end times of the words/phonemes.
;Change Speech API
:The SDK version of FacePoser supports Microsoft SAPI 5.1 for performing automatic phoneme extraction from .wav files.
;Import / export word data to .txt
:If you need to work with the .wav file in a sound tool which strips our data chunks, you can save the original data lump into a .txt file and reapply after you edit the .wav externally.
;Disable voice duck
:The Source engine automatically lowers non-voice volume levels when a spoken wav is playing back. This behavior can be disabled for a spoken .wav by choosing "Disable voice duck" from the right-click menu.


===Clear extraction===
===Mouse actions===
 
Throws away the "uncommitted" data.
 
===Cleanup words/phonemes===
 
Iterates through all phonemes and words and finds words that are within a couple of pixels of touching (or are overlapping by such and amount) and fixes up the start/end times of the words/phonemes.
 
===Change Speech API===
 
The SDK version of FacePoser supports Microsoft SAPI 5.1 for performing automatic phoneme extraction from .wav files.
 
The current version of FacePoser in the SDK may not run correctly on foreign language versions of the Windows OS.  If you see a message of the form:
 
  Error: SAPI 5.1 Unable to create phoneme converter for language 2070
 
then this is the problem.  We have fixed this for the next SDK update of FacePoser '''[NOT YET RELEASED as of 11/15/05]''', where if the phoneme converter for the OS language id cannot be instanced, then the SAPI phoneme extractor will fall back to using the "english" language id (hex, 0x409, decimal 1033).  In addition, you can force a specific language id by running FacePoser with the following command line:
 
  bin\hlfaceposer.exe -languageid 1033
 
This will force using the specified language id for phoneme conversion.  It's not clear whether SAPI 5.1 itself will support anything other than english language phoneme conversion, however.
 
===Import / export word data to .txt===
 
If you need to work with the .wav file in a sound tool which strips our data chunks, you can save the original data lump into a .txt file and reapply after you edit the .wav externally.
 
===Disable voice duck===
 
The Source engine automatically lowers non-voice volume levels when a spoken wav is playing back. This behavior can be disabled for a spoken .wav by choosing "Disable voice duck" from the right-click menu.
 
===Other Controls===
 
In addition to these buttons, the mouse and keyboard can be used to perform various actions on the words/phonemes/wave form.


The general interaction UI works as follows:
The general interaction UI works as follows:


* To select, simply left-click on items.
* To select, use {{key|LMB}} left mouse button on items.
* To deselect, click outside the item area for type of item being used
* To deselect, click outside the item area for type of item being used
* To shift the position of an item left, right, hold down the SHIFT key
* To shift the position of an item left, right, hold down {{key|SHIFT}}
* To shift a boundary/edge of an item, hold down the CTRL key
* To shift a boundary/edge of an item, hold down {{key|CTRL}}


Note that the cursor will reflect the appropriate mode (4 way cursor == item can be shifted, East-West cursor means item can be resized)
Note that the cursor will reflect the appropriate mode (4 way cursor == item can be shifted, East-West cursor means item can be resized)
Line 106: Line 89:
==Waveform Editing==
==Waveform Editing==


To select a portion of the waveform, simply click and drag with the left mouse button. To move the selection area, hold SHIFT and use the left mouse to drag the area. To resize the selection, hover the mouse over the solid blue lines at either edge while holding the CTRL key. To deselect, click anywhere outside of the current selection, or press the ESCAPE key. You can play the current selection or re-extract phonemes using the right mouse context menu or by hitting the SPACE bar.
To select a portion of the waveform, simply click and drag with the {{key|LMB}} left mouse button. To move the selection area, hold {{key|SHIFT}} and use the {{key|LMB}} left mouse button to drag the area. To resize the selection, hover the mouse over the solid blue lines at either edge while holding {{key|CTRL}}. To deselect, click anywhere outside of the current selection, or press {{key|ESCAPE}}. You can play the current selection or re-extract phonemes using the right mouse context menu or by hitting {{key|SPACE}}.


==Word Editing==
==Word Editing==


Use the left mouse to select words. Once selected, one or more words can be moved by holding down the SHIFT key and using the mouse to drag the selection. If a single word is selected, it can be moved by holding down the SHIFT key and using the RIGHT or LEFT arrow on the keyboard to shift it pixel by pixel. The size of a word can be adjusted by holding the CTRL key and hovering the left mouse over the edge of the word, then clicking and dragging the edge left or right. The right boundary (end time) of a word can be adjusted using the keyboard by holding CTRL and using the RIGHT/LEFT arrows.
Use the left mouse to select words. Once selected, one or more words can be moved by holding down the {{key|SHIFT}} key and using the mouse to drag the selection. If a single word is selected, it can be moved by holding down {{key|SHIFT}} and using {{key|Left}} or {{key|RIGHT}} on the keyboard to shift it pixel by pixel. The size of a word can be adjusted by holding {{key|CTRL}} and hovering the mouse over the edge of the word, then clicking and dragging the edge left or right. The right boundary (end time) of a word can be adjusted using the keyboard by holding {{key|CTRL}} and using the {{key|Left}}/{{key|RIGHT}} keys.  


To deselect words, click anywhere outside of the word area (e.g., just above the words area works just fine)
To deselect words, click anywhere outside of the word area (e.g., just above the words area works just fine)
Line 118: Line 101:
If you have one or more words selected, the right menu shows additional options:
If you have one or more words selected, the right menu shows additional options:


'''Delete 'word'''' - You can delete the selected word(s) using this option.
;Delete word
 
:You can delete the selected word(s) using this option.
'''Edit 'word'''' - If there is just one word selected, you can type in new text for the word by selecting this option. Only one word may be entered.
;Edit word
 
:If there is just one word selected, you can type in new text for the word by selecting this option. Only one word may be entered.
'''Insert word before/after 'word'''' - If you have a single word selected, and there is sufficient time before/after the word, then you can insert a new word by choosing this menu item. A dialog appears in which you can type a single word, once you click OK, another dialog appears which allows you to pick one or more phonemes for the word just entered. You can type a space separated list of phonemes, or click one or more phoneme buttons to create the phoneme list for the newly entered word, or just click Cancel to put in a word with no phonemes.
;Insert word before/after word
 
:If you have a single word selected, and there is sufficient time before/after the word, then you can insert a new word by choosing this menu item. A dialog appears in which you can type a single word, once you click OK, another dialog appears which allows you to pick one or more phonemes for the word just entered. You can type a space separated list of phonemes, or click one or more phoneme buttons to create the phoneme list for the newly entered word, or just click Cancel to put in a word with no phonemes.
'''Add phoneme to 'word'''' - If the selected word doesn't have any phonemes, you can choose this option to allow entry of a string of one or more phonemes to use for the word.
;Add phoneme to word
 
:If the selected word doesn't have any phonemes, you can choose this option to allow entry of a string of one or more phonemes to use for the word.
'''Select all words before/after 'word'''' - If a single word is selected, you can use this option to select the rest of the row in either direction (so you can shift everything down with the mouse easily)
;Select all words before/after word
 
:If a single word is selected, you can use this option to select the rest of the row in either direction (so you can shift everything down with the mouse easily)
'''Deselect all''' - Deselects all words/phonemes currently selected
;Deselect all
 
:Deselects all words/phonemes currently selected
'''Merge words''' - If two or more contiguous words are selected, choosing "Merge words" will make the start time of each word match the end time of the previous word
;Merge words
 
:If two or more contiguous words are selected, choosing "Merge words" will make the start time of each word match the end time of the previous word
'''Separate words''' - If two or more contiguous selected words are close together, this option will provide a bit of space between the words.
;Separate words
 
:If two or more contiguous selected words are close together, this option will provide a bit of space between the words.
'''Clear Undo''' - Resets undo information, deleting the undo history.
;Clear Undo
:Resets undo information, deleting the undo history.


==Phoneme Editing==
==Phoneme Editing==
Line 144: Line 128:
==Phoneme Editor Keyboard Shortcuts==
==Phoneme Editor Keyboard Shortcuts==


ESCAPE - if a .wav is currently being played, stop playback. If not, deselects all words/phonemes/selection areas
;ESCAPE
 
:If a .wav is currently being played, stop playback. If not, deselects all words/phonemes/selection areas
PGUP/PGDN - moves the keyboard focus either to the word area (PGUP) or the phoneme area (PGDN). The current focus area is shown by a light green bar along the top or bottom edge of the word or phoneme display. Clicking/manipulating words or phonemes will set the focus appropriately.
;{{key|PGUP}}/{{key|PGDN}}
 
:Moves the keyboard focus either to the word area (PGUP) or the phoneme area (PGDN). The current focus area is shown by a light green bar along the top or bottom edge of the word or phoneme display. Clicking/manipulating words or phonemes will set the focus appropriately.
RIGHT/LEFT arrow - The right/left arrows move and select the next or previous word or phoneme. For phonemes, the arrows cycle within a word.
;{{key|left}}/{{key|right}} arrow
 
:The right/left arrows move and select the next or previous word or phoneme. For phonemes, the arrows cycle within a word.
TAB/SHIFT + TAB - You can change words at any time by using the TAB key.
;{{key|TAB}}/{{Key|SHIFT}} + {{key|TAB}}
 
:You can change words at any time by using the TAB key.
SHIFT + ARROW KEY - Move the selected word/phoneme to right or left
;{{key|SHIFT}} + ARROW KEY
 
:Move the selected word/phoneme to right or left
CTRL + ARROW KEY - Resize end position of selected word phoneme
;{{key|CTRL}} + ARROW KEY
 
:Resize end position of selected word phoneme
INSERT / SHIFT + INSERT - Insert a new word to right/left of selected word/phoneme
;{{key|INSERT}} / {{key|SHIFT}} + {{key|INSERT}}
 
:Insert a new word to right/left of selected word/phoneme
DELETE - Delete selected word(s) (which deletes all phonemes of the word, too) or delete selected phoneme(s).
;{{key|DELETE}}
 
:Delete selected word(s) (which deletes all phonemes of the word, too) or delete selected phoneme(s).
UP or CTRL+RETURN - Edit the selected word or phoneme.
;{{key|UP}} or {{key|CTRL}}+{{key|RETURN}}
 
:Edit the selected word or phoneme.
CTRL+Z - Undo
;{{key|CTRL}}+{{key|Z}}
 
:Undo
CTRL+Y -Redo
;{{key|CTRL}}+{{key|Y}}
 
:Redo
SPACE - Play selection or entire wav file.
;{{key|SPACE}}
:Play selection or entire wav file.


==Phoneme Emphasis Editing==
==Phoneme Emphasis Editing==
<center>[[image:Faceposer phoneme emphasis tool.jpg]]</center>
<div align="center">[[File:Faceposer phoneme emphasis tool.jpg]]</div>




By clicking on the "Emphasis" tab with a .wav loaded, you'll see most of the view grayed out but there will now be a work area with a blue line at the center of the screen. You can create an emphasis spline by laying down points using the CTRL key and left-clicking on points in the work area.
By clicking on the "Emphasis" tab with a .wav loaded, you'll see most of the view grayed out but there will now be a work area with a blue line at the center of the screen. You can create an emphasis spline by laying down points using the {{key|CTRL}} key and left-clicking on points in the work area.


Once you have placed points, you can select them (shown in red) by dragging a rectangle around the desired points with the mouse. To move the points, just left-click on one or more selected points and move the mouse. If you right-click in the work area, there are various options for selecting/deselecting all points and for undo/redo of editing changes.
Once you have placed points, you can select them (shown in red) by dragging a rectangle around the desired points with the mouse. To move the points, just left-click on one or more selected points and move the mouse. If you right-click in the work area, there are various options for selecting/deselecting all points and for undo/redo of editing changes.

Latest revision as of 07:08, 20 May 2025

Note.pngNote:The Phoneme Extraction Tool requires additional setup to work in most versions of the Source engine. See the Setup section of this page for more info.
Faceposer phoneme tool.jpg

The FacePoser application contains a tool for editing phoneme/word tags for the .wav files that actors can use with the "SPEAK" event. You can either load a scene that contains a spoken .wav file and the select any of the SPEAK events in the Choreography View, or you can directly load a .wav file by clicking the "Load" button along the bottom of the Phoneme Editor view.

Once you've loaded a .wav file, the display will show the general wave form of the sound file. In addition, along the top, the display shows the previously recognized words of the sentence, while along the bottom the display shows the previously tagged phonemes of the spoken .wav. Useful information about the .wav file is displayed in the bottom section of the view. The full text of the sentence, and information about the currently selected phoneme/word is displayed along the right side of the workspace. There is a scroll bar at the top to allow sliding the view of the wave view left/right. In addition, the mouse wheel can be used to zoom in/out. The zoom factor is shown at the bottom left of the tool window. Finally, there is a tab control that allows changing from manipulation of phonemes to editing of phoneme emphasis or of close captioning/localization information.

Setup

Using Lipsinc Speech API

The original Faceposer used a third-party API for phoneme extraction.

I'm certainly no expert in this field but after hours of searching for the correct method to get it working I think I finally found it!

It's working for me, I'm using it for Source SDK Base 2013: Multiplayer on Windows 10, that means it should work for the rest because it's the most "up-to-date" branch.


1.Download this: https://drive.google.com/file/d/1aWA2ME2yA5or4yTdUvm0daBg1_kFOt-b/view?usp=sharing

2.Unpack it and put the folder here (in my case) [Source SDK Base 2013 Multiplayer\bin]

(So the .dat files should be in [Source SDK Base 2013 Multiplayer\bin\lipsinc_data])

3.Run faceposer

4.Go to Phoeneme editor

5.Right Click on the Phoeneme editor window

6.At the "Change Speech API" change it to: "Lipsinc Speech API"


Note.pngNote:For games after Left 4 Dead Left 4 Dead , including Alien Swarm Alien Swarm , you will have to copy phonemeextractor_ims.dll and ims_helper.dll from the bin/phonemeextractors folder in Source SDK 2013 (or another game released before Left 4 Dead) to the bin/phonemeextractors folder in your engine installation.


Using Microsoft Speech API

Alternatively, you may use the Microsoft Speech API. To do this, you must have the Microsoft Speech API 5.1 (SAPI 5.1) installed. It can be downloaded from Microsoft's web site.

Icon-Bug.pngBug:

Phoneme Editor Tools

Buttons

Redo Extraction
Resubmits the sound file to the speech recognizer. If this is successful, a new list of words/phonemes will show up "inset" from the original data. To accept the new data and begin editing it, right-click in the workspace (in the wave form display) and choose "Commit extraction" from the context menu. To remove the inset data, right-click and select "Clear extraction" from the menu. Note, committing the results doesn't clobber the original .wav file, that only occurs when you click the "Save Changes" button, or you say "Yes" to the "Save file" prompt when changing .wav files or quitting the FacePoser application.
Save
Press the save changes button to save the working .wav file out to disk (see Phoneme Tool/data format).
Load
Load a new .wav file into the editor for editing.

Context Menu

Play
This option has three sub-options to play the original .wav, the edited wav or just the selected portion, if a selected portion is active. Playing and stopping the .wav can also be accomplished by pressing the Spacebar.
Load/Save
These options either load a new .wav or save the changes made to the current .wav.
Stop
Stops all sound playback on the sound engine
Deselect
If you've marked some portions of the .wav file as selected by dragging the left mouse along the wave form, you can click this button to remove all such markings.
Redo extraction
Same as button (above)
Redo extraction of selected words
This option requires that you have a portion of the wave form selected as well as a contiguous set of words form the sentence selected. The option will send the subset of the sentence off to the phoneme extraction tool and will display the results when finished. The tool will not change the positions of words, though it will wipe out and re-populate any phonemes belonging to words in the set. Sometimes the phoneme extractor has a hard time with long sentences. In such cases, working on sections of the sentence piecemeal can help with extraction.
Commit extraction
If word/phoneme data has been processed by the extraction system, choosing "Commit" will overwrite the current working data.
Clear extraction
Throws away the "uncommitted" data.
Cleanup words/phonemes
Iterates through all phonemes and words and finds words that are within a couple of pixels of touching (or are overlapping by such and amount) and fixes up the start/end times of the words/phonemes.
Change Speech API
The SDK version of FacePoser supports Microsoft SAPI 5.1 for performing automatic phoneme extraction from .wav files.
Import / export word data to .txt
If you need to work with the .wav file in a sound tool which strips our data chunks, you can save the original data lump into a .txt file and reapply after you edit the .wav externally.
Disable voice duck
The Source engine automatically lowers non-voice volume levels when a spoken wav is playing back. This behavior can be disabled for a spoken .wav by choosing "Disable voice duck" from the right-click menu.

Mouse actions

The general interaction UI works as follows:

  • To select, use LMB left mouse button on items.
  • To deselect, click outside the item area for type of item being used
  • To shift the position of an item left, right, hold down Shift
  • To shift a boundary/edge of an item, hold down CTRL

Note that the cursor will reflect the appropriate mode (4 way cursor == item can be shifted, East-West cursor means item can be resized)

Waveform Editing

To select a portion of the waveform, simply click and drag with the LMB left mouse button. To move the selection area, hold Shift and use the LMB left mouse button to drag the area. To resize the selection, hover the mouse over the solid blue lines at either edge while holding CTRL. To deselect, click anywhere outside of the current selection, or press ESCAPE. You can play the current selection or re-extract phonemes using the right mouse context menu or by hitting SPACE.

Word Editing

Use the left mouse to select words. Once selected, one or more words can be moved by holding down the Shift key and using the mouse to drag the selection. If a single word is selected, it can be moved by holding down Shift and using or on the keyboard to shift it pixel by pixel. The size of a word can be adjusted by holding CTRL and hovering the mouse over the edge of the word, then clicking and dragging the edge left or right. The right boundary (end time) of a word can be adjusted using the keyboard by holding CTRL and using the / keys.

To deselect words, click anywhere outside of the word area (e.g., just above the words area works just fine)

Right clicking without words selected brings up a context menu with just a couple of options: First, the "Edit sentence text…" option allows you to specify the entire text of the current sentence. Clicking okay to exact the dialog will cause phoneme extraction to be performed again. Additionally, "Cleanup words phonemes" is an available option any time a .wav is loaded.

If you have one or more words selected, the right menu shows additional options:

Delete word
You can delete the selected word(s) using this option.
Edit word
If there is just one word selected, you can type in new text for the word by selecting this option. Only one word may be entered.
Insert word before/after word
If you have a single word selected, and there is sufficient time before/after the word, then you can insert a new word by choosing this menu item. A dialog appears in which you can type a single word, once you click OK, another dialog appears which allows you to pick one or more phonemes for the word just entered. You can type a space separated list of phonemes, or click one or more phoneme buttons to create the phoneme list for the newly entered word, or just click Cancel to put in a word with no phonemes.
Add phoneme to word
If the selected word doesn't have any phonemes, you can choose this option to allow entry of a string of one or more phonemes to use for the word.
Select all words before/after word
If a single word is selected, you can use this option to select the rest of the row in either direction (so you can shift everything down with the mouse easily)
Deselect all
Deselects all words/phonemes currently selected
Merge words
If two or more contiguous words are selected, choosing "Merge words" will make the start time of each word match the end time of the previous word
Separate words
If two or more contiguous selected words are close together, this option will provide a bit of space between the words.
Clear Undo
Resets undo information, deleting the undo history.

Phoneme Editing

The phoneme area behaves almost identically to the word area as far as mouse and keyboard interaction are concerned.

When using the mouse to drag one or more selected phonemes/words, selection rubber band while dragging as well as the entire move is bounded to a valid amount of space.

Phoneme Editor Keyboard Shortcuts

ESCAPE
If a .wav is currently being played, stop playback. If not, deselects all words/phonemes/selection areas
PGUP/PGDN
Moves the keyboard focus either to the word area (PGUP) or the phoneme area (PGDN). The current focus area is shown by a light green bar along the top or bottom edge of the word or phoneme display. Clicking/manipulating words or phonemes will set the focus appropriately.
/ arrow
The right/left arrows move and select the next or previous word or phoneme. For phonemes, the arrows cycle within a word.
Tab / Shift + Tab
You can change words at any time by using the TAB key.
Shift + ARROW KEY
Move the selected word/phoneme to right or left
CTRL + ARROW KEY
Resize end position of selected word phoneme
INSERT / Shift + INSERT
Insert a new word to right/left of selected word/phoneme
DELETE
Delete selected word(s) (which deletes all phonemes of the word, too) or delete selected phoneme(s).
or CTRL+RETURN
Edit the selected word or phoneme.
CTRL+Z
Undo
CTRL+Y
Redo
SPACE
Play selection or entire wav file.

Phoneme Emphasis Editing

Faceposer phoneme emphasis tool.jpg


By clicking on the "Emphasis" tab with a .wav loaded, you'll see most of the view grayed out but there will now be a work area with a blue line at the center of the screen. You can create an emphasis spline by laying down points using the CTRL key and left-clicking on points in the work area.

Once you have placed points, you can select them (shown in red) by dragging a rectangle around the desired points with the mouse. To move the points, just left-click on one or more selected points and move the mouse. If you right-click in the work area, there are various options for selecting/deselecting all points and for undo/redo of editing changes.

The emphasis track scales the intensity of phonemes during playback. For certain phonemes, you may want to author a "weak" and "strong" version and add these to the "phonemes_weak" and "phonemes_strong" expression class files. Note that Valve did not actually use this feature in shipping HL2 (but in theory, it should work).

The blue center line is normal emphasis of the phonemes in the "phonemes" class. As the line goes to the top, the amount of the phoneme from phonemes is faded out and the phoneme from "phonemes_strong" is faded in. If a phoneme doesn't have strong or weak override, then the absolute scale for emphasis is appropriately clamped.