Phoneme Tool/data format

The phoneme editor embeds the following ASCII text block at the end of a .wav file:

VERSION 1.0
PLAINTEXT
{
example sentence
}
WORDS
{
WORD example <start time> <end time>
{
<phoneme id> <phoneme name> <start time> <end time> <volume. Unused, always 1)>
}
WORD sentence <as above>
{
<as above>
}
}
EMPHASIS
{
<time> <normalised value>
}
CLOSECAPTION
{
english
{
PHRASE unicode <size of text in *bytes*. Text has no nul-termination.> <Text, formatted in what seems to be either UCS-2 or UTF-16> <start time> <end time>
}
}
OPTIONS
{
voice_duck <1/0>
}

All sections are required, even if they are empty (as emphasis often is).

Todo: As relates to the closed-captioning section...what are other valid language identifiers? Are there encodings other than "unicode" that are valid? Can we have multiple "phrases"? Is this method of closed-captioning deprecated or does it supersede the method described here, or do they exist alongside each other?

VDAT chunk

As WAV is a chunk-based file format derived from RIFF, WAV files containing phoneme data use a custom chunk with the type VDAT to store phoneme data. The VDAT chunk consists of the four ASCII characters VDAT (56 44 41 54 in hexadecimal), followed by four bytes describing the length of the chunk, excluding the eight identifier and length bytes. All data is encoded in little endian. After that, the above plaintext block is appended.

Todo: Does the Phoneme Editor use the VDAT chunk or does that exist only in audio shipped with the games by Valve? Does Source still recognise phoneme data that doesn't use the VDAT chunk?

Phoneme IDs

95 <sil>
97 aa2
98 b
100 d
101 ey
102 f
103 g
104 hh
105 iy
106 y
107 c
108 l
109 m
110 n
111 ow
112 p
114 r2
115 s
116 t
117 uw
118 v
119 w
122 z
230 ae
240 dh
331 nx
593 aa
596 ao
601 ax
602 er
603 eh
604 ax2
605 er2
609 g2
614 hh2
616 ih2
618 ih
619 l2
633 r
635 r3
638 d2
643 sh
650 uh
652 ah
658 zh
676 jh
679 ch
952 th

Phoneme Tool/data format

VDAT chunk

Phoneme IDs

Navigation menu

Search