Phoneme Tool/data format

The phoneme editor embeds the following ASCII text block at the end of a .wav file:

PLAINTEXT
{
example sentence
}
WORDS
{
WORD example <start time> <end time>
{
<phoneme id> <phoneme name> <start time> <end time> 1
}
WORD sentence <as above>
{
<as above>
}
}
EMPHASIS
{
<time> <normalised value>
}
CLOSECAPTION
{
english
{
PHRASE unicode <size of text in *bytes*. Text has no nul-termination.> <Text, formatted in what seems to be either UCS-2 or UTF-16> <start time> <end time>
}
}
OPTIONS
{
voice_duck <1/0>
}

All sections are required, even if they are empty (as emphasis often is).

Todo: Purpose of the final "1" value of a phoneme.

Todo: As relates to the closed-captioning section...what are other valid language identifiers? Are there encodings other than "unicode" that are valid? Can we have multiple "phrases"? Is this method of closed-captioning deprecated or does it supersede the method described here, or do they exist alongside each other?

Phoneme IDs

95 <sil>
97 aa2
98 b
100 d
101 ey
102 f
103 g
104 hh
105 iy
106 y
107 c
108 l
109 m
110 n
111 ow
112 p
114 r2
115 s
116 t
117 uw
118 v
119 w
122 z
230 ae
240 dh
331 nx
593 aa
596 ao
601 ax
602 er
603 eh
604 ax2
605 er2
609 g2
614 hh2
616 ih2
618 ih
619 l2
633 r
635 r3
638 d2
643 sh
650 uh
652 ah
658 zh
676 jh
679 ch
952 th

Phoneme Tool/data format

Phoneme IDs

Navigation menu

Search