Phoneme Tool/data format
Jump to navigation
Jump to search
The phoneme editor embeds the following ASCII text block at the end of a .wav file:
PLAINTEXT
{
example sentence
}
WORDS
{
WORD example <start time> <end time>
{
<phoneme id> <phoneme name> <start time> <end time> 1
}
WORD sentence <as above>
{
<as above>
}
}
EMPHASIS
{
<time> <normalised value>
}
CLOSECAPTION
{
english
{
PHRASE unicode <size of text in *bytes*. Text has no nul-termination.> <Text, formatted in what seems to be either UCS-2 or UTF-16> <start time> <end time>
}
}
OPTIONS
{
voice_duck <1/0>
}
All sections are required, even if they are empty (as emphasis
often is).
Todo: Purpose of the final "1" value of a phoneme.
Todo: As relates to the closed-captioning section...what are other valid language identifiers? Are there encodings other than "unicode" that are valid? Can we have multiple "phrases"? Is this method of closed-captioning deprecated or does it supersede the method described here, or do they exist alongside each other?
Phoneme IDs
- 95 <sil>
- 97 aa2
- 98 b
- 100 d
- 101 ey
- 102 f
- 103 g
- 104 hh
- 105 iy
- 106 y
- 107 c
- 108 l
- 109 m
- 110 n
- 111 ow
- 112 p
- 114 r2
- 115 s
- 116 t
- 117 uw
- 118 v
- 119 w
- 122 z
- 230 ae
- 240 dh
- 331 nx
- 593 aa
- 596 ao
- 601 ax
- 602 er
- 603 eh
- 604 ax2
- 605 er2
- 609 g2
- 614 hh2
- 616 ih2
- 618 ih
- 619 l2
- 633 r
- 635 r3
- 638 d2
- 643 sh
- 650 uh
- 652 ah
- 658 zh
- 676 jh
- 679 ch
- 952 th