Task #4131
Task #3672: RA1d - Automatic cleaning of speech corpora
Task #3690: Annotation error detection
Task #3899: Submit a paper on anomaly-based annottaion errors detection (Jimp)
Task #4128: Final listening test based evaluation of annotation error detection
Select utterances containing units with detected annotation errors
Added by Matoušek Jindřich about 8 years ago.
Updated about 8 years ago.
- Analyze logs of synthesized utterances and grep utterances that contain unit(s) with detected annotation error
- Sort the filtered utterances according to the number of units with detected error
- Select the following utterances:
- 20 utterances with the most units that contain an error
- 20 utterances with just one unit containing an error
- 20 utterances with something in between (depending on the result of logging)
- Store texts and waveforms of the selected utterances
- Blocked by Task #4130: Prepare words with detected annotation errors added
- Blocked by Task #4129: Synthesize & log a large portion of text by TTS system with annotation errors added
- Blocks Task #4132: Synthesize the selected utterances by TTS system with/without the annotation errors added
Words detected as containing annotation errors (and being really misannotated) are attached here (#4130).
In annot-errors.detected.with_stats.txt, the words detected as containing annotation errors (in the file ...) were extended with the number representing how many units from the words were used during the synthesis of the large tests (see #4129). The list of words was sorted according to the number of selected units.
- Status changed from Resolved to Feedback
- Assignee changed from Matoušek Jindřich to Tihelka Dan
The absolute numbers of units are fine but it might be better to specify also the average number of units (containing annotation errors) per synthetic phrase.
- Status changed from Feedback to Resolved
- Assignee changed from Tihelka Dan to Matoušek Jindřich
The following items selected:
- 20 "most frequent" items (a combination of the most frequently selected units in absolute and relative numbers) -- most_frequent.txt
- 20 "least frequent" items (a combination of the least frequently selected units in absolute and relative numbers) -- least_frequent.txt
- 20 "mean frequent" items (a combination of the moderately frequently selected units in absolute and relative numbers) -- mean_frequent.txt
Also available in: Atom