Task #4131
closedTask #3672: RA1d - Automatic cleaning of speech corpora
Task #3690: Annotation error detection
Task #3899: Submit a paper on anomaly-based annottaion errors detection (Jimp)
Task #4128: Final listening test based evaluation of annotation error detection
Select utterances containing units with detected annotation errors
0%
Description
- Analyze logs of synthesized utterances and grep utterances that contain unit(s) with detected annotation error
- Sort the filtered utterances according to the number of units with detected error
- Select the following utterances:
- 20 utterances with the most units that contain an error
- 20 utterances with just one unit containing an error
- 20 utterances with something in between (depending on the result of logging)
- Store texts and waveforms of the selected utterances
Files
Related issues
Updated by Matoušek Jindřich about 8 years ago
- Blocked by Task #4130: Prepare words with detected annotation errors added
Updated by Matoušek Jindřich about 8 years ago
- Blocked by Task #4129: Synthesize & log a large portion of text by TTS system with annotation errors added
Updated by Matoušek Jindřich about 8 years ago
- Blocks Task #4132: Synthesize the selected utterances by TTS system with/without the annotation errors added
Updated by Matoušek Jindřich about 8 years ago
Updated by Tihelka Dan about 8 years ago
- File annot-errors.detected.with_stats.txt annot-errors.detected.with_stats.txt added
- Status changed from New to Resolved
- Assignee changed from Tihelka Dan to Matoušek Jindřich
In annot-errors.detected.with_stats.txt, the words detected as containing annotation errors (in the file ...) were extended with the number representing how many units from the words were used during the synthesis of the large tests (see #4129). The list of words was sorted according to the number of selected units.
Updated by Matoušek Jindřich about 8 years ago
- Status changed from Resolved to Feedback
- Assignee changed from Matoušek Jindřich to Tihelka Dan
The absolute numbers of units are fine but it might be better to specify also the average number of units (containing annotation errors) per synthetic phrase.
Updated by Tihelka Dan about 8 years ago
- W is the number of selections from the given word and
- P is the number of phrases it was used in.
The second column in annot-errors.detected.with_rels.txt corresponds to the first column in annot-errors.detected.with_stats.txt.
Updated by Tihelka Dan about 8 years ago
- Status changed from Feedback to Resolved
- Assignee changed from Tihelka Dan to Matoušek Jindřich
Updated by Matoušek Jindřich about 8 years ago
- File least_frequent.txt least_frequent.txt added
- File mean_frequent.txt mean_frequent.txt added
- File most_frequent.txt most_frequent.txt added
- Status changed from Resolved to Closed
- 20 "most frequent" items (a combination of the most frequently selected units in absolute and relative numbers) -- most_frequent.txt
- 20 "least frequent" items (a combination of the least frequently selected units in absolute and relative numbers) -- least_frequent.txt
- 20 "mean frequent" items (a combination of the moderately frequently selected units in absolute and relative numbers) -- mean_frequent.txt