Project management of NTIS P1 Cybernetic Systems and Department of Cybernetics | WiKKY



Custom queries



Task #4131


Task #3672: RA1d - Automatic cleaning of speech corpora

Task #3690: Annotation error detection

Task #3899: Submit a paper on anomaly-based annottaion errors detection (Jimp)

Task #4128: Final listening test based evaluation of annotation error detection

Select utterances containing units with detected annotation errors

Added by Matoušek Jindřich about 8 years ago. Updated about 8 years ago.

Start date:
Due date:
% Done:


Estimated time:


  1. Analyze logs of synthesized utterances and grep utterances that contain unit(s) with detected annotation error
  2. Sort the filtered utterances according to the number of units with detected error
  3. Select the following utterances:
    1. 20 utterances with the most units that contain an error
    2. 20 utterances with just one unit containing an error
    3. 20 utterances with something in between (depending on the result of logging)
  4. Store texts and waveforms of the selected utterances


annot-errors.detected.with_stats.txt (13.9 KB) annot-errors.detected.with_stats.txt Tihelka Dan, 09.01.2017 13:37
annot-errors.detected.with_rels.txt (17.3 KB) annot-errors.detected.with_rels.txt Tihelka Dan, 10.01.2017 14:06
least_frequent.txt (1000 Bytes) least_frequent.txt Matoušek Jindřich, 10.01.2017 15:26
mean_frequent.txt (954 Bytes) mean_frequent.txt Matoušek Jindřich, 10.01.2017 15:26
most_frequent.txt (1015 Bytes) most_frequent.txt Matoušek Jindřich, 10.01.2017 15:26

Related issues

Blocked by HQSYN16 - Task #4130: Prepare words with detected annotation errorsClosedMatoušek Jindřich03.01.201706.01.2017

Blocked by HQSYN16 - Task #4129: Synthesize & log a large portion of text by TTS system with annotation errorsClosedTihelka Dan03.01.201706.01.2017

Blocks HQSYN16 - Task #4132: Synthesize the selected utterances by TTS system with/without the annotation errorsClosedTihelka Dan03.01.201713.01.2017

Actions #1

Updated by Matoušek Jindřich about 8 years ago

  • Blocked by Task #4130: Prepare words with detected annotation errors added
Actions #2

Updated by Matoušek Jindřich about 8 years ago

  • Blocked by Task #4129: Synthesize & log a large portion of text by TTS system with annotation errors added
Actions #3

Updated by Matoušek Jindřich about 8 years ago

  • Blocks Task #4132: Synthesize the selected utterances by TTS system with/without the annotation errors added
Actions #4

Updated by Matoušek Jindřich about 8 years ago

Words detected as containing annotation errors (and being really misannotated) are attached here (#4130).

Actions #5

Updated by Tihelka Dan about 8 years ago

In annot-errors.detected.with_stats.txt, the words detected as containing annotation errors (in the file ...) were extended with the number representing how many units from the words were used during the synthesis of the large tests (see #4129). The list of words was sorted according to the number of selected units.

Actions #6

Updated by Matoušek Jindřich about 8 years ago

  • Status changed from Resolved to Feedback
  • Assignee changed from Matoušek Jindřich to Tihelka Dan

The absolute numbers of units are fine but it might be better to specify also the average number of units (containing annotation errors) per synthetic phrase.

Actions #7

Updated by Tihelka Dan about 8 years ago

Attachment annot-errors.detected.with_rels.txt contains the similar statistics as annot-errors.detected.with_stats.txt, with the difference that the first number represents W/(P +1), where:
  • W is the number of selections from the given word and
  • P is the number of phrases it was used in.

The second column in annot-errors.detected.with_rels.txt corresponds to the first column in annot-errors.detected.with_stats.txt.

Actions #8

Updated by Tihelka Dan about 8 years ago

  • Status changed from Feedback to Resolved
  • Assignee changed from Tihelka Dan to Matoušek Jindřich
Actions #9

Updated by Matoušek Jindřich about 8 years ago

The following items selected:
  • 20 "most frequent" items (a combination of the most frequently selected units in absolute and relative numbers) -- most_frequent.txt
  • 20 "least frequent" items (a combination of the least frequently selected units in absolute and relative numbers) -- least_frequent.txt
  • 20 "mean frequent" items (a combination of the moderately frequently selected units in absolute and relative numbers) -- mean_frequent.txt

Also available in: Atom PDF