Authors: Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Géza Németh
Datasets
- CSTR VCTK Corpus
- 8 speakers: 4 Male * 4 Female
- 12 experiments: cross-gender and intra-gender VC
Reference system: It is based on [Hirokazu Kameoka et al., 2018] which outperforms the variational autoencoding GAN system.
1) Female to Male
p229 (Female) → p232 (Male)
Source speech (natural) |
Target speech (natural) |
|
|
Input speech |
Converted speech (Reference) |
Converted speech (Proposed) |
|
|
|
p262 (Female) → p272 (Male)
Source speech (natural) |
Target speech (natural) |
|
|
Input speech |
Converted speech (Reference) |
Converted speech (Proposed) |
|
|
|
p293 (Female) → p292 (Male)
Source speech (natural) |
Target speech (natural) |
|
|
Input speech |
Converted speech (Reference) |
Converted speech (Proposed) |
|
|
|
p361 (Female) → p360 (Male)
Source speech (natural) |
Target speech (natural) |
|
|
Input speech |
Converted speech (Reference) |
Converted speech (Proposed) |
|
|
|
2) Male to Female
p232 (Male) → p229 (Female)
Source speech (natural) |
Target speech (natural) |
|
|
Input speech |
Converted speech (Reference) |
Converted speech (Proposed) |
|
|
|
p272 (Male) → p262 (Female)
Source speech (natural) |
Target speech (natural) |
|
|
Input speech |
Converted speech (Reference) |
Converted speech (Proposed) |
|
|
|
p292 (Male) → p293 (Female)
Source speech (natural) |
Target speech (natural) |
|
|
Input speech |
Converted speech (Reference) |
Converted speech (Proposed) |
|
|
|
p360 (Male) → p361 (Female)
Source speech (natural) |
Target speech (natural) |
|
|
Input speech |
Converted speech (Reference) |
Converted speech (Proposed) |
|
|
|
3) Female to Female
p262 (Female) → p293 (Female)
Source speech (natural) |
Target speech (natural) |
|
|
Input speech |
Converted speech (Reference) |
Converted speech (Proposed) |
|
|
|
p229 (Female) → p262 (Female)
Source speech (natural) |
Target speech (natural) |
|
|
Input speech |
Converted speech (Reference) |
Converted speech (Proposed) |
|
|
|
4) Male to Male
p272 (Male) → p292 (Male)
Source speech (natural) |
Target speech (natural) |
|
|
Input speech |
Converted speech (Reference) |
Converted speech (Proposed) |
|
|
|
p232 (Male) → p272 (Male)
Source speech (natural) |
Target speech (natural) |
|
|
Input speech |
Converted speech (Reference) |
Converted speech (Proposed) |
|
|
|