Authors: Mohammed Salah Al-Radhi , Omnia Ibrahim, Ali Raheem Mandeel, Tamás Gábor Csapó, Géza Németh
Inference: pretrained model
Fast Arabic Single-Speaker TTS
Phoneme Sequence #1: "lakin~a diraAsatahumo - >a^abotato >an~a Alomu$okilapa bimiSora - layosato faqaTo fiy kam~iy~api AlT~aEaAmi"
Natural
Tacotron_WaveGlow
FastSp2_HiFi
FastSp2_PWG
Phoneme Sequence #2: "yasotaDiyfu maEohadu AloEaAlami AloEarabiy~i fiy baAriysa - maEoriDAF biEunowaAni - kaAna yaA makaAn - qiTaAru Al$~aroqi Als~ariyEu"
Natural
Tacotron_WaveGlow
FastSp2_HiFi
FastSp2_PWG
Phoneme Sequence #3: "AloHimoDiy~aAtu gany~apN bimukawonaAtK SiH~iy~apK lijisomi AloinosaAni"
Natural
Tacotron_WaveGlow
FastSp2_HiFi
FastSp2_PWG
Phoneme Sequence #4: "watu&ak~idu EaAlimapu Aln~afosi >an~a Alo>asobaAba AlomanoTiqiy~apa AlomuHaf~izapa EalaY mumaArasapi Alr~iy~aADapi - laA takofiy waHodahaA"
Natural
Tacotron_WaveGlow
FastSp2_HiFi
FastSp2_PWG
Phoneme Sequence #5: "ak~ada AlokaAtibu waAln~aAqidu"
Natural
Tacotron_WaveGlow
FastSp2_HiFi
FastSp2_PWG
Visualization
The image below demonstrates spectrogram visualization and pitch contours extracted from synthesized speech samples. Top-left: Ground-truth; Top-right: Tacotron2; Bottom-right: FastSp2-HiFi; and Bottom-left: Developed FastSp2-PWG.