Skip to the content.

UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation

Anonymous submission to ICLR 2025

Audio Samples

!!! Wearing headphones is strongly recommended to judge the audio quality !!!

Speech Tokenization & Resynthesis

Samples are randomly selected from LibriSpeech test-clean subset.
         
Ground Truth
256k bps
SpeechTokenizer
500 bps
HuBERT + Unit-HiFiGAN
500 bps
UniWav
500 bps
SpeechTokenizer
1k bps
UniWav
1k bps

In-context Text-to-Speech

Samples are randomly selected from LibriSpeech test-clean subset. First 3 seconds of the ground truth are used as audio prompt.
Text UniWav Ground Truth
on arriving at home at my own residence i found that our salon was filled with a brilliant company
at the inception of plural marriage among the latter day saints there was no law national or state against its practise
we are losing time and the fact is i have not come all this way to take a little sail upon a pond on a raft
it was the first great sorrow of his life it was not so much the loss of the cotton itself but the fantasy the hopes the dreams built around it
for some years it was not found feasible to operate motors on alternating current circuits and that reason was often urged against it seriously