Introducing Whisper

Different current approaches incessantly use smaller, extra intently paired audio-text coaching datasets,[^reference-1] [^reference-2][^reference-3] or use broad however unsupervised audio pretraining.[^reference-4][^reference-5][^reference-6] As a result of Whisper was educated on a big and numerous dataset and was not fine-tuned to any particular one, it doesn’t beat fashions focusing on LibriSpeech efficiency, a famously aggressive benchmark in speech recognition. Nonetheless, once we measure Whisper’s zero-shot efficiency throughout many numerous datasets we discover it’s far more sturdy and makes 50% fewer errors than these fashions.

A couple of third of Whisper’s audio dataset is non-English, and it’s alternately given the duty of transcribing within the unique language or translating to English. We discover this method is especially efficient at studying speech to textual content translation and outperforms the supervised SOTA on CoVoST2 to English translation zero-shot.

Date: 2022-09-21 03:00:00

Source link



Related articles

Alina A, Toronto
Alina A, Toronto
Alina A, an UofT graduate & Google Certified Cyber Security analyst, currently based in Toronto, Canada. She is passionate for Research and to write about Cyber-security related issues, trends and concerns in an emerging digital world.


Please enter your comment!
Please enter your name here