U.S. Markets open in 9 hrs 8 mins

iQIYI Holds World's First Low-Resource Voice Cloning Challenge to Accelerate Development of AI Voice Technology

  • Oops!
    Something went wrong.
    Please try again later.
·3 min read
  • Oops!
    Something went wrong.
    Please try again later.

BEIJING, Dec. 15, 2020 /PRNewswire/ -- iQIYI Inc. (NASDAQ: IQ) ("iQIYI" or the "Company"), an innovative market-leading online entertainment service in China, is pleased to announce that it has partnered with multiple organizations to hold a Multi-Speaker Multi-Style Voice Cloning Challenge (M2VoC) scheduled to run from 27 November 2020 to 11 February 2021.

M2VoC aims to enhance the quality of synthetic speech while reducing the dependence on the quantity and quality of training datasets. The Company hopes that participants can improve the intelligibility and naturalness of synthetic speech even under conditions in which there are limited resources.

iQIYI released detailed guidelines for M2VoC, the first low-resource voice cloning challenge in the world, on 27 November. Organized by a team of iQIYI experts and a number of organizations, the challenge is aimed to serve as a general dataset and a fair test platform that would facilitate the research of the voice cloning tasks.

As an ICASSP2021 Signal Processing Grand Challenge, M2VoC encourages researchers from academia and the computing industry to participate.

The competition is comprised of two categories, the 'few-shots' category and the 'one-shot' category. Target speakers for voice cloning validation and evaluation are provided for both categories.

In the few-shots category, each speaker has a different speaking style with 100 available samples. In the one-shot category, each speaker has a different speaking style with only 5 samples.

For both categories, contestants will be provided with two base datasets for base model training, with each dataset containing 5,000 different training samples of different speech styles.

The winners will be selected for each category based on a weighted value of four criteria: speaker similarity, speech quality, style/expressiveness and pronunciation accuracy.

As an innovative technology in the field of artificial intelligence (AI), speech synthesis is essential for creating a good interactive experience. As speech synthesis has valuable applications in areas such as voice assistants, broadcasting and audio books, it is a fast-growing field. The global market of speech recognition and speech-related technologies is projected to expand to $16 billion in the next seven to eight years, with a compound annual growth rate of 16%, according to market research firm Global Industry Analysts.

Thanks to deep learning, speech synthesis has been able to produce very realistic and natural-sounding speech in specific areas. However, the technology requires a large number of datasets and highly demanding recording conditions. As a result, technological advancement in the field has been hindered by the capital and time required for dataset creation. There is still much room for improvement in the expressiveness and robustness of synthetic speech with different speakers and various styles, especially in real-world or low-resource conditions. iQIYI hopes that M2VoC will help to address these issues and accelerate the development of AI voice technology.

The competition will also drive the development of cutting-edge technologies such as voice cloning and speech recognition, further broadening the application scope of AI and creating new opportunities in the audiovisual industry. Through this challenge, iQIYI hopes to team up with talented researchers and build solutions for low-resource voice cloning with advanced deep-learning technology and multi-stylistic voice morphing technology. The Company also anticipates that M2VoC will further elevate the interactive experience of video and drive the development and application of voice cloning technology.

In recent years, iQIYI has been leveraging AI to enable content creation, enhance users' entertainment experience and improve iQIYI's growing entertainment ecosystem. Currently, iQIYI's AI technology has been applied to a whole set of processes including content creation, production, distribution and commercialization. In the years ahead, iQIYI will continue to explore AI voice technology, unlocking its tremendous potential for use in the multi-media entertainment industry so that the Company can create a better audio-visual world for its users.