OpenAI's Sora pours 'cold water' on China's AI dreams, as text-to-video advancements prompt more soul-searching

In this article:

OpenAI's recent text-to-video model Sora has fired a fresh warning shot to China about its gap with the world's top artificial intelligence (AI) technologies, triggering questions about why the country has no equivalent product in an echo of the kind of soul-searching local researchers and investors went through after the 2022 launch of ChatGPT.

Just a few years ago, China had envisioned itself eventually dominating the global AI race by leveraging the country's vast troves of data to develop mature applications for functions like facial recognition. Recent developments in generative AI - which uses large models to produce content like text, images and video - have changed the calculus, making China look like a laggard once again.

Sora, launched on February 16, moves the AI battle into the realm of video generation just as China is facing greater challenges from a lack of access to key tools such as advanced graphics processing units (GPUs) developed by Nvidia, the leading AI chip designer, owing to escalating US export restrictions. The country's best AI players are already a number of years behind their American peers in generative AI, an area in which Beijing's self-trumpeted internet governance model looks like a liability.

Do you have questions about the biggest topics and trends from around the world? Get the answers with SCMP Knowledge, our new platform of curated content with explainers, FAQs, analyses and infographics brought to you by our award-winning team.

Zhou Hongyi, the founder of Chinese internet security firm 360 Security Technology, which has joined China's race to launch its own ChatGPT-style large language model, said the introduction of Sora was like a "barrel of cold water poured down China's head", Chinese media Yicai reported on Friday. "It cools down the heads of many people, forcing us to see the gap with leaders overseas," he added.

In one knee-jerk response to Sora this week, Beijing asked its most trusted state-owned enterprises to take a lead on AI. The State Council's State-owned Assets Supervision and Administration Commission on Monday urged firms under direct control of the central government to "embrace the profound changes brought about by AI". Ten of these firms were designated as champions to promote AI, but the watchdog did not name the selected companies.

Xie Saining, an assistant professor of Computer Science at Courant Institute of Mathematical Sciences at New York University, denied he was involved in the development of Sora and emphasised the importance of talent, data and computing power. In a widely reported social media post, Xie asked whether China is ready for Sora, saying the country should make sure the technology "won't be abused to serve as a profiteering and manipulation tool by some people or groups".

Sora's access is currently limited. Unlike some of OpenAI's earlier models, it is not open source, and only a small number of people have access to a trial of the model.

In mainland China, the national Cyberspace Administration requires all publicly available large language models (LLMs) to be registered with the authority. OpenAI does not make its services directly available in the mainland or Hong Kong, nor does Google make its Gemini AI product available in those markets. Microsoft's Copilot, which uses OpenAI's GPT models, is available in Hong Kong.

The absence of foreign players in the mainland has left several local tech giants jostling for position in a crowded market of over 200 LLMs. Chinese search giant Baidu, social media behemoth Tencent Holdings, and e-commerce king Alibaba Group Holding, which owns the South China Morning Post, have all unveiled their own LLMs.

Few are able to match Sora, however, partly because they are not yet using the novel Diffusion Transformer (DiT) architecture.

ByteDance, the Beijing-based owner of TikTok, said its in-house video motion control tool Boximator, used to assist video generation, is still in infancy and not ready for mass release. "It still has a big gap with leading video generation models in terms of image quality, fidelity and duration."

Rather than matching Sora, however, some industry insiders see the more pressing issue as gaining access to OpenAI's model. Beijing-based Sinodata said it will be one of the first companies to apply for a Sora API subscription once the text-to-video tool becomes available on Azure, the cloud computing platform of Microsoft, which is OpenAI's biggest backer.

In the US, though, lawmakers are already looking at ways of curbing China's access to American AI cloud services.

Meanwhile, London-based unicorn Stability AI released its text-to-image model Stable Diffusion 3, which also uses DiT, as the architecture might become mainstream for building generative AI following the popularity of Sora. A Chinese developer, who declined to be named, said a likely path for Chinese AI engineers is to "first decode Sora and train it with their own data to churn out a similar product".

Xu Liang, an AI entrepreneur based in Hangzhou, eastern Zhejiang province, said it will not be long before China has similar services. "As soon as in the next one or two months, there will be Sora-like models coming out of the Chinese market and plenty in the next half year," he said. But Xu noted that there could still be a non-negligible gap between Chinese products and Sora.

Wang Shuyi, a professor who focuses on AI and machine learning at Tianjin Normal University (TJNU), said the experience of developing LLMs in the past year has allowed the Chinese Big Tech firms to build up their know-how in this area and stock up on necessary hardware, giving them the ability to produce Sora-like products in the next six months.

The Sora launch has brought speculation about the secret behind its impressive output. Xie, at New York University and one of two developers of DiT, tweeted that "data is likely the most critical factor for Sora's success". He estimated that Sora might have around 3 billion parameters.

"If true, this is not an unreasonable model size," he wrote. "It could suggest that training the Sora model might not require as many GPUs as one would anticipate - I would expect very fast iterations going forward."

A few months before Sora was out, a group of researchers launched the VBench, a benchmarking tool for video generation models designed to evaluate performance of Runway's Gen-2 and Pika. Among 16 dimensions, Gen-2 stands out in areas including imaging quality and aesthetic quality, but it was weak in dynamic range and appearance style. Pika, co-founded by Chinese PhD candidate Guo Wenjing at Stanford University, is best at background consistency and temporal flickering but needs improvements in imaging quality.

The VBench team, consisting of researchers from Singapore's Nanyang Technological University and Shanghai Artificial Intelligence Laboratory in China, found that Sora excels in overall video quality when compared with other models, based on the demos provided by OpenAI. There is limited information on how the model transforms text prompts into videos.

Baidu chairman and CEO Robin Li Yanhong discusses the company's Ernie Bot during the Baidu World conference in Beijing on October 17, 2023. Photo: Bloomberg alt=Baidu chairman and CEO Robin Li Yanhong discusses the company's Ernie Bot during the Baidu World conference in Beijing on October 17, 2023. Photo: Bloomberg>

Lu Yanxia, research director for IDC China's research on emerging technology, said tech giants such as Baidu, Alibaba and Tencent will be among the first to roll out similar services in the country. Local AI players iFlyTek, SenseTime and Hikvision - all sanctioned by Washington - will also be in the race, she said.

But China still faces an uphill battle, as the country's tech market becomes increasingly walled off from the world in terms of capital, hardware, data and even people, according to analysts.

The market value gap between China's top tech firms compared with those in the US such as Microsoft, Google and Nvidia has widened significantly in recent years since Beijing decided to kneecap its tech giants in the name of reining in the "irrational expansion of capital".

And while China was once seen as having an advantage in its quantity of data, Lu said the country now faces a scarcity of quality data needed to train these newer models, compounding challenges from its limited access to advanced chips. A lack of talent is another concern, according to Lu, as the country's best and brightest in AI often find it easier to shine working for leading players in the US.

At OpenAI, for instance, tech professionals with an educational background from China form a key group. Among OpenAI's 1,677 associated members on LinkedIn, 23 of them studied at China's Tsinghua University, the ninth most common tertiary education institution among the start-up's employees, beating out the University of Cambridge and Yale University.

Stanford University, the University of California, Berkeley, and the Massachusetts Institute of Technology are the top three institutions among OpenAI workers, with 88, 80 and 59 employees, respectively, listing those schools on their LinkedIn profiles.

Even with the requisite talent, though, experts question how far China's home-grown generative AI can go while facing existing constraints from US-China trade tensions.

Ping An Securities warned in a report that continued semiconductor export restrictions from the US "may accelerate the maturity of the domestic AI chip industry", but "home-grown alternatives may fall short of expectations".

Washington has blocked Chinese companies from accessing the world's most advanced semiconductor tools through restrictions on related products that include any US-origin technology. In October, the US again tightened those restrictions, blocking the mainland's access to GPUs that Nvidia had specifically designed for Chinese clients in response to earlier curbs.

Alexander Harrowell, principal analyst for advanced computing at technology research and advisory group Omdia, noted that China has options beyond GPUs for training LLMs. "You could use Google's TPU [Tensor Processing Unit], Huawei's Ascend, AWS's Trainium, or one of quite a few start-ups' products," he said.

But replacing GPUs comes at a cost. "The further you go from the GPU route, the more effort it will cost you in software development and systems administration," Harrowell said.

There will also be opportunities specifically for the China market, according to Xu, the Hangzhou-based entrepreneur. "With the publication of the technical report on Sora, and upcoming open-source video models, there will be groundwork for the Chinese players to learn from," he said. Local video models will have better support for the Chinese language, he added.

TJNU's Wang noted that one of the Sora demo videos involves a scene of a dancing Chinese dragon, which he found to be a stereotypical depiction of the activity. China's numerous ethnic groups, folk traditions, customs, and geographic diversity offer a wealth of material for local video models to draw from to better cater to local users, he said.

Wang also balked at the idea that there is an "insurmountable divide" between Chinese and American AI.

"Would Chinese companies rather just follow suit and crank out rip-offs every time their US peers come up with a novel product, or would they rather set a bigger goal to strive for safe artificial general intelligence?" Wang asked.

This article originally appeared in the South China Morning Post (SCMP), the most authoritative voice reporting on China and Asia for more than a century. For more SCMP stories, please explore the SCMP app or visit the SCMP's Facebook and Twitter pages. Copyright © 2024 South China Morning Post Publishers Ltd. All rights reserved.

Copyright (c) 2024. South China Morning Post Publishers Ltd. All rights reserved.

Advertisement