The ugly truth about ‘open’ A.I. models championed by Meta, Google, and other Big Tech players

Fortune· Piaras Ó Mídheach—Sportsfile for Web Summit Rio/Getty Images
In this article:

Hello and welcome back to Eye on A.I. In a paper published this past week, researchers from Carnegie Mellon University and the AI Now Institute, along with Signal Foundation President Meredith Whittaker, dove deep into what exactly is—and is not—open about current “open” A.I. systems.

From Meta’s LLaMA-2 to OpenAI’s various models, many of the A.I. technologies being released are touted by their corporate creators as “open” or “open source," but the authors argue many of them aren’t so open after all and that these terms are used in confusing and diverse ways that have more to do with aspiration and marketing than as a technical descriptor. The authors also interrogate how, due to the vast differences between large A.I. systems and traditional software, even the most maximally “open” A.I. offerings do not ensure a level playing field or facilitate the democratization of A.I.; in fact, large companies have a clear playbook for using their open A.I. offerings to leverage the benefits of owning the ecosystem and capture the industry.

“Over the past months, we’ve seen a wave of A.I. systems described as ‘open’ in an attempt at branding, even though the authors and stewards of these systems provide little meaningful access or transparency about the system,” the authors told Eye on A.I., adding that these companies claim “openness” while not disclosing key features of their A.I. systems—from model size and training weights to basic information about the training data used.

The paper comes amid a growing conversation about the reality of open-source in the A.I. world, from recent opinion pieces calling out supposedly open-source A.I. systems for not actually being so, to backlash from Hugging Face users who were disappointed when the license for one of the company’s open-source projects was changed after the fact.

In the paper, the researchers break down their findings by category including developmental frameworks, compute, data, labor, and models. Looking at LLaMA-2, for one model example, the authors call Meta’s claims that the model is open-source “contested, shallow, and borderline dishonest,” pointing to how it fails to meet key criteria that would enable it to be conventionally considered open-source, such as that its license was written by Meta and isn’t recognized by the Open Source Initiative.

The discussion around how mass utilization of corporate giants’ A.I. systems further entrenches their ownership over the entire landscape—in turn, chipping away at openness and giving them immense indirect power—is a crucial point of the paper. In evaluating Meta’s PyTorch and Google’s TensorFlow, the two dominant A.I. developmental frameworks, the authors cite how these frameworks do speed up the deployment process for those who use them, but to the massive benefit of Meta and Google.

“Most significantly, they allow Meta, Google, and those steering framework development to standardize AI construction so it’s compatible with their own company platforms—ensuring that their framework leads developers to create AI systems that, Lego-like, snap into place with their own company systems,” reads the paper. The authors continue that this enables these companies to create onramps for profitable compute offerings and also shapes the work of researchers and developers.

The takeaway is that, in A.I., labels like “open source” are not necessarily fact but rather language chosen by executives at powerful companies whose goals are to proliferate their technologies, capture the market, and boost their revenue.

And the stakes are high as these companies integrate A.I. into more of our world and governments rush to regulate them. In addition to seeing the recent proliferation of not-so-open “open” A.I. efforts, the authors said it was the lobbying by these companies that prompted them to undertake this research.

“What really set things off was observing the significant level of lobbying coming from industry players—like the Business Software Association, Google, and Microsoft’s GitHub—to seek exemption under the EU AI Act,” the authors said. “This was curious, given that these were the same companies that would, according to much of the rhetoric espousing ‘open’ AI’s benefits, be ‘disrupted’ were ‘open’ AI to proliferate.”

Overall, it’s not just about the muddiness and lack of definition around terms like “open” and “open source,” but rather how it’s being used (or misused) by companies and how it can influence the laws that will guide this field and everything it touches going forward. Not to mention, these are some of the same companies that are currently being sued for stealing the data that made these very technologies possible.

“‘Open’ AI has emerged as a ‘rhetorical wand’ that, due to its ill-defined nature, allows it to mean many things to many people, which is useful in the context of fierce high-stakes regulatory debates,” the authors said.

Sage Lazzaro
sage.lazzaro@fortune.com
sagelazzaro.com


Programming note: Gain vital insights on how the most powerful and far-reaching technology of our time is changing businesses, transforming society, and impacting our future. Join us in San Francisco on Dec. 11–12 for Fortune’s third annual Brainstorm A.I. conference. Confirmed speakers include such A.I. luminaries as PayPal’s John Kim, Salesforce AI CEO Clara Shih, IBM’s Christina Montgomery, Quizlet’s CEO Lex Bayer, and more. Apply to attend today!

This story was originally featured on Fortune.com

More from Fortune:
5 side hustles where you may earn over $20,000 per year—all while working from home
Looking to make extra cash? This CD has a 5.15% APY right now
Buying a house? Here's how much to save
This is how much money you need to earn annually to comfortably buy a $600,000 home

Advertisement