The lesson we should be taking from LLMs is the immense social value there is in having all kinds of material – all kinds of products of human intellectual labor – freely available online
J. W. Mason is Associate Professor of Economics at John Jay College, City University of New York and a Fellow at the Roosevelt Institute
Cross-posted from J. W. Mason’s Blog
Creative Commons Attribution-Share Alike 2.0 Generic license
I wanted to put down some thoughts on Large Language Models (LLMs), or so-called artificial intelligence. I apologize that this post is not going to include any links or quotes or data. It’s just an effort to work something out in my own head – something that I don’t feel – tho it’s very likely I’ve overlooked it – has been spelled out in the discussion anywhere else.
It’s a point that is, on one level, obvious, but one that I feel does not get sufficiently foregrounded: LLMs are, as the name says, language models. Given a corpus of text, they create a set of probabilities such that, for any given input, you can calculate the probability that, following a certain input, a certain word should come next. They are, in other words, tools for transforming material that people have put up on the internet.
On one level, again, everyone knows this. It’s what critics mean when they call these programs“stochastic parrots.” It’s what the companies that make them are thinking about when they talk about the problem of training data. But I don’t think we think about it enough when we think about what these things actually are.
We’ve been primed by generations of science fiction stories to imagine machines that think, think well or poorly, helpfully or malevolently. But maybe we would have a better understanding of LLMs — of what they do well as well as what they do poorly or not at all – if we thought of them not as thinking machines but as windows: windows onto the thinking that people have already done.
There is no thinking going on when you enter a query into ChatGPT, in the sense of an abstract model of the world that can be manipulated and then expressed in words. With the LLM, the words are all there are. The reason an LLM can answer a factual question is because someone has posted text on that specific question. The reason they make nice pictures is because there are an immense number of pictures on the web, with descriptive text attached. The reason they are such good coding buddies is because people have posted immense numbers of code snippets (and also because code is so nicely grammatical.)
If you’re impressed that an LLM can give you a stat block for your DnD campaign (one genuinely positive use case I’ve seen) or answers to your economics homework or text for a form letter, what you should really be impressed by is that so many people have posted versions of exactly that over the past 30 years.
People talk about the software and the chips. And sure, it does need a whole lot of chips. But the real secret is that people have posted this immense amount of useful text on the web, for free. That’s where the magic comes from.
OK, they didn’t post it all for free – a lot of it was produced for money. But none of the text that LLMs draw on was produced for sale to LLMs. All of it is free from their point of view. What they are drawing on is the positive externalities of people communicating with each other, for their own reasons, on the web. What LLMs are doing, fundamentally, is reaping the benefits of a vast spontaneous, directly social, decommodified decentralized production of use values.
When we look at the useful stuff that LLMs give us, we should not think, how cool this technology is. We should think, what an amazing range of useful work people are willing to share online, freely, without any monetary compensation. The machine is the least interesting part. It’s just summarizing it for us.
What makes LLMs work as a business is precisely that all this text is decommodified, as far as they’re concerned, it’s free. As they themselves say, they’d have to shut down if they had to pay for their training data. Yet all that data is the product of human labor. This cutting edge of capitalism – the biggest part of new business investment – rests on a substrate of communism.
People who criticize OpenAI and the rest of these companies for not adhering to copyrights are completely correct about their hypocrisy, and about the inconsistent application of the law. But they mostly get the correct resolution backward, in my opinion. Where we want to get to is a world where information is free for everyone, not one where OpenAI and company also respect the gates. You might ask: why does that follow? To which I would say: LLMs themselves demonstrate the value of making content, in the broadest sense, universally available for free.
The lesson we should be taking from LLMs is the immense social value there is in having all kinds of material – all kinds of products of human intellectual labor – freely available online. They should be reminding us of the early utopian promise of the web.
But now we must turn this around. The other side, of course – of course! – is that the companies making LLMs are not doing so with the goal of more easily sharing the material that people have made freely available on the web. They are doing so with the goal of enclosing it, of converting the products of free human activity into commodities.
The problem we have to deal with is that these companies are selling access to the freely shared products of human social activity, as the product of their own particular capitals. (And also that they are encouraging people to use it for dumb or pointless or socially destructive purposes.)
Worse: The project of enclosing and commodifying the world of online communication destroys what made it valuable in the first place. It’s the opposite of the tragedy of the commons – as if the villagers’ animals grazing on the green were what fertilized it and made it valuable in the first place. This case, where joint use of common resources maintains rather than degrades them, is, I suspect, the more usual one in traditional farming and pastoral communities. In any case, it certainly applies to the information commons – private appropriation is incompatible with collective activity that maintains them. Can’t expect people to keep posting on Reddit if all they hear back is AI slop.
Still , I think it’s important – especially for those of us who are deeply skeptical of “AI” as a business – to not lose sight of the genuinely positive and transformative aspect of this technology: the window it gives onto the possibilities of free, decommodified cooperative production.
The great debate going forward is not about this specific technology. (Though it is, to be clear, about its enormous energy demands. The real question is the about the conditions under which people will continue to be able to share the products of intellectual work with each other on the web. The issue is not what “AI” will or will not do. The issue is how we can take advantage for the tremendous opportunities for sharing the products of actual human intelligence, which were opened up by the internet, but have been increasingly closed off by its commercial overlords.

 
		
Be the first to comment