A legal and ethical battle is raging between AI companies scraping data and creators demanding compensation for their work.
John Walter, Founder of ProxyLink and President of the Contact Center AI Association, explains that the core issue isn't AI learning from content, but rather how it illicitly accesses data behind paywalls.
New tools from companies like Cloudflare aim to solve this by helping publishers enforce their perimeters, shifting the focus from policing AI output to controlling data access.
The internet is in the midst of a great rebalancing. On one side, AI companies are on a voracious, seemingly unstoppable quest for data to train their models. On the other, publishers and creators are watching decades of work get scraped and repurposed to build AI products, driving the growth of trillion-dollar companies without any compensation. The result is a legal and ethical quagmire, where the laws of yesterday are struggling to govern the technology of tomorrow.
Into this fray steps Cloudflare, the internet infrastructure giant, with a potential solution: a new feature called Pay-Per-Crawl. Launched earlier this year, the tool gives website owners new control over AI crawlers, allowing them to block access or charge for it. Its relevance quickly became clear when Cloudflare later accused AI company Perplexity of using “stealth” crawlers to bypass publisher rules—a case study in why stronger guardrails are needed.
But technological fixes are only one piece of a much larger puzzle. To understand the fundamental conflict, we spoke with an expert uniquely positioned at the intersection of technology and the law.
John Walter is the President of the Contact Center AI Association and Founder of ProxyLink. His dual career—leading in the AI industry while also practicing law—positions him uniquely at the crossroads of technology and copyright, giving him a rare vantage point on how large language models collide with existing legal frameworks.
The creator's angst: "There is a lot of pent up angst from the creator class," Walter stated. "There is a strong sense from creatives saying, 'Hey, I don't think it's fair for the wealthiest, most powerful companies on earth to be achieving that status off what I labored over for years to create. I'm not being fairly compensated there.'"
A legal vacuum: "In my view, the law just doesn't provide a remedy there," he explained. The gap between what creators feel they deserve and what the law actually protects has created space for new solutions to emerge. "Cloudflare is satisfying. They're finding a market opportunity that's based upon this dissatisfaction with the current contours of copyright law that were designed for a time that predates large language models."
Walter’s analysis highlighted a critical nuance often lost in the debate. The legal problem isn't as simple as AI companies "stealing" content. The law, as it stands, was built for a different world.
Learning vs. pirating: "Historically, there's never been a legal protection against learning from something. If you and I read a book and are inspired by that to go off and write something completely different, there is no copyright protection. My analysis is that the law is similar for a large language model. The current copyright law only protects output. So if someone infringes upon a copyright by outputting something that is replicating creative work, then that is a clear infringement."
This distinction is at the heart of the famous New York Times v. OpenAI lawsuit. But for Walter, the more pressing issue isn't whether a user can trick an AI into spitting out a copyrighted paragraph. The real question is how the AI got behind the paywall in the first place. This shifts the focus from policing output to enforcing the perimeter, a battle with significant geopolitical stakes.
The access problem: "How is the AI getting access to these articles? It's behind a paywall. That is a problem that I hope Cloudflare can solve. This is very important to American competitiveness and most Western countries which adhere to copyright protections. Most big companies like OpenAI and Google are going to try to comply with these copyright laws, but there will be other companies in other jurisdictions that won't be as sensitive. That's what I'm most hopeful for with what Cloudflare is doing: really helping with enforcement and making sure that people who are accessing restricted content are legitimate and are not using it for training data."
The push for stronger perimeters has created an unexpected paradox. For years, the open internet fought against walled gardens, criticizing them as tools of monopolistic control. Today, many creators are asking for those same walls—not to lock users in, but to keep AI scrapers out. That shift puts Cloudflare in an unusually powerful role, raising questions about whether it is acting as a neutral guardian of the internet or becoming a gatekeeper.
Finding a middle ground: Walter argued that Cloudflare’s approach offers a middle ground—protection without the need to wall off the internet entirely. "In a world without Cloudflare's technology, to avoid having your content used as training data, you would have to have a paywall. But what Cloudflare is enabling is for businesses to continue having freely available media that's monetized through advertising while still keeping it out of the training data."
Yet even with these emerging tools, the sense of urgency among creators is palpable. For many, the damage has already been done. Walter ended the conversation with a sobering dose of reality that reframes the entire debate, shifting the stakes from reclaiming the past to protecting the future. "Anything Cloudflare does is only going to really protect the future at this point. Everything on the Internet is already in training data."