AI is learning from what you say on Reddit, Stack Overflow, or Facebook. Do you agree?

July 7, 2024Last Updated: July 7, 2024

0 231

AI is learning from what you say on Reddit, Stack Overflow, or Facebook. Do you agree?

Post a comment on Reddit, answer coding questions on Stack Overflow, edit a Wikipedia entry, or share a baby photo on your public Facebook or Instagram page, and you’re helping train the next generation of artificial intelligence.

Not everyone agrees with that — especially as the online forums they’ve spent years contributing to are increasingly flooded with AI-generated comments that mimic what real humans might say.

Some longtime users have tried to delete their previous posts or rewrite them to make them nonsensical, but the protests have had little effect. Some governments — including Brazil’s privacy regulator on Tuesday — have also tried to intervene.

“A significant portion of the population feels powerless,” said Sarah Gilbert, a volunteer moderator of Reddit who also studies online communities at Cornell University. “There’s nowhere to go except completely offline or not contributing in a way that brings value to them and value to others.”

Platforms are responding — with mixed results. Take Stack Overflow, the popular hub for computer programming tips. It first banned responses written by ChatGPT due to frequent errors, but now it’s working with AI chatbot developers and has punished some of its own users who tried to delete their previous contributions in protest.

It’s one of many social media platforms grappling with user wariness — and sometimes rebellion — as they try to adapt to the changes brought on by AI.

Software developer Andy Rotering of Bloomington, Minnesota, has used Stack Overflow daily for 15 years and said he worries the company “may inadvertently be hurting its greatest resource” — the community of contributors who spend their time helping other programmers.

“Keeping contributors motivated to comment is paramount,” he said.

Stack Overflow CEO Prashanth Chandrasekar said the company is trying to balance the growing need for instant programming support via chatbots with the desire for a community “knowledge base” where people still want to post and “get recognized” for what they contribute.

“Fast forward five years — there will be all kinds of machine-generated content on the web,” he said in an interview. “There will be very few places that will actually have original, authentic human thought. And we are one of those places.”

Chandrasekar describes Stack Overflow’s challenges as one of the “case studies” he studied at Harvard Business School, about how a business survives—or doesn’t survive—after a disruptive technological change.

For more than a decade, users have typically come to Stack Overflow by Googling a coding question, then searching for an answer, copying, and pasting. The answers they’re most likely to see come from volunteers who have accumulated a reputation score—which in some cases can help them land a job.

Now, programmers can simply ask an AI chatbot — some of which have been trained on everything ever posted on Stack Overflow — and it can instantly spit out an answer.

The launch of ChatGPT in late 2022 threatened to bankrupt Stack Overflow, so Chandrasekar assembled a special 40-person team at the company to race to launch its own dedicated AI chatbot, called Overflow AI. The company then signed a deal with Google and ChatGPT maker OpenAI, allowing AI developers to tap into Stack Overflow’s Q&A repository to further improve their AI language models.

Maria Roche, an assistant professor at Harvard Business School, said that kind of strategy makes sense but may come too late. “I’m surprised Stack Overflow didn’t do this sooner,” she said.

When some Stack Overflow users attempted to delete their previous comments after the Open AI partnership was announced, the company responded by suspending their accounts due to terms that made all contributions “perpetually and irrevocably licensed to Stack Overflow.”

“We were quick to address the issue and say, ‘Hey, that’s not acceptable behavior,’” Chandrasekar said, describing the protesters as a minority among “hundreds” of the platform’s 100 million users.

Brazil’s national data protection agency moved Tuesday to ban social media giant Meta Platforms from training its AI models on Brazilians’ Facebook and Instagram posts. The agency has set a daily fine of 50,000 reais ($8,820) for non-compliance.

In a statement, Meta called this a “step backward in innovation” and said it was more transparent than many industry competitors that conduct similar AI training on public content, saying its operations comply with Brazilian law.

Meta has also faced pushback in Europe, which recently halted plans to start feeding people’s public posts into its AI training system — which was scheduled to begin last week. In the United States, where there are no national laws protecting online privacy, such training is likely already underway.

“Most people have no idea their data is being used,” Gilbert said.

Reddit took a different approach—partnering with AI developers like OpenAI and Google, and making it clear that content could not be scraped en masse without the platform’s approval by commercial entities “with no regard for user rights or privacy.” Those deals helped Reddit raise the money it needed to debut on Wall Street in March, with investors pushing the company’s valuation to nearly $9 billion just seconds after it began trading on the New York Stock Exchange.

Reddit has never tried to punish users who object — nor is it easy to do so, since volunteer moderators have a lot of discretion over what happens in their specialized forums, known as subreddits. But what worries Gilbert, who helps moderate the “AskHistorians” subreddit, is the growing stream of AI-generated comments that moderators have to decide whether to allow or ban.

“People come to Reddit because they want to talk to people, they don’t want to talk to bots,” Gilbert said. “There are apps where they can talk to bots if they want to. But traditionally, Reddit is about connecting with people.”

She said it was ironic that the AI-generated content threatening Reddit was sourced from comments from millions of Reddit users, and that “there is a real risk that it could end up pushing people out.”

——

Associated Press journalist Eléonore Hughes in Rio de Janeiro contributed to this report.

——

The Associated Press and OpenAI have a licensing and technology agreement that gives OpenAI access to part of the AP’s text archive.