Geoffrey Irving, a safety researcher at DeepMind, says the difference between this approach and previous methods is that DeepMind hopes to use “a long-term dialogue for safety”.
“That means we don’t expect that the problems we have in these models – misinformation or prejudice or whatever – are obvious at first glance, and we want talk about them in detail. And that means between machines and humans,” he said.
Sara Hooker, Cohere head of AI, a nonprofit AI research lab, says DeepMind’s idea of using human preferences to optimize how AI models learn is not new. .
“But the improvements are convincing and show clear benefits for optimizing human-guided dialogue agents in the context of large linguistic models,” says Hooker.
Douwe Kiela, a researcher at AI startup Hugging Face, said Sparrow is “a nice next step in the direction of a general trend in AI where we’re trying more seriously to improve safety aspects.” of large language model implementations”.
But much work remains to be done before these conversational AI models can be deployed in the wild.
Sparrow still makes mistakes. The model sometimes goes off topic or generates random responses. Identified participants can also make the model break the rules 8% of the time. (This is still an improvement over older models: DeepMind’s previous models broke the rules three times more often than Sparrow.)
“For areas where the risk of harm to people could be high if a representative responds, such as providing medical and financial advice, this can make many people feel uncomfortable,” says Hooker. like an unacceptably high failure rate,” Hooker said. , “While we live in a world where technology must safely and responsibly serve many different languages,” she added.
And Kiela points out another problem: “Relying on Google to find information leads to undefined biases that are hard to detect, since everything is closed source.”