AI Scientist: ‘We Need to Think Beyond the Big Language Model’
Reproductive artificial intelligence Developers (Gen AI) are constantly pushing the boundaries of what is possible, such as Google Gemini 1.5capable of processing millions of information tokens at once.
However, even this level of development is not enough to make real progress in AI, say competitors who compete directly with Google.
Also: 3 Ways Meta’s Llama 3.1 Is a Step Forward for Gen AI
“We need to think beyond the LLM framework,” said Yoav Shoham, co-founder and co-CEO of AI21 Labin an interview with ZDNET.
AI21 Labs, a private startup, competes with Google in LLM, the large language models that are the foundation of Gen AI. Shoham, who was a chief scientist at Google, is also a professor emeritus at Stanford University.
Also: AI21 and Databricks Show Open Source Can Dramatically Shrink AI
“They were great at the products they put out, but they didn’t really understand what they were doing,” he said of LLM. “I think even the most ardent neural network people didn’t think that you could just build a bigger language model and it would solve everything.”
Shoham’s company has pioneered novel approaches to next-generation AI that go beyond the traditional “transformers” that are at the core of most LLMs. For example, the company In April, it launched a model called Jamba.An interesting combination of transformers with a second neural network is called a state-space model (SSM).
This combination helped Jamba outperform other AI models on key metrics.
Shoham asked ZDNET to elaborate on one important metric: context length.
Context length is the amount of input data — in the form of tokens, usually words — that a program can process. Meta 3.1 Llama provides 128,000 tokens in the context window. AI21 Labs’ Jamba, also open source software, has double that — a 256,000-token context window.
In head-to-head testing, using a benchmark built by Nvidia, Shoham said the Jamba model was the only one other than Gemini that could maintain that 256K context window “in practice.” Context length may be advertised as one thing, but it can fall apart as the model scores lower as context length increases.
Also: 3 Ways Meta’s Llama 3.1 Is a Step Forward for Gen AI
“We are the only ones who have truth in advertising,” in terms of context length, Shoham said. “All the other models degrade as context length increases.”
Google’s Gemini couldn’t be tested beyond 128K, Shoham said, due to the limitations imposed by Google’s Gemini API. “They actually have a good efficient context window, at least at 128K,” he said.
Shoham said Jamba is more cost-effective than Gemini for the same 128K window. “They are about 10 times more expensive than us,” given the cost of providing predictions from Gemini versus Jamba, the inference method, he said.
Shoham emphasizes that all of that is a product of the “architectural” choice to do something different, connecting a transformer to an SSM. “You can show exactly how many [API] “The calls are made” to the model, he told ZDNET. “It’s not just the cost and latency, it’s the inherent issues in the architecture.”
Shoham described the findings in a blog post..
None of that progress, however, would matter unless Jamba could do something more advanced. The benefits of having a large context window became clear, Shoham says, as the world moved toward things like retrieval-augmented generation (RAG), an increasingly popular approach to Connect LLM to external information sourcessuch as databases.
Also: Make Way for RAG: Gen AI’s Shifting Balance of Power
The large context window allows LLM to retrieve and organize additional information from the RAG source to find the answer.
“At the end of the day, get as much back as you can. [from the database]“But not too much,” is the right approach for RAG, Shoham said. “Now you can retrieve more than before, if you have a long context window, and now the language model has more information to work with.”
When asked if there were any real-world examples of the effort, Shoham told ZDNET: “It’s too early to show a running system. I can tell you that we have a number of customers who have been disappointed with RAG’s solutions who are currently working with us. And I’m pretty sure we’ll be able to publicly show the results, but it hasn’t been out long enough.”
Jamba, has had 180,000 downloads since it was posted on HuggingFaceis available on Amazon’s AWS Bedrock inference service and Microsoft Azure, and “people are doing interesting things with it,” Shoham said.
However, even an improved RAG is not the ultimate solution to Gen AI’s various shortcomings, from hallucinations to the risks of multiple generations of technology. go down into nothingness.
“I think we’re going to see people demanding more, demanding systems that aren’t unreasonable and have something that looks like real understanding, that has near-perfect answers,” Shoham said, “and that’s not going to be a pure LLM.”
Also: Beware AI ‘model collapse’: How training on synthetic data pollutes the next generation
In one The article was published last month. On the arXiv preprint server, with collaborator Kevin Leyton-Brown, titled ‘Understanding Understanding: A Pragmatic Framework Driven by Large Language Models’, Shoham demonstrated how, through a variety of operations, such as mathematics and table data manipulation, LLMs produce “explanations that seem convincing but are not worth the metaphorical paper on which they are written”.
“We have shown how naive hooking can be. [an LLM] “Up to a board, that board function will yield success 70% or 80% of the time,” Shoham told ZDNET. “That’s usually exciting because you get something for nothing, but if it’s mission-critical work, you can’t do that.”
Such failures mean that “the whole approach to creating intelligence will say that LLMs have a role, but they are part of a larger AI system that enables things you can’t do with LLMs,” Shoham said.
Among the things needed to move beyond the LLM are the tools have emerged in recent years, Shoham said. Elements like function calls allow LLMs to hand off tasks to another type of software built specifically for a particular task.
“If you want to do addition, language models do addition, but they do it very badly,” Shoham said. “Hewlett-Packard gave us a computer in 1970, why reinvent the wheel? It’s an example of a tool.”
The use of LLM with tools is broadly classified by Shoham and others under the heading of “compound AI systems.” With the help of data management company Databricks, Shoham recently organize a seminar on the prospects for building such systems.
One example of using such tools is to present LLM with a “semantic structure” of table-based data, Shoham said. “Now you get close to one hundred percent accuracy” from LLM, he said, “which you wouldn’t get if you just used a language model without anything else.
Beyond tools, Shoham advocates exploring science in directions other than the pure deep learning approach that has dominated AI for more than a decade.
“You’re not going to get strong reasoning just by applying backpropagation and hoping for the best,” Shoham said, referring to backpropagation, the learning rule that most AIs today are trained on.
Also: Anthropic Takes Tool Use for Claude Out of Beta, Promises Sophisticated Assistants
Shoham has been careful to avoid discussing future product initiatives, but he hinted that what might be needed is embodied—at least philosophically—in a system he and his colleagues are introducing in 2022 called the MRKL (Modular Reasoning, Knowledge, and Language) system.
The paper describes the MRKL system as both a “Neural System, consisting of a large general-purpose language model as well as other smaller, specialized LM systems” and also a “Symbolic System, such as a calculator, a currency converter, or an API call to a database”.
That breath is a neuro-symbolic approach to AI. And in that way, Shoham agrees with some prominent thinkers who are concerned about the dominance of generative AI. Frequent AI critic Gary Marcus, for example, has said that AI will never reach human intelligence unable to manipulate icons
MRKL has been deployed as a program called Jurassic-X, which the company is testing with partners.
Also: OpenAI is training the successor to GPT-4. Here are 3 big upgrades to expect from GPT-5
The MRKL system must be able to use LLM to analyze problems involving difficult-to-understand expressions, such as “There are ninety-nine beer bottles on the wall, one fell, so how many beer bottles are on the wall?” The actual calculation is handled by a second neural network that has access to arithmetic logic, using arguments extracted from the text of the first model.
The “router” between the two has the difficult task of choosing what to extract from the text parsed by the LLM and choosing which “module” to pass the results to in order to perform logic.
That work means “there is no free lunch, but in many cases it is affordable,” Shoham and his team write.
On the product and business side, “we want to continue to provide additional features so people can build products,” Shoham said.
The important point is that a system like MRKL doesn’t need to do everything to be practical, he says. “If you’re trying to build a universal LLM that understands math problems and how to make pictures of donkeys on the moon, how to write poetry, and all of that, that can be expensive,” he notes.
“But 80% of data in business is text — you have tables, you have charts, but donkeys on the moon don’t matter much in business.”
Given Shoham’s skepticism about LLM, is today’s new generation of AI at risk of causing a so-called AI winter, a sudden collapse in activity when interest and funding dry up completely?
“It’s a fair question, and I don’t really know the answer,” he said. “I think this time is different, back in the 1980s,” during the last AI winter, “AI didn’t create enough value to offset the unfounded hype. Obviously there’s some unfounded hype now, but I feel like there’s enough value created to get us through it.”