The Future of AI-Powered Coding is Coming
I am not a skilled programmer, but thanks to a free program called SWE-agent, I have was only able to debug and fix a difficult issue involving a misnamed file in various code repositories on the software hosting site GitHub.
I pointed SWE-agent at an issue on GitHub and watched as it ran through the code and reasoned about what could be wrong. It correctly identified that the root cause of the error was a line pointing to the wrong location for a file, then navigated through the project, located the file, and modified the code to get things working properly. This is the kind of thing that an inexperienced developer (like me) could spend hours trying to debug.
Many programmers have used artificial intelligence to write software faster. GitHub Copilot is The first integrated development environment for exploiting AIBut many IDEs now autocomplete code snippets as the developer starts typing. You can also ask the AI questions about the code or ask the AI to make suggestions on how to improve what you’re doing.
Last summer, John Yang and Carlos Jimenez, two PhD students at Princeton, began discussing what it would take for an AI to become a real software engineer. This led them and others at Princeton to come up with a SWE-bencha set of benchmarks to test AI tools on a variety of coding tasks. After releasing the benchmarks in October, the team developed its own tool—SWE-agent—to master these tasks.
SWE-agent (“SWE” stands for “software engineering”) is one of a number of significantly more powerful AI coding programs that go beyond just writing lines of code to acting as software agents, tapping into the tools needed to organize, debug, and organize software. The startup Devin has become known for its a video demo of one such tool in March.
Ofir Press, a member of the Princeton team, said SWE-bench could help OpenAI test the performance and reliability of its software agents. “It’s just my opinion, but I think they’ll release a software agent soon,” Press said.
OpenAI declined to comment, but another source with knowledge of the company’s operations, who asked not to be named, told WIRED that “OpenAI is definitely working on cryptographic agents.”
As GitHub Copilot pointed out, Large language models can write code and increase programmer productivityTools like SWE-agent can demonstrate that AI agents can operate reliably, starting with building and maintaining code.
Several companies are testing agents for software development. At the top of the SWE-bench chart, which measures the scores of different cryptographic agents on a variety of tasks, is one of AI Factorya startup, followed by AutoCodeRoveran open source item from a group at the National University of Singapore.
Big players are getting in on the action, too. A software writing tool called Amazon Questions is another top performer on SWE-bench. “Software development is more than just typing,” says Deepak Singh, vice president of software development at Amazon Web Services.
He added that AWS has used the agent to translate entire software stacks from one programming language to another. “It’s like having a really smart engineer sitting next to you, writing and building the application with you,” Singh said. “I think that’s pretty transformative.”
A team at OpenAI recently helped a Princeton research team improve a benchmark for measuring the reliability and effectiveness of tools like SWE-agent, suggesting that the company may also be improving agents for writing code or performing other tasks on computers.
Singh says some customers are already building complex backend applications using Q. My own testing with SWE-bench shows that anyone who knows how to code will soon want to use agents to enhance their programming skills, or risk being left behind.