• Home  
  • Goldman Sachs and the Rise of “AI Engineers”: Meet Devin, the Robot Coder
- News

Goldman Sachs and the Rise of “AI Engineers”: Meet Devin, the Robot Coder

In July 2025, Goldman Sachs took a bold step that may reshape how Wall Street tackles its most technical challenges: hiring an AI software engineer. But Devin isn’t a human—it’s an autonomous coder developed by startup Cognition. What happens when an AI becomes the newest “employee” at a major investment bank? Introducing Devin: a tireless code-writing teammate Goldman Sachs has started piloting an AI agent called Devin, created by AI startup Cognition. Unlike traditional AI assistants that suggest code snippets, Devin can independently handle complex, multi-step development tasks—essentially acting as a full-stack engineer under human supervision. The bank’s Chief Information Officer, Marco Argenti, described Devin as “like our new employee,” with plans to deploy hundreds to thousands of instances depending on business needs. This marks a clear shift from basic developer copilot tools toward truly agentic AI—machines that execute rather than just assist. Cognition’s valuation has surged to around $4 billion as investor enthusiasm for the technology continues to grow. The startup’s AI agent has garnered industry-wide attention, not just for its autonomous capabilities but also for its performance under real-world conditions. Why now? Productivity gains and the hybrid workforce Goldman employs approximately 12,000 software engineers, and Devin aims to augment these teams, not replace them. Argenti envisions a “hybrid workforce,” where engineers shift from writing boilerplate code to supervising AI agents, turning problems into prompts and ensuring output quality. Cognition claims productivity improvements of 3–4× over previous AI-assisted workflows. Meanwhile, other Goldman AI tools—from a generative AI assistant rolled out to 10,000 employees to code translators and investment research assistants—suggest a broad, enterprise-level commitment to AI. Employees from junior analysts to managing directors have reportedly saved hours on tasks like drafting reports, understanding legacy code, or translating documents. Goldman’s internal AI assistant has already transformed day-to-day work across departments, signaling a readiness to embrace more autonomous tools like Devin. But: can Devin really deliver? Despite the hype, independent tests show Devin still struggles with certain complex assignments. In one benchmark, it succeeded in only 3 of 20 tasks, failing on the rest. While outperforming standard language-model bots, it fell short of fully replacing human oversight. The tool appears most effective when operating on large, contextual codebases, where it can leverage ample context to make informed decisions. Still, human developers remain critical for supervision, revision, and strategic prompt-writing. Devin, in its current form, is less of a miracle worker and more of a high-efficiency intern—capable, tireless, and prone to occasional errors. One of the key challenges facing the deployment of agentic AI like Devin is ensuring trustworthiness. Unlike deterministic software, AI agents can make unpredictable decisions based on subtle prompt changes or ambiguous instructions. For this reason, Goldman is taking a cautious approach: Devin doesn’t push to production autonomously. Instead, it operates in sandbox environments where human engineers approve or reject its output. Broader implications: jobs, workflows, and Wall Street business models Goldman Sachs isn’t alone. Firms like JPMorgan and Morgan Stanley have begun rolling out AI assistants or copilots. Goldman’s Devin experiment may be the first deep integration of autonomous AI engineers in finance. Yet, the excitement is tempered by employment concerns. A Bloomberg study projects that 200,000 banking jobs could vanish within 3–5 years due to automation. Similarly, recent analysis suggests AI could fulfill 95% of an IPO prospectus task—formerly a multi-person, multi-week assignment—triggering major shifts in junior roles throughout finance. The roles most at risk involve routine, mechanical work: financial modeling, data entry, compliance documentation, and code maintenance. At the same time, new roles are emerging—prompt engineers, AI supervisors, and trust validators. These jobs require a different skill set, blending domain knowledge with machine fluency. Goldman’s strategy is not merely about efficiency. It’s about evolving the nature of financial work. As Argenti puts it, intelligence is no longer scarce—the value is in applying judgment. This reframing suggests a long-term vision: not fewer employees, but different ones. Technical architecture: how Devin works Devin is built atop a sophisticated LLM stack, combined with task planning, memory, tool integration, and execution capabilities. It can access version control systems like GitHub, navigate file systems, debug software, run test cases, and adapt its approach based on prior outcomes. Unlike conventional models that generate static code snippets, Devin operates dynamically. For instance, when given a task to add a feature to a legacy codebase, Devin analyzes the structure, inserts the correct logic, tests the implementation, and updates the repository—all autonomously. Think of it as a software engineer with a photographic memory and the ability to iterate endlessly. However, these capabilities raise operational questions: What if Devin introduces bugs that humans overlook? How do teams ensure transparency in AI-generated code? Who is accountable for software shipped by a non-human agent? To address this, Goldman has integrated logging, audit trails, and human sign-off into Devin’s workflow. Every step taken by the AI is recorded and reviewed, turning the human supervisor into a strategic reviewer rather than a coder. The goal is not just speed, but accountability. Cultural shift: adapting to AI colleagues Beyond technology, the Devin pilot is forcing a cultural shift. Engineers accustomed to traditional coding must now become “AI conductors,” translating requirements into prompts and interpreting machine output. This requires both technical fluency and a mindset shift. Goldman has launched internal training programs focused on prompt engineering, AI literacy, and best practices for supervising autonomous agents. The feedback loop between human engineers and Devin is being treated as a new form of collaboration, one where trust and clarity are paramount. Interestingly, some junior employees report feeling less threatened and more empowered. With Devin handling routine grunt work, they have more time to focus on learning, strategy, and creativity. For many, the AI assistant is less a rival and more a highly skilled teammate. Industry reactions and investor response The finance sector is watching closely. Cognition’s valuation has jumped dramatically, attracting attention from both tech investors and traditional banking institutions. Other firms are evaluating whether to license Devin or build proprietary alternatives. There

In July 2025, Goldman Sachs took a bold step that may reshape how Wall Street tackles its most technical challenges: hiring an AI software engineer. But Devin isn’t a human—it’s an autonomous coder developed by startup Cognition. What happens when an AI becomes the newest “employee” at a major investment bank?

Introducing Devin: a tireless code-writing teammate

Goldman Sachs has started piloting an AI agent called Devin, created by AI startup Cognition. Unlike traditional AI assistants that suggest code snippets, Devin can independently handle complex, multi-step development tasks—essentially acting as a full-stack engineer under human supervision. The bank’s Chief Information Officer, Marco Argenti, described Devin as “like our new employee,” with plans to deploy hundreds to thousands of instances depending on business needs.

This marks a clear shift from basic developer copilot tools toward truly agentic AI—machines that execute rather than just assist. Cognition’s valuation has surged to around $4 billion as investor enthusiasm for the technology continues to grow. The startup’s AI agent has garnered industry-wide attention, not just for its autonomous capabilities but also for its performance under real-world conditions.

Why now? Productivity gains and the hybrid workforce

Goldman employs approximately 12,000 software engineers, and Devin aims to augment these teams, not replace them. Argenti envisions a “hybrid workforce,” where engineers shift from writing boilerplate code to supervising AI agents, turning problems into prompts and ensuring output quality.

Cognition claims productivity improvements of 3–4× over previous AI-assisted workflows. Meanwhile, other Goldman AI tools—from a generative AI assistant rolled out to 10,000 employees to code translators and investment research assistants—suggest a broad, enterprise-level commitment to AI.

Employees from junior analysts to managing directors have reportedly saved hours on tasks like drafting reports, understanding legacy code, or translating documents. Goldman’s internal AI assistant has already transformed day-to-day work across departments, signaling a readiness to embrace more autonomous tools like Devin.

But: can Devin really deliver?

Despite the hype, independent tests show Devin still struggles with certain complex assignments. In one benchmark, it succeeded in only 3 of 20 tasks, failing on the rest. While outperforming standard language-model bots, it fell short of fully replacing human oversight.

The tool appears most effective when operating on large, contextual codebases, where it can leverage ample context to make informed decisions. Still, human developers remain critical for supervision, revision, and strategic prompt-writing. Devin, in its current form, is less of a miracle worker and more of a high-efficiency intern—capable, tireless, and prone to occasional errors.

One of the key challenges facing the deployment of agentic AI like Devin is ensuring trustworthiness. Unlike deterministic software, AI agents can make unpredictable decisions based on subtle prompt changes or ambiguous instructions. For this reason, Goldman is taking a cautious approach: Devin doesn’t push to production autonomously. Instead, it operates in sandbox environments where human engineers approve or reject its output.

Broader implications: jobs, workflows, and Wall Street business models

Goldman Sachs isn’t alone. Firms like JPMorgan and Morgan Stanley have begun rolling out AI assistants or copilots. Goldman’s Devin experiment may be the first deep integration of autonomous AI engineers in finance.

Yet, the excitement is tempered by employment concerns. A Bloomberg study projects that 200,000 banking jobs could vanish within 3–5 years due to automation. Similarly, recent analysis suggests AI could fulfill 95% of an IPO prospectus task—formerly a multi-person, multi-week assignment—triggering major shifts in junior roles throughout finance.

The roles most at risk involve routine, mechanical work: financial modeling, data entry, compliance documentation, and code maintenance. At the same time, new roles are emerging—prompt engineers, AI supervisors, and trust validators. These jobs require a different skill set, blending domain knowledge with machine fluency.

Goldman’s strategy is not merely about efficiency. It’s about evolving the nature of financial work. As Argenti puts it, intelligence is no longer scarce—the value is in applying judgment. This reframing suggests a long-term vision: not fewer employees, but different ones.

Technical architecture: how Devin works

Devin is built atop a sophisticated LLM stack, combined with task planning, memory, tool integration, and execution capabilities. It can access version control systems like GitHub, navigate file systems, debug software, run test cases, and adapt its approach based on prior outcomes.

Unlike conventional models that generate static code snippets, Devin operates dynamically. For instance, when given a task to add a feature to a legacy codebase, Devin analyzes the structure, inserts the correct logic, tests the implementation, and updates the repository—all autonomously. Think of it as a software engineer with a photographic memory and the ability to iterate endlessly.

However, these capabilities raise operational questions: What if Devin introduces bugs that humans overlook? How do teams ensure transparency in AI-generated code? Who is accountable for software shipped by a non-human agent?

To address this, Goldman has integrated logging, audit trails, and human sign-off into Devin’s workflow. Every step taken by the AI is recorded and reviewed, turning the human supervisor into a strategic reviewer rather than a coder. The goal is not just speed, but accountability.

Cultural shift: adapting to AI colleagues

Beyond technology, the Devin pilot is forcing a cultural shift. Engineers accustomed to traditional coding must now become “AI conductors,” translating requirements into prompts and interpreting machine output. This requires both technical fluency and a mindset shift.

Goldman has launched internal training programs focused on prompt engineering, AI literacy, and best practices for supervising autonomous agents. The feedback loop between human engineers and Devin is being treated as a new form of collaboration, one where trust and clarity are paramount.

Interestingly, some junior employees report feeling less threatened and more empowered. With Devin handling routine grunt work, they have more time to focus on learning, strategy, and creativity. For many, the AI assistant is less a rival and more a highly skilled teammate.

Industry reactions and investor response

The finance sector is watching closely. Cognition’s valuation has jumped dramatically, attracting attention from both tech investors and traditional banking institutions. Other firms are evaluating whether to license Devin or build proprietary alternatives.

There is also a growing ecosystem of “agent ops” tools—software platforms designed to manage, monitor, and orchestrate fleets of AI agents. Goldman’s experience could serve as a blueprint for how to scale such deployments responsibly.

On the regulatory front, questions are emerging. Should AI agents be subject to the same compliance reviews as human developers? How do institutions prevent hallucinated code or biased outputs? The answers will likely shape how agentic AI is adopted across other regulated industries, from healthcare to insurance.

Looking ahead: where this could lead

Goldman’s bold rollout of Devin, alongside its GS AI Assistant and other tools, reflects a two-pronged strategy:

1. Automate routine, mechanical tasks, allowing humans to focus on higher-value thinking.

2. Build AI fluency across employees, empowering them to prompt, supervise, and refine AI-generated output.

    But success will depend on continuous improvement of the agents, rigorous supervision protocols, and strategic deployment. The hybrid model hinges on human judgment deciding what to ask, what to trust, and how to act. As Argenti emphasizes, the goal isn’t just to replace effort with automation, but to elevate the entire organization’s intelligence quotient.

    If Devin and tools like it can prove their value, we may see a transformation in how engineering, finance, and knowledge work are organized. Teams could evolve from static units of labor to dynamic orchestrators of algorithmic labor, with AI as both tool and teammate.

    Conclusion

    Goldman Sachs stepping into agentic AI with Devin represents a significant milestone: not just coding assistance, but distributed automation at scale. Whether this becomes a template for the financial industry—or a cautionary tale—depends on careful execution, real-world performance, and the ability of human teams to supervise, adapt, and innovate alongside these new colleagues.

    As AI agents begin writing code, analyzing data, and making decisions, the very nature of work is being rewritten. Goldman Sachs isn’t just testing a tool; it’s pioneering a new workforce paradigm. The rest of the industry is watching—and taking notes.

    Leave a comment

    Your email address will not be published. Required fields are marked *