Measuring the ROI of GenAI in Engineering: The Cody Experiment

The Promise of GenAI in Engineering

In the ever-evolving world of software development, the allure of Generative AI (GenAI) is undeniable. It promises to accelerate workflows, reduce toil, and enhance productivity. But beyond the hype, how do you measure its true impact? How do you quantify the return on investment (ROI) of a tool that transforms the way engineers write and ship code?

This is the story of how we started to measure the ROI of Cody, a GenAI-powered coding assistant, in a complex engineering environment. It’s a journey of discovery, filled with data, insights, and a few unexpected revelations. But most importantly, we are right at the beginning, navigating through uncharted territory with no major statistically significant proof that the approaches or metrics below are fully fledged or mature.

Why Measure ROI?

When we first introduced Cody to our engineering teams, excitement buzzed in the air. The promise was clear: Cody would help developers write better code faster, cut down manual effort, and ultimately ship more features. But as leaders, we needed more than excitement, we needed evidence. We needed metrics that would not only justify the investment but also help us understand how to optimize Cody’s usage for maximum impact.

This wasn’t just about proving Cody’s worth; it was about understanding its role in our developers’ day-to-day lives. Were they actually using it? Was it making them more productive? And most importantly, was it driving real business value?

The Metrics That Matter: Crafting the Framework

We knew that traditional metrics wouldn’t cut it. We needed a custom framework tailored to GenAI’s unique contributions to engineering productivity. Here’s how we approached it:

Cody Hours Saved

Sourcegraph, the company behind Cody, shipped the product with a set of metrics including adoption, usage, and “hours saved”. The metric was based on research amongst Sourcegraph’s clients. The research included asking developers using the tool how many minutes they saved when they used the three main features behind Cody back then (autocomplete, chats, and commands).

The data was astounding. Cody saved hundreds of hours per month across the organization. Notably, the most significant savings came from chat events, suggesting that developers were leveraging Cody as a real-time problem-solving companion.

However, there was a significant challenge with this metric: it wasn’t based on statistically strong values, nor could it be proven that developers in different companies and industries would use the tools the same way and therefore achieve the same gains. In short, it wasn’t a trustworthy metric. We needed to find something more relevant to understand the ROI behind the tool.

The Question: Is Cody actually saving time?

Why It Matters: Time saved translates directly to productivity gains. The more time engineers save, the more they can focus on strategic, high-impact work.

What We Learned: Cody Hours Saved was an enticing metric, but its lack of statistical rigor led us to seek out more meaningful measurements. This realization was pivotal in refining our approach to measuring GenAI’s impact.

Cody Usage Percentage (%)

The Question: Are developers adopting Cody?

Why It Matters: Adoption is the first step to realizing any productivity gains. If developers aren’t using Cody, then everything else is moot.

How We Measured It: We calculated the Cody usage percentage by measuring the proportion of active tech employees who interacted with Cody at least once. This metric revealed not only the overall adoption rate but also highlighted adoption trends across different teams and job functions.

What We Learned: Early results showed that adoption rates varied significantly across teams. Interestingly, folks were adopting the tool and stopping using it almost immediately. They fed back that the tool wasn’t useful… (a bit more on this on the education part). This insight led us to tailor onboarding and training sessions to different team profiles, ultimately boosting overall adoption.

Speed to Merge Request (MR)

The Question: Does Cody accelerate code delivery?

Why It Matters: In software development, speed matters. The faster code moves from idea to production, the more value it delivers to customers.

How We Measured It: We compared the MR velocity of users who became daily Cody users to those who never used Cody. Specifically, we measured:

  • Number of MRs per month

  • Size of MRs (lines of code changed)

Key Findings:

  • MR Count Difference: Daily Cody users submitted 2.90 more MRs per month (30.4% increase).

  • MR Size Difference: Interestingly, MR sizes were smaller on average, suggesting that Cody users were committing more frequently, leading to more modular and manageable code changes.

  • The above only applied to engineers using Cody on a daily basis (12+ days in a month - this would allow us to remove weekends and the average time developers weren’t actually writing code).

The Surprising Insight: Not only were daily users more productive, but their MRs were also more concise, indicating cleaner, more maintainable code. This finding challenged our initial hypothesis that productivity gains would solely come from volume and highlighted the importance of code quality.

Beyond the Metrics: Insights and Decisions

The data began to show signs of validation of our investment in Cody, but more importantly, it revealed actionable insights:

  • Adoption Strategies: Adoption rates were significantly influenced by team culture. Teams with a strong learning culture adopted Cody faster and saw higher productivity gains.

  • Training and Enablement: We found that tailored onboarding sessions and peer-led training workshops boosted Cody usage significantly.

  • Developer Advocacy: Developers went from skeptics to advocates, reporting a higher sense of accomplishment and automation of mundane and unwanted tasks, contributing to an improved developer experience.

The Bigger Picture: From ROI to Transformation

Measuring the ROI of Cody wasn’t just about justifying the investment; it was about understanding the transformative impact of GenAI on engineering productivity. By digging deep into usage data and productivity metrics, we uncovered a narrative of enhanced velocity, cleaner code, and happier developers.

But this story is far from over. We are continuously refining our metrics, exploring advanced usage profiles, and experimenting with new features to maximize GenAI’s potential.

What’s Next?

We are now exploring:

  • Active User Trends: Are active users increasing month over month?

  • MR Patterns: Do developers create more MRs when they use Cody more frequently? And are MRs with Cody footprint containing fewer vulnerabilities and errors?

  • Strategic Impact: Could Cody accelerate some of our replatforming efforts and ultimately reduce the time to replatform our legacy codebase from years to potentially months?

Measuring What Matters

Measuring the ROI of GenAI isn’t just about tracking numbers. It’s about understanding how technology transforms the way people work. It’s about connecting productivity gains to business outcomes. And most importantly, it’s about leveraging data-driven insights to make informed decisions.

Ready to Measure Your GenAI ROI?

If you’re considering GenAI for your engineering teams or already on this journey and looking to measure your impact, let’s connect. Let’s turn data into insights and insights into impact.

I’d like to thank the Sourcegraph team for all the partnership and collaboration to see this through.

Next
Next

Declarative Code as Prompt: A New Paradigm for AI-Driven Software Development