Cognition’s CEO on What Comes After Code

The future has a way of showing up early to some places. In software engineering, one of those places is Cognition—the startup that made headlines in early 2024 with Devin, the world’s first autonomous coding agent, and more recently with its acquisition of the AI code editor Windsurf. Scott Wu, Cognition’s cofounder and CEO, has a front-row seat to what comes next. In this episode of AI & I, we talk with Wu about why the fundamentals of computer science still matter in an AI-first world, the direction he sees for the short- and long-term future of programming, and why he believes we may already be living with AGI. Timestamps: 00:00:00 – Start 00:02:02 – Introduction 00:02:32 – Why Scott thinks AGI is here 00:09:27 – Scott’s personal journey as a founder 00:16:55 – Why the fundamentals of computer science still matter 00:22:30 – How the future of programming will evolve 00:26:50 – A new workflow for the AI-first software engineer 00:29:33 – How Devin stacks up against Claude Code 00:40:05 – Reinforcement learning to build better coding agents 00:50:05 – What excites Scott about AI beyond Cognition If you found this episode interesting, please like, subscribe, comment, and share! Want even more? Sign up for Every to unlock our ultimate guide to prompting ChatGPT here: https://every.ck.page/ultimate-guide-to-prompting-chatgpt. It’s usually only for paying subscribers, but you can get it here for free. To hear more from Dan Shipper: - Subscribe to Every: https://every.to/subscribe - Follow him on X: https://twitter.com/danshipper **Links to resources mentioned in the episode: ** - Scott Wu: Scott Wu (@ScottWu46) - Learn more about Cognition: https://cognition.ai/ Try the world’s first autonomous coding agent: https://devin.ai/

Published: Published Sep 24, 2025
Uploaded: Uploaded Jun 12, 2026
File type: Podcast
Queried: 00
Source: share.transistor.fm

Full transcript

Showing the full transcript for this episode.

AI-generated transcript with timestamped sections.

0:00-1:47

[00:00] Software engineering is being radically changed by AI. Being able to program in English by launching agents to build features and fix bugs is changing all of the techniques and the primitives and the best practices that if you're a software engineer, you've grown up with. And for a lot of people, there's a big question about whether we're going to need software engineers at all anymore. [00:30] an AI software engineer. Devon came out just a couple years ago, and it's already at 73 million in ARR. You can think of it like another teammate that you can talk to on Slack that has access to its own computer and can build small features or fix bugs, basically autonomously. Cognition also just acquired Windsurf, a cursor competitor. [00:49] So in addition to having a fully autonomous AI software engineer, now they have a programming editor with AI built in. So they're one of the foremost players shaping the entire landscape of what programming is going to be like over the next 10 or 20 years. And that's what I talk about with Scott. We talk about what AGI is and whether it's here or not. And we talk about what programming is going to look like now and into the future. We talk about the different bets that players like Cognition and Anthropic and OpenAI are making to win the programming and AI race. [01:19] talk about the recent wave of CLI tools that have taken off like CloudCode and Codex CLI. We talk about how Scott sees Devon fitting into the landscape and why he acquired Windsurf. This is a great episode because Scott is that rare combination of guests that both has really practical hands-on frontier experience because he's building and he's also incredibly smart and incredibly articulate. If you're looking for a conversation that gets down to the actual brass tacks of what the future of software engineering is like, this is the episode for you. Enjoy.

1:49-3:21

[01:49] you. [02:01] Scott, welcome to the show. Yeah, yeah. Thanks for having me. Of course. So for people who don't know, you are the co-founder and CEO of Cognition. You are the makers of Devon, the AI software engineering agent, and recently the acquirers, the lucky winners of Windsurf. [02:19] It seems like you've had a really crazy couple months over there. [02:22] Yeah, it's been a fun few months for us. I was gonna say, I mean, it's been an interesting time for everyone, I think, in the AI coding space, but a crazy few months, especially for us. So the thing I want to start with first is you said something on John Collison's podcast recently that you think AGI is here. Explain. [02:39] Sure. No, I mean, so it's at least a little bit facetious, obviously, but I think it's worth thinking about that 10 years ago, what would we have called AGI? [02:51] right and we would have said okay it's got to be able to pass the turing test obviously it's got to be able to converse with you and just like you know come off as a human and and and just be able to relate to you and think and you know in much the same way that a human does it's got to be able to solve like you know [03:06] tough, technical problems. It's got to be able to do a lot of the same tasks. It's got to be able to interact with the real world the same way that a human would. Right. And I think to first order, we've basically done all of those things. I mean, passing the charting test, we've obviously done, you know, we have, you know,

3:21-4:52

[03:21] Uh, you know, OpenAI and others, you know, have released work on, you know, getting a gold medal at the IMO and the IOI, you know, solving incredibly hard technical problems, um, building agents that can go and actually interact and reason in the world. And obviously, um... [03:36] You know, I think there's an interesting question of, well, there's still so much more that humans do, and then there's so much more to it. And I think that's true. Maybe one way to put it is, you know, my view is that's going to be true for quite some time to come. [03:49] And sometimes people ask about AGI from the perspective of like, well, AGI is when humans have nothing left to do at all, you know, or something like that. Or, you know, one of the definitions that people use is like AGI does 80% of knowledge work or things like that. You know, I think these things are really hard to define and really hard to kind of. [04:10] exactly clarify because humans specifically do the parts that are not automated, right? I mean, it's kind of like whether 80% of human work has been automated, I claim that it already has been [04:23] a long time ago, actually, because as soon as we had the tractor and as soon as we had, you know, it's like, if you think about what people did a thousand years ago, we are doing way less than 20% of that work, right? And so a bit facetious for sure, but I guess my point is like, you know, I think we have a lot of levels of AI development that occur. I'm not sure there's one hard cutoff on what counts as AGI, but I think it's also very clear that, you know, we've hit a lot of the things that people would have considered doing. [04:50] insane, you know, just a few years ago. So

4:52-6:24

[04:52] I think that's such a good point. That's why I hate the 80% of knowledge work definitions, because knowledge work changes. It's not a static thing. [05:01] And I think people underestimate the... Once you automate one level of work, there's always another level of work above that. We've all seen this over the last three years. A lot of the stuff that I do today is... [05:14] A lot of the stuff I was doing three years ago is automated now, and I'm just doing more per unit work, which is really interesting. [05:24] Yeah. And as humans, we're always so, you know, we're human centric in our view, I guess, was the way to say it. We're very proud of ourselves and our work, which we should be. But, you know, you can imagine at some point, it's like, you'll just be able to just like think a couple thoughts and then have all of this come out and happen in reality. And we'll still be saying, oh, you know, well, AI can't do that. You know, humans are still doing all the important work, right? And of course, it's like AI, you know, at that point, or technology in general, at that point, will have made us like 10,000 times more. [05:50] you know, faster by virtue of doing 99.99% of the work. But it's kind of, it's very hard to define what counts as percentage of labor, right? Yeah. So I have a particular definition of AGI that I'd love. I'd love to bat around with you. Feel free to criticize, poke holes in it, but also I'm interested in what you think. So the definition of AGI that I like actually comes from, [06:12] Thank you. [06:13] child psychology. So when children are born, they are [06:18] essentially like totally dependent on their caregiver. You can't leave them alone for any length of time.

6:24-7:55

[06:24] Um, and as they get older, you can leave them for progressively more amounts of time to be on their own. So, you know, an infant or like a toddler, you can leave for like five minutes or 10 minutes or something like that in their room. Um, as they become children, you know, you can leave them alone for like hours or more as teenagers, they go away for, you know, maybe a night at a time and then they go to college and they're like fully, fully autonomous. [06:54] and just AI in general, it has followed that same trajectory. So when GPT-3 was first on the scene, we were just at the tab complete level of autonomy. And now we're seeing Devon or GPT-5 or Cloud Code run for 10 or 15 minutes, 10 or 15, 20 minutes at a time. And you can see this smooth lengthening of that leash in the same way that [07:18] you see a smooth lengthening of the leash for children. And so I think a good definition of AGI is... [07:24] when it is economically profitable to never turn your AI off. It's always working. It's always doing something. And when enough people are doing that, I think that counts as AGI. Yeah. Yeah. I think that's super fair, I would say. One thing is, obviously, it's very dependent on dynamics in the sense that, well, if everybody has an AGI, then the AGI's are competing with one another for their usefulness or something, right? And so there's some of that. But I think that's right. Yeah. I think there is a point... [07:54] "When?"

7:55-9:26

[07:55] you can truly just have an always on agent that's going and doing meaningful work and producing value off of it. [08:03] Um, [08:04] Economic value is, I think the one thing I would push back against is the idea of economic value, just because as we're saying so much of economic values, depending on, you know, [08:13] how substitutable it is that you're providing, you know, or things like that. [08:18] Um, [08:19] But no, that's cool. Yeah, I feel like to your point about the doubling time of how long these agents can operate, it is insane how long that trajectory has continued. It's like there's always this saying of you can never trust an exponential curve or you can never keep predicting points on an exponential curve. And yet, like... [08:40] they've kept coming so they they certainly have i'm curious actually one of the things that this makes me think about is um [08:49] Because I'm thinking about growing up and I'm thinking about the process of growing up for you. I think that Devin is your first, Cognition is the first company you started. And is it? So I actually, I started a company before this. It was called Lunch Club. [09:03] So I ran it for about five years. [09:06] Okay, so you're more of a veteran than I thought. Well, it's still very much a noob. [09:14] I am curious, though, what has that been like? You've been running Cognition for at least a few years. What were you like when it started, and what did you believe about yourself and the world, and what are you like now?

9:27-11:16

[09:27] Yeah, you know, it's honestly crazy. I think especially because I had run a company before, and the sense of like, you know, I think... [09:38] Over the last decade or so, there are a lot of different great companies that got built, but the pace of what's happened in AI and the trajectory of AI has changed. [09:48] already, I think, been very vastly different from a lot of that. And it's already gone much faster. I think for me, [09:55] There are a few elements for sure. There's a little bit of the chip on the shoulder of, you know, I feel like I could do better. I feel like I could do more. But honestly, I think there was also just a feeling of like... [10:05] It's just... [10:06] I have to try, is kind of how I thought about it at the time. [10:10] Like I thought about it, the way I thought about it was, [10:14] If you try and you build something really meaningful and it works out great, obviously that's great. [10:19] But what happens if that doesn't happen? And the question for me was, would you rather try [10:28] and give it everything you had and just find out that, you know, it didn't work out and you weren't the one and whatever else. Or would you rather not try? [10:38] and wonder about whether you could have done it. And I think for me, the answer was pretty clear that I just wanted to give it a go. And so that's a lot of what it was like for me. In practice, obviously, it was like, [10:53] It was almost like walking our way into building a company. It's not even necessarily the case that we were saying, oh, we're going to build a company. I think at the time, it was really just exploring ideas in AI and looking into the things that we found really interesting, which naturally were just nerds. And so AI coding is the coolest thing. And so we were messing around with a bunch of these things. It was me and a bunch of my friends who I've known for years and years.

11:16-13:01

[11:16] But as it became more clear that, hey, AI coding is going to take off and stuff like RL is going to really work over the next while and it's going to unlock a lot of these product experiences, I think there's a real question of like, [11:28] Is this the thing that we want to commit and spend all of our focus and all of our effort on? [11:35] And, [11:36] And that was the trade off for me. [11:37] Yeah. And how are you different now? [11:41] Yeah. And now I think, um, [11:44] Thank you. [11:45] I've been having a great time, to be honest. [11:48] And I consider myself very lucky in that. And I think there's a few things, I think, [11:57] about it that are really nice. I think obviously the problem that we work on is great. I think people, I think of the people that you work with as the most important thing. You know, someone gave me this advice a long time ago that, [12:07] You're going to spend most of your time working, and so [12:10] you might as well work with people that you really like. [12:12] And I still always thinking about that. And so like the... [12:18] Um, [12:20] No, I, I, a lot of the different, you know, I, I think I've learned more things. I, I, hopefully I've gotten a little bit better over the course of the last couple of years. Um, but yeah. [12:28] But in a lot of ways, I think it has been... [12:31] much the same way. You know, I think that... Do you know the line of like... [12:37] you know, leave it all on the field. [12:40] Bye. [12:41] Yeah. And I really like that mindset of like, you want to go and try your best, but you also want to be able to live with the outcome. And I think that's how we very much think about it today of like, we give it everything that we have, we do the best that we can, you know, you can control the inputs, you can't control what happens at the end. And, and we just want to be able to say that we gave it everything we have.

13:02-14:44

[13:02] I feel that. I mean, I definitely feel like inside of every, I just love the people that I work with. And that makes it so much better to run a company when you're just everyone's having a great time together, you know. And I think another way to frame, leave it on the field that I've felt myself is... [13:19] I would do this even if it failed. [13:21] Um, and even if it didn't make a lot of money, and I think that's actually somewhat rare today. Um, and, and, and it sounds like that is a big part of your journey too, because these are things that you're just interested in and playing around with yourself anyway, regardless of whether it was a gigantic company. [13:42] Yeah, it's like, you know, everything could blow up today. You know, who knows what happens? You know, all the we have a huge AI winter or, you know, something happens with the hardware or whatever. And then, you know, AI collapses. And I would still, you know, our company collapses and whatever, you know, God forbid. And I would still be like, that was an amazing two years. Like, I had a great time. I'm really happy that I got to do this for sure. [14:12] that you currently do to do that thing. [14:16] Yeah. Yeah, that's a good question. I grew up doing a lot of coding, obviously. I mean, funnily enough, despite all of these-- arguably, because of my day-to-day work, I get to do less coding now than I did before. And so I don't know exactly what that implies. And I obviously like to do more coding whenever I can. But I certainly have not gotten to the point of saturating my desire to write code, if that makes sense. I do think there's something really satisfying about--

14:45-16:30

[14:45] Just building the like, you know, it's funnily, I think one of the things that comes to mind here for me is like, [14:51] Um, [14:51] is the Otter logo of Devin. I don't know if you've seen the Otter on Twitter or things like that. It has always been an informal logo of ours. There's a question of whether to make it the official logo. But like, [15:03] I don't know, random company stuff, right? But the reason I say it is because it actually is really how we think about it internally. [15:12] Which is Devin is just like our little buddy, you know, like a little, you know. [15:17] cute little otter with his own computer and just typing away and doing tasks for you. And that's kind of how we have always thought about it. I think if we were, I mean, we are all programmers ourselves, obviously. If we really felt like this was going to be the end of programming, I think we would be way less excited about this problem. [15:35] And I think instead, it's... [15:38] It is kind of just like teaching your own buddy how to code and then starting on that journey in a way that's been really fun for us. [15:45] I think that's a really good segue into one of the things I wanted to talk about is just how you see the discipline of programming changing. I think I've seen on my end, and I'm curious if this is similar to what you're seeing. I've seen on my end, obviously there's people who don't use AI at all. But then there's this sort of bifurcation between programming. [16:08] I think more traditional engineers who are adding AI into their existing processes. And then there's AI-first engineers who maybe only learn to code with AI, or maybe they are senior engineers from the past, but are just going full-on into AI. And they're AI-first, and they're only touching the code if they absolutely have to, which is a very different mindset.

16:38-18:24

[16:38] ways of thinking and different primitives for how to do good engineering if you're only orchestrating agents. I'm curious if that's what you're seeing or how you see the landscape evolving. [16:46] No, I think that's definitely the case. And it reminds me of like, [16:51] Um, you know, one of my favorite fun facts actually is that I like to share is, do you know that teachers actually used to picket and protest against the idea of calculators? [17:00] I did not know that. So when calculators first came out, there were a lot of protests of like, you know, we can't have this, this threatens math education, and all that good stuff. And, you know, obviously, I mean, we did okay as a society, despite having calculators in our lives. And I guess the point that I want to make with it is like, sure, I think there are some things that go away. And, you know, maybe people are, for example, maybe people have their multiple kitchen tables memorized a little bit less, you know, in the post calculator era. But obviously, [17:30] humans with the tools and what they can do, you know, the answer is much, much more today. And so I think what is going to happen or, you know, what's kind of actually happening already is I think there's going to be a somewhat different education path for, you know, [17:43] how to be a really great engineer in the post-AI age. The thing that's interesting is there's so many levels of AI improvement that are actively happening right now that kind of change that answer. [17:55] But from what we can see, I think we can kind of imagine that a lot of what that looks like is more about [18:01] Really deeply understanding, you know, logical fundamentals, being able to break down problems and articulate the answers to them, you know, being able to think about different strategic tradeoffs, thinking about architectures and so on. Right. And less about, you know, just going and debugging your Kubernetes or, you know, knowing all of these kind of obscure libraries or understanding, you know, some very particular like esoteric syntax or something like that.

18:31-20:09

[18:31] Some people say that that means that like computer science has no value. Some people say that computer science has way more value. I tend to think. [18:40] It's more of the latter. [18:42] And the reason for that is because obviously you are still the one at the helm making the decisions, right? And a lot of how you make decisions and how you decide what to build and how you think about the trade-offs that you're making, it all goes back to computer science fundamentals. [18:57] Hmm. Yeah, I agree with that. I think if you want to make the analogy, and I think it is a good analogy to make to say, well, you're becoming a manager instead of an IC when you when you use these agents, the best managers are typically have. [19:11] if you're an engineering manager, typically have technical backgrounds, or the best CEOs too for software products, for example, tend to be able to go deep into how everything works to help resolve issues. And also to have good expectations set for what can be done. Yeah, it's like turning brickleaders into architects is one of the things that we've said as well. And yeah, to your point, if anything, the technical architect at the company [19:36] is actually the sickest software engineer. It's not somebody who's just walked in. If you say, "Oh, this person, [19:45] insane software engineer. What do you mean? Usually you don't mean that they type really fast. What you mean is they can break down problems. They just have a really great feel for things. They just think really logically. They cover all the cases. And I think those are the same skills that you're going to want to have. I think the thing that's kind of interesting is, and this is, by the way, not true, not just in code, but I would describe it as a lot of professions today,

20:10-21:46

[20:10] You know, there's four for people who just get started. There's almost like a hazing experience where you like spend your first few years doing the most boring stuff. Right. And then you get to graduate and do the interesting stuff. Right. And I think what we are going to have now is a little bit more of like an officer's school of like, you know, going straight into learning the interesting things. And if anything, that's more true. And like, you know what I mean? Like, it's like, it's probably more true in like investment banking or something like that, that like, you know, you spend like your first three years, you know, just going and going, going through spreadsheets. [20:40] right and then you get to do some of the cool stuff but but i think in code obviously yeah [20:44] I think it's like, it's like, uh, it's, it's not even in encoding, like it's maybe less intentional hazing, but you have to go through, uh, [20:52] six months of, you know, learning what a while loop is and what an if statement is before you can build anything interesting. And with Devon or Cloud Code or ChatGPT, you can build something like, [21:04] on your first prompt. And that's a huge, huge difference. Yeah. Yeah. Yeah. I think it's like, it's, it's like, [21:12] And I think it's not intentional hazing, you know, anywhere, or at least in most places, we like to think. Well, I mean, investment banking, I don't know. [21:22] I guess I was going to say, it's kind of like, it's like, there's a lot of this work that has to be done. [21:27] And like somebody has to do it. And so it's kind of like, you know, naturally it ends up being like the most junior team members that go and have to take it on. But but but now that's Devin, right? And then you kind of get to, you know, it's, it's, it's to your point, it's kind of like skipping one of the runs of the ladder and being able to be a manager directly and being able to be like an architect directly. Yeah, yeah.

21:46-23:18

[21:46] Totally. Well, but I want to get deeper into this. So, you know, we're talking about what does the future of software engineering look like? What does the future of the landscape look like? And in particular, what I'm really interested in is the day-to-day of... [22:02] what software engineers are doing in this new world. And I'm curious for specific, because I think the best way to think about this stuff is just look at what [22:12] people are doing right now because there are people who are living that way right now. I assume some of them are inside of cognition. I assume some of them are your customers. So what I want to understand is what does that actually look like and what are the new interesting things you're learning about the way that engineering works from this perspective? [22:27] Yeah, yeah, for sure. Yeah, so I'll give the long term answer and the short term answer. It's my favorite topic to talk about, by the way. It's like, what is the future of software sharing? Because, you know, it is, I think, still a pretty open question. I think in the long term, I think it's very clear that... [22:43] obviously these systems will continue to get more powerful. And a lot of what that looks like is just you as an engineer being able to operate at higher and higher levels of abstraction. Right. And, you know, it's kind of in the same way that we've made the jump from like assembly to like Python or JavaScript or something. It's, you know, we're going to make that leap from like looking at a bunch of boilerplate Python code to just being able to express your ideas in English of what you want to build. Right. And so at some point, you're not looking at your code, you're just like looking at your own product. And you're, you know, I actually think [23:12] the Jarvis kind of Iron Man style future is in a lot of ways correct in terms of what we'll have. Like,

23:18-24:52

[23:18] A lot of the interfaces are going to change pretty significantly when you have an intelligent agent that can go and execute tons of things for you and you can just go and work with all these things. It's not obvious that keyboard and mouse is necessarily the right input format in that world. [23:32] And so that's the long-term future. What does that mean for us today? Obviously, we're not all working with our own personal Jarvis quite yet today, right? But I think in code, you kind of see these different form factors emerging. [23:45] And I think the older one that has existed is what I'd call the IDE category of basically making you faster when your hands are on the keyboard. And that's all the tab complete and the chat with code base and all the tools there. And then the newer school is kind of this fuller agentic thing, running agents asynchronously in the background, having them take on full tasks. [24:08] And, you know, [24:10] You know, and the simple way to describe it is up until the point where the agents are capable enough to handle everything and let you just operate 100% in that higher layer of abstraction. You want to have both, right? Because you want to have agents for the things that they can go and take on and just do it entirely independently. And then you want to have the synchronous IDE experience for the things that really need you at the wheel. [24:35] And I would guess that that phase lasts for about like, [24:39] let's call it three years or so. For the next three years, we will have both IDEs and agents. And then at some point beyond that, it's kind of like, [24:46] everything will just be dictating what you want to, to some kind of agent form factor. Obviously it's not that the,

24:52-26:29

[24:52] Again, there, there's a cutoff of like, well, what counts as an IDE, you know, and what counts as an agent and, you know, the interfaces. Yeah. When you talk about, for example, Cloud Code, or sorry, when you talk about, for example, like the IDE or background agents, are you counting Cloud Code as a background agent? Like where does the new CLI, all the new CLI stuff fit into this worldview? Yeah. So it's all a spectrum for sure. And I think about these CLI agents as... [25:17] It's definitely somewhere in the middle. I would describe it as a bit more, it's a bit closer to like a synchronous agent. [25:22] Right. And so, you know, if you compare it to like a cascade or something in Windsurf, like it operates a little bit more like that, where, yes, it is an agent that can do multiple steps, but one. [25:32] you're kind of meant to be checking in with it more deeply. And two, it is not fully, fully autonomous in the sense that it doesn't go all the way and create pull requests for you and work with all of your systems. It usually doesn't go and test all your code for you or things like that. And so it's more a spectrum than a binary, for sure. I think in general, I think we will be kind of operating on the spectrum and things will be gradually shifting more and more agentic and more and more autonomous. But we will have the spectrum, at least for the next couple of years. [26:02] And, and then I think the natural question is like, what that experience should look like of using this suite of tools across the spectrum, right? And so you have like the full like synchronous things, like I think tab complete itself is probably the most synchronous thing. It's like, you are still really, you know, going and dictating every line of code, you know, tab is just like helping you go a little bit fat. Tab is kind of speculative decoding is my is my nerd view on it. Do you know what I mean by that? And then it goes all the way to these like full autonomous agents.

26:32-28:07

[26:32] Great. [26:33] Right. And then the question is kind of like, how do you split up your tasks into which ones should go into which buckets? And also for more complex tasks that are going across buckets, like how do you use these tools in tandem with one another? [26:46] Right. And I honestly, I think that's a pretty unanswered question today. [26:50] Frankly, like I think there are, you know, and I think there's a lot of progress that different folks are making the space. I think there's a lot of really, you know, really great thinkers in the space. But I think how you start with a synchronous experience and then hand off to an async and go back, you know, a simple example I might give is like. [27:08] Um, [27:09] a lot of what you want to do, let's say like in the world today, you know, with humans, you sit down and you're like, all right, [27:15] I have this project idea. I think we should build it. What's the first thing that you do? [27:21] Hopefully, it's not that you just immediately sit down and start typing code. A lot of it is you're just fleshing out the details, thinking about all the decisions that you're going to have to make. All right, we should build this new feature and [27:34] This feature is only going to imply to users in X, Y, and Z buckets. And if they're in bucket X, then they already have something that looks like this. And so we need to go replace that in the UI with the new thing that we're trying to build. If they're in Y, then it's like a totally fresh thing. So we should walk them through the onboarding. You're fleshing out all the details here. [27:51] of what you need to do. [27:53] Right? And then I think the next step is typically something like building out a clear spec, right? Like thinking about the technical details and building out all of that. Right? And then, depending on the cycle, obviously, there's more things. And then at some point, it's like kind of handing off the implementation.

28:08-29:44

[28:08] And I guess my point is, I think AI coding agents should be able to help you through the full cycle, or like this kind of suite of things should be able to help you through this whole thing, right? But naturally, there are parts of it that you want to be doing synchronously and parts of it that you want to be doing asynchronously, right? So going and making these decisions of how you handle users in different buckets or whatever is something where it's like... [28:29] you should be able to have the Jarvis experience of talking directly, you know, live with your agent. And your agent is like, oh, by the way, yeah, there's this if case, there's this case, there's this case. How do you want to handle each of these? Let's talk it through together. By the way, like, you know, I looked deeper into the code paths and here are the things that I found. But like, obviously, the decision is yours ultimately, right? [28:47] And then, you know, maybe there's other things of kind of just like building out the PRD itself, right? And then at some point when you're going and doing the literal implementation of the code and testing and just making sure that works, like that's something that probably should happen more async, right? Like you don't have to be involved once you've kind of fleshed out all the details in the actual implementation itself, right? And then maybe when it comes time to do the PR review cycle, then you want to be synchronously there. You want to be able to read the diffs, right? And so there are a lot of these kind of like... [29:16] Uh, yeah. [29:18] these flows in our work today where naturally you want to be able to go from like sync to async to sync to async, right? [29:25] Um, [29:26] And I think there's still a pretty open question of how you should be kind of working between those together. [29:33] One thing I'm curious your take on is, I think the AI coding space had this really interesting shift three months ago, where prior to three or four months ago,

29:44-31:13

[29:44] Everybody at Ery, including me, we're all using Cursor and Windsurf. [29:48] And then Cloud Code came out. And I mean, I used Cloud Code with Opus before it came out. And I was like, holy fucking shit, this is crazy. [30:01] And literally overnight, everybody and every, and I think a lot of people around the space, switched to these new CLI form factors. What do you make of that? Why did that happen? Obviously, people are still using Windsor. They're still using Cursor. [30:17] whatever, but a lot of the momentum went to CLIs. Why do you think it was so successful, and what do you make of it? Yeah. Yeah. Well, first of all, I think it's an incredible product experience, to be clear. And I think Anthropic has done very, very well with that. I think there are a few things going on here, you know, but broadly, the way I would describe it is... [30:38] Thank you. [30:39] Thank you. [30:39] The capabilities, you know, well, the capabilities change like every week, honestly, but there's, I would say, relatively meaningful step function changes every few months, let's say. And the thing that's really interesting is the correct form factor or like the correct interface is. [30:53] is a pretty tight function of the capabilities, if that makes sense, right? Like you, you, you, [30:59] If you were trying to do full autonomous things, you know, I'm making fun of myself a little bit here. If you were trying to do full autonomous things in the GPT 3.5 era, you know, it's like there were things that you could do, but obviously. Good luck, younger Scott.

31:29-33:00

[31:29] in a more autonomous way, for one. [31:31] And then I think the other thing is just like, I think Anthropic very clearly put a lot of love into the experience. You know, I actually think of it as I think a bit less... [31:42] So something I'll say, maybe this is a controversial take. I actually don't know that CLI itself is the most important part of the experience. [31:50] Um, and I would claim, you know, the reason I say that, you know, we have this debate actually internally at cognition of like. [31:57] Like, what is the form factor? You know, like, should the form factor, should it live in the IDE? Should it live in Slack? Should it live in, you know, should it be its own web app or whatever? And I think the answer that we kind of come to is like, [32:07] You know, [32:08] The form factor just is a software engineer. [32:11] If you know what I mean by that. And it's kind of a non-answer, but also, you know, hopefully it does say something, which is basically like, it's less a question of, well, where do you go to interact with your tools? It's more like a question of like, what do the tools do for you? [32:26] And how do you expect to work with them? [32:29] And so I think from that perspective, you know, engineers are spending a lot of time in the terminal. They're also spending a lot of time in the IDE. They're also spending a lot of time in Slack. You know, all of these are reasonable things to think about and to work with. [32:40] And I think as time goes on, I think probably a lot of these tools will be integrated with more and more of these and so that you can kind of call them from anywhere. But I think the bigger question, which I think Cloud Code took like a bit of a different spin on is how should you be working with the tool? [32:56] If that makes sense. And I think the...

33:00-34:43

[33:00] The way that I would describe it is kind of like, [33:04] their view of the tool is like the tool is... [33:06] You... [33:07] You know, whereas if you're in Windsurf and you're doing a tab completion, it's like, [33:13] Um, [33:14] this is like [33:15] Um, [33:16] You know, it's obviously very kind of like something that augments you, right? If you're calling something with Devin, um, [33:24] um, [33:25] then it's very much kind of like you're [33:28] the software engineer sitting next to you and they have their own virtual machine where all of that operates and they've spun up the repo themselves. Cloud code is kind of like [33:37] handing the reins over to your AI buddy to take the wheel of your computer, right? Which I think is an interesting paradigm, frankly. [33:43] I agree. I think the thing that struck me about it was... [33:49] It was like a full send... [33:52] to [33:55] a new version of agent of agentic engineering where previous except for devon um but previous iterations were always well the ai is like on the side and the cli form factor was actually no no all you need is to talk to the ai and that was the first time um that that happened for something that you're using on your own computer which i think is the other really crucial component at [34:20] Having it be able to run bash commands on your computer makes it way more extendable and customizable than if it's in some environment that you don't fully control. Or sometimes, for example, in Codex, the environment would be spun up and then spun down. I think it's more consistent, at least last time I checked, it was a consistent environment, but still harder to customize than your own machine.

34:43-36:15

[34:43] Yeah. Yeah. Yeah. And so I think there's a lot of different it's we talk about the entire kind of like. [34:50] There's like a hyperspace of all of the decisions that you can make, right? There's kind of like, you know, synchronous versus asynchronous. There's like local versus remote in terms of like the environment that it operates in. There's like, you know, in the IDE versus in other things, there's like single player versus multiplayer and whatever. And I think what we're seeing, [35:07] is just... [35:09] So there's obviously, you know, there's some exponential amount of possibilities in the space. I think a lot of different single points of the space are pretty interesting. And, you know, I think Devon is the one point of the space. I think Windsurf and other ideas are another point of the space. I think Anthropic unlocked a new point in the space with Claude Coote. [35:26] I think that the... [35:29] Frankly, I think that there will be a lot of these points that exist for quite a while. And I think that the full suite kind of should have... [35:35] um should have a lot of these different experiences because obviously it depends a lot on your particular use case or your flow um which one is best um [35:42] at each one time. [35:43] What do you think the tradeoffs are of the point in hyperspace that you're in? So in particular, I think the thing that makes Devon unique, at least in my testing, has been it's an agent that lives on its own computer in the cloud persistently that you can talk to at any time, which is just it is a different bet than pretty much any other big company has made. [36:03] I mean, it is almost like onboarding a software engineer, right? But once you do, there are a lot of tasks because Devon has its own environment and because it can go and learn how to test things and run all the tests itself that it just can do.

36:15-38:01

[36:15] you know, that basically nothing else out there can do. And I think it's interesting for us because, you know, even before this Windsurf, you know, acquisition, we were already thinking about this question of, well, what should we do [36:28] to [36:29] Basically, make it a lot easier and make an experience that's a lot more accessible for folks. We were talking about a few different ideas and more synchronous experiences. [36:40] And then obviously, you know, everything happened with Windsor. It was a great opportunity for us. And so a lot of how we see it today is, you know, I think there is going to be a lot of work in terms of. [36:51] not just like onboarding the agent, but also in human software engineers themselves, learning how to work with more and more async agents. [36:59] And I think... [37:01] I think it naturally has to kind of transition from a sync to an ASIC environment. And so that's kind of what has led to a lot of our thinking today is using Windsurf or having Windsurf as kind of like a really fast time to value option that you can immediately just kind of download and use and get a lot of value of. Over time, you learn how to work with that Cascade agent or you learn how to kind of like use the deep wiki indexing in Windsurf. And then naturally that takes you to more of the ASIC flow. [37:29] But yeah. [37:30] That makes a lot of sense. I'm curious. One of the things you said earlier that stuck out to me and I think is really true is that there is... [37:38] There's a tight dependency between model capability and what the right affordance or what the right harness is to use that model. So, you know, for example, with Cloud Code, it becomes possible to do a lot of the CLI stuff because Opus 4 and other models of that generation are good enough to make that work. Whereas for GBD 3.5, if you do the CLI, it would have been horrible.

38:08-39:44

[38:08] Thank you. [38:08] You guys don't have your own models, as far as I know. Maybe you have fine-tuned versions of models, but you're not building your own coding models. [38:14] Is that true? So we do a lot of post training of models. So we do like fine tuning and RL and things like that. We don't train. We don't pre-train base models. [38:22] Yeah. [38:23] I guess, why not? So given that there's a lot of overlap between the capability of the model and what the right harness or affordance is to use it, [38:34] Um, [38:36] Does that make you worried about competing with OpenAI or... [38:40] you know, anthropic when they have this like very tight coupling between... [38:45] These two things that are evolving together very rapidly. [38:49] On the other hand, I think post-training very much is. And some of the examples that I'd give you are things like, well, [38:56] You know, we want Devon to be able to predict its own confidence, to have an opinion on, you know, how likely it is to be able to do this or how well it understands the task or things like that. Right. Or, you know, maybe a more like direct practical one is like, [39:08] all right, get out. [39:09] One thing that engineers do a lot day to day is pulling up the data dog, finding the corresponding logs, and using that to debug what went wrong, and then making the right edits, right? [39:18] And that's a very specific flow that you obviously need to have custom training data in for. The models don't just learn that on their own. All of these things... [39:28] they fit actually quite naturally into post-training as a category. And so, you know, I think from our perspective, it is really a question of what we think we spike most on and where we want to focus. You know, I think as a startup, your edge...

39:44-41:27

[39:44] It always has to be speed and focus. [39:46] And I think for us, it's kind of like, I think we know what our core DNA is about. And a lot of it is just like understanding the nuances of [39:57] real-world software engineering and basically teaching that to the models in a way that that that you can build a great product experience um and that's what we focus on in post-training world right now rl environments are like really hot [40:11] And it strikes me that you all have been probably purposely building the perfect RL environment for post training software engineer. Tell me about that. Sure. Yeah, I know. I mean, it's one of the beautiful things about code. Right. And people, you know, people talk about this, obviously. But the fact that code is... [40:29] It has a much cleaner feedback loop because you can run the code, or there's all the version control. You can see every commit that was made in history. You have so many of these tools, which you would love to have in creating these things. And I guess the only thing I would say here is, [40:49] it really does come down to just... [40:53] building the exact custom environments for the use cases that you care about. And so all this, you know, we just talked about like Datadog or other things. There's, there's, [41:02] Honestly, like, [41:03] hundreds of these, you know, within code and like, you know, random things that come like COBOL, you know, turns out there's still a bunch of COBOL out there in the world, right? And it's not something that the language models are super kind of, you know, adapted to understandably, right? But like that is real work and, you know, real stuff that takes a lot of time for people today, right? And I think a decent bit of the work is,

41:27-42:56

[41:27] Of RL. [41:28] I've said this before, but my high-level view on RL is kind of like the platonic ideal of RL [41:35] is that you can go and solve any benchmark, right? [41:37] Thank you. [41:38] and, um, [41:40] And once, you know, once we have that, which I think we were getting closer and closer to having that, then the question is kind of just like, okay, well, what's the benchmark, right? And now the question that I think a lot of these application that our companies are thinking about is basically, what is the benchmark? Like, what is the exact set of tasks and environments? What are the tools that you're going to use? What are the decisions you would make? How are you going to decide whether it is a success or failure? And if you have described all of those things exactly, and you've kind of collected enough data points around that, then you can train a model that just does it. [42:10] Got that. But obviously, it just means that, to your point, having the right environments and the right use cases is even more important. [42:17] And I'll say that's very hard. Getting the answer, like how do we decide whether this is good or whatever, that is actually quite hard. I guess it strikes me, though, that there's maybe two ways to solve the kinds of problems we're talking about. So, for example, making a great software engineer. [42:37] One is having RL environments set up that mirror the kinds of problems that a normal software engineer would encounter, and then using that to generate data to train the model to be able to solve those problems, like the Datadog example, right?

42:59-44:30

[42:59] and enumerate what are the likely things that people are going to have to do and what are our users saying they want to do. [43:04] collect all that data and then train the model. On the other end, though, there's a lot of talk about continual learning, which I think I kind of think continual learning is already happening. It's just very sample inefficient. So on the other end of the spectrum, instead of having to do all that work, if you just made the model more sample efficient to be able to try stuff and learn, it would make it so you wouldn't have to do as much of the RL environment post-training type [43:34] the bet over here in the in the rl environment post training side instead of on the more sample efficient learning side yeah yeah um no it's a good question uh it's a it's a really good question i i think like um i'll give you my high level view on this is just kind of like a almost like a philosophical question even of like basically if we are going to have the full you know [43:59] Talked about all the reasons that ASI or HCI is like a bad, but the full intelligence that can do everything that we want it to do. [44:05] then obviously at some point it has to learn all of the practicalities of the real world, right? [44:10] And I would argue we're bottlenecked by that right now. We're not bottlenecked by pure logical reasoning. Like, you know, we could do some pretty insane logical reasoning with language models, right? And so like, how do you learn-- you know, for an accountant who's doing all their work every day or for like a paralegal, how do you learn the practicalities of what somebody does that way and build a model that is like intelligent about that, right?

44:31-46:08

[44:31] And obviously, you know, the... [44:33] the best way to learn something is to do it. And so you need this actual data of what this particular paralegal does in some form. And so then to your point, I think there's kind of this fork in the road of where [44:46] Uh, [44:47] where that data comes from. [44:50] And I would say I would add a third one, just for completeness sake. I think most people agree. But the three that come to mind are, one is... [44:59] "Well, the data exists in the world already. You just have to go get it." That was kind of like the pre-training view of the world, right? You just take more and more of the internet, you train it all, and because everybody has stuff on the internet, [45:09] If you just keep doing that over and over, eventually you get a model that knows everything. Two is kind of like-- [45:15] Well, I don't know. [45:15] it has to go be built by, you know, experts themselves, you have to go and like really curate an environment and figure out exactly, you know, like these like 500 environments of, of your one task, and then do that for every task that you could possibly imagine. And that's how you do that. And then three, to your point is like, you have an agent that can go out and do it by itself. [45:36] Right. So I think two and three are kind of that. That's that. Those are the versions that you're describing of like, do we go and handcraft the environments versus versus is there some continual learning that works out? Right. Yeah. Like it does it and fails. You know, you onboard it to your company and it fucks up how to log into Datadog a few times, but then it figures it out, you know. [45:56] Yeah, yeah. And I think the I think the short answer is basically, I think we will get to three. But I think a lot of the problems that you solve along the way actually naturally apply to both.

46:08-47:43

[46:08] Right. And so, so I think it's kind of one, like the pre-training world, you know, [46:12] Personally, I kind of think we've, we've, [46:14] Roughly, we're kind of converging on a lot of the capabilities on pre-training at this point. Two is this RL world, which is actively the world that we're all in, is going and doing custom RL to find particular capabilities. And then three is this kind of long-term continual learning, which will obviously unlock some really big things. But I guess I would just point out that for both two and three, to your point, [46:38] A lot of what you have to do [46:39] is you just have to create actual agents that can operate in the real world, right? And the way I kind of think about it is I think the most important thing in 2 that's making it more successful is you can just be much tighter about curating the reward function. [46:53] Right. And so like simple example of this is, you know, one of our evals or one of our environments is like, you know, there's a there's a Grafana dashboard that you need to set up. And for some reason, it's not working. So please figure out what went wrong. And it's very much meant to be like a. [47:09] Messy real-world software, this is the kind of stuff that you'll run into day to day. And the way the task works is obviously you go and install the packages, you run the code and stuff, and then you find some error. It turns out the error is because the version lock of the packages was slightly off in a way that makes it... It's the kind of stuff that you run into all the time as an issue. So then you have to find this, and you realize you've got to downgrade the version of this, and then that leads to the next thing, and you figure out what went wrong with that error, and then you get the thing running, right? [47:37] And at the end, there is a dashboard. And the really beautiful thing about an eval like this is you can just make the eval just

47:43-49:09

[47:43] So what does the dashboard say? [47:46] And the thing is, if you did not get the Grafana dashboard running, you will never be able to answer that question. And if you did get it running, you will always get the right answer. Right. And this curation just means you have a much, much tighter feedback signal. Right. For three, you would want to be able to do this kind of like live in the real world and have that feedback cycle. The problem is it's like a lot tougher of like, well, you can get the dashboard running and then have the wrong number and then like, [48:08] How are we going to know that that was wrong? Or we could do this other thing, right? And so it's not an unsolvable problem. I mean, I think we will make more and more progress on it over time. But I guess my point is just like, either way, building the full environments and the tooling for the agents that can do this is kind of the primitive that you need to be able to do both of these routes. [48:28] That does make sense. [48:33] The thing that comes to mind there is, or a worry, which I'm curious for your opinion on, is the Grafana eval. [48:43] In that case, how generalizable have you found being good at that eval is? So the fact that it can set up a Grafana dashboard, if Grafana changes the way that their whole system works and that breaks the eval, are you training? Does it end up being too brittle if it's trained for these specific kinds of environments that could end up changing pretty fast? Yeah. So the first answer is yes, you do need a bunch of environments, naturally, right?

49:13-50:47

[49:13] meta level answer I'd give to you is kind of like, it really depends on how you set up the task, right? And so, you know, if the task is literally just like, all right, you just have to recall exactly what packages are needed for to run this version of Grafana from memory, then yeah, it's like, that's not very generalizable. Because as soon as the next version comes out, that's not going to be a null, right? But if the task is meant to be such that, okay, you're going to go and Google this. And then you're going to go and read the docs page of Grafana. And you're going to use that and understand what went wrong. And then you're going to look at the error in your logs and [49:43] that file, you're going to use that, right? So I guess my point is like, [49:46] We as humans all figure it out somehow. And the way I would kind of broadly describe that we figure it out is we actually interact with the real world in a way that gives us the information that we need rather than just pulling it all from memory. And as long as your agent is set up to do that, that skill is something that is very generalizable. [50:05] What's an AI that you're excited about that has nothing to do with cognition or Devon? Oh, interesting. I've always felt like... [50:15] these personal agents should just be a thing. Like I'm surprised it hasn't happened already. I guess this is the way I would put it. Like, you know, I think obviously there was like operator, right? And there's kind of some deep research, I think is another good example. But, but, but like, I think just like a mass consumer agent that you can just have on your phone that just takes care of things for you. I mean, it's. [50:34] It feels like the capabilities are there for that. Maybe I'm wrong. And it feels like something like that would be so valuable. Everyone always gives the example of like, schedule my dentist appointment or whatever. What does it do?

51:04-52:29

[51:04] make sure like the package delivery, like went through and then go at, you know, it feels like there should be something there or it's like buying my Amazon packages for me. It's just like, Hey, like I need another order of X or Y or whatever that just goes against. And I've always thought, yeah, that that should be something. It's funny because we actually, I mean, [51:22] Thank you. [51:23] We obviously build Devon entirely for coding, but we've kind of messed around with it. I'm just seeing like, hey, can Devon do this? And it's like, yeah, we actually order all of our Amazon packages with Devon now. That's amazing. [51:38] The Slack message or the linear ticket is not the right form factor for that. Somebody else should build the right... But the agent capabilities are there, I guess is my point. [51:53] keys, you might as well be able to handle your Amazon account as well. And because you have the browser, that is actually the set of things that you need to be able to do that. But it feels like [52:04] really taken off in a way that I would imagine, you know, like, I would imagine that 12 months from now, we will not be, you know, [52:12] like like that that we will have this but but but but i would love to have it sooner if possible so that's really interesting well i hope to have you back on the show 12 months from now talking about uh the future of personal agents and hopefully some really cool updates to devon [52:27] Yeah. Yeah. Awesome. Thank you so much for having me.

52:57-53:20

[52:57] insights and laughter that will leave you on the edge of your seat. [53:01] craving for more. It's not just a show, it's a journey into the future with Dan Shipper as the captain of the spaceship. [53:09] So do yourself a favor, hit like, smash subscribe, and strap in for the ride of your life. [53:14] And now, without any further ado, let me just say, Dan, I'm absolutely hopelessly in love with you.

Want to learn more?