you2idea@video:~$ watch z1ISq9Ty4Cg [1:25:13]
// transcript — 2871 segments
0:01 do lead work on Codex. >> Codex is Open's coding agent. We think
0:05 of Codex as just the [music] beginning of a software engineering teammate. It's
0:09 a bit like this really smart intern that refuses to read Slack, doesn't check
0:12 Data [music] Dog unless you ask it to. >> I remember Carpot that he tweeted the
0:16 gnarliest bugs that he runs into that he just spends hours trying to figure out.
0:19 Nothing else has solved. He gives it to Codex, lets it run for an hour and it
0:21 solves it. >> Starting to see glimpses of the future where we're actually starting to have
0:26 Codeex be on call for its own training. Codex writes a lot of the code that
0:29 helps like manage its training run. the key infrastructure. [music] And so we
0:32 have a codeex code review is like catching a lot of mistakes. It's
0:34 actually caught some like pretty interesting configuration mistakes. One
0:37 of the most mind-blowing examples of acceleration, the Sora Android app, like
0:42 a fully new app. We built it in 18 days and then 10 days later, so 28 days
0:45 total, we went to the public. >> How do you think you win [music] in this
0:47 space? >> One of our major goals with Codex is to get to proactivity. If we're going to
0:51 build a super assistant has to be able [music] to do things. One of the
0:54 learnings over the past year is that for models to do stuff, they are much more
0:57 effective when they can use a computer. It turns out the best way for models to
1:00 use [music] computers is simply to write code. And so we're kind of getting to
1:02 this idea where if you want to build any agent, maybe you should be building a
1:04 coding agent. >> When you think about progress on codecs,
1:08 I imagine you have a bunch of evals and there's all these public benchmarks.
1:11 >> A few of us are like constantly on Reddit. You know, there's a there's
1:14 praise up there and there's a lot of complaints. What we can do as a product
1:17 team just [music] try to always think about how are we building a tool so that
1:20 it feels like we're maximally accelerating people rather than building
1:23 a tool that makes it more unclear what you should do as the human. Being at
1:27 OpenAI, I can't not ask about how far you think we are from AGI.
1:29 >> The current underappreciated [music] limiting factor is literally human
1:32 typing speed or human multitasking speed. Today, my guest is Alexander Emiros,
1:39 product lead for Codeex, OpenAI's incredibly popular and powerful coding
1:44 agent. In the words of Nick Turley, head of Chachi BT and former podcast guest,
1:48 Alex is one of my all-time favorite humans I've ever worked with, and
1:51 bringing him and his company into OpenAI ended up being one of the best decisions
1:56 we've ever made. Similarly, Kevin Wheel, OpenAI CPO, said, "Alex is simply the
2:00 best." In our conversation, we chat about what it's truly like to build
2:04 product at OpenAI. How Codeex allowed the Sora team to ship the Sora app,
2:07 which became the number one app in the app store in under one month. Also, the
2:12 20x growth Codex is seeing right now and what they did to make it so good at
2:16 coding. Why his team is now focused on making it easier to review code, not
2:20 just write code. His AGI timelines, his thoughts on when AI agents will actually
2:25 be really useful, and so much more. A huge thank you to Ed Baze, Nick Turley,
2:28 and Dennis Yang for suggesting topics for this conversation. If you enjoy this
2:32 podcast, don't forget to subscribe and follow it in your favorite podcasting
2:35 app or YouTube. And if you become an annual subscriber of my newsletter, you
2:40 get a year free of 19 incredible products, including a year free of
2:45 Devon, Lovable, Replet, Bolt, Nadam, Linear, Superhum, Descript, Whisper
2:48 Flow, Gamma, Perplexity, Warp, Granola, Magic Patterns, Raycast, Champardd, Mob
2:52 and Post Hog, and Stripe Atlas. Head on over to lenniesnewsletter.com and click
2:56 product pass. With that, I bring you Alexander and Biricos after a short word
3:00 from our sponsors. Here's a puzzle for you. What do OpenAI, Cursor, Perplexity, Verscell, Platt, and
3:07 hundreds of other winning companies have in common? The answer is they're all
3:12 powered by today's sponsor, Work OS. If you're building software for
3:14 enterprises, you've probably felt the pain of integrating single signon, skim,
3:20 arbback, audit logs, and other features required by big customers. Work OS turns
3:25 those deal blockers into drop-in APIs with a modern developer platform built
3:29 specifically for B2B SAS. Whether you're a seedstage startup trying to land your
3:33 first enterprise customer or a unicorn expanding globally, work OS is the
3:36 [music] fastest path to becoming enterprise ready and unlocking growth.
3:40 They're essentially Stripe for enterprise features. Visit workos.com to
3:45 get started or just hit up their Slack support where they have real engineers
3:48 in there who answer your questions super fast. Workos allows you to build like
3:53 the best with delightful APIs, comprehensive docs, and a smooth
3:58 developer experience. Go to works.com to make your app enterprise ready today.
4:03 This episode is brought to you by Finn, the number one AI agent for customer
4:07 service. If your customer support tickets are piling up, then you need
4:11 Finn. Finn is the highest performing AI agent on the market with a 65% average
4:16 resolution rate. Finn resolves even the most complex customer queries. No other
4:20 AI agent performs better. In head-to-head bake offs with competitors,
4:25 Finn wins every time. Yes, switching to a new tool can be scary, but Finn works
4:29 on any help desk with no migration needed, which means you don't have to
4:32 overhaul your current system or deal with delays in service for your
4:35 customers. [music] And Finn is trusted by over 6,000 customer service leaders
4:39 and top companies like Anthropic, Shuttertock, Cynthia, Clay, Vant,
4:43 Lovable,Mundday.com, and more. And because Finn is powered by the Finn AI
4:47 engine, which is a continuously improving system that allows you to
4:50 analyze, train, test, and deploy with ease, Finn can continuously improve your
4:54 results, too. So, if you're ready to transform your customer service, and
4:58 scale your support, give Finn a try for only 99 cents per resolution. Plus, Finn
5:02 comes with a 90-day money back guarantee. Find out how Finn can work
5:08 for your team at f.ai/lenny. Alexander, thank you so much for being
5:18 here and welcome to the podcast. >> Thank you so much. I've been following
5:21 for ages and I'm excited to be here. >> I'm even more excited. I really
5:24 appreciate that. I want to start with your time at Open AI. So, you joined
5:30 OpenAI about a year ago. Before that, you had your own startup for about 5
5:34 years. Before that, you were a product manager at Dropbox. I imagine OpenAI is
5:39 very different from every other place you've worked. Let me just ask you this.
5:44 What is most different about how OpenAI operates? And what's something that
5:47 you've learned there that you think you're going to take with you wherever
5:50 you go, assuming you ever leave? By far, I would say the speed and ambition of
5:54 working at OpenAI are just like dramatically more than what I can
5:58 imagine. And you know, I guess it's kind of an embarrassing thing to say because
6:01 you, you know, everyone who's a startup founder thinks like, "Oh yeah, my
6:04 startup moves super fast and the talent bar is super high and we're super
6:07 ambitious." But I have to say like working at OpenAI just kind of like made
6:10 me reimagine what he what that even means. We hear this a lot about, you
6:14 know, feels like every AI company is just like, "Oh my god, I can't believe
6:17 how fast they're moving." Is there an example of just like, "Wow, that
6:19 wouldn't have happened this quickly anywhere else." >> The most obvious thing that comes to
6:23 mind is just like the the explosive growth of codeex itself. I think it's a
6:27 while since we bumped our external number, but like you know it's like the
6:32 the 10xing of Codeex's scale was just like super fast in a matter of months
6:37 and it's like well more since then and you know like once you've lived through
6:40 that or at least speaking for myself like having lived through that now I
6:45 feel like anytime I'm going to spend my time on like you know building tech
6:49 product there's that kind of that speed and scale that I now need to to to meet.
6:54 If I think of like what I was doing in my startup, it moved like way slower.
6:58 And I, you know, there's always this balance with startups of like how much
7:01 do you commit to an idea that you have versus like find out that it's not
7:06 working uh and then pivot. But I think one thing I've realized at OpenAI is
7:09 like the the amount of impact that we can have and in fact need to have to do
7:13 a good job is so high that it it's I have to be like way more ruthless with
7:16 how I spend my time. Before we get to codeex, is there a way that they've
7:20 structured the org or I don't know the way that open operates that allows the
7:23 team to move this quickly because everyone everyone wants to move super
7:27 fast. I imagine there's a structural approach to allowing this to happen.
7:30 >> I mean, so one thing is just the technology that we're building with has
7:35 like just transformed so many things, you know, from like both how we build
7:39 but also like what kinds of things we can enable uh for users. And you know we
7:43 spend most of our time talking about like the sort of improvements within the
7:47 foundation models but I I believe that even if we had no more progress today
7:51 with models which is absolutely not the case but if even if we had no more
7:55 progress we are way behind on product. There's so much more product to build.
7:59 >> So I think like just like the moment is ripe if that makes sense.
8:03 >> But I think there's a lot of sort of counterintuitive things that surprised
8:06 me when I arrived as far as like how things are structured. One example that
8:10 comes to mind is like when I was working on my startup and and before that when I
8:12 was a dropbox, it was like very important, you know, especially as a PM
8:16 to like always kind of rally the ship and it was kind of like make sure you're
8:18 pointed in the right direction and that you can like accelerate in that
8:24 direction. But here I think because we don't exactly know like what
8:27 capabilities will even come up soon and we don't know what's going to work uh
8:31 technically and then we also don't know what's going to land even if it works
8:34 technically. It's much more important for us to be very like humble and learn
8:39 a lot more empirically and just try things quickly and like the org is is
8:44 set up in that way to to be incredibly bottoms up. You know, this is again one
8:47 of those things that like as you were saying, everyone wants to move fast. I
8:50 think everyone likes to say that they're bottoms up or at least a lot of people
8:53 do, but OpenAI is like truly truly bottoms up and that's like been a
8:58 learning experience for me that now like it it'll be interesting if I ever work
9:02 at like I don't think it'll ever it'll even make sense to work at a nonAI
9:05 company in the future. I don't even know what that means. But if I were to
9:08 imagine it or go back in time, I think I would like run things totally different.
9:12 >> What I'm hearing is kind of this uh ready, fire, aim uh is the approach more
9:17 than ready, aim, fire. And this something and as you processed that uh
9:21 because that may not come across well but I actually have heard this a lot at
9:25 AI companies is because you don't know and Nick Charlie shared I think the same
9:28 sentiment because you don't know how people will use it. It doesn't make
9:31 sense to spend a lot of time making it perfect. It's better to just get it out
9:36 there in a primordial way see how people use it and then go big on that use case.
9:41 Yeah. It's like to okay to use this analogy a little bit I feel like there
9:44 there is an aim component but the aim component is much fuzzier. you know,
9:48 it's kind of like roughly what do we think can happen? like someone um I've
9:52 learned a ton from working here is a is a research lead and he likes to say that
9:57 like in open AI we can can have really good conversations about something
10:01 that's like a year plus from now and you know there's a lot of ambiguity in what
10:04 will happen but but like that's a right sort of timeline and then we can have
10:07 really good conversations about what's happening like in like low months or low
10:11 or weeks but there's kind of this like awkward middle ground which was like as
10:14 you start approaching a year but you're not at a year where it's like very
10:18 difficult to reason about right and so as far As far as like aiming, I think we
10:21 want to know like, okay, what are some of the futures that we're trying to
10:24 build towards and like a lot of the problems we're dealing with in AI, like
10:26 such as alignment, are problems you need to be thinking out like really far out
10:30 into the future. So, we're kind of aiming fuzzily there. But when it comes
10:34 down to the more tactically like, oh yeah, like what product will we build
10:37 and therefore how will people use that product? That's the place where we're
10:40 much more like let's find out empirically. >> That's a good way of putting it.
10:44 Something else that when people hear this, they people sometimes hear
10:49 companies like yours saying, "Okay, we're gonna be bottoms up. We're gonna
10:51 try a bunch of stuff. We're not going to have exactly a plan of where it's going
10:55 in the next few months." The key is you all hire the best people in the world.
10:59 And so that feels like a really key ingredient in order to be this
11:02 successful at Bottoms Upwork. it just super resonates basically.
11:07 >> Um I was just like again surprised or even shocked when I arrived at like the
11:11 level of like individual like drive and like autonomy that everyone here has. So
11:18 I think like the way that OpenAI runs like many you can't like read this or be
11:22 on listen to a podcast and be like I am I'm just going to deploy this to my
11:26 company. Um you know maybe this is a harsh thing to say but I think like yeah
11:28 very few companies have the talent caliber to be able to do that. So it
11:33 might need to be like adjusted if you were going to implement this.
11:36 >> Okay. So let's talk codeex. You lead work on codeex. How's codeex going? What
11:40 numbers can you share? Is there anything you can share there? Also just not
11:43 everyone knows exactly what codeex is. Explain what codeex is. Totally. Yeah.
11:48 So uh I have the very lucky job of of living in the future and leading
11:53 products on codeex. Um and codeex is open coding agent. So super concretely
11:59 that means it's an IDE extension VS code extension uh that you can install or a
12:02 terminal tool that you can install and when you do so you can then basically
12:06 pair with codeex to answer questions about code write code uh you know run
12:12 tests execute code and do a bunch of the work in sort of that like thick middle
12:15 section of the software development life cycle which is all about uh you know
12:19 writing code that you're going to get into production. Uh more broadly we
12:25 think of codeex as like it's the what it currently is is just the beginning of a
12:29 software engineering teammate. And so you know when we when you when we use a
12:32 big word like teammate like some of the things we're imagining are that it's not
12:36 only able to to write code but actually it participates like early on in like
12:40 the ideation and planning phases of writing software and then further
12:43 downstream in terms of like validation deploying and like maintaining code. to
12:48 make that a little more fun. Like one thing I like to imagine is like if you
12:51 think of what Codex is today, it's a bit like this like really smart intern that
12:55 like refuses to read Slack and like doesn't check data dog or like Sentry
12:59 unless you ask it to. And so like no matter how smart it is, like how much
13:02 are you going to trust it to write code without you also working with it, right?
13:05 So that's how people use it mostly today is they pair with it.
13:08 >> But we want to get to the point where you know it can work like just like a
13:12 new intern that you hire, you don't only ask them to write code, but you ask them
13:15 to participate across the cycle. And so you know that like even if they don't
13:17 get something right the first try, they're eventually going to be able to
13:20 iterate their way there. >> I thought the way uh I thought the point
13:23 about not reading Slack in Dave Dog was it's just not distracted. It's just
13:26 constantly focused and is always in flow. But I get what you're saying there
13:30 is it doesn't have all the context on everything that's going on.
13:33 >> And like that's not only true when it's performing a task, but again if you
13:36 think of like the best human teammates, like you don't tell them what to do,
13:39 >> right? Like maybe when you first hire them, you have like a couple meetings
13:42 and you're like, "Hey, like you kind of learn like, okay, this is this these
13:45 prompts work for this teammate. These prompts don't, right? This is how to
13:48 communicate with this person." Then eventually you give them some starter
13:50 tasks. You delegate a few tasks. But then eventually you just say like, "Hey,
13:53 great. Okay, you're working with this set of people in this area of the
13:57 codebase. You know, feel free to work with other people in other parts of the
14:00 codebase too even." And yeah, you tell me what you think makes sense to be
14:03 done, right? And so, you know, we think of this as like proactivity and like one
14:06 of our major goals with Codeex is to like get to proactivity.
14:12 I think this is this is like critically important to like achieve the mission of
14:15 OpenAI which is to deliver the benefits of AI to all humanity. You know, I like
14:19 to joke today that like AI products and it's it's a half joke. They're actually
14:23 like really hard to use because you have to like be very thoughtful about when it
14:29 could help you. And if you're not prompting a model to help you, it's
14:33 probably not helping you at that time. And if you think of how many times like
14:36 the average user is prompting AI today, it's probably like tens of times. But if
14:40 you think of how many times people could actually get benefit from a really
14:44 intelligent entity, it's thousands of times per day. And so a large a large
14:48 part of our our goal with codeex is to figure out like what is the shape of an
14:52 actual teammate agent that is sort of helpful by default. When people think
14:57 about cursor and uh even cloud code, it it's like a IDE that helps you code and
15:01 kind of autocompletes code and maybe does some agentic work. What I'm hearing
15:05 here is the vision is is different which is it's a teammate. It's like a remote
15:09 teammate, a building code for you that you talk to and ask to do things and it
15:14 also does IDE autocomplete and things like that. Is that is that a kind of a
15:17 differentiator in the way you think about codecs? It's basically this idea
15:22 that like we want the way like if you're a developer and you're trying to get
15:25 something done, we want you to just feel like you have superpowers and you're
15:29 able to move much much faster. But we don't think that in order for you to
15:33 reap those benefits, you need to be sitting there constantly thinking about
15:37 like how can I invoke AI at this point to do this thing. We want you to be able
15:40 to sort of like plug it in to the way that you work and have it just start to
15:43 do stuff without you having to think about it. >> Okay. I have a lot of questions along
15:46 those lines, but uh just how's it going? Is there any stats, any numbers you can
15:49 share about how Codex is doing? >> Yeah, it's been Codex has been growing
15:53 like absolutely explosively um since the launch of GPT5 back in August. Um
15:57 there's some definitely some interesting like product insights to talk about as
16:00 to like how we unlock that growth if you're interested. But yeah, the last
16:03 the last stat we shared there was like we we were like well over 10x since
16:08 August. In fact, it's been like 20x since then. Um, also the codex models
16:12 are serving many many trillions of tokens a week now and it's basically
16:17 like our most served coding model. Um, one of the really cool things that we've
16:20 seen is that the way that we decided to set up the codeex team uh was to build a
16:25 you know really tightly integrated product and research team that are
16:28 iterating on the model and the harness together. And it turns out that lets you
16:32 just do a lot more and try many more experiments as to how these things will
16:36 work together. And so we were just training these models for use in our
16:40 first party harness that we were very opinionated about. And then what we've
16:44 started to see more recently actually is that other major sort of API coding
16:48 customers are now starting to adopt these models as well. And so we've
16:51 reached a point where actually the codeex model is the most served coding
16:55 model in the API as well. >> You uh hinted at this uh what unlocked
17:00 this growth? I am extremely interested in hearing that. It felt like before, I
17:04 don't know, maybe this was before you joined the team. It just felt like cloud
17:07 code was killing it. Just everyone was sitting on top of cloud code. It was by
17:11 far the best way to code. And then all of a sudden, Codex comes around. I
17:16 remember Carpathy tweeted that he just like has never seen a model like this.
17:20 He I think the tweet was the gnarliest bugs that he runs into that he just
17:23 spends hours trying to figure out. Nothing else has solved. He gives it to
17:27 Codeex, lets it run for an hour, and it solves it. What What did you guys do? We
17:32 have this strong sort of mission here at OpenAI to you know basically to build
17:38 AGI. Um and so we we think a lot about what how can we shape the product so
17:43 that it can scale right you know earlier I was mentioning like hey like if you're
17:45 an engineer you should be getting help from from AI like thousands of times per
17:50 day right and so we thought a lot about the primitives for that when we launched
17:54 our first version of codeex uh which was Codex cloud and that was basically a
17:58 product that had its own computer lives in the cloud you could delegate to it
18:02 and you know the sort of the coolest part about that was you could run many
18:05 many tasks in parallel But some of the challenges that we saw
18:11 are that it's a little bit harder to set that up both in terms of like
18:14 environment configuration like giving the model the tools it needs to validate
18:18 changes and to learn how to prompt in that way. And sort of my my analogy for
18:22 this is going back to this teammate analogy. It's like if you hired a
18:26 teammate but you're never allowed to get on a call with them and you can only go
18:30 back and forth, you know, asynchronously over time. like that works for some
18:33 teammates and eventually that's actually how you want to spend most of your time.
18:36 So that's still the future, but it's hard to initially adopt.
18:40 So we still have that vision of like that's what we're trying to get you to a
18:43 teammate that you delegate to and then is proactive and we're seeing that
18:48 growing. But the key unlock is actually first you need to land with users in a
18:51 way that's like much more intuitive and like trivial to get value from. So the
18:56 way that most people discover like the vast majority of users discover codeex
19:00 today is either they download an IDE extension or they run it in their CLI
19:05 and the agent works there with you on your computer interactively and uh it
19:09 works within a sandbox which is actually like a really cool piece of tech to to
19:13 help that be safe and secure but it has access to all those dependencies. So if
19:17 the agent needs to do something like it needs to run a command it can do so
19:20 within the sandbox. we don't have to set up any environment and if it's a command
19:23 that doesn't work in the sandbox it can just ask you and so you can get into
19:27 this like really strong feedback loop using the model and then over time like
19:31 our team's job is to like help turn that feedback loop into you sort of as a
19:35 byproduct of using the product configuring it so that you can then be
19:39 delegating to it down the line and again analog you keep coming back to it but
19:43 like if you hire a teammate and you ask them to do work but they you just give
19:46 them like a fresh computer from the store it's going to be hard for them to
19:49 do their job right but if as you work with them side by side. You could be
19:52 like, "Oh, you don't have a password for this service we use. Here's the password
19:56 for this service." You know, yeah, don't worry. Feel free to run this command.
19:59 Then it's like much easier for them to then go off and do work for hours
20:03 without you. So, what I'm hearing is the initial version of Codeex was almost too
20:06 far in the future. It's like a remote in the cloud uh agent that's coding for you
20:11 asynchronously. And what you did is okay, let's actually come back a little
20:15 bit. Let's integrate into the way engineers already integrate into IDs and
20:20 locally and help them kind of on ramp to this new world. Totally. And this was it
20:26 was quite interesting because we we dog food product a ton at OpenAI. So you
20:30 know dog food as in we use our own product and so Codex has been
20:34 accelerating OpenAI over the course of the entire year and the cloud product
20:38 was a massive accelerant to the company as well. Um it just turns out that this
20:44 is one of those places where the signal we got from dog fooding is a little bit
20:47 different from the signal you get from like the general market because at
20:50 OpenAI you know we train reasoning models all day and so we're very used to
20:54 this kind of prompting thing and like you know think up front run things
20:59 massively in parallel and uh you know it would take some time and then come back
21:03 to it later asynchronously and so you know now when we build we still get a a
21:06 ton of signal from dog footing internally but uh you know we're also
21:11 very cognizant of like the different ways that different audiences use the
21:14 product. That's really funny. It's like live in the future but maybe not too far
21:17 in the future. And I could see how everyone open AI is living very far in
21:21 the future and sometimes that won't that won't work for everyone.
21:25 >> Yeah. What about just like uh intelligence training data? I don't
21:28 know. Is there something else that helped Codeex accelerate its ability to
21:32 actually code? Is it like better, cleaner data? Is it more just models
21:36 advancing? Is there anything else that really helped accelerate? Yeah. So
21:41 there's like a few components here. Um I guess you know you were mentioning
21:44 models and the models have improved a ton. In fact um just last Wednesday we
21:50 shipped GPD 5.11 CEX Max a very you know accurately named model. Uh that is that
21:56 is awesome. It is awesome both because it is um for any given task that you
22:01 were using GPD 5.1 codecs for it's like you know roughly uh 30% faster at
22:06 accomplishing that task but also it unlocks a ton of intelligence. So if you
22:10 use it at our higher reasoning levels, it's just like even smarter. Um, and you
22:13 know that that feedback that or that tweet you were saying like Karpathi made
22:16 about like, hey, give us your gnarliest bugs like you know obviously there's a
22:20 ton going on in the market right now, but like Codex Max is definitely like
22:24 carrying that mantle of uh, you know, tackling the hardest bugs. Um, so that
22:28 is that is super cool. But I will say it's like some of what how we're
22:32 thinking about this is evolving a little bit from being like yeah we're just
22:35 going to think about the model and like let's just like train the best model to
22:38 really thinking about like what is an agent actually overall right and you
22:43 know I'm not going to try to define agent exactly but at least the stack
22:46 that we think of it as having is it's like you have this model really smart
22:51 reasoning model that knows how to do a specific kind of task really well. So we
22:53 can talk about how we make that possible. But then actually we need to
22:59 serve that model through an API into a harness. And both of those things also
23:03 have a really big role here. So for instance, one of the things uh that
23:07 we're really proud of is you can have GP5.1 CX max work for really long
23:11 periods of time. That's not like normal, but you can set it up to do that or that
23:15 might happen. But now routinely we'll hear about people saying like yeah, it
23:18 ran like overnight or it ran for 24 hours. M >> and so you know for a model to work
23:22 continuously for that amount of time it's going to exceed its context window
23:25 and so we have a solution for that which we call compaction. Um but compaction is
23:30 actually a feature that uses like all three layers of that stack. So you need
23:36 to have a model that has a concept of compaction and knows like okay as I
23:39 start to approach this context window I might be asked to like prepare to be run
23:43 in a new context window. And then at the API layer, you need an API that like
23:47 understands this concept and like has an endpoint that you can hit to do this
23:50 change. And at the harness layer, you need a harness that can like prepare the
23:53 payload for this to be done. And so like shipping this compaction feature that
23:56 now just like made this behavior possible to like anyone using codecs
23:59 actually been working across all three things. And I think that's like
24:03 increasingly going to be true. Another maybe like underappreciated version of
24:08 this is is if you think about all the different coding products out there,
24:10 they all have like very different tool harnesses with like very different
24:14 opinions on how the model should work. And so if you want to train a model to
24:17 be good at like all the different ways uh it could work. Like you know maybe
24:20 you have a strong opinion that it should work using semantic search, right? Maybe
24:24 you have a strong opinion that it should like call bespoke tools or maybe you
24:27 have like in our case a strong opinion that it should just use like the shell
24:32 work in the terminal. You know, you can be much you can move much faster if
24:34 you're just optimizing for one of those worlds, right? And so the way that we
24:38 built codeex is that it just uses the shell. But in order to make that like
24:43 safer and secure, we uh have a sandbox that the model is used to operating in.
24:46 So I think one of the biggest accelerants to go all the way back to
24:49 your to your answer question Russian is just like we're building all three
24:52 things in parallel and like kind of tuning each one and um you know
24:56 constantly experimenting with how those things work with like a tightly
24:59 integrated product and research team. How do you think you win in this space?
25:04 Do you think it it'll event it'll always be this kind of like race with other
25:08 models constantly kind of leaprogging each other? Do you think there's a world
25:11 where someone just t runs away with it and no one else can ever catch up? Is
25:15 there like a path to just we win? >> Again comes back to this idea of like
25:19 building a teammate and not just a teammate that you know uh participates
25:24 in team planning and prioritization. Not just a teammate that you know really
25:27 tests its code and like helps you maintain and deploy. But even a teammate
25:31 you know like if you think again an engineering teammate they can also like
25:34 schedule a calendar invite right or move standup or do whatever right. And so in
25:42 my mind, if we just imagine that every day or every week some like crazy new
25:46 capability is just going to be deployed by a research lab, it's just impossible
25:50 for us like you know as humans to keep up and like use all this technology. And
25:54 so I think we need to get to this world where you kind of just have like an AI
25:59 teammate or super assistant that you just talk to and it just knows how to be
26:04 helpful like on its own, right? And so you don't you don't have to be like
26:07 reading the latest tips for how to use it. You just like you've plugged it in
26:11 and it just provides help. And so that's kind of the shape of what I think we're
26:14 building. And I think that will be like a very sticky like winning product if we
26:18 can do so. So the shape that in my head at least I have is that we build you
26:23 know maybe a fun topic is like is chat the right interface for AI? I actually
26:27 think chat is a very good interface when you don't know what you're supposed to
26:30 use it for. uh in the same way that if I think of like I'm like on a teams or in
26:34 Slack with a teammate, chat is pretty good. I can ask for whatever I want,
26:37 right? It's like it's kind of the the common denominator for everything. So
26:40 you can chat with a super assistant about whatever topic you want, whether
26:45 it be coding or not. And then if you are like a functional expert in a specific
26:49 domain such as coding, there's like a guey that you can pull up to go really
26:54 deep and like look at the code and like work with the code. So I think like what
26:59 we need to build as open AI is basically this idea of like you have chat chatpt
27:02 PT and that is a tool that's like ubiquitously available to like everyone.
27:06 You start using it even like outside of work right to just help you. You become
27:09 very comfortable with the idea of being accelerated with AI. And so then you get
27:13 to work and you just can naturally just yeah I'm just going to ask it for this
27:16 and I don't need to know about all the connectors or like all the different
27:19 features. I'm just going to ask it for help and it'll surface to me the the
27:23 best way that it can help at this point in time and maybe even chime in when I
27:27 didn't ask it for help. Um, so in my mind, if we can get to that, I think
27:30 that's, you know, that's how we we really build like the winning product.
27:34 This is so interesting because with the my chat with Nick Charlie, the head of
27:37 chat JPT, I think he shared that the original name for Chat JPT was super
27:41 assistant or something like that. >> Yeah. >> And it's interesting that there's like
27:46 that approach to the super assistant and then there's this codeex approach. It's
27:49 almost like the B TOC version and the B2B version. And what I'm hearing is the
27:53 idea here is okay, you start with coding and building and then it's doing all
27:56 this other stuff for you, scheduling meetings, I don't know, probably posting
28:01 in Slack, uh I don't know, shipping designs, I don't know. Is that is the
28:04 idea there? This is like the the business version of ChatGpt in a sense.
28:08 Or is there or is there something else there? >> Yeah. So, you know, so we're getting to
28:12 the like the like one-year time horizon conversation. A lot of this might happen
28:16 sooner, but in terms of fuzziness, I think we're at the one year. So I'll
28:19 give you like a contention in like the plausible way we get there, but as for
28:23 how it happens, who knows? So basically, if we're going to build a super
28:26 assistant, it has to be able to do things, right? So like we're going to
28:29 have a model and it's going to be able to do stuff affecting your world.
28:33 >> And one of the learnings I think we've seen over the past year or so is that
28:38 for models to do stuff, they're much more effective when they can use a
28:41 computer, right? Okay. So now we're like, okay, we need the super assistant that can use a
28:47 computer, right? or many computers. And now the question is, okay, well, how
28:50 should it use the computer, right? And there's lots of ways to use a computer.
28:54 Uh, you know, you could try to hack the OS and like use accessibility APIs.
28:57 Maybe a bit easier is you could point and click. That's a little slow, you
29:02 know, and, uh, unpredictable sometimes. Um, and another way, it turns out the
29:06 best way for models to use computers is simply to write code, right? And so
29:09 we're kind of getting to this idea where like, well, if you want to build any
29:12 agent, maybe you should be building a coding agent. And maybe to the user, a
29:17 nontechnical user, they won't even know they're using a coding agent. The same
29:19 way that no one thinks about are they using the internet or not, which is
29:22 they're more just like is Wi-Fi on? Right? So I think that what we're doing
29:27 with codeex is we're building a software engineering teammate. And as part of
29:30 that, we're kind of building an agent that can use uh a computer by writing
29:36 code. And so we're already seeing like some pull for this. It's like quite
29:39 early, but we're starting to see people like who are using codeex for like
29:43 coding adjacent product purposes. And so as that develops, I think we'll
29:47 just naturally see that like, oh, it turns out like we should just always
29:50 have the agent write code if there is a coding way to solve a problem instead
29:53 of, you know, even if you're doing a financial analysis, right? Like maybe
29:56 write some code for that. So basically like, you know, you were like, hey, is
29:59 this like the two ends of of uh of this product for the super assistant, right,
30:03 of CHCH PT? In my mind, like just coding is a core competency of any agent,
30:06 including Chach PT. And so like what really what we think we're building is
30:10 like that competency. But so here's here's like the really cool thing about
30:13 agents writing code is that you can import code right code is like
30:19 composable interoperable right because if if we you know one very reductive
30:23 view we could have for an agent is it's just going to be given a computer and
30:26 it's just going to like point and click and you know go around but you know that
30:32 is the future and then how we get there is difficult to sort of chart a path
30:36 because a lot of the questions around building agents aren't like can the
30:41 agent do it but it's more about well how can we help the agent understand the
30:44 context that it's working in and like the team that's using it you know
30:47 probably has a way that they like to do things they have guidelines they
30:50 probably want certain deterministic guarantees about what the agent can or
30:54 cannot do or they want to know that the agent understands sort of this detail
30:59 like an example would be you know if we're looking at a crash reporting tool
31:04 hitting a connector for it every sub team is probably has a different meta
31:07 prompt for like how they want the crashes to be analyzed ized, right? And
31:12 so we start to get to this thing where like, yeah, we have this agent sitting
31:15 in front of a computer, but we need to make that configurable for the team or
31:19 for the user, right? And let them like stuff that the agent does often, we
31:22 probably just want to like build in as a competency that this agent has that it
31:27 can do. So I think we end up with this generalizable thing that you were saying
31:31 of like an agent that can just write its own scripts for whatever it wants to do.
31:36 But I think that the the really key part here is can we make it so that
31:40 everything that the agent has to do often or that it does well we can just
31:44 like remember and store so that the agent doesn't have to write a script for
31:47 that again. Right. Or maybe like if I just joined a team and you are already
31:51 on the same team as me. I can just like use all those scripts that the agents
31:53 had written already. >> Yeah. It's like if this is our teammate
31:57 uh we can they can share things that it's learned from working with other
32:00 people at the company. Just makes sense as a metaphor. >> Yeah. It feels like you're in the uh
32:05 Karpathy camp of agents today are not that great and mostly slop and maybe in
32:09 the future they'll be awesome. Does that resonate? >> I think so. I think coding agents are
32:14 pretty great. I think >> uh ton of value, >> right? Yep.
32:19 >> And then I think like agents outside of coding, it's still like very early and
32:23 you know, this is just my opinion, but I think they're going to get a whole lot
32:26 better once they can use coding too and like in a composable way.
32:29 This is it's kind of the fun part of like when you're building for software
32:33 engineers. Like I at my startup we were building for software engineers too for
32:36 a lot of that journey and they're just such a fun audience to build for because
32:41 you know they also like building for themselves and are often like even more
32:45 creative than we are and thinking about how to use the technology. Um and so
32:48 like by building for software engineers you get to just observe a ton of
32:52 emergent behaviors and like things that you should do and build into the
32:55 product. I love how you you say that because a lot of people building for
32:57 engineers get really annoyed because the engineers are so they're just always
33:00 complaining about stuff. They're like, "Ah, that sucks. Why'd you build it this
33:04 way?" I love that you enjoy it, but I think it's probably because you're
33:06 building such an amazing tool for engineers that can actually solve
33:11 problems and just, you know, code for them. Um, kind of along those lines, you
33:15 know, there's always this talk of what will happen with jobs, engineers,
33:18 coding, do you have to learn coding, all these things? Uh clearly the way you're
33:21 describing it is it's a teammate. It's going to work with you, make you more
33:24 superhuman. It's not going to replace you with the way you just think about
33:28 the impact on the field of engineering having this super intelligent
33:33 engineering teammate. I think there's there's two sides to it, but the one we
33:37 were just talking about is this idea that maybe every agent should actually
33:43 use code and be a coding agent. And in my mind, that's just like a small part
33:46 of this like broader idea that like, hey, as we make code even more
33:48 ubiquitous, I mean, you could probably claim it's ubiquitous today, even pre
33:51 AAI, right? But as we make code even more ubiquitous, it's actually just
33:56 going to be used for many more purposes. And so there's just going to be a ton
33:59 more need for people with this like humans with this competency. So that's
34:05 my view. I think this is like quite a complex topic. So, you know, it's
34:08 something we talk about a lot and we have to kind of see how it pans out. But
34:12 I think what we can do what we can do basically as a product team building in
34:15 the space is just try to always think about how are we building a tool so that
34:18 it feels like we're like maximally accelerating uh people you know rather
34:24 than building a tool that makes it like more unclear what you should do as the
34:29 human right like I think like to to you know give an example right now like
34:33 nowadays when you work with a coding agent um it writes a ton of code but it
34:36 turns out writing code is actually one of the most fun parts of software
34:40 engineering for many software engineers. is so then you end up reviewing AI code,
34:45 right? And that's often a less fun part of the job for many software engineers,
34:49 right? And so I actually think like we see that like this this comes out plays
34:53 out all the time in like a ton of micro decisions. And so we as a product team
34:55 are always thinking about like okay, how do we make this more fun? How do we make
34:58 you feel more empowered whereas it's not working and I I would argue that like
35:01 reviewing agent written code is like a place that today is like less fun. And
35:06 so you know then I think okay what can we do about that? Well, we can ship a
35:09 code review feature that like helps you build confidence in the Irw written
35:12 code. Okay, cool. You know, another thing we can do is we can make it so
35:14 that the agent's like better able to validate its work. And you know, it gets
35:18 all the way down into like micro decisions like if you're going to have
35:23 the an agent capability to validate work and let's say you have like I'm thinking
35:27 of Codex web right now like you have a a pane that sort of reflects the work the
35:30 agent did. What do you see first? Do you see the diff or do you see the image
35:34 preview of the code it wrote? Right? And you know, I think if you're thinking
35:36 about this from perspective like how do I empower the human? How do I make them
35:40 feel like as as accelerated as possible like you obviously see the image first,
35:43 right? You shouldn't be reviewing the code unless first you know you've seen
35:46 the image unless maybe it's being like reviewed by an AI and now it's time for
35:49 you to take a look. When I had uh Michael Charel, the CEO of Cursor on the
35:53 podcast, he he had this kind of vision of us moving to something beyond code.
35:58 And I've seen this rise of something called specd driven development where
36:02 you kind of just write the spec and then the code, you know, the AI writes code
36:05 for you. And so you kind of start working at this higher abstraction
36:09 level. Is that something you see where we're going? Just like engineers not
36:12 having to actually write code or look at code and there's going to be this higher
36:16 level of abstraction that we focus on. Yeah, I mean I think I think there's
36:19 like constantly these levels of abstraction and they're actually already
36:23 played out today, right? Like today like coding agents mostly it's like prompt to
36:29 patch right we're starting to see people doing like spec driven development or
36:32 like planned driven development that's actually one of the ways when people ask
36:35 like hey how do you run codex on a really long task well it's like often
36:38 collaborate with it first to write like a plan MD like a markdown file that's
36:42 your plan and once you're happy with that then you ask it to go off and do
36:46 work and if that plan has verifiable steps it'll like work for much longer.
36:51 Um so we're totally seeing that. I think spec driven development is like an
36:55 interesting idea. It's not clear to me that it'll work out that way because a
36:57 lot of people don't write like don't like writing specs either, but it seems
37:02 plausible that some some people will work that way. You know, like a a bit of
37:06 a joke idea though is like if you think of like um the way that many teams work
37:11 today, they're they often like don't necessarily have specs, but the team is
37:14 just really self-driven and so stuff just gets done. And so almost that is
37:17 like I'm coming up with this on the spot so it's you know not a good name but
37:21 like chatterdriven development where it's just like stuff is happening you
37:24 know on social media and like in your team communications tools and then as a
37:28 result like code gets written and deployed right so yeah I think I'm a
37:33 little bit more oriented in that way of you know I don't even necessarily want
37:37 to have to write a spec like sometimes I want to only if I like writing specs
37:42 right uh other times I might just want to say like hey here's like the
37:45 customer, you know, service channel and like tell me what's interesting to know,
37:49 but if it's a small bug, just fix it. I don't want to have to write a spec for
37:51 that, right? >> I have this sort of uh hypothetical future uh that I like to
37:58 share sometimes with people as a provocation, which is like in a world
38:01 where we have like truly amazing agents, like what does it look like to be a
38:04 soloreneur? Um, and uh, you know, one terrible idea for how it could look is that it's
38:12 actually there's a mobile app and um, every idea that the agent has to do is
38:17 just like vertical video on your phone and then you can like swipe left if you
38:21 think it's a bad idea and you can like swipe right if it's a good idea and like
38:24 you can press and hold and like speak to your phone if you want to get feedback
38:28 on the idea before you swipe, you know. So in this world like basically what
38:31 your job is just to like plug in this app into like every single like signal
38:36 system you know system of record and then you just sort of sit back and like
38:39 swipe. I don't know. >> I love this. So this is like Tinder
38:42 meets Tik Tok meets codeex. >> It's pretty terrible. >> No, this is great. So the idea here is
38:47 this thing is this agent is watching and right listening to you paying attention
38:51 to the market your users and it's like cool here's something I should do. It's
38:54 like a proactive engineer just like here we should build this feature fix this
38:56 thing. >> Exactly. I think they're communicating with you in like the lowest like the
39:05 gyms like the modern way to communicate. >> Yeah. >> Swipe left or right and in vertical feed
39:10 and then the Sora video. Okay. So I see how this all connects now. I see.
39:13 >> Yeah. To be clear, we're not building that but like you know it's a fun idea.
39:17 I mean you see you know like in this example though like one of the things
39:19 that it's doing is it's consuming external signals right. I think the
39:23 other really interesting thing is like if we think about like what is the most
39:28 successful like AI product to date um I would argue um it's funny actually
39:34 not to confuse things at all but like the first time we used the the brand
39:38 codeex at OpenAI was actually the model powering GitHub copilot. This is like
39:42 way back in the day, years ago. And so we decided to reuse that that brand
39:45 recently um because it's just so good, you know, codeex code execution. But I
39:50 think actually like autocomp completion and IDEs is like one of the most
39:54 successful AI products to date. And part of what's so magical about it is that
40:01 when the it can surface like ideas for helping you really rapidly. When it's
40:05 right, you're accelerated. When it's wrong, it's not like that annoying. It
40:08 can be annoying, but it's not that annoying, right? And so you can create
40:12 this like mixed initiative system that's like contextually responding to like
40:17 what you're attempting to do. And so in my mind, this is like a really
40:21 interesting thing for us as open as we're building. So for instance, you
40:25 know, when I think about launching a browser, which we did with Atlas, right?
40:29 Like in my mind, one of the really interesting things we can then do is we
40:33 can then like contextually surface like ways that we can help you as you're
40:37 going about your day, right? And so we break out of this like, you know, we're
40:41 just looking at code or we're just in your terminal um into this idea that
40:44 like, hey, like a real teammate is dealing with a lot more than just code,
40:47 right? They're dealing with a lot of things that are web content. So like,
40:51 you know, how can we help you with that? >> Man, there's so much there and I love
40:55 this. Okay, so autocomplete on web with the browser. That's so interesting. just
40:58 like here's all the things that we can help you with as you're browsing and
41:01 going about your day. I want to talk about Atlas. I'll come back to that. Uh
41:05 codeex code execution. Did not know that. That's really clever. I I get it
41:10 now. Okay. And then this chatter, what is a chatter driven development? Uh I
41:14 had a No, this is a really good idea, but it reminds me I had John Gon on the
41:19 podcast, CTO of Block, and they they have this product called Goose, which is
41:24 their own internal agent thing. And he talked about an engineer at block just
41:30 uh has goose watch him with like his screen and listens to every meeting and
41:36 proactively does work that he should will probably want to do. So ships a PR
41:41 sends an email drafts a Slack message. So he's doing exactly what you're
41:44 describing in in kind of a very early way. >> Yeah, that's super interesting. And you
41:49 know, I bet you the So, if we go if we went and asked them what the bottleneck
41:52 to that productivity is, did did they share what it is? >> Uh, probably looking at it just making
41:57 sure this is the right the right thing to do. Yeah. >> Yeah. So, like we see this now like we
42:01 have a Slack integration for Codex. People love, you know, if there's like
42:04 some thing that you need to do quickly. People just like at mentioned Codex like
42:07 why do you think this bug is happening? Right. Doesn't have to be an engineer.
42:10 Even like maybe you know data scientists often here are using Codex a ton to just
42:14 like answer questions like why do you think this metric moved? What happened?
42:18 So questions you you get the answer right back in Slack. It's amazing, super
42:22 useful. But when it's as for when it's writing code, then you have to go back
42:27 and look at the code, right? And so the real like I think bottleneck right now
42:30 is like validating that the code worked and like writing code review.
42:34 So in my mind, if we wanted to get to something like uh you know that uh a
42:38 friend you were talking about world, I think we we really need to figure out
42:42 how to get people to configure their coding agents to be much more autonomous
42:46 on those later stages of the work. It makes sense like you said writing code.
42:49 I used to be an engineer as an engineer for 10 years. Really fun to write code.
42:53 Really fun to just get in the flow, build, architect, test. Not so fun to
42:56 look at everyone else's code and just have to go through and be on the hook if
43:00 it is doing something dumb that's going to take down production. And now that
43:03 building has become easier, what I've always heard from companies that are
43:06 really at the cutting edge of this is the bottleneck is now like figuring out
43:09 what to build and then it's at the end of like, okay, we have all this all 100
43:13 hours to review. Who's going to go through all that? >> Right. Yeah.
43:19 This episode is brought to you by Jira product discovery. The hardest part of
43:22 building products isn't actually building products. It's everything else.
43:26 It's proving that the work matters, managing stakeholders, trying to plan
43:30 ahead. Most teams spend more time reacting than learning, chasing updates,
43:34 justifying road maps, and constantly unblocking work to keep things moving.
43:39 Jira product discovery puts you back in control. With Jira product discovery,
43:43 you can capture insights and prioritize high impact ideas. It's flexible, so it
43:47 adapts to the way your team works and helps you build a road map that drives
43:51 alignment, not questions. And because it's built on Jira, you can track ideas
43:56 from strategy to delivery, all in one place. Less chasing, more time to think,
44:01 learn, and build the right thing. Get Jirroduct Discovery for free at
44:06 atlassian.com/lenny. That's atassian.com/lenny. What has the impact of Codex been on the
44:13 way you operate as a product person, as a PM? It's clear how engineering is
44:19 impacted. Uh, code is written for you. What has it done to the way you operate,
44:24 the way PMs operate at at OpenAI? Yeah, I mean I think mostly I just feel like
44:28 much more empowered. Um I've always been sort of more technical leaning PM and especially when
44:34 I'm working on products for engineers, I feel like it's necessary to like you
44:37 know dog food the product but even beyond that I I I just feel like I can
44:42 do much much more as a PM. And uh you know Scott Beltski talks about this idea
44:45 of like compressing the talent stack. I'm not sure if I've phrased that right,
44:48 but it's basically this idea that like maybe the boundaries between these roles
44:52 are a little bit like less needed than before because people can just do much
44:57 more and every time you someone can do more you can like skip one communication
45:00 boundary and make the team like that much more efficient, right? So I think I
45:07 think we see it you know in a bunch of functions now but I guess since you
45:11 asked about like product specifically uh you know now like answering questions
45:15 much much easier you can know just ask codeex for thoughts on that uh a lot of
45:20 like PM type work understanding what's changing again just ask codeex for help
45:25 with that um prototyping is often faster than writing specs this is something
45:29 that a lot of people have talked about I think something that I don't think it's
45:33 super surprising But something that's slightly surprising is like we see like
45:36 we're mostly building codecs for to write code that's going to be deployed
45:40 to production but actually we see a lot of throwaway code written with codeex
45:43 now. It's kind of going back to this idea of like you know ubiquitous code.
45:48 So you'll see uh you know someone wants to do an analysis like if I want to
45:51 understand something it's like okay just give codeex a bunch of data but then ask
45:54 it to build like an interactive like data viewer for this data right you
45:56 would that's just like too annoying to do in the past but now it's just like
46:00 totally worth the time of just getting an agent to go do something. Um,
46:04 similarly, I've seen like some pretty cool prototypes on our design team about
46:09 like if you want to well like a designer basically wanted to build an animation
46:13 and this is the coin animation in codeex and it was like normally it'd be too
46:17 annoying to program this animation. So they just vibe coded a animation editor
46:21 and then they use the animation editor to build the animation which they then
46:25 checked into the repo. Actually, our designers are there's a ton of
46:28 acceleration there. And like speaking of compressing the town stack, I think our
46:31 designers are very PM. So, you know, they they do ton of product work. And like they actually
46:38 have like an entire like vibecoded sort of side prototype of the Codex app. And
46:41 so, a lot of how we talk about things is like we'll have like a really quick jam
46:44 because there's like 10,000 things going on. And then designer will like go think
46:48 about how this should work, but instead of like talking about it again, they'll
46:50 just like vibe code a prototype of that in their like standalone prototype.
46:54 We'll play with it. If we like it, they'll vibe code that prototype into or
46:59 vibe engineer that prototype into an actual PR to land. And then depending on
47:02 their comfort with the codebase, like codeex CLI and Rust is a little harder.
47:06 Maybe they'll like land it themselves or they'll like get close and then an
47:09 engineer can help them like land the PR. Um, you know, we recently shipped the
47:15 Sora Android app. Um and uh that was one of the most sort of mind-blowing
47:19 examples of acceleration actually because usage of of codeex internally at
47:24 open is obviously really really high but it's been growing uh over the course of
47:28 the year both in terms of like now it's basically like all technical staff use
47:32 it uh but even like the intensity and knowhow of how to make the most of
47:35 coding agents has gone up by a ton and so the Sora Android app right like a
47:42 fully new app we built it in 18 days it went from like zero to launch to
47:46 employees and then 10 days later so 28 days total we went to just like GA to
47:51 the public and that was done just like with the help of Codex
47:56 so pretty insane velocity I would say it was like a little bit I don't want to
48:01 say easy mode but there is one thing that Codex is really good at if you're a
48:04 company that's like building software on multiple platforms so you've already
48:07 figured out like some of the underlying like APIs or systems asking codeex to
48:13 like to port things over is really effective because it has like something
48:15 you can go look at. And so the engineers on that team uh were basically having
48:20 codeex go look at the iOS app, produce plans of work that needed to be done and
48:23 then go implement those. And it was kind of looking at iOS and Android at the
48:27 same time. And so you know basically it was like two weeks to launch to
48:30 employees four weeks total. Insanely fast. >> What makes that even more insane is it
48:35 was the it became the number one app in the app store. >> I don't this just boggles the mind.
48:39 Okay. So >> yeah. So imagine releasing number one app on the app store with like a handful
48:45 of engineers >> uh I think it was like >> two or three possibly
48:53 >> uh in a handful of weeks. Yeah, this is absurd. So >> yeah, so that's a really fun um example
49:01 of uh acceleration. And then like Atlas was the other one that I think um Ben
49:06 did a podcast the the the engine on Atlas uh sharing a little bit of how we
49:12 built there. You know many Atlas is is actually I mean it's it's a browser
49:15 right and building a browser is really hard. Um and so we uh had to build a lot
49:23 of difficult systems in order to do that and basically we got to the point where
49:27 that team has a ton of power users of codecs right now. And um you know it got
49:32 to the point where they they basically were you know we were talking to them
49:34 about it because a lot of those engineers are people I used to work with
49:38 before at my startup and so they'd say you know before this would have taken us
49:42 like two to three weeks for two to three engineers and now it's like one engineer
49:48 one week. Um so massive acceleration there as well. And what's quite cool is
49:52 that uh you know we we shipped Atlas on on Mac first but now we're working on
49:56 the Windows version. you know that so the team now is like ramping up on
49:58 Windows and they're helping us make codecs better on Windows 2 which is
50:02 admittedly earlier like just the model we we shipped last week is the first
50:06 model that natively understands PowerShell. So you know PowerShell being
50:11 uh the native like shell language on Windows. So yeah, it's been it's been
50:16 really awesome to see like the whole company getting accelerated by codeex
50:21 like from and you know most obviously also research and like improving how
50:24 quickly we train models and how well we do it and then even like uh design as we
50:28 talked about and and marketing like actually we're at this point now where
50:32 uh my product marketer is often also making string changes just directly from
50:36 Slack or like updating docs directly from Slack. >> These are amazing examples. You guys are
50:42 living at the bleeding edge of what is possible and this is how other companies
50:46 are going to work. Uh just shipping again what became the number one app in
50:49 the app store and just beloved all over the it just like took over the I don't
50:54 know the world for at least a week. Uh built you said in 28 days and like I
50:58 don't know 10 days 18 days just to get like the core of it working.
51:02 >> Yeah. So like 18 days we had a thing that employees were playing with and
51:05 then 10 days later we were out. >> And you said just a couple engineers.
51:07 >> Yeah. >> Two or three. Okay. And then Atlas you
51:11 said was took a week to build. >> No, no, no. So Atlas, not the whole
51:16 week, but Atlas was like a really meaty project. >> Yeah.
51:18 >> Um and so I was talking to one of the engineers on Atlas um about like you
51:23 know just how what they use codex for and it's basically like we use codex for
51:25 absolutely everything. I was like okay well like you know how would you how
51:29 would you measure the acceleration? And so basically the the answer I got back
51:31 was >> previously it would have taken two to three weeks for two to three engineers
51:36 and now it's like one engineer one week. Do you think this eventually moves to
51:39 non-engineers doing this sort of thing? Like does it have to be an engineer
51:42 building this thing? Could sort of have built been built by I don't know a PM or
51:46 designer. I think we will very much get to the point where well basically where
51:50 the boundaries are a little bit blurred, right? Like I think you're going to want
51:54 someone who's like understands the details of what they're building, but
51:58 what details those are will evolve. Kind of like how now like if you're writing
52:02 Swift, you don't have to speak assembly. You know, there's a handful of people in
52:05 the world and it's really important that they exist. and like speak assembly. Uh
52:09 maybe more than a handful, right? But that's like a specialized function that
52:14 like most companies don't need to have. So I think we're just going to naturally
52:17 see like an increase in layers of abstraction. And then the cool thing is
52:21 now we're entering like the language layer of abstraction like natural
52:25 language. And then natural language itself is really flexible, right? Like
52:29 you could have engineers talking about like a plan and then you could have
52:32 engineers talking about a spec and then you could have engineers talking about
52:35 just, you know, a product or an idea. So I think we can also like start moving up
52:39 those layers of of abstraction as well. But you know I I do think this is going
52:43 to be gradual. I don't think it's going to go to like all of a sudden like
52:46 nobody ever writes anything and like you know any code and it's just specs. I
52:49 think it's going to be much more like okay we've set up our coding agent to be
52:53 really good at like previewing the build or like at running tests. Maybe that's
52:56 the first part right that most people have set up. And it's like okay now
52:59 we've set it up so that it can like execute the build and it can like see
53:03 the results of its own changes but you know we haven't yet built a good
53:06 integration harness so that it can like in the case of Atlas like by the way I
53:08 don't know if they've done any of this or not I think they've done a lot of
53:11 this but you know maybe the next stage is like enable it to like load a few
53:16 sample pages to see how well those work right so then okay now we're going to
53:19 like set up set up do that and I think for some time at least we're going to
53:22 have humans kind of curating like which of these connectors or systems or
53:26 components that the agent needs to be good at talking to and then you know in
53:30 the future there will be an even greater unlock where Codex tells you how to set
53:34 it up or maybe sets itself up in a repo. What a wild time to be alive. Wow. I'm
53:38 curious just the second order effects of this sort of thing. Just how quickly it
53:42 is to build stuff. What does that do? Does that mean distribution becomes much
53:46 much more important? Does it mean uh ideas are just worth a lot more? It's
53:50 interesting to think about how quick how that changes. >> I'm curious what you think. I still
53:56 don't think ideas are worth as much as maybe some a lot of people think. I
53:59 think still think execution is really hard, right? Like you can build
54:01 something fast, but you still need to execute well on it. Still needs to make
54:06 sense and be a coherent thing overall. Um Yeah. And distribution is massive.
54:10 >> Yeah. Just feels like everything else is now more important. Everything that
54:13 isn't the building piece, which is >> coming up with an idea, getting to
54:17 market, profit, >> all that kind of stuff. I I think we
54:21 might have been in this weird temporary phase where you know for a while like
54:26 you could you could just it was so hard to build product that you mostly just
54:31 had to be really good at building product and it maybe didn't matter if
54:34 you like had an intimate understanding of a specific customer.
54:39 Um, but now I think we're getting to this point where actually like if I
54:42 could only choose like one thing to understand, it would be like really
54:46 meaningful understanding of like the problems that a certain customer has,
54:49 right? If I could only if I could only go in with one like core competency. So
54:54 I think that that's that's ultimately still what's going to matter most,
54:57 right? Like if you're starting a new company today and you have like a really
55:02 good understanding and like network of customers that are currently underserved
55:05 by AI tools, I think you're like you're set, right? Whereas if you're like good
55:09 [clears throat] at building like you know websites, but you don't have any
55:12 specific customer to build for, I think you're in for a much harder time.
55:17 Bullish on vertical AI startups is what I'm hearing. Yeah, I completely agree.
55:20 There's like, you know, there's like the general thing that can solve a lot of
55:23 problems and then there's like we're going to solve presentations incredibly
55:25 well and we're going to understand the presentation problem uh better than
55:30 anyone and we're going to uh plug into your workflows and all these other
55:33 things that matter for a very specific problem. Okay. Incredible. When you
55:39 think about progress on codecs, I imagine you have a bunch of evals and
55:42 there's all these public benchmarks. What's something you look at to tell
55:45 you, okay, we're making really good progress. I imagine it's not going to be
55:48 the one thing, but what do you focus on? What's like something you're trying to
55:51 push? What's like a KPI or two? One of the things that I'm constantly reminding
55:56 myself of is that a tool like Codex sort of naturally is a tool that you would,
56:00 you know, become a power user of, right? And so we can accidentally spend a lot
56:03 of our time thinking about features that are like very deep in the user adoption
56:08 journey. Um, and so we can kind of end up oversolving for that. And so I think
56:12 it's like just critically important to like go look at like your like D7
56:16 retention, right? just go try the product. Like sign up from scratch
56:19 again. Um I have a few too many like catchup pro accounts that I've just like
56:24 in order to maximally correctly dog food like signed up for on my Gmail and they
56:27 charge me like 200 bucks a month. I need to expense those. But uh uh you know
56:33 like I think just like the feeling of being a user and the early retention
56:37 stats are still like super important for us because you know as much as this
56:41 category is is taking off I think we're still in the very early days of like
56:45 people using them. Um, another thing that we do that that might might be I
56:51 think we might be the most like user feedback slashsocial media pill team out
56:56 there in this space is like a few of us are like constantly on Reddit and
57:01 Twitter and uh you know there's a there's praise up there and there's a
57:04 lot of complaints but we take the complaints like very seriously and look
57:08 at them and I think that again because you can use like coding agent for so
57:12 many different things um it often is like kind of broken in any sort of ways
57:17 for like specific behaviors. Um, and so we we actually monitor a lot just like
57:20 what the vibes are on social media pretty often, especially I think for for
57:27 Twitter X, um, it's a little bit more hypy and then Reddit is a little more
57:34 negative but real actually. Um, so I've started increasingly paying attention to
57:37 like how people are talking about using Codex on Reddit. Actually,
57:41 >> this is uh important for people to know. Which the subreddits do you check most?
57:44 Is there like an R codeex or >> I mean the algorithm is pretty good at
57:48 surfacing stuff but like r/codex is is there >> okay I'll take very interesting and then
57:52 uh if people tag you on Twitter you still see that but maybe not as powerful
57:56 as seeing it on Reddit. >> Well yeah the interesting well the thing
57:58 with Twitter is it's a little bit more onetoone even if it's like in public
58:01 whereas like with Reddit there's like really good upvoting mechanics and like
58:05 maybe most people are still not bots unclear. Um so you get you get like good
58:09 signal on what matters and what other people think. So uh interestingly uh
58:13 Atlas I want to talk about that briefly. Uh you guys launched Atlas. I tweeted
58:18 actually that I tried Atlas and then I I don't love the AI only uh search
58:23 experience. I was just like I just want Google sometimes or whatever like just
58:26 waiting for AI to give me an answer. I'm like I don't want to and there was no
58:29 way to switch. I just tweeted hey I'm I'm switching back. I don't it's not
58:32 great. And I feel like I made some PMs at OpenAI sad and I saw someone tweet
58:37 okay we have this now which I imagine was always part of the plan. It's
58:40 probably an example of we just ship we got to ship stuff, see how people use it
58:43 and then we figure it out. Uh so I guess one is that I don't know is there
58:46 anything there and two I'm just curious why are you guys building a web browser?
58:51 So I I worked on Atlas for a bit. Um I don't work on it now. Um but you know
58:55 like the a bit of the narrative here for for me just to tell my story a bit was
58:58 like I was working on this like screen sharing like pair programming startup
59:03 right and then we joined open AI and so the idea was really to build a
59:07 contextual desktop assistant and the reason I believe that's so important is
59:11 because I think that it's really annoying to have to give all your
59:14 context to an assistant and then to figure out how it can help you right and
59:18 so if it could just like understand what you are trying to do then it could
59:23 maximally accelerate do um and so I I I would you know I still think of Codex
59:26 actually as like a contextual assistant um from a little bit of a different
59:30 angle like starting with coding tasks but um the some of the some of the
59:36 thinking at least for me personally I can't speak for the whole project but
59:40 was that a lot of work is done in the web and if we could build a browser then
59:45 we could be contextual for you but in a much more first class way we weren't
59:48 hacking like other desktop software which have like very varied report for
59:53 for like what content they're rendering to the accessibility tree. Uh we
59:56 wouldn't be relying on screenshots which are a little bit slower and unreliable.
60:00 Instead, we we could like be in the rendering engine, right? And like
60:03 extract whatever we needed to to help you. Um and also I like to think of like
60:09 you know video games like I don't know if you've played like I don't know say
60:13 Halo right like you walk up to an object. I mean this true for many games
60:16 you press man it's been a long time this is embarrassing. press X and it just
60:21 does the right thing, right? And I was one of those guys who always read the
60:23 instruction manual for every video game that I bought. And I remember the first
60:26 time I read about a contextual action and I just thought it was like this
60:31 really cool idea. And uh you know the the thing about a contextual action is
60:34 we need to know what you are attempting to do. We need to have a little bit of
60:37 context and then we can and then we can help. Uh, and I think this is critically
60:43 important because you know, imagine this world that we reach, right, where we're
60:45 we have agents that are helping you thousands of times per day. Um, imagine
60:50 if the only way we could tell you that we helped you is if we could like push
60:55 notify you. So, you get a thousand push notifications a day of an AI saying
60:59 like, "Hey, I did this thing. Do you like it?" It'd be super annoying, right?
61:03 Whereas imagine going back to software engineering like I was looking at a
61:07 dashboard and I noticed some like key metric had like gone down
61:12 and you know at that point in time an II could like maybe go take a look and then
61:15 surface the fact that it has an opinion on why this metric went down and maybe a
61:19 fix right there right when I'm looking at the dashboard right that would be
61:22 like that would much more keep me in flow and enable the agent to take action
61:27 on like many more things so in my mind like part of why I'm excited for us to
61:32 have a browser is that I think we have then like much more context around like
61:37 what we should help with. Users have much more control over what they want us
61:40 to look at. It's like hey if you want to open if you want us to like take action
61:43 on something you can open it in your AI browser. If you don't then you can open
61:46 it in your other browser right? So like really clear control and boundaries and
61:51 then we have the ability to build UX that's like mixed initiative so that we
61:54 can surface contextual actions to you like at the times they're helpful as
61:58 opposed to just like randomly notifying you. hearing the vision for Codeex being
62:01 the super assistant. It's not just there to code for you. It's trying to do a lot
62:05 for you as a teammate, as this kind of super teammate that makes you awesome at
62:10 work. So, I get this. Speaking of that, are there other non-engineering
62:15 common use cases for codecs? Just ways that non-engineers, we talked about it,
62:18 you know, designers prototyping and building stuff. Are there any, I don't
62:22 know, fun or unexpected ways people are using codecs that aren't engineers? I
62:25 mean there's a load of a load of unexpected ways but I think like most of
62:31 where we're seeing like real traction with people using things are still for
62:35 now like very like I would say coding adjacent or like sort of tech oriented
62:39 places where there's like a mature ecosystem um or you know maybe you're
62:43 doing data an data analysis or something like that. I personally am expecting
62:47 that we're going to see a lot more of that over time. Um, but for now like
62:51 we're keeping the team like very focused on just coding for now because there's
62:54 so much more work to do. >> For people that are thinking about
62:58 trying out codecs, is there like um does it work for all kinds of code bases?
63:02 What what code does it support? If you're like I don't know SAP, can you
63:06 add codec and start building things? What's kind of like the sweet spot or
63:11 does it start to not be amazing yet? This I'm really glad you asked this
63:14 question actually because the best way to try codeex is to give it your hardest
63:19 tasks which is a little different than some of the other coding agents like you
63:23 know some tools you might think okay let me like start easy or just like you know
63:27 like vibe code something random and decide if I like the tool whereas like
63:32 we're really building codeex to be the like professional tool that you can give
63:36 your like hardest problems to um and you know that writes like high quality code
63:40 in your like enormous code base that is in fact not perfect right now. So yeah,
63:43 I think if you're going to try codeex, you want to try it on like a real task
63:48 that you have and not necessarily like dumb that task down to something that's
63:53 like trivial, but actually like you know like a good one would be like you have a
63:55 hard bug and you don't know what what's causing that bug and you ask Codex to
63:59 like help figure that out or like to implement that, you know, the fix.
64:02 >> I love that answer. Just give it your hardest problem. I will say like you
64:05 know if you if you're like hey okay well the hardest problem I have is that I
64:08 need to build like a new unicorn business like obviously that you know
64:13 it's not going to work not yet. So I think it's like give it like the hardest
64:18 problem but something that is still like one like question right or one task um
64:23 to start that's if you're testing and then over time you can learn how to use
64:25 it for like bigger things. >> Yeah. What languages does does it
64:28 support? Basically the way we've trained codeex is like there's a distribution of
64:32 languages that we support and it's like fairly aligned with like the frequency
64:36 of these languages in the world. So unless you're writing some like very
64:39 esoteric language or like some private language, it should do fine in your
64:42 language. If someone was just getting started, is there a tip you could share
64:46 to help them be successful? Like if you could just whisper a little tip into
64:49 someone just setting up Codex for the first time to help them have a really
64:53 good time, what's something you would whisper? >> I might say try a few things in
64:57 parallel, right? Right? So you could try giving it a hard task. Um maybe ask it
65:03 to understand the codebase. Uh formulate a plan with it around an idea that you
65:07 have and kind of build your way up from there. And like sort of the meta idea
65:11 here is it's again it's like you're building trust with the new teammate,
65:15 right? And so like you wouldn't go to a new teammate and just give them like hey
65:18 do this thing here's zero context. you would start by like first making sure
65:22 they understand the codebase and then you would like maybe align on a an
65:24 approach and then you would have them go off and do bit by bit right and I think
65:28 if you use codeex in that way you'll just sort of naturally start to
65:30 understand like the different ways of prompting it because it is it's a super
65:35 powerful like agent and model but it is it is a little bit different to prompt
65:38 codeex and other models just a couple more questions one we touch on this a
65:44 little bit as AI does more and more coding there's always this question of
65:48 should I learn to code why should they spend time doing this sort of thing. For
65:52 people that are trying to figure out what to do with their career, especially
65:55 if they're into software engineering, computer science, do you think there's
65:59 specific elements of computer science that are mo more and more important to
66:03 lean into maybe things they don't need to worry about? Like what do you think
66:06 people should be leaning into skill-wise in as this becomes more and more of a
66:11 thing in our workplace? I think there's like a couple angles you could go at
66:18 this from. Um, I think the, well, the easiest one to think of at
66:24 least is just like be a doer of things. Um, I think that, you know, with coding
66:28 agents, um, getting better and better over time. It's just what you can do as
66:33 even like someone in college or a new grad is just like so much more than what
66:37 that was before. And so, I think you just want to be taking advantage of
66:40 that. You know, definitely when I'm looking at like hiring folks who are
66:43 earlier career, it's like definitely something that I think about is how how
66:47 productive are they using the latest tools, right? They should be like super
66:51 productive. And if you think of it in that way, they actually have like less
66:55 of a handicap than before versus a more senior career person because, you know,
66:59 the divide is actually getting smaller because they've got these amazing coding
67:02 agents now. Um, so that's one thing which is like I guess the thing the
67:05 advice is just like learn about whatever you want but just make sure you spend
67:08 time doing things not just like fulfilling homework assignments. I guess
67:12 I think the other side of it though is that it's still deeply worth
67:17 understanding like what makes a good like overall software system. So I still
67:22 think that like skills like really strong systems engineering skills or
67:27 even like really effective like communication and collaboration with
67:31 your team, skills like that I think are are important are going to continue to
67:35 matter for for quite some time. Like I don't think it's going to be like all of
67:39 a sudden uh the AI coding agents are just able to build like perfect systems
67:43 without your help. I think it's going to look much more gradual where it's like
67:48 okay we have these AI coding agents they're able to validate their work it's
67:52 still important and like for example like I'm thinking of an engineer who was
67:55 working on Atlas since we were talking about it he set up codeex so it can like
67:59 verify its own work which is a little bit non-trivial because of the nature of
68:02 the Atlas project. So the way that he did that was he actually prompted codeex
68:05 like hey why can't you verify your work fix it and like did that on a loop right
68:11 and so you still like at various phases are going to want a human in the loop to
68:15 like help configure the coding agent to be effective and so I think like you
68:19 still want to be able to reason about that so maybe it's like less important
68:23 that you can like type really fast and like you understand exactly how to write
68:27 not that anyone writes a you know for each loop or something right but it is
68:31 or you know you don't need to know how implement like a specific algorithm. But
68:33 I think you need to be able to reason about the different systems and like
68:36 what makes like effective a software engineering team effective. So I think
68:40 that's the other really important thing. And then like maybe the last angle that
68:44 you could take is I think if you're on the frontier of knowledge for a given
68:49 thing, I still think that's like deeply interesting to go down partially because
68:54 that knowledge is still going to be like uh you know agents aren't going to be as
68:58 good at that. But also partially because I think that like by trying to advance
69:01 the frontier of a specific thing, you'll actually like end up like being forced
69:05 to take advantage of coding agents and like using them to accelerate your own
69:09 workflow as you go. >> What's an example that when you when you
69:12 talk about being at the frontier? So >> Codex writes a lot of the code that
69:15 helps like manage its training runs, the key infrastructure. Uh you know, we move
69:21 pretty fast and so we have a Codex code review is like catching a lot of
69:23 mistakes. It's actually caught some like pretty interesting configuration
69:27 mistakes and uh you know we're starting to see glimpses of the future where
69:31 we're actually starting to have codeex even like be on call for its own
69:36 training which is pretty interesting. Um so there's lots there.
69:39 >> Uh wait what does that mean to be on call for its own training? So it's
69:42 running it's training and it's like oh something broke someone needs and it it
69:45 does it like alert people or it's like here I'm going to fix the problem and re
69:48 restart. This is an early idea that we're like figuring out, but the basic
69:51 idea is that you know during a training run there's like a bunch of graphs that
69:54 like today like humans are looking at and it's like really important to like
69:58 look at those. Um we call this babysitting >> because it's very expensive to train I
70:02 imagine and very important to move fast and exactly and there's a lot of there's
70:06 a lot of systems underlying uh the training run and so like a system could
70:09 go down or there could be an error somewhere that gets introduced and so we
70:13 might need to like fix it or pause things or I don't know there's lots of
70:16 actions we might need to take and so basically having codeex like run on a
70:20 loop to like evaluate how those charts are moving over time um is sort of this
70:24 idea that we have to like how to enable us to like train like way more
70:27 efficiently. I love that. This is very much along the lines of this is the
70:31 future of agents. It's codeex isn't just for building code, right? It's it's a
70:34 lot more than that. >> Yeah. >> Okay. Last question. Uh being at OpenAI,
70:41 uh I can't not ask about your AGI timeline and how far you think we are
70:45 from AGI. I know this isn't what you work on, but there's a lot of opinions,
70:50 a lot of I don't know timelines. How far do you think we are from a humanly human
70:56 version of AI? Whatever that means to you. For me, I think that it's a little
71:01 bit about like when do we see the acceleration curves kind of go like this
71:03 or I don't know which way I'm mirrored here, right? When do we see the hockey
71:08 stick? And I think that the current limiting factor, I mean there's many,
71:11 but I think a current underappreciated limiting factor is like literally human
71:16 typing speed or human multitasking speed on like writing prompts,
71:20 right? And like you know, you were talking about it's like you can have an
71:22 agent like watch all the work you're doing, but if you don't have the agent
71:27 uh also validating its work, then you're still bottlenecked on like can you go
71:30 review all that code, right? So my view is that we need to um unblock those
71:36 productivity loops from like humans having to prompt and humans having to
71:40 like manually validate all the work. And so if we can like rebuild systems to let
71:45 the agent like be default useful, we'll start unlocking hockey sticks.
71:48 Unfortunately, I don't think that's going to be binary. I think it's going
71:51 to be very dependent on what you're building, right? So like I would imagine
71:55 that like next year if you're a startup and you're building a new new piece of
71:59 like you know some new app or something it'll be possible for you to set it up
72:02 on a stack where agents are like much more self sufficient than not right but
72:07 now let's say I don't know you message SAP right let's say you work in SAP like
72:11 they have many like complex systems and they're not going to be able to just
72:13 like get the agent to be self-sufficient overnight in those systems so they're
72:17 going to have to slowly like maybe replace systems or update systems to
72:21 allow the agent to like handle more of the work end to end. And so basically my
72:25 sort of long answer to your question, maybe boring answer is that I think
72:29 starting next year we're going to see like early adopters like starting to
72:33 like hockey stick their productivity. Um and then over the years that follow,
72:36 we're going to see larger and larger companies like hockey stick that
72:39 productivity. And then somewhere in that fuzzy middle is like when that hockey
72:44 sticking will be like flowing back into the AI labs and that's when we'll we'll
72:48 basically be at the AGI tier. >> I love this answer. It's very practical
72:52 and it's something that comes up a lot on this podcast just like the time to
72:55 review all the things AI is doing is really annoying and a big bottleneck. I
72:59 love that you're working on this because it's one thing to just make coding much
73:03 more efficient and do that for people. It's another to take care of that final
73:08 step of okay is this actually great? And that's so interesting that your sense is
73:11 that's the limiting factor. It comes back to your earlier point of even if AI
73:16 did not advance anymore. We have so much more potential to unlock if we uh as we
73:22 learn to use it more effectively. Uh so that is a really unique answer. I
73:25 haven't heard that perspective on what is the big unlock human typing speed to
73:29 review basically what AI is doing for us. >> Mhm. So good. Okay. Uh Alexander, we
73:35 covered a lot of ground. Is there anything that we haven't covered? Is
73:38 there anything you wanted to share, maybe double down on before we get to
73:44 our very exciting lightning round? I think uh one thing is that the codeex
73:48 team is growing and uh as I was just saying, we're still somewhat limited by
73:51 human thinking speed and human typing speed. We're working on it. So um if
73:58 you're an engineer um or a salesperson or I am hiring for product, a product
74:03 person, uh please hit us up. I'm not sure the best way to give contact info,
74:06 but I guess you can go to our jobs page or do they have contact for you?
74:10 Actually, do listeners have contact for you >> before they send me like, "Hey, I want
74:13 to apply to Codex." >> Uh, I do have a contact form at lenny
74:16 richchi.com. I'm afraid of all the amazing people that are ping me. But
74:19 there we go. We could try that. Let's see how that goes. >> Okay. Or Yeah. Or another maybe an
74:24 easier. We can edit all that out or up to you. But uh yeah, or I would just say
74:28 you can drop us a DM. Uh, for example, I'm Emir Rico on Twitter and hit me up
74:32 if you're interested in joining the team. >> What a dream job for so many people.
74:38 What's a sign they I don't know what's like a way to filter people a little bit
74:42 so they're not flooding your inbox. >> So, specifically, if you want to join
74:46 the codeex team, then you need to be a technical person who uses these tools.
74:50 And I think I would just ask yourself the question, uh, hey, let's say, you
74:54 know, I were to join OpenAI and work on Codeex over the next six months, you
74:59 know, and crush it. What does the life of a software engineer look like then?
75:02 And I think if you have an opinion on that, you should apply. And if you don't
75:05 have an opinion on that and have to think about it first, you know,
75:09 depending on how long you have to think about it, I guess that would be the
75:12 filter, right? Like I think there's a lot of people thinking about the space
75:16 and so we're we're very interested in folks who sort of have already been
75:21 thinking about like what the future should look like with agents and like we
75:23 don't have to agree on where where we're going but I think we want people who
75:26 like are very passionate about the topic. I guess >> it's very rare to be working on a
75:32 product that has this much impact and is at such a bleeding edge of where it's
75:37 possible. It's uh what a cool role for the right person. So, uh, um, it's
75:40 awesome that you have an opening and this audience is, uh, a really good fit
75:45 potentially for for that role. So, I hope we find someone that would be
24:59 integrated product and research team. How do you think you win in this space?
25:04 Do you think it it'll event it'll always be this kind of like race with other
25:08 models constantly kind of leaprogging each other? Do you think there's a world
25:11 where someone just t runs away with it and no one else can ever catch up? Is
25:15 there like a path to just we win? >> Again comes back to this idea of like
25:19 building a teammate and not just a teammate that you know uh participates
25:24 in team planning and prioritization. Not just a teammate that you know really
25:27 tests its code and like helps you maintain and deploy. But even a teammate
25:31 you know like if you think again an engineering teammate they can also like
25:34 schedule a calendar invite right or move standup or do whatever right. And so in
25:42 my mind, if we just imagine that every day or every week some like crazy new
25:46 capability is just going to be deployed by a research lab, it's just impossible
25:50 for us like you know as humans to keep up and like use all this technology. And
25:54 so I think we need to get to this world where you kind of just have like an AI
25:59 teammate or super assistant that you just talk to and it just knows how to be
26:04 helpful like on its own, right? And so you don't you don't have to be like
26:07 reading the latest tips for how to use it. You just like you've plugged it in
26:11 and it just provides help. And so that's kind of the shape of what I think we're
26:14 building. And I think that will be like a very sticky like winning product if we
26:18 can do so. So the shape that in my head at least I have is that we build you
26:23 know maybe a fun topic is like is chat the right interface for AI? I actually
26:27 think chat is a very good interface when you don't know what you're supposed to
26:30 use it for. uh in the same way that if I think of like I'm like on a teams or in
26:34 Slack with a teammate, chat is pretty good. I can ask for whatever I want,
26:37 right? It's like it's kind of the the common denominator for everything. So
26:40 you can chat with a super assistant about whatever topic you want, whether
26:45 it be coding or not. And then if you are like a functional expert in a specific
26:49 domain such as coding, there's like a guey that you can pull up to go really
26:54 deep and like look at the code and like work with the code. So I think like what
26:59 we need to build as open AI is basically this idea of like you have chat chatpt
27:02 PT and that is a tool that's like ubiquitously available to like everyone.
27:06 You start using it even like outside of work right to just help you. You become
27:09 very comfortable with the idea of being accelerated with AI. And so then you get
27:13 to work and you just can naturally just yeah I'm just going to ask it for this
27:16 and I don't need to know about all the connectors or like all the different
27:19 features. I'm just going to ask it for help and it'll surface to me the the
27:23 best way that it can help at this point in time and maybe even chime in when I
27:27 didn't ask it for help. Um, so in my mind, if we can get to that, I think
27:30 that's, you know, that's how we we really build like the winning product.
27:34 This is so interesting because with the my chat with Nick Charlie, the head of
27:37 chat JPT, I think he shared that the original name for Chat JPT was super
27:41 assistant or something like that. >> Yeah. >> And it's interesting that there's like
27:46 that approach to the super assistant and then there's this codeex approach. It's
27:49 almost like the B TOC version and the B2B version. And what I'm hearing is the
27:53 idea here is okay, you start with coding and building and then it's doing all
27:56 this other stuff for you, scheduling meetings, I don't know, probably posting
28:01 in Slack, uh I don't know, shipping designs, I don't know. Is that is the
28:04 idea there? This is like the the business version of ChatGpt in a sense.
28:08 Or is there or is there something else there? >> Yeah. So, you know, so we're getting to
28:12 the like the like one-year time horizon conversation. A lot of this might happen
28:16 sooner, but in terms of fuzziness, I think we're at the one year. So I'll
28:19 give you like a contention in like the plausible way we get there, but as for
28:23 how it happens, who knows? So basically, if we're going to build a super
28:26 assistant, it has to be able to do things, right? So like we're going to
28:29 have a model and it's going to be able to do stuff affecting your world.
28:33 >> And one of the learnings I think we've seen over the past year or so is that
28:38 for models to do stuff, they're much more effective when they can use a
28:41 computer, right? Okay. So now we're like, okay, we need the super assistant that can use a
28:47 computer, right? or many computers. And now the question is, okay, well, how
28:50 should it use the computer, right? And there's lots of ways to use a computer.
28:54 Uh, you know, you could try to hack the OS and like use accessibility APIs.
28:57 Maybe a bit easier is you could point and click. That's a little slow, you
29:02 know, and, uh, unpredictable sometimes. Um, and another way, it turns out the
29:06 best way for models to use computers is simply to write code, right? And so
29:09 we're kind of getting to this idea where like, well, if you want to build any
29:12 agent, maybe you should be building a coding agent. And maybe to the user, a
29:17 nontechnical user, they won't even know they're using a coding agent. The same
29:19 way that no one thinks about are they using the internet or not, which is
29:22 they're more just like is Wi-Fi on? Right? So I think that what we're doing
29:27 with codeex is we're building a software engineering teammate. And as part of
29:30 that, we're kind of building an agent that can use uh a computer by writing
29:36 code. And so we're already seeing like some pull for this. It's like quite
29:39 early, but we're starting to see people like who are using codeex for like
29:43 coding adjacent product purposes. And so as that develops, I think we'll
29:47 just naturally see that like, oh, it turns out like we should just always
29:50 have the agent write code if there is a coding way to solve a problem instead
29:53 of, you know, even if you're doing a financial analysis, right? Like maybe
29:56 write some code for that. So basically like, you know, you were like, hey, is
29:59 this like the two ends of of uh of this product for the super assistant, right,
30:03 of CHCH PT? In my mind, like just coding is a core competency of any agent,
30:06 including Chach PT. And so like what really what we think we're building is
30:10 like that competency. But so here's here's like the really cool thing about
30:13 agents writing code is that you can import code right code is like
30:19 composable interoperable right because if if we you know one very reductive
30:23 view we could have for an agent is it's just going to be given a computer and
30:26 it's just going to like point and click and you know go around but you know that
30:32 is the future and then how we get there is difficult to sort of chart a path
30:36 because a lot of the questions around building agents aren't like can the
30:41 agent do it but it's more about well how can we help the agent understand the
30:44 context that it's working in and like the team that's using it you know
30:47 probably has a way that they like to do things they have guidelines they
30:50 probably want certain deterministic guarantees about what the agent can or
30:54 cannot do or they want to know that the agent understands sort of this detail
30:59 like an example would be you know if we're looking at a crash reporting tool
31:04 hitting a connector for it every sub team is probably has a different meta
31:07 prompt for like how they want the crashes to be analyzed ized, right? And
31:12 so we start to get to this thing where like, yeah, we have this agent sitting
31:15 in front of a computer, but we need to make that configurable for the team or
31:19 for the user, right? And let them like stuff that the agent does often, we
31:22 probably just want to like build in as a competency that this agent has that it
31:27 can do. So I think we end up with this generalizable thing that you were saying
31:31 of like an agent that can just write its own scripts for whatever it wants to do.
31:36 But I think that the the really key part here is can we make it so that
31:40 everything that the agent has to do often or that it does well we can just
31:44 like remember and store so that the agent doesn't have to write a script for
31:47 that again. Right. Or maybe like if I just joined a team and you are already
31:51 on the same team as me. I can just like use all those scripts that the agents
31:53 had written already. >> Yeah. It's like if this is our teammate
31:57 uh we can they can share things that it's learned from working with other
32:00 people at the company. Just makes sense as a metaphor. >> Yeah. It feels like you're in the uh
32:05 Karpathy camp of agents today are not that great and mostly slop and maybe in
32:09 the future they'll be awesome. Does that resonate? >> I think so. I think coding agents are
32:14 pretty great. I think >> uh ton of value, >> right? Yep.
32:19 >> And then I think like agents outside of coding, it's still like very early and
32:23 you know, this is just my opinion, but I think they're going to get a whole lot
32:26 better once they can use coding too and like in a composable way.
32:29 This is it's kind of the fun part of like when you're building for software
32:33 engineers. Like I at my startup we were building for software engineers too for
32:36 a lot of that journey and they're just such a fun audience to build for because
32:41 you know they also like building for themselves and are often like even more
32:45 creative than we are and thinking about how to use the technology. Um and so
32:48 like by building for software engineers you get to just observe a ton of
32:52 emergent behaviors and like things that you should do and build into the
32:55 product. I love how you you say that because a lot of people building for
32:57 engineers get really annoyed because the engineers are so they're just always
33:00 complaining about stuff. They're like, "Ah, that sucks. Why'd you build it this
33:04 way?" I love that you enjoy it, but I think it's probably because you're
33:06 building such an amazing tool for engineers that can actually solve
33:11 problems and just, you know, code for them. Um, kind of along those lines, you
33:15 know, there's always this talk of what will happen with jobs, engineers,
33:18 coding, do you have to learn coding, all these things? Uh clearly the way you're
33:21 describing it is it's a teammate. It's going to work with you, make you more
33:24 superhuman. It's not going to replace you with the way you just think about
33:28 the impact on the field of engineering having this super intelligent
33:33 engineering teammate. I think there's there's two sides to it, but the one we
33:37 were just talking about is this idea that maybe every agent should actually
33:43 use code and be a coding agent. And in my mind, that's just like a small part
33:46 of this like broader idea that like, hey, as we make code even more
33:48 ubiquitous, I mean, you could probably claim it's ubiquitous today, even pre
33:51 AAI, right? But as we make code even more ubiquitous, it's actually just
33:56 going to be used for many more purposes. And so there's just going to be a ton
33:59 more need for people with this like humans with this competency. So that's
34:05 my view. I think this is like quite a complex topic. So, you know, it's
34:08 something we talk about a lot and we have to kind of see how it pans out. But
34:12 I think what we can do what we can do basically as a product team building in
34:15 the space is just try to always think about how are we building a tool so that
34:18 it feels like we're like maximally accelerating uh people you know rather
34:24 than building a tool that makes it like more unclear what you should do as the
34:29 human right like I think like to to you know give an example right now like
34:33 nowadays when you work with a coding agent um it writes a ton of code but it
34:36 turns out writing code is actually one of the most fun parts of software
34:40 engineering for many software engineers. is so then you end up reviewing AI code,
34:45 right? And that's often a less fun part of the job for many software engineers,
34:49 right? And so I actually think like we see that like this this comes out plays
34:53 out all the time in like a ton of micro decisions. And so we as a product team
34:55 are always thinking about like okay, how do we make this more fun? How do we make
34:58 you feel more empowered whereas it's not working and I I would argue that like
35:01 reviewing agent written code is like a place that today is like less fun. And
35:06 so you know then I think okay what can we do about that? Well, we can ship a
35:09 code review feature that like helps you build confidence in the Irw written
35:12 code. Okay, cool. You know, another thing we can do is we can make it so
35:14 that the agent's like better able to validate its work. And you know, it gets
35:18 all the way down into like micro decisions like if you're going to have
35:23 the an agent capability to validate work and let's say you have like I'm thinking
35:27 of Codex web right now like you have a a pane that sort of reflects the work the
35:30 agent did. What do you see first? Do you see the diff or do you see the image
35:34 preview of the code it wrote? Right? And you know, I think if you're thinking
35:36 about this from perspective like how do I empower the human? How do I make them
35:40 feel like as as accelerated as possible like you obviously see the image first,
35:43 right? You shouldn't be reviewing the code unless first you know you've seen
35:46 the image unless maybe it's being like reviewed by an AI and now it's time for
35:49 you to take a look. When I had uh Michael Charel, the CEO of Cursor on the
35:53 podcast, he he had this kind of vision of us moving to something beyond code.
35:58 And I've seen this rise of something called specd driven development where
36:02 you kind of just write the spec and then the code, you know, the AI writes code
36:05 for you. And so you kind of start working at this higher abstraction
36:09 level. Is that something you see where we're going? Just like engineers not
36:12 having to actually write code or look at code and there's going to be this higher
36:16 level of abstraction that we focus on. Yeah, I mean I think I think there's
36:19 like constantly these levels of abstraction and they're actually already
36:23 played out today, right? Like today like coding agents mostly it's like prompt to
36:29 patch right we're starting to see people doing like spec driven development or
36:32 like planned driven development that's actually one of the ways when people ask
36:35 like hey how do you run codex on a really long task well it's like often
36:38 collaborate with it first to write like a plan MD like a markdown file that's
36:42 your plan and once you're happy with that then you ask it to go off and do
36:46 work and if that plan has verifiable steps it'll like work for much longer.
36:51 Um so we're totally seeing that. I think spec driven development is like an
36:55 interesting idea. It's not clear to me that it'll work out that way because a
36:57 lot of people don't write like don't like writing specs either, but it seems
37:02 plausible that some some people will work that way. You know, like a a bit of
37:06 a joke idea though is like if you think of like um the way that many teams work
37:11 today, they're they often like don't necessarily have specs, but the team is
37:14 just really self-driven and so stuff just gets done. And so almost that is
37:17 like I'm coming up with this on the spot so it's you know not a good name but
37:21 like chatterdriven development where it's just like stuff is happening you
37:24 know on social media and like in your team communications tools and then as a
37:28 result like code gets written and deployed right so yeah I think I'm a
37:33 little bit more oriented in that way of you know I don't even necessarily want
37:37 to have to write a spec like sometimes I want to only if I like writing specs
37:42 right uh other times I might just want to say like hey here's like the
37:45 customer, you know, service channel and like tell me what's interesting to know,
37:49 but if it's a small bug, just fix it. I don't want to have to write a spec for
37:51 that, right? >> I have this sort of uh hypothetical future uh that I like to
37:58 share sometimes with people as a provocation, which is like in a world
38:01 where we have like truly amazing agents, like what does it look like to be a
38:04 soloreneur? Um, and uh, you know, one terrible idea for how it could look is that it's
38:12 actually there's a mobile app and um, every idea that the agent has to do is
38:17 just like vertical video on your phone and then you can like swipe left if you
38:21 think it's a bad idea and you can like swipe right if it's a good idea and like
38:24 you can press and hold and like speak to your phone if you want to get feedback
38:28 on the idea before you swipe, you know. So in this world like basically what
38:31 your job is just to like plug in this app into like every single like signal
38:36 system you know system of record and then you just sort of sit back and like
38:39 swipe. I don't know. >> I love this. So this is like Tinder
38:42 meets Tik Tok meets codeex. >> It's pretty terrible. >> No, this is great. So the idea here is
38:47 this thing is this agent is watching and right listening to you paying attention
38:51 to the market your users and it's like cool here's something I should do. It's
38:54 like a proactive engineer just like here we should build this feature fix this
38:56 thing. >> Exactly. I think they're communicating with you in like the lowest like the
39:05 gyms like the modern way to communicate. >> Yeah. >> Swipe left or right and in vertical feed
39:10 and then the Sora video. Okay. So I see how this all connects now. I see.
39:13 >> Yeah. To be clear, we're not building that but like you know it's a fun idea.
39:17 I mean you see you know like in this example though like one of the things
39:19 that it's doing is it's consuming external signals right. I think the
39:23 other really interesting thing is like if we think about like what is the most
39:28 successful like AI product to date um I would argue um it's funny actually
39:34 not to confuse things at all but like the first time we used the the brand
39:38 codeex at OpenAI was actually the model powering GitHub copilot. This is like
39:42 way back in the day, years ago. And so we decided to reuse that that brand
39:45 recently um because it's just so good, you know, codeex code execution. But I
39:50 think actually like autocomp completion and IDEs is like one of the most
39:54 successful AI products to date. And part of what's so magical about it is that
40:01 when the it can surface like ideas for helping you really rapidly. When it's
40:05 right, you're accelerated. When it's wrong, it's not like that annoying. It
40:08 can be annoying, but it's not that annoying, right? And so you can create
40:12 this like mixed initiative system that's like contextually responding to like
40:17 what you're attempting to do. And so in my mind, this is like a really
40:21 interesting thing for us as open as we're building. So for instance, you
40:25 know, when I think about launching a browser, which we did with Atlas, right?
40:29 Like in my mind, one of the really interesting things we can then do is we
40:33 can then like contextually surface like ways that we can help you as you're
40:37 going about your day, right? And so we break out of this like, you know, we're
40:41 just looking at code or we're just in your terminal um into this idea that
40:44 like, hey, like a real teammate is dealing with a lot more than just code,
40:47 right? They're dealing with a lot of things that are web content. So like,
40:51 you know, how can we help you with that? >> Man, there's so much there and I love
40:55 this. Okay, so autocomplete on web with the browser. That's so interesting. just
40:58 like here's all the things that we can help you with as you're browsing and
41:01 going about your day. I want to talk about Atlas. I'll come back to that. Uh
41:05 codeex code execution. Did not know that. That's really clever. I I get it
41:10 now. Okay. And then this chatter, what is a chatter driven development? Uh I
41:14 had a No, this is a really good idea, but it reminds me I had John Gon on the
41:19 podcast, CTO of Block, and they they have this product called Goose, which is
41:24 their own internal agent thing. And he talked about an engineer at block just
41:30 uh has goose watch him with like his screen and listens to every meeting and
41:36 proactively does work that he should will probably want to do. So ships a PR
41:41 sends an email drafts a Slack message. So he's doing exactly what you're
41:44 describing in in kind of a very early way. >> Yeah, that's super interesting. And you
41:49 know, I bet you the So, if we go if we went and asked them what the bottleneck
41:52 to that productivity is, did did they share what it is? >> Uh, probably looking at it just making
41:57 sure this is the right the right thing to do. Yeah. >> Yeah. So, like we see this now like we
42:01 have a Slack integration for Codex. People love, you know, if there's like
42:04 some thing that you need to do quickly. People just like at mentioned Codex like
42:07 why do you think this bug is happening? Right. Doesn't have to be an engineer.
42:10 Even like maybe you know data scientists often here are using Codex a ton to just
42:14 like answer questions like why do you think this metric moved? What happened?
42:18 So questions you you get the answer right back in Slack. It's amazing, super
42:22 useful. But when it's as for when it's writing code, then you have to go back
42:27 and look at the code, right? And so the real like I think bottleneck right now
42:30 is like validating that the code worked and like writing code review.
42:34 So in my mind, if we wanted to get to something like uh you know that uh a
42:38 friend you were talking about world, I think we we really need to figure out
42:42 how to get people to configure their coding agents to be much more autonomous
42:46 on those later stages of the work. It makes sense like you said writing code.
42:49 I used to be an engineer as an engineer for 10 years. Really fun to write code.
42:53 Really fun to just get in the flow, build, architect, test. Not so fun to
42:56 look at everyone else's code and just have to go through and be on the hook if
43:00 it is doing something dumb that's going to take down production. And now that
43:03 building has become easier, what I've always heard from companies that are
43:06 really at the cutting edge of this is the bottleneck is now like figuring out
43:09 what to build and then it's at the end of like, okay, we have all this all 100
43:13 hours to review. Who's going to go through all that? >> Right. Yeah.
43:19 This episode is brought to you by Jira product discovery. The hardest part of
43:22 building products isn't actually building products. It's everything else.
43:26 It's proving that the work matters, managing stakeholders, trying to plan
43:30 ahead. Most teams spend more time reacting than learning, chasing updates,
43:34 justifying road maps, and constantly unblocking work to keep things moving.
43:39 Jira product discovery puts you back in control. With Jira product discovery,
43:43 you can capture insights and prioritize high impact ideas. It's flexible, so it
43:47 adapts to the way your team works and helps you build a road map that drives
43:51 alignment, not questions. And because it's built on Jira, you can track ideas
43:56 from strategy to delivery, all in one place. Less chasing, more time to think,
44:01 learn, and build the right thing. Get Jirroduct Discovery for free at
44:06 atlassian.com/lenny. That's atassian.com/lenny. What has the impact of Codex been on the
44:13 way you operate as a product person, as a PM? It's clear how engineering is
44:19 impacted. Uh, code is written for you. What has it done to the way you operate,
44:24 the way PMs operate at at OpenAI? Yeah, I mean I think mostly I just feel like
44:28 much more empowered. Um I've always been sort of more technical leaning PM and especially when
44:34 I'm working on products for engineers, I feel like it's necessary to like you
44:37 know dog food the product but even beyond that I I I just feel like I can
44:42 do much much more as a PM. And uh you know Scott Beltski talks about this idea
44:45 of like compressing the talent stack. I'm not sure if I've phrased that right,
44:48 but it's basically this idea that like maybe the boundaries between these roles
44:52 are a little bit like less needed than before because people can just do much
44:57 more and every time you someone can do more you can like skip one communication
45:00 boundary and make the team like that much more efficient, right? So I think I
45:07 think we see it you know in a bunch of functions now but I guess since you
45:11 asked about like product specifically uh you know now like answering questions
45:15 much much easier you can know just ask codeex for thoughts on that uh a lot of
45:20 like PM type work understanding what's changing again just ask codeex for help
45:25 with that um prototyping is often faster than writing specs this is something
45:29 that a lot of people have talked about I think something that I don't think it's
45:33 super surprising But something that's slightly surprising is like we see like
45:36 we're mostly building codecs for to write code that's going to be deployed
45:40 to production but actually we see a lot of throwaway code written with codeex
45:43 now. It's kind of going back to this idea of like you know ubiquitous code.
45:48 So you'll see uh you know someone wants to do an analysis like if I want to
45:51 understand something it's like okay just give codeex a bunch of data but then ask
45:54 it to build like an interactive like data viewer for this data right you
45:56 would that's just like too annoying to do in the past but now it's just like
46:00 totally worth the time of just getting an agent to go do something. Um,
46:04 similarly, I've seen like some pretty cool prototypes on our design team about
46:09 like if you want to well like a designer basically wanted to build an animation
46:13 and this is the coin animation in codeex and it was like normally it'd be too
46:17 annoying to program this animation. So they just vibe coded a animation editor
46:21 and then they use the animation editor to build the animation which they then
46:25 checked into the repo. Actually, our designers are there's a ton of
46:28 acceleration there. And like speaking of compressing the town stack, I think our
46:31 designers are very PM. So, you know, they they do ton of product work. And like they actually
46:38 have like an entire like vibecoded sort of side prototype of the Codex app. And
46:41 so, a lot of how we talk about things is like we'll have like a really quick jam
46:44 because there's like 10,000 things going on. And then designer will like go think
46:48 about how this should work, but instead of like talking about it again, they'll
46:50 just like vibe code a prototype of that in their like standalone prototype.
46:54 We'll play with it. If we like it, they'll vibe code that prototype into or
46:59 vibe engineer that prototype into an actual PR to land. And then depending on
47:02 their comfort with the codebase, like codeex CLI and Rust is a little harder.
47:06 Maybe they'll like land it themselves or they'll like get close and then an
47:09 engineer can help them like land the PR. Um, you know, we recently shipped the
47:15 Sora Android app. Um and uh that was one of the most sort of mind-blowing
47:19 examples of acceleration actually because usage of of codeex internally at
47:24 open is obviously really really high but it's been growing uh over the course of
47:28 the year both in terms of like now it's basically like all technical staff use
47:32 it uh but even like the intensity and knowhow of how to make the most of
47:35 coding agents has gone up by a ton and so the Sora Android app right like a
47:42 fully new app we built it in 18 days it went from like zero to launch to
47:46 employees and then 10 days later so 28 days total we went to just like GA to
47:51 the public and that was done just like with the help of Codex
47:56 so pretty insane velocity I would say it was like a little bit I don't want to
48:01 say easy mode but there is one thing that Codex is really good at if you're a
48:04 company that's like building software on multiple platforms so you've already
48:07 figured out like some of the underlying like APIs or systems asking codeex to
48:13 like to port things over is really effective because it has like something
48:15 you can go look at. And so the engineers on that team uh were basically having
48:20 codeex go look at the iOS app, produce plans of work that needed to be done and
48:23 then go implement those. And it was kind of looking at iOS and Android at the
48:27 same time. And so you know basically it was like two weeks to launch to
48:30 employees four weeks total. Insanely fast. >> What makes that even more insane is it
48:35 was the it became the number one app in the app store. >> I don't this just boggles the mind.
48:39 Okay. So >> yeah. So imagine releasing number one app on the app store with like a handful
48:45 of engineers >> uh I think it was like >> two or three possibly
48:53 >> uh in a handful of weeks. Yeah, this is absurd. So >> yeah, so that's a really fun um example
49:01 of uh acceleration. And then like Atlas was the other one that I think um Ben
49:06 did a podcast the the the engine on Atlas uh sharing a little bit of how we
49:12 built there. You know many Atlas is is actually I mean it's it's a browser
49:15 right and building a browser is really hard. Um and so we uh had to build a lot
49:23 of difficult systems in order to do that and basically we got to the point where
49:27 that team has a ton of power users of codecs right now. And um you know it got
49:32 to the point where they they basically were you know we were talking to them
49:34 about it because a lot of those engineers are people I used to work with
49:38 before at my startup and so they'd say you know before this would have taken us
49:42 like two to three weeks for two to three engineers and now it's like one engineer
49:48 one week. Um so massive acceleration there as well. And what's quite cool is
49:52 that uh you know we we shipped Atlas on on Mac first but now we're working on
49:56 the Windows version. you know that so the team now is like ramping up on
49:58 Windows and they're helping us make codecs better on Windows 2 which is
50:02 admittedly earlier like just the model we we shipped last week is the first
50:06 model that natively understands PowerShell. So you know PowerShell being
50:11 uh the native like shell language on Windows. So yeah, it's been it's been
50:16 really awesome to see like the whole company getting accelerated by codeex
50:21 like from and you know most obviously also research and like improving how
50:24 quickly we train models and how well we do it and then even like uh design as we
50:28 talked about and and marketing like actually we're at this point now where
50:32 uh my product marketer is often also making string changes just directly from
50:36 Slack or like updating docs directly from Slack. >> These are amazing examples. You guys are
50:42 living at the bleeding edge of what is possible and this is how other companies
50:46 are going to work. Uh just shipping again what became the number one app in
50:49 the app store and just beloved all over the it just like took over the I don't
50:54 know the world for at least a week. Uh built you said in 28 days and like I
50:58 don't know 10 days 18 days just to get like the core of it working.
51:02 >> Yeah. So like 18 days we had a thing that employees were playing with and
51:05 then 10 days later we were out. >> And you said just a couple engineers.
51:07 >> Yeah. >> Two or three. Okay. And then Atlas you
51:11 said was took a week to build. >> No, no, no. So Atlas, not the whole
51:16 week, but Atlas was like a really meaty project. >> Yeah.
51:18 >> Um and so I was talking to one of the engineers on Atlas um about like you
51:23 know just how what they use codex for and it's basically like we use codex for
51:25 absolutely everything. I was like okay well like you know how would you how
51:29 would you measure the acceleration? And so basically the the answer I got back
51:31 was >> previously it would have taken two to three weeks for two to three engineers
51:36 and now it's like one engineer one week. Do you think this eventually moves to
51:39 non-engineers doing this sort of thing? Like does it have to be an engineer
51:42 building this thing? Could sort of have built been built by I don't know a PM or
51:46 designer. I think we will very much get to the point where well basically where
51:50 the boundaries are a little bit blurred, right? Like I think you're going to want
51:54 someone who's like understands the details of what they're building, but
51:58 what details those are will evolve. Kind of like how now like if you're writing
52:02 Swift, you don't have to speak assembly. You know, there's a handful of people in
52:05 the world and it's really important that they exist. and like speak assembly. Uh
52:09 maybe more than a handful, right? But that's like a specialized function that
52:14 like most companies don't need to have. So I think we're just going to naturally
52:17 see like an increase in layers of abstraction. And then the cool thing is
52:21 now we're entering like the language layer of abstraction like natural
52:25 language. And then natural language itself is really flexible, right? Like
52:29 you could have engineers talking about like a plan and then you could have
52:32 engineers talking about a spec and then you could have engineers talking about
52:35 just, you know, a product or an idea. So I think we can also like start moving up
52:39 those layers of of abstraction as well. But you know I I do think this is going
52:43 to be gradual. I don't think it's going to go to like all of a sudden like
52:46 nobody ever writes anything and like you know any code and it's just specs. I
52:49 think it's going to be much more like okay we've set up our coding agent to be
52:53 really good at like previewing the build or like at running tests. Maybe that's
52:56 the first part right that most people have set up. And it's like okay now
52:59 we've set it up so that it can like execute the build and it can like see
53:03 the results of its own changes but you know we haven't yet built a good
53:06 integration harness so that it can like in the case of Atlas like by the way I
53:08 don't know if they've done any of this or not I think they've done a lot of
53:11 this but you know maybe the next stage is like enable it to like load a few
53:16 sample pages to see how well those work right so then okay now we're going to
53:19 like set up set up do that and I think for some time at least we're going to
53:22 have humans kind of curating like which of these connectors or systems or
53:26 components that the agent needs to be good at talking to and then you know in
53:30 the future there will be an even greater unlock where Codex tells you how to set
53:34 it up or maybe sets itself up in a repo. What a wild time to be alive. Wow. I'm
53:38 curious just the second order effects of this sort of thing. Just how quickly it
53:42 is to build stuff. What does that do? Does that mean distribution becomes much
53:46 much more important? Does it mean uh ideas are just worth a lot more? It's
53:50 interesting to think about how quick how that changes. >> I'm curious what you think. I still
53:56 don't think ideas are worth as much as maybe some a lot of people think. I
53:59 think still think execution is really hard, right? Like you can build
54:01 something fast, but you still need to execute well on it. Still needs to make
54:06 sense and be a coherent thing overall. Um Yeah. And distribution is massive.
54:10 >> Yeah. Just feels like everything else is now more important. Everything that
54:13 isn't the building piece, which is >> coming up with an idea, getting to
54:17 market, profit, >> all that kind of stuff. I I think we
54:21 might have been in this weird temporary phase where you know for a while like
54:26 you could you could just it was so hard to build product that you mostly just
54:31 had to be really good at building product and it maybe didn't matter if
54:34 you like had an intimate understanding of a specific customer.
54:39 Um, but now I think we're getting to this point where actually like if I
54:42 could only choose like one thing to understand, it would be like really
54:46 meaningful understanding of like the problems that a certain customer has,
54:49 right? If I could only if I could only go in with one like core competency. So
54:54 I think that that's that's ultimately still what's going to matter most,
54:57 right? Like if you're starting a new company today and you have like a really
55:02 good understanding and like network of customers that are currently underserved
55:05 by AI tools, I think you're like you're set, right? Whereas if you're like good
55:09 [clears throat] at building like you know websites, but you don't have any
55:12 specific customer to build for, I think you're in for a much harder time.
55:17 Bullish on vertical AI startups is what I'm hearing. Yeah, I completely agree.
55:20 There's like, you know, there's like the general thing that can solve a lot of
55:23 problems and then there's like we're going to solve presentations incredibly
55:25 well and we're going to understand the presentation problem uh better than
55:30 anyone and we're going to uh plug into your workflows and all these other
55:33 things that matter for a very specific problem. Okay. Incredible. When you
55:39 think about progress on codecs, I imagine you have a bunch of evals and
55:42 there's all these public benchmarks. What's something you look at to tell
55:45 you, okay, we're making really good progress. I imagine it's not going to be
55:48 the one thing, but what do you focus on? What's like something you're trying to
55:51 push? What's like a KPI or two? One of the things that I'm constantly reminding
55:56 myself of is that a tool like Codex sort of naturally is a tool that you would,
56:00 you know, become a power user of, right? And so we can accidentally spend a lot
56:03 of our time thinking about features that are like very deep in the user adoption
56:08 journey. Um, and so we can kind of end up oversolving for that. And so I think
56:12 it's like just critically important to like go look at like your like D7
56:16 retention, right? just go try the product. Like sign up from scratch
56:19 again. Um I have a few too many like catchup pro accounts that I've just like
56:24 in order to maximally correctly dog food like signed up for on my Gmail and they
56:27 charge me like 200 bucks a month. I need to expense those. But uh uh you know
56:33 like I think just like the feeling of being a user and the early retention
56:37 stats are still like super important for us because you know as much as this
56:41 category is is taking off I think we're still in the very early days of like
56:45 people using them. Um, another thing that we do that that might might be I
56:51 think we might be the most like user feedback slashsocial media pill team out
56:56 there in this space is like a few of us are like constantly on Reddit and
57:01 Twitter and uh you know there's a there's praise up there and there's a
57:04 lot of complaints but we take the complaints like very seriously and look
57:08 at them and I think that again because you can use like coding agent for so
57:12 many different things um it often is like kind of broken in any sort of ways
57:17 for like specific behaviors. Um, and so we we actually monitor a lot just like
57:20 what the vibes are on social media pretty often, especially I think for for
57:27 Twitter X, um, it's a little bit more hypy and then Reddit is a little more
57:34 negative but real actually. Um, so I've started increasingly paying attention to
57:37 like how people are talking about using Codex on Reddit. Actually,
57:41 >> this is uh important for people to know. Which the subreddits do you check most?
57:44 Is there like an R codeex or >> I mean the algorithm is pretty good at
57:48 surfacing stuff but like r/codex is is there >> okay I'll take very interesting and then
57:52 uh if people tag you on Twitter you still see that but maybe not as powerful
57:56 as seeing it on Reddit. >> Well yeah the interesting well the thing
57:58 with Twitter is it's a little bit more onetoone even if it's like in public
58:01 whereas like with Reddit there's like really good upvoting mechanics and like
58:05 maybe most people are still not bots unclear. Um so you get you get like good
5:18 here and welcome to the podcast. >> Thank you so much. I've been following
5:21 for ages and I'm excited to be here. >> I'm even more excited. I really
5:24 appreciate that. I want to start with your time at Open AI. So, you joined
5:30 OpenAI about a year ago. Before that, you had your own startup for about 5
5:34 years. Before that, you were a product manager at Dropbox. I imagine OpenAI is
5:39 very different from every other place you've worked. Let me just ask you this.
5:44 What is most different about how OpenAI operates? And what's something that
5:47 you've learned there that you think you're going to take with you wherever
5:50 you go, assuming you ever leave? By far, I would say the speed and ambition of
5:54 working at OpenAI are just like dramatically more than what I can
5:58 imagine. And you know, I guess it's kind of an embarrassing thing to say because
6:01 you, you know, everyone who's a startup founder thinks like, "Oh yeah, my
6:04 startup moves super fast and the talent bar is super high and we're super
6:07 ambitious." But I have to say like working at OpenAI just kind of like made
6:10 me reimagine what he what that even means. We hear this a lot about, you
6:14 know, feels like every AI company is just like, "Oh my god, I can't believe
6:17 how fast they're moving." Is there an example of just like, "Wow, that
6:19 wouldn't have happened this quickly anywhere else." >> The most obvious thing that comes to
6:23 mind is just like the the explosive growth of codeex itself. I think it's a
6:27 while since we bumped our external number, but like you know it's like the
6:32 the 10xing of Codeex's scale was just like super fast in a matter of months
6:37 and it's like well more since then and you know like once you've lived through
6:40 that or at least speaking for myself like having lived through that now I
6:45 feel like anytime I'm going to spend my time on like you know building tech
6:49 product there's that kind of that speed and scale that I now need to to to meet.
6:54 If I think of like what I was doing in my startup, it moved like way slower.
6:58 And I, you know, there's always this balance with startups of like how much
7:01 do you commit to an idea that you have versus like find out that it's not
7:06 working uh and then pivot. But I think one thing I've realized at OpenAI is
7:09 like the the amount of impact that we can have and in fact need to have to do
7:13 a good job is so high that it it's I have to be like way more ruthless with
7:16 how I spend my time. Before we get to codeex, is there a way that they've
7:20 structured the org or I don't know the way that open operates that allows the
7:23 team to move this quickly because everyone everyone wants to move super
7:27 fast. I imagine there's a structural approach to allowing this to happen.
7:30 >> I mean, so one thing is just the technology that we're building with has
7:35 like just transformed so many things, you know, from like both how we build
7:39 but also like what kinds of things we can enable uh for users. And you know we
7:43 spend most of our time talking about like the sort of improvements within the
7:47 foundation models but I I believe that even if we had no more progress today
7:51 with models which is absolutely not the case but if even if we had no more
7:55 progress we are way behind on product. There's so much more product to build.
7:59 >> So I think like just like the moment is ripe if that makes sense.
8:03 >> But I think there's a lot of sort of counterintuitive things that surprised
8:06 me when I arrived as far as like how things are structured. One example that
8:10 comes to mind is like when I was working on my startup and and before that when I
8:12 was a dropbox, it was like very important, you know, especially as a PM
8:16 to like always kind of rally the ship and it was kind of like make sure you're
8:18 pointed in the right direction and that you can like accelerate in that
8:24 direction. But here I think because we don't exactly know like what
8:27 capabilities will even come up soon and we don't know what's going to work uh
8:31 technically and then we also don't know what's going to land even if it works
8:34 technically. It's much more important for us to be very like humble and learn
8:39 a lot more empirically and just try things quickly and like the org is is
8:44 set up in that way to to be incredibly bottoms up. You know, this is again one
8:47 of those things that like as you were saying, everyone wants to move fast. I
8:50 think everyone likes to say that they're bottoms up or at least a lot of people
8:53 do, but OpenAI is like truly truly bottoms up and that's like been a
8:58 learning experience for me that now like it it'll be interesting if I ever work
9:02 at like I don't think it'll ever it'll even make sense to work at a nonAI
9:05 company in the future. I don't even know what that means. But if I were to
9:08 imagine it or go back in time, I think I would like run things totally different.
9:12 >> What I'm hearing is kind of this uh ready, fire, aim uh is the approach more
9:17 than ready, aim, fire. And this something and as you processed that uh
9:21 because that may not come across well but I actually have heard this a lot at
9:25 AI companies is because you don't know and Nick Charlie shared I think the same
9:28 sentiment because you don't know how people will use it. It doesn't make
9:31 sense to spend a lot of time making it perfect. It's better to just get it out
9:36 there in a primordial way see how people use it and then go big on that use case.
9:41 Yeah. It's like to okay to use this analogy a little bit I feel like there
9:44 there is an aim component but the aim component is much fuzzier. you know,
9:48 it's kind of like roughly what do we think can happen? like someone um I've
9:52 learned a ton from working here is a is a research lead and he likes to say that
9:57 like in open AI we can can have really good conversations about something
10:01 that's like a year plus from now and you know there's a lot of ambiguity in what
10:04 will happen but but like that's a right sort of timeline and then we can have
10:07 really good conversations about what's happening like in like low months or low
10:11 or weeks but there's kind of this like awkward middle ground which was like as
10:14 you start approaching a year but you're not at a year where it's like very
10:18 difficult to reason about right and so as far As far as like aiming, I think we
10:21 want to know like, okay, what are some of the futures that we're trying to
10:24 build towards and like a lot of the problems we're dealing with in AI, like
10:26 such as alignment, are problems you need to be thinking out like really far out
10:30 into the future. So, we're kind of aiming fuzzily there. But when it comes
10:34 down to the more tactically like, oh yeah, like what product will we build
10:37 and therefore how will people use that product? That's the place where we're
10:40 much more like let's find out empirically. >> That's a good way of putting it.
10:44 Something else that when people hear this, they people sometimes hear
10:49 companies like yours saying, "Okay, we're gonna be bottoms up. We're gonna
10:51 try a bunch of stuff. We're not going to have exactly a plan of where it's going
10:55 in the next few months." The key is you all hire the best people in the world.
10:59 And so that feels like a really key ingredient in order to be this
11:02 successful at Bottoms Upwork. it just super resonates basically.
11:07 >> Um I was just like again surprised or even shocked when I arrived at like the
11:11 level of like individual like drive and like autonomy that everyone here has. So
11:18 I think like the way that OpenAI runs like many you can't like read this or be
11:22 on listen to a podcast and be like I am I'm just going to deploy this to my
11:26 company. Um you know maybe this is a harsh thing to say but I think like yeah
11:28 very few companies have the talent caliber to be able to do that. So it
11:33 might need to be like adjusted if you were going to implement this.
11:36 >> Okay. So let's talk codeex. You lead work on codeex. How's codeex going? What
11:40 numbers can you share? Is there anything you can share there? Also just not
11:43 everyone knows exactly what codeex is. Explain what codeex is. Totally. Yeah.
11:48 So uh I have the very lucky job of of living in the future and leading
11:53 products on codeex. Um and codeex is open coding agent. So super concretely
11:59 that means it's an IDE extension VS code extension uh that you can install or a
12:02 terminal tool that you can install and when you do so you can then basically
12:06 pair with codeex to answer questions about code write code uh you know run
12:12 tests execute code and do a bunch of the work in sort of that like thick middle
12:15 section of the software development life cycle which is all about uh you know
12:19 writing code that you're going to get into production. Uh more broadly we
12:25 think of codeex as like it's the what it currently is is just the beginning of a
12:29 software engineering teammate. And so you know when we when you when we use a
12:32 big word like teammate like some of the things we're imagining are that it's not
12:36 only able to to write code but actually it participates like early on in like
12:40 the ideation and planning phases of writing software and then further
12:43 downstream in terms of like validation deploying and like maintaining code. to
12:48 make that a little more fun. Like one thing I like to imagine is like if you
12:51 think of what Codex is today, it's a bit like this like really smart intern that
12:55 like refuses to read Slack and like doesn't check data dog or like Sentry
12:59 unless you ask it to. And so like no matter how smart it is, like how much
13:02 are you going to trust it to write code without you also working with it, right?
13:05 So that's how people use it mostly today is they pair with it.
13:08 >> But we want to get to the point where you know it can work like just like a
13:12 new intern that you hire, you don't only ask them to write code, but you ask them
13:15 to participate across the cycle. And so you know that like even if they don't
13:17 get something right the first try, they're eventually going to be able to
13:20 iterate their way there. >> I thought the way uh I thought the point
13:23 about not reading Slack in Dave Dog was it's just not distracted. It's just
13:26 constantly focused and is always in flow. But I get what you're saying there
13:30 is it doesn't have all the context on everything that's going on.
13:33 >> And like that's not only true when it's performing a task, but again if you
13:36 think of like the best human teammates, like you don't tell them what to do,
13:39 >> right? Like maybe when you first hire them, you have like a couple meetings
13:42 and you're like, "Hey, like you kind of learn like, okay, this is this these
13:45 prompts work for this teammate. These prompts don't, right? This is how to
13:48 communicate with this person." Then eventually you give them some starter
13:50 tasks. You delegate a few tasks. But then eventually you just say like, "Hey,
13:53 great. Okay, you're working with this set of people in this area of the
13:57 codebase. You know, feel free to work with other people in other parts of the
14:00 codebase too even." And yeah, you tell me what you think makes sense to be
14:03 done, right? And so, you know, we think of this as like proactivity and like one
14:06 of our major goals with Codeex is to like get to proactivity.
14:12 I think this is this is like critically important to like achieve the mission of
14:15 OpenAI which is to deliver the benefits of AI to all humanity. You know, I like
14:19 to joke today that like AI products and it's it's a half joke. They're actually
14:23 like really hard to use because you have to like be very thoughtful about when it
14:29 could help you. And if you're not prompting a model to help you, it's
14:33 probably not helping you at that time. And if you think of how many times like
14:36 the average user is prompting AI today, it's probably like tens of times. But if
14:40 you think of how many times people could actually get benefit from a really
14:44 intelligent entity, it's thousands of times per day. And so a large a large
14:48 part of our our goal with codeex is to figure out like what is the shape of an
14:52 actual teammate agent that is sort of helpful by default. When people think
14:57 about cursor and uh even cloud code, it it's like a IDE that helps you code and
15:01 kind of autocompletes code and maybe does some agentic work. What I'm hearing
15:05 here is the vision is is different which is it's a teammate. It's like a remote
15:09 teammate, a building code for you that you talk to and ask to do things and it
15:14 also does IDE autocomplete and things like that. Is that is that a kind of a
15:17 differentiator in the way you think about codecs? It's basically this idea
15:22 that like we want the way like if you're a developer and you're trying to get
15:25 something done, we want you to just feel like you have superpowers and you're
15:29 able to move much much faster. But we don't think that in order for you to
15:33 reap those benefits, you need to be sitting there constantly thinking about
15:37 like how can I invoke AI at this point to do this thing. We want you to be able
15:40 to sort of like plug it in to the way that you work and have it just start to
15:43 do stuff without you having to think about it. >> Okay. I have a lot of questions along
15:46 those lines, but uh just how's it going? Is there any stats, any numbers you can
15:49 share about how Codex is doing? >> Yeah, it's been Codex has been growing
15:53 like absolutely explosively um since the launch of GPT5 back in August. Um
15:57 there's some definitely some interesting like product insights to talk about as
16:00 to like how we unlock that growth if you're interested. But yeah, the last
16:03 the last stat we shared there was like we we were like well over 10x since
16:08 August. In fact, it's been like 20x since then. Um, also the codex models
16:12 are serving many many trillions of tokens a week now and it's basically
16:17 like our most served coding model. Um, one of the really cool things that we've
16:20 seen is that the way that we decided to set up the codeex team uh was to build a
16:25 you know really tightly integrated product and research team that are
16:28 iterating on the model and the harness together. And it turns out that lets you
16:32 just do a lot more and try many more experiments as to how these things will
16:36 work together. And so we were just training these models for use in our
16:40 first party harness that we were very opinionated about. And then what we've
16:44 started to see more recently actually is that other major sort of API coding
16:48 customers are now starting to adopt these models as well. And so we've
16:51 reached a point where actually the codeex model is the most served coding
16:55 model in the API as well. >> You uh hinted at this uh what unlocked
17:00 this growth? I am extremely interested in hearing that. It felt like before, I
17:04 don't know, maybe this was before you joined the team. It just felt like cloud
17:07 code was killing it. Just everyone was sitting on top of cloud code. It was by
17:11 far the best way to code. And then all of a sudden, Codex comes around. I
17:16 remember Carpathy tweeted that he just like has never seen a model like this.
17:20 He I think the tweet was the gnarliest bugs that he runs into that he just
17:23 spends hours trying to figure out. Nothing else has solved. He gives it to
17:27 Codeex, lets it run for an hour, and it solves it. What What did you guys do? We
17:32 have this strong sort of mission here at OpenAI to you know basically to build
17:38 AGI. Um and so we we think a lot about what how can we shape the product so
17:43 that it can scale right you know earlier I was mentioning like hey like if you're
17:45 an engineer you should be getting help from from AI like thousands of times per
17:50 day right and so we thought a lot about the primitives for that when we launched
17:54 our first version of codeex uh which was Codex cloud and that was basically a
17:58 product that had its own computer lives in the cloud you could delegate to it
18:02 and you know the sort of the coolest part about that was you could run many
18:05 many tasks in parallel But some of the challenges that we saw
18:11 are that it's a little bit harder to set that up both in terms of like
18:14 environment configuration like giving the model the tools it needs to validate
18:18 changes and to learn how to prompt in that way. And sort of my my analogy for
18:22 this is going back to this teammate analogy. It's like if you hired a
18:26 teammate but you're never allowed to get on a call with them and you can only go
18:30 back and forth, you know, asynchronously over time. like that works for some
18:33 teammates and eventually that's actually how you want to spend most of your time.
18:36 So that's still the future, but it's hard to initially adopt.
18:40 So we still have that vision of like that's what we're trying to get you to a
18:43 teammate that you delegate to and then is proactive and we're seeing that
18:48 growing. But the key unlock is actually first you need to land with users in a
18:51 way that's like much more intuitive and like trivial to get value from. So the
18:56 way that most people discover like the vast majority of users discover codeex
19:00 today is either they download an IDE extension or they run it in their CLI
19:05 and the agent works there with you on your computer interactively and uh it
19:09 works within a sandbox which is actually like a really cool piece of tech to to
19:13 help that be safe and secure but it has access to all those dependencies. So if
19:17 the agent needs to do something like it needs to run a command it can do so
19:20 within the sandbox. we don't have to set up any environment and if it's a command
19:23 that doesn't work in the sandbox it can just ask you and so you can get into
19:27 this like really strong feedback loop using the model and then over time like
19:31 our team's job is to like help turn that feedback loop into you sort of as a
19:35 byproduct of using the product configuring it so that you can then be
19:39 delegating to it down the line and again analog you keep coming back to it but
19:43 like if you hire a teammate and you ask them to do work but they you just give
19:46 them like a fresh computer from the store it's going to be hard for them to
19:49 do their job right but if as you work with them side by side. You could be
19:52 like, "Oh, you don't have a password for this service we use. Here's the password
19:56 for this service." You know, yeah, don't worry. Feel free to run this command.
19:59 Then it's like much easier for them to then go off and do work for hours
20:03 without you. So, what I'm hearing is the initial version of Codeex was almost too
20:06 far in the future. It's like a remote in the cloud uh agent that's coding for you
20:11 asynchronously. And what you did is okay, let's actually come back a little
20:15 bit. Let's integrate into the way engineers already integrate into IDs and
20:20 locally and help them kind of on ramp to this new world. Totally. And this was it
20:26 was quite interesting because we we dog food product a ton at OpenAI. So you
20:30 know dog food as in we use our own product and so Codex has been
20:34 accelerating OpenAI over the course of the entire year and the cloud product
20:38 was a massive accelerant to the company as well. Um it just turns out that this
20:44 is one of those places where the signal we got from dog fooding is a little bit
20:47 different from the signal you get from like the general market because at
20:50 OpenAI you know we train reasoning models all day and so we're very used to
20:54 this kind of prompting thing and like you know think up front run things
20:59 massively in parallel and uh you know it would take some time and then come back
21:03 to it later asynchronously and so you know now when we build we still get a a
21:06 ton of signal from dog footing internally but uh you know we're also
21:11 very cognizant of like the different ways that different audiences use the
21:14 product. That's really funny. It's like live in the future but maybe not too far
21:17 in the future. And I could see how everyone open AI is living very far in
21:21 the future and sometimes that won't that won't work for everyone.
21:25 >> Yeah. What about just like uh intelligence training data? I don't
21:28 know. Is there something else that helped Codeex accelerate its ability to
21:32 actually code? Is it like better, cleaner data? Is it more just models
21:36 advancing? Is there anything else that really helped accelerate? Yeah. So
21:41 there's like a few components here. Um I guess you know you were mentioning
21:44 models and the models have improved a ton. In fact um just last Wednesday we
21:50 shipped GPD 5.11 CEX Max a very you know accurately named model. Uh that is that
21:56 is awesome. It is awesome both because it is um for any given task that you
22:01 were using GPD 5.1 codecs for it's like you know roughly uh 30% faster at
22:06 accomplishing that task but also it unlocks a ton of intelligence. So if you
22:10 use it at our higher reasoning levels, it's just like even smarter. Um, and you
22:13 know that that feedback that or that tweet you were saying like Karpathi made
22:16 about like, hey, give us your gnarliest bugs like you know obviously there's a
22:20 ton going on in the market right now, but like Codex Max is definitely like
22:24 carrying that mantle of uh, you know, tackling the hardest bugs. Um, so that
22:28 is that is super cool. But I will say it's like some of what how we're
22:32 thinking about this is evolving a little bit from being like yeah we're just
22:35 going to think about the model and like let's just like train the best model to
22:38 really thinking about like what is an agent actually overall right and you
22:43 know I'm not going to try to define agent exactly but at least the stack
22:46 that we think of it as having is it's like you have this model really smart
22:51 reasoning model that knows how to do a specific kind of task really well. So we
22:53 can talk about how we make that possible. But then actually we need to
22:59 serve that model through an API into a harness. And both of those things also
23:03 have a really big role here. So for instance, one of the things uh that
23:07 we're really proud of is you can have GP5.1 CX max work for really long
23:11 periods of time. That's not like normal, but you can set it up to do that or that
23:15 might happen. But now routinely we'll hear about people saying like yeah, it
23:18 ran like overnight or it ran for 24 hours. M >> and so you know for a model to work
23:22 continuously for that amount of time it's going to exceed its context window
23:25 and so we have a solution for that which we call compaction. Um but compaction is
23:30 actually a feature that uses like all three layers of that stack. So you need
23:36 to have a model that has a concept of compaction and knows like okay as I
23:39 start to approach this context window I might be asked to like prepare to be run
23:43 in a new context window. And then at the API layer, you need an API that like
23:47 understands this concept and like has an endpoint that you can hit to do this
23:50 change. And at the harness layer, you need a harness that can like prepare the
23:53 payload for this to be done. And so like shipping this compaction feature that
23:56 now just like made this behavior possible to like anyone using codecs
23:59 actually been working across all three things. And I think that's like
24:03 increasingly going to be true. Another maybe like underappreciated version of
24:08 this is is if you think about all the different coding products out there,
24:10 they all have like very different tool harnesses with like very different
24:14 opinions on how the model should work. And so if you want to train a model to
24:17 be good at like all the different ways uh it could work. Like you know maybe
24:20 you have a strong opinion that it should work using semantic search, right? Maybe
24:24 you have a strong opinion that it should like call bespoke tools or maybe you
24:27 have like in our case a strong opinion that it should just use like the shell
24:32 work in the terminal. You know, you can be much you can move much faster if
24:34 you're just optimizing for one of those worlds, right? And so the way that we
24:38 built codeex is that it just uses the shell. But in order to make that like
24:43 safer and secure, we uh have a sandbox that the model is used to operating in.
24:46 So I think one of the biggest accelerants to go all the way back to
24:49 your to your answer question Russian is just like we're building all three
24:52 things in parallel and like kind of tuning each one and um you know
24:56 constantly experimenting with how those things work with like a tightly
24:59 integrated product and research team. How do you think you win in this space?
25:04 Do you think it it'll event it'll always be this kind of like race with other
25:08 models constantly kind of leaprogging each other? Do you think there's a world
25:11 where someone just t runs away with it and no one else can ever catch up? Is
25:15 there like a path to just we win? >> Again comes back to this idea of like
25:19 building a teammate and not just a teammate that you know uh participates
25:24 in team planning and prioritization. Not just a teammate that you know really
25:27 tests its code and like helps you maintain and deploy. But even a teammate
25:31 you know like if you think again an engineering teammate they can also like
25:34 schedule a calendar invite right or move standup or do whatever right. And so in
25:42 my mind, if we just imagine that every day or every week some like crazy new
25:46 capability is just going to be deployed by a research lab, it's just impossible
25:50 for us like you know as humans to keep up and like use all this technology. And
25:54 so I think we need to get to this world where you kind of just have like an AI
25:59 teammate or super assistant that you just talk to and it just knows how to be
26:04 helpful like on its own, right? And so you don't you don't have to be like
26:07 reading the latest tips for how to use it. You just like you've plugged it in
26:11 and it just provides help. And so that's kind of the shape of what I think we're
26:14 building. And I think that will be like a very sticky like winning product if we
26:18 can do so. So the shape that in my head at least I have is that we build you
26:23 know maybe a fun topic is like is chat the right interface for AI? I actually
26:27 think chat is a very good interface when you don't know what you're supposed to
26:30 use it for. uh in the same way that if I think of like I'm like on a teams or in
26:34 Slack with a teammate, chat is pretty good. I can ask for whatever I want,
26:37 right? It's like it's kind of the the common denominator for everything. So
26:40 you can chat with a super assistant about whatever topic you want, whether
26:45 it be coding or not. And then if you are like a functional expert in a specific
26:49 domain such as coding, there's like a guey that you can pull up to go really
26:54 deep and like look at the code and like work with the code. So I think like what
26:59 we need to build as open AI is basically this idea of like you have chat chatpt
27:02 PT and that is a tool that's like ubiquitously available to like everyone.
27:06 You start using it even like outside of work right to just help you. You become
27:09 very comfortable with the idea of being accelerated with AI. And so then you get
27:13 to work and you just can naturally just yeah I'm just going to ask it for this
27:16 and I don't need to know about all the connectors or like all the different
27:19 features. I'm just going to ask it for help and it'll surface to me the the
27:23 best way that it can help at this point in time and maybe even chime in when I
27:27 didn't ask it for help. Um, so in my mind, if we can get to that, I think
27:30 that's, you know, that's how we we really build like the winning product.
27:34 This is so interesting because with the my chat with Nick Charlie, the head of
27:37 chat JPT, I think he shared that the original name for Chat JPT was super
27:41 assistant or something like that. >> Yeah. >> And it's interesting that there's like
27:46 that approach to the super assistant and then there's this codeex approach. It's
27:49 almost like the B TOC version and the B2B version. And what I'm hearing is the
27:53 idea here is okay, you start with coding and building and then it's doing all
27:56 this other stuff for you, scheduling meetings, I don't know, probably posting
28:01 in Slack, uh I don't know, shipping designs, I don't know. Is that is the
28:04 idea there? This is like the the business version of ChatGpt in a sense.
28:08 Or is there or is there something else there? >> Yeah. So, you know, so we're getting to
28:12 the like the like one-year time horizon conversation. A lot of this might happen
28:16 sooner, but in terms of fuzziness, I think we're at the one year. So I'll
28:19 give you like a contention in like the plausible way we get there, but as for
28:23 how it happens, who knows? So basically, if we're going to build a super
28:26 assistant, it has to be able to do things, right? So like we're going to
28:29 have a model and it's going to be able to do stuff affecting your world.
28:33 >> And one of the learnings I think we've seen over the past year or so is that
28:38 for models to do stuff, they're much more effective when they can use a
28:41 computer, right? Okay. So now we're like, okay, we need the super assistant that can use a
28:47 computer, right? or many computers. And now the question is, okay, well, how
28:50 should it use the computer, right? And there's lots of ways to use a computer.
28:54 Uh, you know, you could try to hack the OS and like use accessibility APIs.
28:57 Maybe a bit easier is you could point and click. That's a little slow, you
29:02 know, and, uh, unpredictable sometimes. Um, and another way, it turns out the
29:06 best way for models to use computers is simply to write code, right? And so
29:09 we're kind of getting to this idea where like, well, if you want to build any
29:12 agent, maybe you should be building a coding agent. And maybe to the user, a
29:17 nontechnical user, they won't even know they're using a coding agent. The same
29:19 way that no one thinks about are they using the internet or not, which is
29:22 they're more just like is Wi-Fi on? Right? So I think that what we're doing
29:27 with codeex is we're building a software engineering teammate. And as part of
29:30 that, we're kind of building an agent that can use uh a computer by writing
29:36 code. And so we're already seeing like some pull for this. It's like quite
29:39 early, but we're starting to see people like who are using codeex for like
29:43 coding adjacent product purposes. And so as that develops, I think we'll
29:47 just naturally see that like, oh, it turns out like we should just always
29:50 have the agent write code if there is a coding way to solve a problem instead
29:53 of, you know, even if you're doing a financial analysis, right? Like maybe
29:56 write some code for that. So basically like, you know, you were like, hey, is
29:59 this like the two ends of of uh of this product for the super assistant, right,
30:03 of CHCH PT? In my mind, like just coding is a core competency of any agent,
30:06 including Chach PT. And so like what really what we think we're building is
30:10 like that competency. But so here's here's like the really cool thing about
30:13 agents writing code is that you can import code right code is like
30:19 composable interoperable right because if if we you know one very reductive
30:23 view we could have for an agent is it's just going to be given a computer and
30:26 it's just going to like point and click and you know go around but you know that
30:32 is the future and then how we get there is difficult to sort of chart a path
30:36 because a lot of the questions around building agents aren't like can the
30:41 agent do it but it's more about well how can we help the agent understand the
30:44 context that it's working in and like the team that's using it you know
30:47 probably has a way that they like to do things they have guidelines they
30:50 probably want certain deterministic guarantees about what the agent can or
30:54 cannot do or they want to know that the agent understands sort of this detail
30:59 like an example would be you know if we're looking at a crash reporting tool
31:04 hitting a connector for it every sub team is probably has a different meta
31:07 prompt for like how they want the crashes to be analyzed ized, right? And
31:12 so we start to get to this thing where like, yeah, we have this agent sitting
31:15 in front of a computer, but we need to make that configurable for the team or
31:19 for the user, right? And let them like stuff that the agent does often, we
31:22 probably just want to like build in as a competency that this agent has that it
31:27 can do. So I think we end up with this generalizable thing that you were saying
31:31 of like an agent that can just write its own scripts for whatever it wants to do.
31:36 But I think that the the really key part here is can we make it so that
31:40 everything that the agent has to do often or that it does well we can just
31:44 like remember and store so that the agent doesn't have to write a script for
31:47 that again. Right. Or maybe like if I just joined a team and you are already
31:51 on the same team as me. I can just like use all those scripts that the agents
31:53 had written already. >> Yeah. It's like if this is our teammate
31:57 uh we can they can share things that it's learned from working with other
32:00 people at the company. Just makes sense as a metaphor. >> Yeah. It feels like you're in the uh
32:05 Karpathy camp of agents today are not that great and mostly slop and maybe in
32:09 the future they'll be awesome. Does that resonate? >> I think so. I think coding agents are
32:14 pretty great. I think >> uh ton of value, >> right? Yep.
32:19 >> And then I think like agents outside of coding, it's still like very early and
32:23 you know, this is just my opinion, but I think they're going to get a whole lot
32:26 better once they can use coding too and like in a composable way.
32:29 This is it's kind of the fun part of like when you're building for software
32:33 engineers. Like I at my startup we were building for software engineers too for
32:36 a lot of that journey and they're just such a fun audience to build for because
32:41 you know they also like building for themselves and are often like even more
32:45 creative than we are and thinking about how to use the technology. Um and so
32:48 like by building for software engineers you get to just observe a ton of
32:52 emergent behaviors and like things that you should do and build into the
32:55 product. I love how you you say that because a lot of people building for
32:57 engineers get really annoyed because the engineers are so they're just always
33:00 complaining about stuff. They're like, "Ah, that sucks. Why'd you build it this
33:04 way?" I love that you enjoy it, but I think it's probably because you're
33:06 building such an amazing tool for engineers that can actually solve
33:11 problems and just, you know, code for them. Um, kind of along those lines, you
33:15 know, there's always this talk of what will happen with jobs, engineers,
33:18 coding, do you have to learn coding, all these things? Uh clearly the way you're
33:21 describing it is it's a teammate. It's going to work with you, make you more
33:24 superhuman. It's not going to replace you with the way you just think about
33:28 the impact on the field of engineering having this super intelligent
33:33 engineering teammate. I think there's there's two sides to it, but the one we
33:37 were just talking about is this idea that maybe every agent should actually
33:43 use code and be a coding agent. And in my mind, that's just like a small part
33:46 of this like broader idea that like, hey, as we make code even more
33:48 ubiquitous, I mean, you could probably claim it's ubiquitous today, even pre
33:51 AAI, right? But as we make code even more ubiquitous, it's actually just
33:56 going to be used for many more purposes. And so there's just going to be a ton
33:59 more need for people with this like humans with this competency. So that's
34:05 my view. I think this is like quite a complex topic. So, you know, it's
34:08 something we talk about a lot and we have to kind of see how it pans out. But
34:12 I think what we can do what we can do basically as a product team building in
34:15 the space is just try to always think about how are we building a tool so that
34:18 it feels like we're like maximally accelerating uh people you know rather
34:24 than building a tool that makes it like more unclear what you should do as the
34:29 human right like I think like to to you know give an example right now like
34:33 nowadays when you work with a coding agent um it writes a ton of code but it
34:36 turns out writing code is actually one of the most fun parts of software
34:40 engineering for many software engineers. is so then you end up reviewing AI code,
34:45 right? And that's often a less fun part of the job for many software engineers,
34:49 right? And so I actually think like we see that like this this comes out plays
34:53 out all the time in like a ton of micro decisions. And so we as a product team
34:55 are always thinking about like okay, how do we make this more fun? How do we make
34:58 you feel more empowered whereas it's not working and I I would argue that like
35:01 reviewing agent written code is like a place that today is like less fun. And
35:06 so you know then I think okay what can we do about that? Well, we can ship a
35:09 code review feature that like helps you build confidence in the Irw written
35:12 code. Okay, cool. You know, another thing we can do is we can make it so
35:14 that the agent's like better able to validate its work. And you know, it gets
35:18 all the way down into like micro decisions like if you're going to have
35:23 the an agent capability to validate work and let's say you have like I'm thinking
35:27 of Codex web right now like you have a a pane that sort of reflects the work the
35:30 agent did. What do you see first? Do you see the diff or do you see the image
35:34 preview of the code it wrote? Right? And you know, I think if you're thinking
35:36 about this from perspective like how do I empower the human? How do I make them
35:40 feel like as as accelerated as possible like you obviously see the image first,
35:43 right? You shouldn't be reviewing the code unless first you know you've seen
35:46 the image unless maybe it's being like reviewed by an AI and now it's time for
35:49 you to take a look. When I had uh Michael Charel, the CEO of Cursor on the
35:53 podcast, he he had this kind of vision of us moving to something beyond code.
35:58 And I've seen this rise of something called specd driven development where
36:02 you kind of just write the spec and then the code, you know, the AI writes code
36:05 for you. And so you kind of start working at this higher abstraction
36:09 level. Is that something you see where we're going? Just like engineers not
36:12 having to actually write code or look at code and there's going to be this higher
36:16 level of abstraction that we focus on. Yeah, I mean I think I think there's
36:19 like constantly these levels of abstraction and they're actually already
36:23 played out today, right? Like today like coding agents mostly it's like prompt to
36:29 patch right we're starting to see people doing like spec driven development or
36:32 like planned driven development that's actually one of the ways when people ask
36:35 like hey how do you run codex on a really long task well it's like often
36:38 collaborate with it first to write like a plan MD like a markdown file that's
36:42 your plan and once you're happy with that then you ask it to go off and do
36:46 work and if that plan has verifiable steps it'll like work for much longer.
36:51 Um so we're totally seeing that. I think spec driven development is like an
36:55 interesting idea. It's not clear to me that it'll work out that way because a
36:57 lot of people don't write like don't like writing specs either, but it seems
37:02 plausible that some some people will work that way. You know, like a a bit of
37:06 a joke idea though is like if you think of like um the way that many teams work
37:11 today, they're they often like don't necessarily have specs, but the team is
37:14 just really self-driven and so stuff just gets done. And so almost that is
37:17 like I'm coming up with this on the spot so it's you know not a good name but
37:21 like chatterdriven development where it's just like stuff is happening you
37:24 know on social media and like in your team communications tools and then as a
37:28 result like code gets written and deployed right so yeah I think I'm a
37:33 little bit more oriented in that way of you know I don't even necessarily want
37:37 to have to write a spec like sometimes I want to only if I like writing specs
37:42 right uh other times I might just want to say like hey here's like the
37:45 customer, you know, service channel and like tell me what's interesting to know,
37:49 but if it's a small bug, just fix it. I don't want to have to write a spec for
37:51 that, right? >> I have this sort of uh hypothetical future uh that I like to
37:58 share sometimes with people as a provocation, which is like in a world
38:01 where we have like truly amazing agents, like what does it look like to be a
38:04 soloreneur? Um, and uh, you know, one terrible idea for how it could look is that it's
38:12 actually there's a mobile app and um, every idea that the agent has to do is
38:17 just like vertical video on your phone and then you can like swipe left if you
38:21 think it's a bad idea and you can like swipe right if it's a good idea and like
38:24 you can press and hold and like speak to your phone if you want to get feedback
38:28 on the idea before you swipe, you know. So in this world like basically what
38:31 your job is just to like plug in this app into like every single like signal
38:36 system you know system of record and then you just sort of sit back and like
38:39 swipe. I don't know. >> I love this. So this is like Tinder
38:42 meets Tik Tok meets codeex. >> It's pretty terrible. >> No, this is great. So the idea here is
38:47 this thing is this agent is watching and right listening to you paying attention
38:51 to the market your users and it's like cool here's something I should do. It's
38:54 like a proactive engineer just like here we should build this feature fix this
38:56 thing. >> Exactly. I think they're communicating with you in like the lowest like the
39:05 gyms like the modern way to communicate. >> Yeah. >> Swipe left or right and in vertical feed
39:10 and then the Sora video. Okay. So I see how this all connects now. I see.
39:13 >> Yeah. To be clear, we're not building that but like you know it's a fun idea.
39:17 I mean you see you know like in this example though like one of the things
39:19 that it's doing is it's consuming external signals right. I think the
39:23 other really interesting thing is like if we think about like what is the most
39:28 successful like AI product to date um I would argue um it's funny actually
39:34 not to confuse things at all but like the first time we used the the brand
39:38 codeex at OpenAI was actually the model powering GitHub copilot. This is like
39:42 way back in the day, years ago. And so we decided to reuse that that brand
39:45 recently um because it's just so good, you know, codeex code execution. But I
39:50 think actually like autocomp completion and IDEs is like one of the most
39:54 successful AI products to date. And part of what's so magical about it is that
40:01 when the it can surface like ideas for helping you really rapidly. When it's
40:05 right, you're accelerated. When it's wrong, it's not like that annoying. It
40:08 can be annoying, but it's not that annoying, right? And so you can create
40:12 this like mixed initiative system that's like contextually responding to like
40:17 what you're attempting to do. And so in my mind, this is like a really
40:21 interesting thing for us as open as we're building. So for instance, you
40:25 know, when I think about launching a browser, which we did with Atlas, right?
40:29 Like in my mind, one of the really interesting things we can then do is we
40:33 can then like contextually surface like ways that we can help you as you're
40:37 going about your day, right? And so we break out of this like, you know, we're
40:41 just looking at code or we're just in your terminal um into this idea that
40:44 like, hey, like a real teammate is dealing with a lot more than just code,
40:47 right? They're dealing with a lot of things that are web content. So like,
40:51 you know, how can we help you with that? >> Man, there's so much there and I love
40:55 this. Okay, so autocomplete on web with the browser. That's so interesting. just
40:58 like here's all the things that we can help you with as you're browsing and
41:01 going about your day. I want to talk about Atlas. I'll come back to that. Uh
41:05 codeex code execution. Did not know that. That's really clever. I I get it
41:10 now. Okay. And then this chatter, what is a chatter driven development? Uh I
41:14 had a No, this is a really good idea, but it reminds me I had John Gon on the
41:19 podcast, CTO of Block, and they they have this product called Goose, which is
41:24 their own internal agent thing. And he talked about an engineer at block just
41:30 uh has goose watch him with like his screen and listens to every meeting and
41:36 proactively does work that he should will probably want to do. So ships a PR
41:41 sends an email drafts a Slack message. So he's doing exactly what you're
41:44 describing in in kind of a very early way. >> Yeah, that's super interesting. And you
41:49 know, I bet you the So, if we go if we went and asked them what the bottleneck
41:52 to that productivity is, did did they share what it is? >> Uh, probably looking at it just making
41:57 sure this is the right the right thing to do. Yeah. >> Yeah. So, like we see this now like we
42:01 have a Slack integration for Codex. People love, you know, if there's like
42:04 some thing that you need to do quickly. People just like at mentioned Codex like
42:07 why do you think this bug is happening? Right. Doesn't have to be an engineer.
42:10 Even like maybe you know data scientists often here are using Codex a ton to just
42:14 like answer questions like why do you think this metric moved? What happened?
42:18 So questions you you get the answer right back in Slack. It's amazing, super
42:22 useful. But when it's as for when it's writing code, then you have to go back
42:27 and look at the code, right? And so the real like I think bottleneck right now
42:30 is like validating that the code worked and like writing code review.
42:34 So in my mind, if we wanted to get to something like uh you know that uh a
42:38 friend you were talking about world, I think we we really need to figure out
42:42 how to get people to configure their coding agents to be much more autonomous
42:46 on those later stages of the work. It makes sense like you said writing code.
42:49 I used to be an engineer as an engineer for 10 years. Really fun to write code.
42:53 Really fun to just get in the flow, build, architect, test. Not so fun to
42:56 look at everyone else's code and just have to go through and be on the hook if
43:00 it is doing something dumb that's going to take down production. And now that
43:03 building has become easier, what I've always heard from companies that are
43:06 really at the cutting edge of this is the bottleneck is now like figuring out
43:09 what to build and then it's at the end of like, okay, we have all this all 100
43:13 hours to review. Who's going to go through all that? >> Right. Yeah.
43:19 This episode is brought to you by Jira product discovery. The hardest part of
43:22 building products isn't actually building products. It's everything else.
43:26 It's proving that the work matters, managing stakeholders, trying to plan
43:30 ahead. Most teams spend more time reacting than learning, chasing updates,
43:34 justifying road maps, and constantly unblocking work to keep things moving.
43:39 Jira product discovery puts you back in control. With Jira product discovery,
43:43 you can capture insights and prioritize high impact ideas. It's flexible, so it
43:47 adapts to the way your team works and helps you build a road map that drives
43:51 alignment, not questions. And because it's built on Jira, you can track ideas
43:56 from strategy to delivery, all in one place. Less chasing, more time to think,
44:01 learn, and build the right thing. Get Jirroduct Discovery for free at
44:06 atlassian.com/lenny. That's atassian.com/lenny. What has the impact of Codex been on the
44:13 way you operate as a product person, as a PM? It's clear how engineering is
44:19 impacted. Uh, code is written for you. What has it done to the way you operate,
44:24 the way PMs operate at at OpenAI? Yeah, I mean I think mostly I just feel like
44:28 much more empowered. Um I've always been sort of more technical leaning PM and especially when
44:34 I'm working on products for engineers, I feel like it's necessary to like you
44:37 know dog food the product but even beyond that I I I just feel like I can
44:42 do much much more as a PM. And uh you know Scott Beltski talks about this idea
44:45 of like compressing the talent stack. I'm not sure if I've phrased that right,
44:48 but it's basically this idea that like maybe the boundaries between these roles
44:52 are a little bit like less needed than before because people can just do much
44:57 more and every time you someone can do more you can like skip one communication
45:00 boundary and make the team like that much more efficient, right? So I think I
45:07 think we see it you know in a bunch of functions now but I guess since you
45:11 asked about like product specifically uh you know now like answering questions
45:15 much much easier you can know just ask codeex for thoughts on that uh a lot of
45:20 like PM type work understanding what's changing again just ask codeex for help
45:25 with that um prototyping is often faster than writing specs this is something
45:29 that a lot of people have talked about I think something that I don't think it's
45:33 super surprising But something that's slightly surprising is like we see like
45:36 we're mostly building codecs for to write code that's going to be deployed
45:40 to production but actually we see a lot of throwaway code written with codeex
45:43 now. It's kind of going back to this idea of like you know ubiquitous code.
45:48 So you'll see uh you know someone wants to do an analysis like if I want to
45:51 understand something it's like okay just give codeex a bunch of data but then ask
45:54 it to build like an interactive like data viewer for this data right you
45:56 would that's just like too annoying to do in the past but now it's just like
46:00 totally worth the time of just getting an agent to go do something. Um,
46:04 similarly, I've seen like some pretty cool prototypes on our design team about
46:09 like if you want to well like a designer basically wanted to build an animation
46:13 and this is the coin animation in codeex and it was like normally it'd be too
46:17 annoying to program this animation. So they just vibe coded a animation editor
46:21 and then they use the animation editor to build the animation which they then
46:25 checked into the repo. Actually, our designers are there's a ton of
46:28 acceleration there. And like speaking of compressing the town stack, I think our
46:31 designers are very PM. So, you know, they they do ton of product work. And like they actually
46:38 have like an entire like vibecoded sort of side prototype of the Codex app. And
46:41 so, a lot of how we talk about things is like we'll have like a really quick jam
46:44 because there's like 10,000 things going on. And then designer will like go think
46:48 about how this should work, but instead of like talking about it again, they'll
46:50 just like vibe code a prototype of that in their like standalone prototype.
46:54 We'll play with it. If we like it, they'll vibe code that prototype into or
46:59 vibe engineer that prototype into an actual PR to land. And then depending on
47:02 their comfort with the codebase, like codeex CLI and Rust is a little harder.
47:06 Maybe they'll like land it themselves or they'll like get close and then an
47:09 engineer can help them like land the PR. Um, you know, we recently shipped the
47:15 Sora Android app. Um and uh that was one of the most sort of mind-blowing
47:19 examples of acceleration actually because usage of of codeex internally at
47:24 open is obviously really really high but it's been growing uh over the course of
47:28 the year both in terms of like now it's basically like all technical staff use
47:32 it uh but even like the intensity and knowhow of how to make the most of
47:35 coding agents has gone up by a ton and so the Sora Android app right like a
47:42 fully new app we built it in 18 days it went from like zero to launch to
47:46 employees and then 10 days later so 28 days total we went to just like GA to
47:51 the public and that was done just like with the help of Codex
47:56 so pretty insane velocity I would say it was like a little bit I don't want to
48:01 say easy mode but there is one thing that Codex is really good at if you're a
48:04 company that's like building software on multiple platforms so you've already
48:07 figured out like some of the underlying like APIs or systems asking codeex to
48:13 like to port things over is really effective because it has like something
48:15 you can go look at. And so the engineers on that team uh were basically having
48:20 codeex go look at the iOS app, produce plans of work that needed to be done and
48:23 then go implement those. And it was kind of looking at iOS and Android at the
48:27 same time. And so you know basically it was like two weeks to launch to
48:30 employees four weeks total. Insanely fast. >> What makes that even more insane is it
48:35 was the it became the number one app in the app store. >> I don't this just boggles the mind.
48:39 Okay. So >> yeah. So imagine releasing number one app on the app store with like a handful
48:45 of engineers >> uh I think it was like >> two or three possibly
48:53 >> uh in a handful of weeks. Yeah, this is absurd. So >> yeah, so that's a really fun um example
49:01 of uh acceleration. And then like Atlas was the other one that I think um Ben
49:06 did a podcast the the the engine on Atlas uh sharing a little bit of how we
49:12 built there. You know many Atlas is is actually I mean it's it's a browser
49:15 right and building a browser is really hard. Um and so we uh had to build a lot
49:23 of difficult systems in order to do that and basically we got to the point where
49:27 that team has a ton of power users of codecs right now. And um you know it got
49:32 to the point where they they basically were you know we were talking to them
49:34 about it because a lot of those engineers are people I used to work with
49:38 before at my startup and so they'd say you know before this would have taken us
49:42 like two to three weeks for two to three engineers and now it's like one engineer
49:48 one week. Um so massive acceleration there as well. And what's quite cool is
49:52 that uh you know we we shipped Atlas on on Mac first but now we're working on
49:56 the Windows version. you know that so the team now is like ramping up on
49:58 Windows and they're helping us make codecs better on Windows 2 which is
50:02 admittedly earlier like just the model we we shipped last week is the first
50:06 model that natively understands PowerShell. So you know PowerShell being
50:11 uh the native like shell language on Windows. So yeah, it's been it's been
50:16 really awesome to see like the whole company getting accelerated by codeex
50:21 like from and you know most obviously also research and like improving how
50:24 quickly we train models and how well we do it and then even like uh design as we
50:28 talked about and and marketing like actually we're at this point now where
50:32 uh my product marketer is often also making string changes just directly from
50:36 Slack or like updating docs directly from Slack. >> These are amazing examples. You guys are
50:42 living at the bleeding edge of what is possible and this is how other companies
50:46 are going to work. Uh just shipping again what became the number one app in
50:49 the app store and just beloved all over the it just like took over the I don't
50:54 know the world for at least a week. Uh built you said in 28 days and like I
50:58 don't know 10 days 18 days just to get like the core of it working.
51:02 >> Yeah. So like 18 days we had a thing that employees were playing with and
51:05 then 10 days later we were out. >> And you said just a couple engineers.
51:07 >> Yeah. >> Two or three. Okay. And then Atlas you
51:11 said was took a week to build. >> No, no, no. So Atlas, not the whole
51:16 week, but Atlas was like a really meaty project. >> Yeah.
51:18 >> Um and so I was talking to one of the engineers on Atlas um about like you
51:23 know just how what they use codex for and it's basically like we use codex for
51:25 absolutely everything. I was like okay well like you know how would you how
51:29 would you measure the acceleration? And so basically the the answer I got back
51:31 was >> previously it would have taken two to three weeks for two to three engineers
51:36 and now it's like one engineer one week. Do you think this eventually moves to
51:39 non-engineers doing this sort of thing? Like does it have to be an engineer
51:42 building this thing? Could sort of have built been built by I don't know a PM or
51:46 designer. I think we will very much get to the point where well basically where
51:50 the boundaries are a little bit blurred, right? Like I think you're going to want
51:54 someone who's like understands the details of what they're building, but
51:58 what details those are will evolve. Kind of like how now like if you're writing
52:02 Swift, you don't have to speak assembly. You know, there's a handful of people in
52:05 the world and it's really important that they exist. and like speak assembly. Uh
52:09 maybe more than a handful, right? But that's like a specialized function that
52:14 like most companies don't need to have. So I think we're just going to naturally
52:17 see like an increase in layers of abstraction. And then the cool thing is
52:21 now we're entering like the language layer of abstraction like natural
52:25 language. And then natural language itself is really flexible, right? Like
52:29 you could have engineers talking about like a plan and then you could have
52:32 engineers talking about a spec and then you could have engineers talking about
52:35 just, you know, a product or an idea. So I think we can also like start moving up
52:39 those layers of of abstraction as well. But you know I I do think this is going
52:43 to be gradual. I don't think it's going to go to like all of a sudden like
52:46 nobody ever writes anything and like you know any code and it's just specs. I
52:49 think it's going to be much more like okay we've set up our coding agent to be
52:53 really good at like previewing the build or like at running tests. Maybe that's
52:56 the first part right that most people have set up. And it's like okay now
52:59 we've set it up so that it can like execute the build and it can like see
53:03 the results of its own changes but you know we haven't yet built a good
53:06 integration harness so that it can like in the case of Atlas like by the way I
53:08 don't know if they've done any of this or not I think they've done a lot of
53:11 this but you know maybe the next stage is like enable it to like load a few
53:16 sample pages to see how well those work right so then okay now we're going to
53:19 like set up set up do that and I think for some time at least we're going to
53:22 have humans kind of curating like which of these connectors or systems or
53:26 components that the agent needs to be good at talking to and then you know in
53:30 the future there will be an even greater unlock where Codex tells you how to set
53:34 it up or maybe sets itself up in a repo. What a wild time to be alive. Wow. I'm
53:38 curious just the second order effects of this sort of thing. Just how quickly it
53:42 is to build stuff. What does that do? Does that mean distribution becomes much
53:46 much more important? Does it mean uh ideas are just worth a lot more? It's
53:50 interesting to think about how quick how that changes. >> I'm curious what you think. I still
53:56 don't think ideas are worth as much as maybe some a lot of people think. I
53:59 think still think execution is really hard, right? Like you can build
54:01 something fast, but you still need to execute well on it. Still needs to make
54:06 sense and be a coherent thing overall. Um Yeah. And distribution is massive.
54:10 >> Yeah. Just feels like everything else is now more important. Everything that
54:13 isn't the building piece, which is >> coming up with an idea, getting to
54:17 market, profit, >> all that kind of stuff. I I think we
54:21 might have been in this weird temporary phase where you know for a while like
54:26 you could you could just it was so hard to build product that you mostly just
54:31 had to be really good at building product and it maybe didn't matter if
54:34 you like had an intimate understanding of a specific customer.
54:39 Um, but now I think we're getting to this point where actually like if I
54:42 could only choose like one thing to understand, it would be like really
54:46 meaningful understanding of like the problems that a certain customer has,
54:49 right? If I could only if I could only go in with one like core competency. So
54:54 I think that that's that's ultimately still what's going to matter most,
54:57 right? Like if you're starting a new company today and you have like a really
55:02 good understanding and like network of customers that are currently underserved
55:05 by AI tools, I think you're like you're set, right? Whereas if you're like good
55:09 [clears throat] at building like you know websites, but you don't have any
55:12 specific customer to build for, I think you're in for a much harder time.
55:17 Bullish on vertical AI startups is what I'm hearing. Yeah, I completely agree.
55:20 There's like, you know, there's like the general thing that can solve a lot of
55:23 problems and then there's like we're going to solve presentations incredibly
55:25 well and we're going to understand the presentation problem uh better than
55:30 anyone and we're going to uh plug into your workflows and all these other
55:33 things that matter for a very specific problem. Okay. Incredible. When you
55:39 think about progress on codecs, I imagine you have a bunch of evals and
55:42 there's all these public benchmarks. What's something you look at to tell
55:45 you, okay, we're making really good progress. I imagine it's not going to be
55:48 the one thing, but what do you focus on? What's like something you're trying to
55:51 push? What's like a KPI or two? One of the things that I'm constantly reminding
55:56 myself of is that a tool like Codex sort of naturally is a tool that you would,
56:00 you know, become a power user of, right? And so we can accidentally spend a lot
56:03 of our time thinking about features that are like very deep in the user adoption
56:08 journey. Um, and so we can kind of end up oversolving for that. And so I think
56:12 it's like just critically important to like go look at like your like D7
56:16 retention, right? just go try the product. Like sign up from scratch
56:19 again. Um I have a few too many like catchup pro accounts that I've just like
56:24 in order to maximally correctly dog food like signed up for on my Gmail and they
56:27 charge me like 200 bucks a month. I need to expense those. But uh uh you know
56:33 like I think just like the feeling of being a user and the early retention
56:37 stats are still like super important for us because you know as much as this
56:41 category is is taking off I think we're still in the very early days of like
56:45 people using them. Um, another thing that we do that that might might be I
56:51 think we might be the most like user feedback slashsocial media pill team out
56:56 there in this space is like a few of us are like constantly on Reddit and
57:01 Twitter and uh you know there's a there's praise up there and there's a
57:04 lot of complaints but we take the complaints like very seriously and look
57:08 at them and I think that again because you can use like coding agent for so
57:12 many different things um it often is like kind of broken in any sort of ways
57:17 for like specific behaviors. Um, and so we we actually monitor a lot just like
57:20 what the vibes are on social media pretty often, especially I think for for
57:27 Twitter X, um, it's a little bit more hypy and then Reddit is a little more
57:34 negative but real actually. Um, so I've started increasingly paying attention to
57:37 like how people are talking about using Codex on Reddit. Actually,
57:41 >> this is uh important for people to know. Which the subreddits do you check most?
57:44 Is there like an R codeex or >> I mean the algorithm is pretty good at
57:48 surfacing stuff but like r/codex is is there >> okay I'll take very interesting and then
57:52 uh if people tag you on Twitter you still see that but maybe not as powerful
57:56 as seeing it on Reddit. >> Well yeah the interesting well the thing
57:58 with Twitter is it's a little bit more onetoone even if it's like in public
58:01 whereas like with Reddit there's like really good upvoting mechanics and like
58:05 maybe most people are still not bots unclear. Um so you get you get like good
58:09 signal on what matters and what other people think. So uh interestingly uh
58:13 Atlas I want to talk about that briefly. Uh you guys launched Atlas. I tweeted
58:18 actually that I tried Atlas and then I I don't love the AI only uh search
58:23 experience. I was just like I just want Google sometimes or whatever like just
58:26 waiting for AI to give me an answer. I'm like I don't want to and there was no
58:29 way to switch. I just tweeted hey I'm I'm switching back. I don't it's not
58:32 great. And I feel like I made some PMs at OpenAI sad and I saw someone tweet
58:37 okay we have this now which I imagine was always part of the plan. It's
58:40 probably an example of we just ship we got to ship stuff, see how people use it
58:43 and then we figure it out. Uh so I guess one is that I don't know is there
58:46 anything there and two I'm just curious why are you guys building a web browser?
58:51 So I I worked on Atlas for a bit. Um I don't work on it now. Um but you know
58:55 like the a bit of the narrative here for for me just to tell my story a bit was
58:58 like I was working on this like screen sharing like pair programming startup
59:03 right and then we joined open AI and so the idea was really to build a
59:07 contextual desktop assistant and the reason I believe that's so important is
59:11 because I think that it's really annoying to have to give all your
59:14 context to an assistant and then to figure out how it can help you right and
59:18 so if it could just like understand what you are trying to do then it could
59:23 maximally accelerate do um and so I I I would you know I still think of Codex
59:26 actually as like a contextual assistant um from a little bit of a different
59:30 angle like starting with coding tasks but um the some of the some of the
59:36 thinking at least for me personally I can't speak for the whole project but
59:40 was that a lot of work is done in the web and if we could build a browser then
59:45 we could be contextual for you but in a much more first class way we weren't
59:48 hacking like other desktop software which have like very varied report for
59:53 for like what content they're rendering to the accessibility tree. Uh we
59:56 wouldn't be relying on screenshots which are a little bit slower and unreliable.
60:00 Instead, we we could like be in the rendering engine, right? And like
60:03 extract whatever we needed to to help you. Um and also I like to think of like
60:09 you know video games like I don't know if you've played like I don't know say
60:13 Halo right like you walk up to an object. I mean this true for many games
60:16 you press man it's been a long time this is embarrassing. press X and it just
60:21 does the right thing, right? And I was one of those guys who always read the
60:23 instruction manual for every video game that I bought. And I remember the first
60:26 time I read about a contextual action and I just thought it was like this
60:31 really cool idea. And uh you know the the thing about a contextual action is
60:34 we need to know what you are attempting to do. We need to have a little bit of
60:37 context and then we can and then we can help. Uh, and I think this is critically
60:43 important because you know, imagine this world that we reach, right, where we're
60:45 we have agents that are helping you thousands of times per day. Um, imagine
60:50 if the only way we could tell you that we helped you is if we could like push
60:55 notify you. So, you get a thousand push notifications a day of an AI saying
60:59 like, "Hey, I did this thing. Do you like it?" It'd be super annoying, right?
61:03 Whereas imagine going back to software engineering like I was looking at a
61:07 dashboard and I noticed some like key metric had like gone down
61:12 and you know at that point in time an II could like maybe go take a look and then
61:15 surface the fact that it has an opinion on why this metric went down and maybe a
61:19 fix right there right when I'm looking at the dashboard right that would be
61:22 like that would much more keep me in flow and enable the agent to take action
61:27 on like many more things so in my mind like part of why I'm excited for us to
61:32 have a browser is that I think we have then like much more context around like
61:37 what we should help with. Users have much more control over what they want us
61:40 to look at. It's like hey if you want to open if you want us to like take action
61:43 on something you can open it in your AI browser. If you don't then you can open
61:46 it in your other browser right? So like really clear control and boundaries and
61:51 then we have the ability to build UX that's like mixed initiative so that we
61:54 can surface contextual actions to you like at the times they're helpful as
61:58 opposed to just like randomly notifying you. hearing the vision for Codeex being
62:01 the super assistant. It's not just there to code for you. It's trying to do a lot
62:05 for you as a teammate, as this kind of super teammate that makes you awesome at
62:10 work. So, I get this. Speaking of that, are there other non-engineering
62:15 common use cases for codecs? Just ways that non-engineers, we talked about it,
62:18 you know, designers prototyping and building stuff. Are there any, I don't
62:22 know, fun or unexpected ways people are using codecs that aren't engineers? I
62:25 mean there's a load of a load of unexpected ways but I think like most of
62:31 where we're seeing like real traction with people using things are still for
62:35 now like very like I would say coding adjacent or like sort of tech oriented
62:39 places where there's like a mature ecosystem um or you know maybe you're
62:43 doing data an data analysis or something like that. I personally am expecting
62:47 that we're going to see a lot more of that over time. Um, but for now like
62:51 we're keeping the team like very focused on just coding for now because there's
62:54 so much more work to do. >> For people that are thinking about
62:58 trying out codecs, is there like um does it work for all kinds of code bases?
63:02 What what code does it support? If you're like I don't know SAP, can you
63:06 add codec and start building things? What's kind of like the sweet spot or
63:11 does it start to not be amazing yet? This I'm really glad you asked this
63:14 question actually because the best way to try codeex is to give it your hardest
63:19 tasks which is a little different than some of the other coding agents like you
63:23 know some tools you might think okay let me like start easy or just like you know
63:27 like vibe code something random and decide if I like the tool whereas like
63:32 we're really building codeex to be the like professional tool that you can give
63:36 your like hardest problems to um and you know that writes like high quality code
63:40 in your like enormous code base that is in fact not perfect right now. So yeah,
63:43 I think if you're going to try codeex, you want to try it on like a real task
63:48 that you have and not necessarily like dumb that task down to something that's
63:53 like trivial, but actually like you know like a good one would be like you have a
63:55 hard bug and you don't know what what's causing that bug and you ask Codex to
63:59 like help figure that out or like to implement that, you know, the fix.
64:02 >> I love that answer. Just give it your hardest problem. I will say like you
64:05 know if you if you're like hey okay well the hardest problem I have is that I
64:08 need to build like a new unicorn business like obviously that you know
64:13 it's not going to work not yet. So I think it's like give it like the hardest
64:18 problem but something that is still like one like question right or one task um
64:23 to start that's if you're testing and then over time you can learn how to use
64:25 it for like bigger things. >> Yeah. What languages does does it
64:28 support? Basically the way we've trained codeex is like there's a distribution of
64:32 languages that we support and it's like fairly aligned with like the frequency
64:36 of these languages in the world. So unless you're writing some like very
64:39 esoteric language or like some private language, it should do fine in your
64:42 language. If someone was just getting started, is there a tip you could share
64:46 to help them be successful? Like if you could just whisper a little tip into
64:49 someone just setting up Codex for the first time to help them have a really
64:53 good time, what's something you would whisper? >> I might say try a few things in
64:57 parallel, right? Right? So you could try giving it a hard task. Um maybe ask it
65:03 to understand the codebase. Uh formulate a plan with it around an idea that you
65:07 have and kind of build your way up from there. And like sort of the meta idea
65:11 here is it's again it's like you're building trust with the new teammate,
65:15 right? And so like you wouldn't go to a new teammate and just give them like hey
65:18 do this thing here's zero context. you would start by like first making sure
65:22 they understand the codebase and then you would like maybe align on a an
65:24 approach and then you would have them go off and do bit by bit right and I think
65:28 if you use codeex in that way you'll just sort of naturally start to
65:30 understand like the different ways of prompting it because it is it's a super
65:35 powerful like agent and model but it is it is a little bit different to prompt
65:38 codeex and other models just a couple more questions one we touch on this a
65:44 little bit as AI does more and more coding there's always this question of
65:48 should I learn to code why should they spend time doing this sort of thing. For
65:52 people that are trying to figure out what to do with their career, especially
65:55 if they're into software engineering, computer science, do you think there's
65:59 specific elements of computer science that are mo more and more important to
66:03 lean into maybe things they don't need to worry about? Like what do you think
66:06 people should be leaning into skill-wise in as this becomes more and more of a
66:11 thing in our workplace? I think there's like a couple angles you could go at
66:18 this from. Um, I think the, well, the easiest one to think of at
66:24 least is just like be a doer of things. Um, I think that, you know, with coding
66:28 agents, um, getting better and better over time. It's just what you can do as
66:33 even like someone in college or a new grad is just like so much more than what
66:37 that was before. And so, I think you just want to be taking advantage of
66:40 that. You know, definitely when I'm looking at like hiring folks who are
66:43 earlier career, it's like definitely something that I think about is how how
66:47 productive are they using the latest tools, right? They should be like super
66:51 productive. And if you think of it in that way, they actually have like less
66:55 of a handicap than before versus a more senior career person because, you know,
66:59 the divide is actually getting smaller because they've got these amazing coding
67:02 agents now. Um, so that's one thing which is like I guess the thing the
67:05 advice is just like learn about whatever you want but just make sure you spend
67:08 time doing things not just like fulfilling homework assignments. I guess
67:12 I think the other side of it though is that it's still deeply worth
67:17 understanding like what makes a good like overall software system. So I still
67:22 think that like skills like really strong systems engineering skills or
67:27 even like really effective like communication and collaboration with
67:31 your team, skills like that I think are are important are going to continue to
67:35 matter for for quite some time. Like I don't think it's going to be like all of
67:39 a sudden uh the AI coding agents are just able to build like perfect systems
67:43 without your help. I think it's going to look much more gradual where it's like
67:48 okay we have these AI coding agents they're able to validate their work it's
67:52 still important and like for example like I'm thinking of an engineer who was
67:55 working on Atlas since we were talking about it he set up codeex so it can like
67:59 verify its own work which is a little bit non-trivial because of the nature of
68:02 the Atlas project. So the way that he did that was he actually prompted codeex
68:05 like hey why can't you verify your work fix it and like did that on a loop right
68:11 and so you still like at various phases are going to want a human in the loop to
68:15 like help configure the coding agent to be effective and so I think like you
68:19 still want to be able to reason about that so maybe it's like less important
68:23 that you can like type really fast and like you understand exactly how to write
68:27 not that anyone writes a you know for each loop or something right but it is
68:31 or you know you don't need to know how implement like a specific algorithm. But
68:33 I think you need to be able to reason about the different systems and like
68:36 what makes like effective a software engineering team effective. So I think
68:40 that's the other really important thing. And then like maybe the last angle that
68:44 you could take is I think if you're on the frontier of knowledge for a given
68:49 thing, I still think that's like deeply interesting to go down partially because
68:54 that knowledge is still going to be like uh you know agents aren't going to be as
68:58 good at that. But also partially because I think that like by trying to advance
69:01 the frontier of a specific thing, you'll actually like end up like being forced
69:05 to take advantage of coding agents and like using them to accelerate your own
69:09 workflow as you go. >> What's an example that when you when you
69:12 talk about being at the frontier? So >> Codex writes a lot of the code that
69:15 helps like manage its training runs, the key infrastructure. Uh you know, we move
69:21 pretty fast and so we have a Codex code review is like catching a lot of
69:23 mistakes. It's actually caught some like pretty interesting configuration
69:27 mistakes and uh you know we're starting to see glimpses of the future where
69:31 we're actually starting to have codeex even like be on call for its own
69:36 training which is pretty interesting. Um so there's lots there.
69:39 >> Uh wait what does that mean to be on call for its own training? So it's
69:42 running it's training and it's like oh something broke someone needs and it it
69:45 does it like alert people or it's like here I'm going to fix the problem and re
69:48 restart. This is an early idea that we're like figuring out, but the basic
69:51 idea is that you know during a training run there's like a bunch of graphs that
69:54 like today like humans are looking at and it's like really important to like
69:58 look at those. Um we call this babysitting >> because it's very expensive to train I
70:02 imagine and very important to move fast and exactly and there's a lot of there's
70:06 a lot of systems underlying uh the training run and so like a system could
70:09 go down or there could be an error somewhere that gets introduced and so we
70:13 might need to like fix it or pause things or I don't know there's lots of
70:16 actions we might need to take and so basically having codeex like run on a
70:20 loop to like evaluate how those charts are moving over time um is sort of this
70:24 idea that we have to like how to enable us to like train like way more
70:27 efficiently. I love that. This is very much along the lines of this is the
70:31 future of agents. It's codeex isn't just for building code, right? It's it's a
70:34 lot more than that. >> Yeah. >> Okay. Last question. Uh being at OpenAI,
70:41 uh I can't not ask about your AGI timeline and how far you think we are
70:45 from AGI. I know this isn't what you work on, but there's a lot of opinions,
70:50 a lot of I don't know timelines. How far do you think we are from a humanly human
70:56 version of AI? Whatever that means to you. For me, I think that it's a little
71:01 bit about like when do we see the acceleration curves kind of go like this
71:03 or I don't know which way I'm mirrored here, right? When do we see the hockey
71:08 stick? And I think that the current limiting factor, I mean there's many,
71:11 but I think a current underappreciated limiting factor is like literally human
71:16 typing speed or human multitasking speed on like writing prompts,
71:20 right? And like you know, you were talking about it's like you can have an
71:22 agent like watch all the work you're doing, but if you don't have the agent
71:27 uh also validating its work, then you're still bottlenecked on like can you go
71:30 review all that code, right? So my view is that we need to um unblock those
71:36 productivity loops from like humans having to prompt and humans having to
71:40 like manually validate all the work. And so if we can like rebuild systems to let
71:45 the agent like be default useful, we'll start unlocking hockey sticks.
71:48 Unfortunately, I don't think that's going to be binary. I think it's going
71:51 to be very dependent on what you're building, right? So like I would imagine
71:55 that like next year if you're a startup and you're building a new new piece of
71:59 like you know some new app or something it'll be possible for you to set it up
72:02 on a stack where agents are like much more self sufficient than not right but
72:07 now let's say I don't know you message SAP right let's say you work in SAP like
72:11 they have many like complex systems and they're not going to be able to just
72:13 like get the agent to be self-sufficient overnight in those systems so they're
72:17 going to have to slowly like maybe replace systems or update systems to
72:21 allow the agent to like handle more of the work end to end. And so basically my
72:25 sort of long answer to your question, maybe boring answer is that I think
72:29 starting next year we're going to see like early adopters like starting to
72:33 like hockey stick their productivity. Um and then over the years that follow,
72:36 we're going to see larger and larger companies like hockey stick that
72:39 productivity. And then somewhere in that fuzzy middle is like when that hockey
72:44 sticking will be like flowing back into the AI labs and that's when we'll we'll
72:48 basically be at the AGI tier. >> I love this answer. It's very practical
72:52 and it's something that comes up a lot on this podcast just like the time to
72:55 review all the things AI is doing is really annoying and a big bottleneck. I
72:59 love that you're working on this because it's one thing to just make coding much
73:03 more efficient and do that for people. It's another to take care of that final
73:08 step of okay is this actually great? And that's so interesting that your sense is
73:11 that's the limiting factor. It comes back to your earlier point of even if AI
73:16 did not advance anymore. We have so much more potential to unlock if we uh as we
73:22 learn to use it more effectively. Uh so that is a really unique answer. I
73:25 haven't heard that perspective on what is the big unlock human typing speed to
73:29 review basically what AI is doing for us. >> Mhm. So good. Okay. Uh Alexander, we
73:35 covered a lot of ground. Is there anything that we haven't covered? Is
73:38 there anything you wanted to share, maybe double down on before we get to
73:44 our very exciting lightning round? I think uh one thing is that the codeex
73:48 team is growing and uh as I was just saying, we're still somewhat limited by
73:51 human thinking speed and human typing speed. We're working on it. So um if
73:58 you're an engineer um or a salesperson or I am hiring for product, a product
74:03 person, uh please hit us up. I'm not sure the best way to give contact info,
74:06 but I guess you can go to our jobs page or do they have contact for you?
74:10 Actually, do listeners have contact for you >> before they send me like, "Hey, I want
74:13 to apply to Codex." >> Uh, I do have a contact form at lenny
74:16 richchi.com. I'm afraid of all the amazing people that are ping me. But
74:19 there we go. We could try that. Let's see how that goes. >> Okay. Or Yeah. Or another maybe an
74:24 easier. We can edit all that out or up to you. But uh yeah, or I would just say
74:28 you can drop us a DM. Uh, for example, I'm Emir Rico on Twitter and hit me up
74:32 if you're interested in joining the team. >> What a dream job for so many people.
74:38 What's a sign they I don't know what's like a way to filter people a little bit
74:42 so they're not flooding your inbox. >> So, specifically, if you want to join
74:46 the codeex team, then you need to be a technical person who uses these tools.
74:50 And I think I would just ask yourself the question, uh, hey, let's say, you
74:54 know, I were to join OpenAI and work on Codeex over the next six months, you
74:59 know, and crush it. What does the life of a software engineer look like then?
75:02 And I think if you have an opinion on that, you should apply. And if you don't
75:05 have an opinion on that and have to think about it first, you know,
75:09 depending on how long you have to think about it, I guess that would be the
75:12 filter, right? Like I think there's a lot of people thinking about the space
75:16 and so we're we're very interested in folks who sort of have already been
75:21 thinking about like what the future should look like with agents and like we
75:23 don't have to agree on where where we're going but I think we want people who
75:26 like are very passionate about the topic. I guess >> it's very rare to be working on a
75:32 product that has this much impact and is at such a bleeding edge of where it's
75:37 possible. It's uh what a cool role for the right person. So, uh, um, it's
75:40 awesome that you have an opening and this audience is, uh, a really good fit
75:45 potentially for for that role. So, I hope we find someone that would be
75:49 incredible. With that, we've reached our very exciting lightning round. I've got
75:53 five questions for you, Alexander. Are you ready? >> I don't know what these are, but I'm
75:57 excited. Let's do it. >> Uh, they're uh, the same questions ask
76:02 everyone except for the last one. So, uh, probably not a surprise. I should
76:06 probably make them more more often a surprise. Okay, first question. What are
76:09 a couple books that you recommend most to other people? Two or three books that
76:14 come to mind. I have been reading a lot of science fiction recently. And I'm
76:18 sure this has been recommended before, but The Culture, I think it's Ian Banks is the name of
76:24 the author. Part of why I love it is because it's like basically relatively
76:30 recent writing about a future with AI, but it's an optimistic future with AI.
76:34 Um, and I think, you know, a lot of sci-fi is like fairly dystopian. Um, but
76:39 this is like people uh sort of the joke at least on the sub culture subreddit is
76:43 that let me let me see if I can get this right. It is a like space communist
76:49 utopia or or like I think it's a gay space communist utopia. Um, and uh I
76:54 just think it's like really fun to think about um like to use the culture as a
76:58 way to think about like what kind of world can we usher in and like what
77:01 decisions can we make today to help usher in that world. >> Wow. I've not I don't think anyone's
77:05 recommended that. I know you're reading, you mentioned before I started recording
77:09 Lord of the Rings right now. Uh if you want another AIish sci-fi book, uh have
77:15 you read Fire Upon the Deep? >> No, I haven't. >> Okay. It's uh incredibly good. It's like
77:22 a a sci-fi space opera sort of epic tale with uh super intelligence.
77:25 >> Cool. >> Yeah. Somewhat mostly not optimistic,
77:30 but somewhat optimistic. Okay. Next question. Is there a favorite recent
77:35 movie or TV show that you've really enjoyed? >> Yeah, there's an anime called Jiu-Jitsu
77:41 Kaisen, which I really like. Um, again, it's got a kind of a slightly dark topic
77:46 of like demons. Um, but what I love about it is that the hero is really
77:49 nice. And I think there's this new wave of like anime and cartoons where the
77:55 protagonists are really friendly and like people who care about the world rather than being like
78:01 sort of like if you look at like some older anime like that started the genre
78:07 like you know those like Evangelian or Akita and like those characters the
78:11 protagonists are like deeply flawed like quite unhappy um that they didn't start the genre but
78:17 it was like a trend for a while to sort of poke poke fun at the idea that in
78:21 these in these cartoons the protagonist was very young but being given a
78:24 ridiculous amount of responsibility to like save the world. And so there was
78:30 kind of a wave of like uh content that was like critiquing this by making the
78:33 character like basically go through like serious like mental issues in the middle
78:37 of the show. Um and I'm not saying this is better, but at least it's quite fun
78:40 to have like these like really positive protagonists who are just trying to help
78:44 everyone around them. I love how much we're learning about your uh personality
78:49 during these recommendations. Nice protagonists, optimistic futures.
78:53 >> I think, you know, if you don't believe it, you can't r will it into existence.
78:57 So, you're in a balance. >> This is your training data.
79:01 >> Is there a product you've recently discovered you really love? Could be an
79:05 app, could be some clothing, could be some kitchen gadget, tech gadget, a hat.
79:13 Yeah. So I have been like quite into uh you know combustion engines um and cars.
79:19 Actually the reason I came to America initially was cuz I wanted to work on
79:23 like US aircraft. Um but you know now I work in software. Um and so for the
79:28 longest time I basically only had like quite old sports cars. Uh old just
79:33 because they were more affordable. Um and then uh recently um we got a Tesla
79:38 instead. And I have to say that I find the Tesla software like quite inspiring. Um, in
79:45 particular, it has like the self-driving feature. And you know, I've mentioned a
79:49 few times like today like I think it's really interesting to think about how to
79:52 build like mixed initiative software that makes you feel maximally empowered
79:56 as a human, maximally in control, but yet you're getting a lot of help. And I
80:01 think they did a really good job with enabling sort of the car to drive
80:05 itself, but all these different ways that you can adjust what it's doing
80:08 without turning off the self-driving. So like you can accelerate, you know, it'll
80:12 like listen to that, you can turn a knob to change its speed, you can steer
80:17 slightly. Um, I think it's it's actually a masterass in like building an agent
80:21 that still leaves the human in control. This reminds me Nick Turley's whole uh
80:25 mantra was are we maximally accelerated? >> Yeah. Yeah,
80:28 >> feels like it's completely infiltrated everything at OpenAI, which makes sense.
80:33 That tracks. Uh, two more questions. Do you have a life motto that you often
80:38 think about and come back to in work or in life that's been helpful?
80:41 >> I don't know if I have a life motto, but maybe I can tell you about the number
80:45 one value, company value from my startup. >> Love it. >> Which is still something that sticks
80:51 with me, which is to be kind and candid. >> That tracks
80:55 kind and candid. Wow. Yeah. And we had to put them together because we as
81:02 founders realized that we often would be nice and it wasn't actually the right thing
81:09 to do. We would like delay the difficult conversations and we were not candid.
81:12 And so every time we would like remind ourselves of this motto and then we
81:15 would become more candid and then six months later we would realize that we
81:18 were in fact not candid six months ago and we needed to be even more candid. So
81:23 then the question is like okay like how how should we be candid? It's like okay
81:26 well let's let's think of being candid as an act of kindness but also think of
81:29 that both in terms of doing it and willing ourselves to do it but also in
81:32 terms of how we frame it to people. >> That is a beautiful uh way of
81:36 summarizing how to how to lead well. What's the uh the book about dare uh
81:42 challenge directly but care deeply uh radical cander. >> Oh yeah yeah
81:45 >> yeah. So it's like another way of thinking about radical cander. Okay last
81:48 question. I was looking up your last name just like hey what's the what's the
81:52 story here? So your last name is Emiricos and I was talking at JGPT and
81:57 it told me the most famous individuals with the surname are the influential
82:02 Greek poet and psychoanalyst Andreas Emiros and his relative the wealthy shipping
82:09 magnate and art collector George Mureos. So the question is which of these two do
82:14 you most identify with? The Greek poet and psychoanalyst or the wealthy
82:19 shipping magnate and art collector? I think it's it's gonna have to be the
82:25 poet because uh he uh he loved the island that our family's from.
82:29 >> Wait, you know those people? Okay, this is not news to you. Okay.
82:32 >> Well, I mean it's an enormous family, but it's like Greek, so you know these
82:35 big families, everyone like everyone's your uncle, you know what I mean? Like
82:38 my mother's Malaysian and also like everyone is my uncle or aunt in
82:42 Malaysia, too, if that makes sense. >> Yeah. But yeah, he he loved this island
82:48 that the family sort of like initiated from. I believe I don't actually know
82:51 where that chipping magnate lived. I think it was New York or something. But
82:54 anyway, we all came from this island called Andros. Um, which is a really
82:59 beautiful place and it's like there's more like livestock there than than
83:03 humans. Uh, not too many tourists go there. Uh, but I think he like part of
83:07 what I think is really cool is like he published a lot and a lot of his writing
83:11 is about like the beauty of that island which I think is super cool.
83:15 >> Wow, that was an amazing answer. Two more questions. Where can folks find you
83:17 if they want to follow you online and you know maybe reach out and then how
83:20 can listeners be useful to you? >> I I'm one of those people who has social
83:23 media only for the purposes of having work. You know my phone my phone turns
83:27 black and white at like 9:00 p.m. at night. Uh but yeah, so Twitter or XM
83:34 Rico. Um, and uh, yeah, if you post in r/codeex, I'll probably see it. Uh, so
83:40 you know, you can go there. Um, how can listeners be useful? Um, I would say
83:44 please try codeex. Please share feedback. Let us know what to improve.
83:48 We pay a ton of ton of attention to feedback. I think it's like honestly
83:51 like the growth has been amazing, but it's still very early times. Um, so we
83:56 still pay a lot of attention and hope to do so forever. Um and also um I would
84:01 say if you're interested in working on the future of coding agents and then
84:06 agents generally then please uh apply to our job site um and or message me in
84:11 those social media places. Alexander this was awesome. I always love meeting
84:15 people working on AI because it always feels like this very I don't know
84:20 sterile scary mysterious thing and then you meet the people building these tools
84:24 and they're always just so awesome and you especially just so nice and uh as
84:30 you like the examples you shared optimism and kindness you know this is
84:34 what we want to be this is these are the kinds of people we want to be building
84:37 these tools that are going to drive the future so um I'm I'm really thankful
84:42 that you did this Um, grateful to have met you and uh, thank you so much for
84:45 being here. >> Yeah, thanks so much for having me. This
84:48 is fun. Thank you so much for listening. If you found this valuable, you can subscribe
84:54 to the show on Apple Podcasts, Spotify, or your favorite podcast app. Also,
84:58 please consider giving us a rating or leaving a review as that really helps
85:03 other listeners find the podcast. You can find all past episodes or learn more
85:08 about the show at lennispodcast.com.
$

Inside OpenAI: 2026 is the year of agents, AI’s biggest bottleneck, and why compute isn’t the issue

@LennysPodcast 1:25:13 20 chapters
[AI agents and automation][developer tools and coding][marketing and growth hacking][e-commerce and conversion optimization][solo founder and bootstrapping]
// chapters
// description

Alexander Embiricos leads product on Codex, OpenAI’s powerful coding agent, which has grown 20x since August and now serves trillions of tokens weekly. Before joining OpenAI, Alexander spent five years building a pair programming product for engineers. He now works at the frontier of AI-led software development, building what he describes as a software engineering teammate—an AI agent designed to participate across the entire development lifecycle. *We discuss:* 1. Why Codex has grown 20x since

now: 0:00
// tags
[AI agents and automation][developer tools and coding][marketing and growth hacking][e-commerce and conversion optimization][solo founder and bootstrapping]