// transcript — 2871 segments
0:00 Introduction to Alexander Embiricos
0:01 do lead work on Codex. >> Codex is Open's coding agent. We think
0:05 of Codex as just the [music] beginning of a software engineering teammate. It's
0:09 a bit like this really smart intern that refuses to read Slack, doesn't check
0:12 Data [music] Dog unless you ask it to. >> I remember Carpot that he tweeted the
0:16 gnarliest bugs that he runs into that he just spends hours trying to figure out.
0:19 Nothing else has solved. He gives it to Codex, lets it run for an hour and it
0:21 solves it. >> Starting to see glimpses of the future where we're actually starting to have
0:26 Codeex be on call for its own training. Codex writes a lot of the code that
0:29 helps like manage its training run. the key infrastructure. [music] And so we
0:32 have a codeex code review is like catching a lot of mistakes. It's
0:34 actually caught some like pretty interesting configuration mistakes. One
0:37 of the most mind-blowing examples of acceleration, the Sora Android app, like
0:42 a fully new app. We built it in 18 days and then 10 days later, so 28 days
0:45 total, we went to the public. >> How do you think you win [music] in this
0:47 space? >> One of our major goals with Codex is to get to proactivity. If we're going to
0:51 build a super assistant has to be able [music] to do things. One of the
0:54 learnings over the past year is that for models to do stuff, they are much more
0:57 effective when they can use a computer. It turns out the best way for models to
1:00 use [music] computers is simply to write code. And so we're kind of getting to
1:02 this idea where if you want to build any agent, maybe you should be building a
1:04 coding agent. >> When you think about progress on codecs,
1:08 I imagine you have a bunch of evals and there's all these public benchmarks.
1:11 >> A few of us are like constantly on Reddit. You know, there's a there's
1:14 praise up there and there's a lot of complaints. What we can do as a product
1:17 team just [music] try to always think about how are we building a tool so that
1:20 it feels like we're maximally accelerating people rather than building
1:23 a tool that makes it more unclear what you should do as the human. Being at
1:27 OpenAI, I can't not ask about how far you think we are from AGI.
1:29 >> The current underappreciated [music] limiting factor is literally human
1:32 typing speed or human multitasking speed. Today, my guest is Alexander Emiros,
1:39 product lead for Codeex, OpenAI's incredibly popular and powerful coding
1:44 agent. In the words of Nick Turley, head of Chachi BT and former podcast guest,
1:48 Alex is one of my all-time favorite humans I've ever worked with, and
1:51 bringing him and his company into OpenAI ended up being one of the best decisions
1:56 we've ever made. Similarly, Kevin Wheel, OpenAI CPO, said, "Alex is simply the
2:00 best." In our conversation, we chat about what it's truly like to build
2:04 product at OpenAI. How Codeex allowed the Sora team to ship the Sora app,
2:07 which became the number one app in the app store in under one month. Also, the
2:12 20x growth Codex is seeing right now and what they did to make it so good at
2:16 coding. Why his team is now focused on making it easier to review code, not
2:20 just write code. His AGI timelines, his thoughts on when AI agents will actually
2:25 be really useful, and so much more. A huge thank you to Ed Baze, Nick Turley,
2:28 and Dennis Yang for suggesting topics for this conversation. If you enjoy this
2:32 podcast, don't forget to subscribe and follow it in your favorite podcasting
2:35 app or YouTube. And if you become an annual subscriber of my newsletter, you
2:40 get a year free of 19 incredible products, including a year free of
2:45 Devon, Lovable, Replet, Bolt, Nadam, Linear, Superhum, Descript, Whisper
2:48 Flow, Gamma, Perplexity, Warp, Granola, Magic Patterns, Raycast, Champardd, Mob
2:52 and Post Hog, and Stripe Atlas. Head on over to lenniesnewsletter.com and click
2:56 product pass. With that, I bring you Alexander and Biricos after a short word
3:00 from our sponsors. Here's a puzzle for you. What do OpenAI, Cursor, Perplexity, Verscell, Platt, and
3:07 hundreds of other winning companies have in common? The answer is they're all
3:12 powered by today's sponsor, Work OS. If you're building software for
3:14 enterprises, you've probably felt the pain of integrating single signon, skim,
3:20 arbback, audit logs, and other features required by big customers. Work OS turns
3:25 those deal blockers into drop-in APIs with a modern developer platform built
3:29 specifically for B2B SAS. Whether you're a seedstage startup trying to land your
3:33 first enterprise customer or a unicorn expanding globally, work OS is the
3:36 [music] fastest path to becoming enterprise ready and unlocking growth.
3:40 They're essentially Stripe for enterprise features. Visit workos.com to
3:45 get started or just hit up their Slack support where they have real engineers
3:48 in there who answer your questions super fast. Workos allows you to build like
3:53 the best with delightful APIs, comprehensive docs, and a smooth
3:58 developer experience. Go to works.com to make your app enterprise ready today.
4:03 This episode is brought to you by Finn, the number one AI agent for customer
4:07 service. If your customer support tickets are piling up, then you need
4:11 Finn. Finn is the highest performing AI agent on the market with a 65% average
4:16 resolution rate. Finn resolves even the most complex customer queries. No other
4:20 AI agent performs better. In head-to-head bake offs with competitors,
4:25 Finn wins every time. Yes, switching to a new tool can be scary, but Finn works
4:29 on any help desk with no migration needed, which means you don't have to
4:32 overhaul your current system or deal with delays in service for your
4:35 customers. [music] And Finn is trusted by over 6,000 customer service leaders
4:39 and top companies like Anthropic, Shuttertock, Cynthia, Clay, Vant,
4:43 Lovable,Mundday.com, and more. And because Finn is powered by the Finn AI
4:47 engine, which is a continuously improving system that allows you to
4:50 analyze, train, test, and deploy with ease, Finn can continuously improve your
4:54 results, too. So, if you're ready to transform your customer service, and
4:58 scale your support, give Finn a try for only 99 cents per resolution. Plus, Finn
5:02 comes with a 90-day money back guarantee. Find out how Finn can work
5:08 for your team at f.ai/lenny. Alexander, thank you so much for being
5:18 here and welcome to the podcast. >> Thank you so much. I've been following
5:21 for ages and I'm excited to be here. >> I'm even more excited. I really
5:24 appreciate that. I want to start with your time at Open AI. So, you joined
5:30 OpenAI about a year ago. Before that, you had your own startup for about 5
5:34 years. Before that, you were a product manager at Dropbox. I imagine OpenAI is
5:39 very different from every other place you've worked. Let me just ask you this.
5:44 What is most different about how OpenAI operates? And what's something that
5:47 you've learned there that you think you're going to take with you wherever
5:50 you go, assuming you ever leave? By far, I would say the speed and ambition of
5:54 working at OpenAI are just like dramatically more than what I can
5:58 imagine. And you know, I guess it's kind of an embarrassing thing to say because
6:01 you, you know, everyone who's a startup founder thinks like, "Oh yeah, my
6:04 startup moves super fast and the talent bar is super high and we're super
6:07 ambitious." But I have to say like working at OpenAI just kind of like made
6:10 me reimagine what he what that even means. We hear this a lot about, you
6:14 know, feels like every AI company is just like, "Oh my god, I can't believe
6:17 how fast they're moving." Is there an example of just like, "Wow, that
6:19 wouldn't have happened this quickly anywhere else." >> The most obvious thing that comes to
6:23 mind is just like the the explosive growth of codeex itself. I think it's a
6:27 while since we bumped our external number, but like you know it's like the
6:32 the 10xing of Codeex's scale was just like super fast in a matter of months
6:37 and it's like well more since then and you know like once you've lived through
6:40 that or at least speaking for myself like having lived through that now I
6:45 feel like anytime I'm going to spend my time on like you know building tech
6:49 product there's that kind of that speed and scale that I now need to to to meet.
6:54 If I think of like what I was doing in my startup, it moved like way slower.
6:58 And I, you know, there's always this balance with startups of like how much
7:01 do you commit to an idea that you have versus like find out that it's not
7:06 working uh and then pivot. But I think one thing I've realized at OpenAI is
7:09 like the the amount of impact that we can have and in fact need to have to do
7:13 a good job is so high that it it's I have to be like way more ruthless with
7:16 how I spend my time. Before we get to codeex, is there a way that they've
7:20 structured the org or I don't know the way that open operates that allows the
7:23 team to move this quickly because everyone everyone wants to move super
7:27 fast. I imagine there's a structural approach to allowing this to happen.
7:30 >> I mean, so one thing is just the technology that we're building with has
7:35 like just transformed so many things, you know, from like both how we build
7:39 but also like what kinds of things we can enable uh for users. And you know we
7:43 spend most of our time talking about like the sort of improvements within the
7:47 foundation models but I I believe that even if we had no more progress today
7:51 with models which is absolutely not the case but if even if we had no more
7:55 progress we are way behind on product. There's so much more product to build.
7:59 >> So I think like just like the moment is ripe if that makes sense.
8:03 >> But I think there's a lot of sort of counterintuitive things that surprised
8:06 me when I arrived as far as like how things are structured. One example that
8:10 comes to mind is like when I was working on my startup and and before that when I
8:12 was a dropbox, it was like very important, you know, especially as a PM
8:16 to like always kind of rally the ship and it was kind of like make sure you're
8:18 pointed in the right direction and that you can like accelerate in that
8:24 direction. But here I think because we don't exactly know like what
8:27 capabilities will even come up soon and we don't know what's going to work uh
8:31 technically and then we also don't know what's going to land even if it works
8:34 technically. It's much more important for us to be very like humble and learn
8:39 a lot more empirically and just try things quickly and like the org is is
8:44 set up in that way to to be incredibly bottoms up. You know, this is again one
8:47 of those things that like as you were saying, everyone wants to move fast. I
8:50 think everyone likes to say that they're bottoms up or at least a lot of people
8:53 do, but OpenAI is like truly truly bottoms up and that's like been a
8:58 learning experience for me that now like it it'll be interesting if I ever work
9:02 at like I don't think it'll ever it'll even make sense to work at a nonAI
9:05 company in the future. I don't even know what that means. But if I were to
9:08 imagine it or go back in time, I think I would like run things totally different.
9:12 >> What I'm hearing is kind of this uh ready, fire, aim uh is the approach more
9:17 than ready, aim, fire. And this something and as you processed that uh
9:21 because that may not come across well but I actually have heard this a lot at
9:25 AI companies is because you don't know and Nick Charlie shared I think the same
9:28 sentiment because you don't know how people will use it. It doesn't make
9:31 sense to spend a lot of time making it perfect. It's better to just get it out
9:36 there in a primordial way see how people use it and then go big on that use case.
9:41 Yeah. It's like to okay to use this analogy a little bit I feel like there
9:44 there is an aim component but the aim component is much fuzzier. you know,
9:48 it's kind of like roughly what do we think can happen? like someone um I've
9:52 learned a ton from working here is a is a research lead and he likes to say that
9:57 like in open AI we can can have really good conversations about something
10:01 that's like a year plus from now and you know there's a lot of ambiguity in what
10:04 will happen but but like that's a right sort of timeline and then we can have
10:07 really good conversations about what's happening like in like low months or low
10:11 or weeks but there's kind of this like awkward middle ground which was like as
10:14 you start approaching a year but you're not at a year where it's like very
10:18 difficult to reason about right and so as far As far as like aiming, I think we
10:21 want to know like, okay, what are some of the futures that we're trying to
10:24 build towards and like a lot of the problems we're dealing with in AI, like
10:26 such as alignment, are problems you need to be thinking out like really far out
10:30 into the future. So, we're kind of aiming fuzzily there. But when it comes
10:34 down to the more tactically like, oh yeah, like what product will we build
10:37 and therefore how will people use that product? That's the place where we're
10:40 much more like let's find out empirically. >> That's a good way of putting it.
10:44 Something else that when people hear this, they people sometimes hear
10:49 companies like yours saying, "Okay, we're gonna be bottoms up. We're gonna
10:51 try a bunch of stuff. We're not going to have exactly a plan of where it's going
10:55 in the next few months." The key is you all hire the best people in the world.
10:59 And so that feels like a really key ingredient in order to be this
11:02 successful at Bottoms Upwork. it just super resonates basically.
11:07 >> Um I was just like again surprised or even shocked when I arrived at like the
11:11 level of like individual like drive and like autonomy that everyone here has. So
11:18 I think like the way that OpenAI runs like many you can't like read this or be
11:22 on listen to a podcast and be like I am I'm just going to deploy this to my
11:26 company. Um you know maybe this is a harsh thing to say but I think like yeah
11:28 very few companies have the talent caliber to be able to do that. So it
11:33 might need to be like adjusted if you were going to implement this.
11:34 Codex: OpenAI’s coding agent
11:36 >> Okay. So let's talk codeex. You lead work on codeex. How's codeex going? What
11:40 numbers can you share? Is there anything you can share there? Also just not
11:43 everyone knows exactly what codeex is. Explain what codeex is. Totally. Yeah.
11:48 So uh I have the very lucky job of of living in the future and leading
11:53 products on codeex. Um and codeex is open coding agent. So super concretely
11:59 that means it's an IDE extension VS code extension uh that you can install or a
12:02 terminal tool that you can install and when you do so you can then basically
12:06 pair with codeex to answer questions about code write code uh you know run
12:12 tests execute code and do a bunch of the work in sort of that like thick middle
12:15 section of the software development life cycle which is all about uh you know
12:19 writing code that you're going to get into production. Uh more broadly we
12:25 think of codeex as like it's the what it currently is is just the beginning of a
12:29 software engineering teammate. And so you know when we when you when we use a
12:32 big word like teammate like some of the things we're imagining are that it's not
12:36 only able to to write code but actually it participates like early on in like
12:40 the ideation and planning phases of writing software and then further
12:43 downstream in terms of like validation deploying and like maintaining code. to
12:48 make that a little more fun. Like one thing I like to imagine is like if you
12:51 think of what Codex is today, it's a bit like this like really smart intern that
12:55 like refuses to read Slack and like doesn't check data dog or like Sentry
12:59 unless you ask it to. And so like no matter how smart it is, like how much
13:02 are you going to trust it to write code without you also working with it, right?
13:05 So that's how people use it mostly today is they pair with it.
13:08 >> But we want to get to the point where you know it can work like just like a
13:12 new intern that you hire, you don't only ask them to write code, but you ask them
13:15 to participate across the cycle. And so you know that like even if they don't
13:17 get something right the first try, they're eventually going to be able to
13:20 iterate their way there. >> I thought the way uh I thought the point
13:23 about not reading Slack in Dave Dog was it's just not distracted. It's just
13:26 constantly focused and is always in flow. But I get what you're saying there
13:30 is it doesn't have all the context on everything that's going on.
13:33 >> And like that's not only true when it's performing a task, but again if you
13:36 think of like the best human teammates, like you don't tell them what to do,
13:39 >> right? Like maybe when you first hire them, you have like a couple meetings
13:42 and you're like, "Hey, like you kind of learn like, okay, this is this these
13:45 prompts work for this teammate. These prompts don't, right? This is how to
13:48 communicate with this person." Then eventually you give them some starter
13:50 tasks. You delegate a few tasks. But then eventually you just say like, "Hey,
13:53 great. Okay, you're working with this set of people in this area of the
13:57 codebase. You know, feel free to work with other people in other parts of the
14:00 codebase too even." And yeah, you tell me what you think makes sense to be
14:03 done, right? And so, you know, we think of this as like proactivity and like one
14:06 of our major goals with Codeex is to like get to proactivity.
14:12 I think this is this is like critically important to like achieve the mission of
14:15 OpenAI which is to deliver the benefits of AI to all humanity. You know, I like
14:19 to joke today that like AI products and it's it's a half joke. They're actually
14:23 like really hard to use because you have to like be very thoughtful about when it
14:29 could help you. And if you're not prompting a model to help you, it's
14:33 probably not helping you at that time. And if you think of how many times like
14:36 the average user is prompting AI today, it's probably like tens of times. But if
14:40 you think of how many times people could actually get benefit from a really
14:44 intelligent entity, it's thousands of times per day. And so a large a large
14:48 part of our our goal with codeex is to figure out like what is the shape of an
14:52 actual teammate agent that is sort of helpful by default. When people think
14:57 about cursor and uh even cloud code, it it's like a IDE that helps you code and
15:01 kind of autocompletes code and maybe does some agentic work. What I'm hearing
15:05 here is the vision is is different which is it's a teammate. It's like a remote
15:09 teammate, a building code for you that you talk to and ask to do things and it
15:14 also does IDE autocomplete and things like that. Is that is that a kind of a
15:17 differentiator in the way you think about codecs? It's basically this idea
15:22 that like we want the way like if you're a developer and you're trying to get
15:25 something done, we want you to just feel like you have superpowers and you're
15:29 able to move much much faster. But we don't think that in order for you to
15:33 reap those benefits, you need to be sitting there constantly thinking about
15:37 like how can I invoke AI at this point to do this thing. We want you to be able
15:40 to sort of like plug it in to the way that you work and have it just start to
15:43 Codex’s explosive growth
15:43 do stuff without you having to think about it. >> Okay. I have a lot of questions along
15:46 those lines, but uh just how's it going? Is there any stats, any numbers you can
15:49 share about how Codex is doing? >> Yeah, it's been Codex has been growing
15:53 like absolutely explosively um since the launch of GPT5 back in August. Um
15:57 there's some definitely some interesting like product insights to talk about as
16:00 to like how we unlock that growth if you're interested. But yeah, the last
16:03 the last stat we shared there was like we we were like well over 10x since
16:08 August. In fact, it's been like 20x since then. Um, also the codex models
16:12 are serving many many trillions of tokens a week now and it's basically
16:17 like our most served coding model. Um, one of the really cool things that we've
16:20 seen is that the way that we decided to set up the codeex team uh was to build a
16:25 you know really tightly integrated product and research team that are
16:28 iterating on the model and the harness together. And it turns out that lets you
16:32 just do a lot more and try many more experiments as to how these things will
16:36 work together. And so we were just training these models for use in our
16:40 first party harness that we were very opinionated about. And then what we've
16:44 started to see more recently actually is that other major sort of API coding
16:48 customers are now starting to adopt these models as well. And so we've
16:51 reached a point where actually the codeex model is the most served coding
16:55 model in the API as well. >> You uh hinted at this uh what unlocked
17:00 this growth? I am extremely interested in hearing that. It felt like before, I
17:04 don't know, maybe this was before you joined the team. It just felt like cloud
17:07 code was killing it. Just everyone was sitting on top of cloud code. It was by
17:11 far the best way to code. And then all of a sudden, Codex comes around. I
17:16 remember Carpathy tweeted that he just like has never seen a model like this.
17:20 He I think the tweet was the gnarliest bugs that he runs into that he just
17:23 spends hours trying to figure out. Nothing else has solved. He gives it to
17:27 Codeex, lets it run for an hour, and it solves it. What What did you guys do? We
17:32 have this strong sort of mission here at OpenAI to you know basically to build
17:38 AGI. Um and so we we think a lot about what how can we shape the product so
17:43 that it can scale right you know earlier I was mentioning like hey like if you're
17:45 an engineer you should be getting help from from AI like thousands of times per
17:50 day right and so we thought a lot about the primitives for that when we launched
17:54 our first version of codeex uh which was Codex cloud and that was basically a
17:58 product that had its own computer lives in the cloud you could delegate to it
18:02 and you know the sort of the coolest part about that was you could run many
18:05 many tasks in parallel But some of the challenges that we saw
18:11 are that it's a little bit harder to set that up both in terms of like
18:14 environment configuration like giving the model the tools it needs to validate
18:18 changes and to learn how to prompt in that way. And sort of my my analogy for
18:22 this is going back to this teammate analogy. It's like if you hired a
18:26 teammate but you're never allowed to get on a call with them and you can only go
18:30 back and forth, you know, asynchronously over time. like that works for some
18:33 teammates and eventually that's actually how you want to spend most of your time.
18:36 So that's still the future, but it's hard to initially adopt.
18:40 So we still have that vision of like that's what we're trying to get you to a
18:43 teammate that you delegate to and then is proactive and we're seeing that
18:48 growing. But the key unlock is actually first you need to land with users in a
18:51 way that's like much more intuitive and like trivial to get value from. So the
18:56 way that most people discover like the vast majority of users discover codeex
19:00 today is either they download an IDE extension or they run it in their CLI
19:05 and the agent works there with you on your computer interactively and uh it
19:09 works within a sandbox which is actually like a really cool piece of tech to to
19:13 help that be safe and secure but it has access to all those dependencies. So if
19:17 the agent needs to do something like it needs to run a command it can do so
19:20 within the sandbox. we don't have to set up any environment and if it's a command
19:23 that doesn't work in the sandbox it can just ask you and so you can get into
19:27 this like really strong feedback loop using the model and then over time like
19:31 our team's job is to like help turn that feedback loop into you sort of as a
19:35 byproduct of using the product configuring it so that you can then be
19:39 delegating to it down the line and again analog you keep coming back to it but
19:43 like if you hire a teammate and you ask them to do work but they you just give
19:46 them like a fresh computer from the store it's going to be hard for them to
19:49 do their job right but if as you work with them side by side. You could be
19:52 like, "Oh, you don't have a password for this service we use. Here's the password
19:56 for this service." You know, yeah, don't worry. Feel free to run this command.
19:59 Then it's like much easier for them to then go off and do work for hours
20:03 without you. So, what I'm hearing is the initial version of Codeex was almost too
20:06 far in the future. It's like a remote in the cloud uh agent that's coding for you
20:11 asynchronously. And what you did is okay, let's actually come back a little
20:15 bit. Let's integrate into the way engineers already integrate into IDs and
20:20 locally and help them kind of on ramp to this new world. Totally. And this was it
20:26 was quite interesting because we we dog food product a ton at OpenAI. So you
20:30 know dog food as in we use our own product and so Codex has been
20:34 accelerating OpenAI over the course of the entire year and the cloud product
20:38 was a massive accelerant to the company as well. Um it just turns out that this
20:44 is one of those places where the signal we got from dog fooding is a little bit
20:47 different from the signal you get from like the general market because at
20:50 OpenAI you know we train reasoning models all day and so we're very used to
20:54 this kind of prompting thing and like you know think up front run things
20:59 massively in parallel and uh you know it would take some time and then come back
21:03 to it later asynchronously and so you know now when we build we still get a a
21:06 ton of signal from dog footing internally but uh you know we're also
21:11 very cognizant of like the different ways that different audiences use the
21:14 product. That's really funny. It's like live in the future but maybe not too far
21:17 in the future. And I could see how everyone open AI is living very far in
21:21 the future and sometimes that won't that won't work for everyone.
21:25 >> Yeah. What about just like uh intelligence training data? I don't
21:28 know. Is there something else that helped Codeex accelerate its ability to
21:32 actually code? Is it like better, cleaner data? Is it more just models
21:36 advancing? Is there anything else that really helped accelerate? Yeah. So
21:41 there's like a few components here. Um I guess you know you were mentioning
21:44 models and the models have improved a ton. In fact um just last Wednesday we
21:50 shipped GPD 5.11 CEX Max a very you know accurately named model. Uh that is that
21:56 is awesome. It is awesome both because it is um for any given task that you
22:01 were using GPD 5.1 codecs for it's like you know roughly uh 30% faster at
22:06 accomplishing that task but also it unlocks a ton of intelligence. So if you
22:10 use it at our higher reasoning levels, it's just like even smarter. Um, and you
22:13 know that that feedback that or that tweet you were saying like Karpathi made
22:16 about like, hey, give us your gnarliest bugs like you know obviously there's a
22:20 ton going on in the market right now, but like Codex Max is definitely like
22:24 carrying that mantle of uh, you know, tackling the hardest bugs. Um, so that
22:28 is that is super cool. But I will say it's like some of what how we're
22:32 thinking about this is evolving a little bit from being like yeah we're just
22:35 going to think about the model and like let's just like train the best model to
22:38 really thinking about like what is an agent actually overall right and you
22:43 know I'm not going to try to define agent exactly but at least the stack
22:46 that we think of it as having is it's like you have this model really smart
22:51 reasoning model that knows how to do a specific kind of task really well. So we
22:53 can talk about how we make that possible. But then actually we need to
22:59 serve that model through an API into a harness. And both of those things also
23:03 have a really big role here. So for instance, one of the things uh that
23:07 we're really proud of is you can have GP5.1 CX max work for really long
23:11 periods of time. That's not like normal, but you can set it up to do that or that
23:15 might happen. But now routinely we'll hear about people saying like yeah, it
23:18 ran like overnight or it ran for 24 hours. M >> and so you know for a model to work
23:22 continuously for that amount of time it's going to exceed its context window
23:25 and so we have a solution for that which we call compaction. Um but compaction is
23:30 actually a feature that uses like all three layers of that stack. So you need
23:36 to have a model that has a concept of compaction and knows like okay as I
23:39 start to approach this context window I might be asked to like prepare to be run
23:43 in a new context window. And then at the API layer, you need an API that like
23:47 understands this concept and like has an endpoint that you can hit to do this
23:50 change. And at the harness layer, you need a harness that can like prepare the
23:53 payload for this to be done. And so like shipping this compaction feature that
23:56 now just like made this behavior possible to like anyone using codecs
23:59 actually been working across all three things. And I think that's like
24:03 increasingly going to be true. Another maybe like underappreciated version of
24:08 this is is if you think about all the different coding products out there,
24:10 they all have like very different tool harnesses with like very different
24:14 opinions on how the model should work. And so if you want to train a model to
24:17 be good at like all the different ways uh it could work. Like you know maybe
24:20 you have a strong opinion that it should work using semantic search, right? Maybe
24:24 you have a strong opinion that it should like call bespoke tools or maybe you
24:27 have like in our case a strong opinion that it should just use like the shell
24:32 work in the terminal. You know, you can be much you can move much faster if
24:34 you're just optimizing for one of those worlds, right? And so the way that we
24:38 built codeex is that it just uses the shell. But in order to make that like
24:43 safer and secure, we uh have a sandbox that the model is used to operating in.
24:46 So I think one of the biggest accelerants to go all the way back to
24:49 your to your answer question Russian is just like we're building all three
24:52 things in parallel and like kind of tuning each one and um you know
24:56 constantly experimenting with how those things work with like a tightly
24:59 integrated product and research team. How do you think you win in this space?
25:04 Do you think it it'll event it'll always be this kind of like race with other
25:08 models constantly kind of leaprogging each other? Do you think there's a world
25:11 where someone just t runs away with it and no one else can ever catch up? Is
25:15 there like a path to just we win? >> Again comes back to this idea of like
25:19 building a teammate and not just a teammate that you know uh participates
25:24 in team planning and prioritization. Not just a teammate that you know really
25:27 tests its code and like helps you maintain and deploy. But even a teammate
25:31 you know like if you think again an engineering teammate they can also like
25:34 schedule a calendar invite right or move standup or do whatever right. And so in
25:42 my mind, if we just imagine that every day or every week some like crazy new
25:46 capability is just going to be deployed by a research lab, it's just impossible
25:50 for us like you know as humans to keep up and like use all this technology. And
25:54 so I think we need to get to this world where you kind of just have like an AI
25:59 teammate or super assistant that you just talk to and it just knows how to be
26:04 helpful like on its own, right? And so you don't you don't have to be like
26:07 reading the latest tips for how to use it. You just like you've plugged it in
26:11 and it just provides help. And so that's kind of the shape of what I think we're
26:14 building. And I think that will be like a very sticky like winning product if we
26:18 can do so. So the shape that in my head at least I have is that we build you
26:23 know maybe a fun topic is like is chat the right interface for AI? I actually
26:27 think chat is a very good interface when you don't know what you're supposed to
26:30 use it for. uh in the same way that if I think of like I'm like on a teams or in
26:34 Slack with a teammate, chat is pretty good. I can ask for whatever I want,
26:37 right? It's like it's kind of the the common denominator for everything. So
26:40 you can chat with a super assistant about whatever topic you want, whether
26:45 it be coding or not. And then if you are like a functional expert in a specific
26:49 domain such as coding, there's like a guey that you can pull up to go really
26:54 deep and like look at the code and like work with the code. So I think like what
26:59 we need to build as open AI is basically this idea of like you have chat chatpt
27:02 PT and that is a tool that's like ubiquitously available to like everyone.
27:06 You start using it even like outside of work right to just help you. You become
27:09 very comfortable with the idea of being accelerated with AI. And so then you get
27:13 to work and you just can naturally just yeah I'm just going to ask it for this
27:16 and I don't need to know about all the connectors or like all the different
27:19 features. I'm just going to ask it for help and it'll surface to me the the
27:23 best way that it can help at this point in time and maybe even chime in when I
27:27 didn't ask it for help. Um, so in my mind, if we can get to that, I think
27:30 that's, you know, that's how we we really build like the winning product.
27:34 This is so interesting because with the my chat with Nick Charlie, the head of
27:37 chat JPT, I think he shared that the original name for Chat JPT was super
27:41 assistant or something like that. >> Yeah. >> And it's interesting that there's like
27:46 that approach to the super assistant and then there's this codeex approach. It's
27:49 almost like the B TOC version and the B2B version. And what I'm hearing is the
27:53 idea here is okay, you start with coding and building and then it's doing all
27:56 this other stuff for you, scheduling meetings, I don't know, probably posting
28:01 in Slack, uh I don't know, shipping designs, I don't know. Is that is the
28:04 idea there? This is like the the business version of ChatGpt in a sense.
28:08 Or is there or is there something else there? >> Yeah. So, you know, so we're getting to
28:12 the like the like one-year time horizon conversation. A lot of this might happen
28:16 sooner, but in terms of fuzziness, I think we're at the one year. So I'll
28:19 give you like a contention in like the plausible way we get there, but as for
28:23 how it happens, who knows? So basically, if we're going to build a super
28:26 assistant, it has to be able to do things, right? So like we're going to
28:29 have a model and it's going to be able to do stuff affecting your world.
28:33 >> And one of the learnings I think we've seen over the past year or so is that
28:38 for models to do stuff, they're much more effective when they can use a
28:41 computer, right? Okay. So now we're like, okay, we need the super assistant that can use a
28:47 computer, right? or many computers. And now the question is, okay, well, how
28:50 should it use the computer, right? And there's lots of ways to use a computer.
28:54 Uh, you know, you could try to hack the OS and like use accessibility APIs.
28:57 Maybe a bit easier is you could point and click. That's a little slow, you
29:02 know, and, uh, unpredictable sometimes. Um, and another way, it turns out the
29:06 best way for models to use computers is simply to write code, right? And so
29:09 we're kind of getting to this idea where like, well, if you want to build any
29:12 agent, maybe you should be building a coding agent. And maybe to the user, a
29:17 nontechnical user, they won't even know they're using a coding agent. The same
29:19 way that no one thinks about are they using the internet or not, which is
29:22 they're more just like is Wi-Fi on? Right? So I think that what we're doing
29:27 with codeex is we're building a software engineering teammate. And as part of
29:30 that, we're kind of building an agent that can use uh a computer by writing
29:36 code. And so we're already seeing like some pull for this. It's like quite
29:39 early, but we're starting to see people like who are using codeex for like
29:43 coding adjacent product purposes. And so as that develops, I think we'll
29:47 just naturally see that like, oh, it turns out like we should just always
29:50 have the agent write code if there is a coding way to solve a problem instead
29:53 of, you know, even if you're doing a financial analysis, right? Like maybe
29:56 write some code for that. So basically like, you know, you were like, hey, is
29:59 this like the two ends of of uh of this product for the super assistant, right,
30:03 of CHCH PT? In my mind, like just coding is a core competency of any agent,
30:06 including Chach PT. And so like what really what we think we're building is
30:10 like that competency. But so here's here's like the really cool thing about
30:13 agents writing code is that you can import code right code is like
30:19 composable interoperable right because if if we you know one very reductive
30:23 view we could have for an agent is it's just going to be given a computer and
30:26 it's just going to like point and click and you know go around but you know that
30:32 is the future and then how we get there is difficult to sort of chart a path
30:36 because a lot of the questions around building agents aren't like can the
30:41 agent do it but it's more about well how can we help the agent understand the
30:44 context that it's working in and like the team that's using it you know
30:47 probably has a way that they like to do things they have guidelines they
30:50 probably want certain deterministic guarantees about what the agent can or
30:54 cannot do or they want to know that the agent understands sort of this detail
30:59 like an example would be you know if we're looking at a crash reporting tool
31:04 hitting a connector for it every sub team is probably has a different meta
31:07 prompt for like how they want the crashes to be analyzed ized, right? And
31:12 so we start to get to this thing where like, yeah, we have this agent sitting
31:15 in front of a computer, but we need to make that configurable for the team or
31:19 for the user, right? And let them like stuff that the agent does often, we
31:22 probably just want to like build in as a competency that this agent has that it
31:27 can do. So I think we end up with this generalizable thing that you were saying
31:31 of like an agent that can just write its own scripts for whatever it wants to do.
31:36 But I think that the the really key part here is can we make it so that
31:40 everything that the agent has to do often or that it does well we can just
31:44 like remember and store so that the agent doesn't have to write a script for
31:47 that again. Right. Or maybe like if I just joined a team and you are already
31:51 on the same team as me. I can just like use all those scripts that the agents
31:53 had written already. >> Yeah. It's like if this is our teammate
31:57 uh we can they can share things that it's learned from working with other
32:00 people at the company. Just makes sense as a metaphor. >> Yeah. It feels like you're in the uh
32:05 Karpathy camp of agents today are not that great and mostly slop and maybe in
32:09 the future they'll be awesome. Does that resonate? >> I think so. I think coding agents are
32:14 pretty great. I think >> uh ton of value, >> right? Yep.
32:19 >> And then I think like agents outside of coding, it's still like very early and
32:23 you know, this is just my opinion, but I think they're going to get a whole lot
32:26 better once they can use coding too and like in a composable way.
32:29 This is it's kind of the fun part of like when you're building for software
32:33 engineers. Like I at my startup we were building for software engineers too for
32:36 a lot of that journey and they're just such a fun audience to build for because
32:41 you know they also like building for themselves and are often like even more
32:45 creative than we are and thinking about how to use the technology. Um and so
32:48 like by building for software engineers you get to just observe a ton of
32:52 emergent behaviors and like things that you should do and build into the
32:55 product. I love how you you say that because a lot of people building for
32:57 engineers get really annoyed because the engineers are so they're just always
33:00 complaining about stuff. They're like, "Ah, that sucks. Why'd you build it this
33:04 way?" I love that you enjoy it, but I think it's probably because you're
33:06 building such an amazing tool for engineers that can actually solve
33:11 problems and just, you know, code for them. Um, kind of along those lines, you
33:15 know, there's always this talk of what will happen with jobs, engineers,
33:18 coding, do you have to learn coding, all these things? Uh clearly the way you're
33:21 describing it is it's a teammate. It's going to work with you, make you more
33:24 superhuman. It's not going to replace you with the way you just think about
33:28 the impact on the field of engineering having this super intelligent
33:33 engineering teammate. I think there's there's two sides to it, but the one we
33:37 were just talking about is this idea that maybe every agent should actually
33:43 use code and be a coding agent. And in my mind, that's just like a small part
33:46 of this like broader idea that like, hey, as we make code even more
33:48 ubiquitous, I mean, you could probably claim it's ubiquitous today, even pre
33:51 AAI, right? But as we make code even more ubiquitous, it's actually just
33:56 going to be used for many more purposes. And so there's just going to be a ton
33:59 more need for people with this like humans with this competency. So that's
34:05 my view. I think this is like quite a complex topic. So, you know, it's
34:08 something we talk about a lot and we have to kind of see how it pans out. But
34:12 I think what we can do what we can do basically as a product team building in
34:15 the space is just try to always think about how are we building a tool so that
34:18 it feels like we're like maximally accelerating uh people you know rather
34:24 than building a tool that makes it like more unclear what you should do as the
34:29 human right like I think like to to you know give an example right now like
34:33 nowadays when you work with a coding agent um it writes a ton of code but it
34:36 turns out writing code is actually one of the most fun parts of software
34:40 engineering for many software engineers. is so then you end up reviewing AI code,
34:45 right? And that's often a less fun part of the job for many software engineers,
34:49 right? And so I actually think like we see that like this this comes out plays
34:53 out all the time in like a ton of micro decisions. And so we as a product team
34:55 are always thinking about like okay, how do we make this more fun? How do we make
34:58 you feel more empowered whereas it's not working and I I would argue that like
35:01 reviewing agent written code is like a place that today is like less fun. And
35:06 so you know then I think okay what can we do about that? Well, we can ship a
35:09 code review feature that like helps you build confidence in the Irw written
35:12 code. Okay, cool. You know, another thing we can do is we can make it so
35:14 that the agent's like better able to validate its work. And you know, it gets
35:18 all the way down into like micro decisions like if you're going to have
35:23 the an agent capability to validate work and let's say you have like I'm thinking
35:27 of Codex web right now like you have a a pane that sort of reflects the work the
35:30 agent did. What do you see first? Do you see the diff or do you see the image
35:34 preview of the code it wrote? Right? And you know, I think if you're thinking
35:36 about this from perspective like how do I empower the human? How do I make them
35:40 feel like as as accelerated as possible like you obviously see the image first,
35:43 right? You shouldn't be reviewing the code unless first you know you've seen
35:46 the image unless maybe it's being like reviewed by an AI and now it's time for
35:49 you to take a look. When I had uh Michael Charel, the CEO of Cursor on the
35:53 podcast, he he had this kind of vision of us moving to something beyond code.
35:58 And I've seen this rise of something called specd driven development where
36:02 you kind of just write the spec and then the code, you know, the AI writes code
36:05 for you. And so you kind of start working at this higher abstraction
36:09 level. Is that something you see where we're going? Just like engineers not
36:12 having to actually write code or look at code and there's going to be this higher
36:16 level of abstraction that we focus on. Yeah, I mean I think I think there's
36:19 like constantly these levels of abstraction and they're actually already
36:23 played out today, right? Like today like coding agents mostly it's like prompt to
36:29 patch right we're starting to see people doing like spec driven development or
36:32 like planned driven development that's actually one of the ways when people ask
36:35 like hey how do you run codex on a really long task well it's like often
36:38 collaborate with it first to write like a plan MD like a markdown file that's
36:42 your plan and once you're happy with that then you ask it to go off and do
36:46 work and if that plan has verifiable steps it'll like work for much longer.
36:51 Um so we're totally seeing that. I think spec driven development is like an
36:55 interesting idea. It's not clear to me that it'll work out that way because a
36:57 lot of people don't write like don't like writing specs either, but it seems
37:02 plausible that some some people will work that way. You know, like a a bit of
37:06 a joke idea though is like if you think of like um the way that many teams work
37:11 today, they're they often like don't necessarily have specs, but the team is
37:14 just really self-driven and so stuff just gets done. And so almost that is
37:17 like I'm coming up with this on the spot so it's you know not a good name but
37:21 like chatterdriven development where it's just like stuff is happening you
37:24 know on social media and like in your team communications tools and then as a
37:28 result like code gets written and deployed right so yeah I think I'm a
37:33 little bit more oriented in that way of you know I don't even necessarily want
37:37 to have to write a spec like sometimes I want to only if I like writing specs
37:42 right uh other times I might just want to say like hey here's like the
37:45 customer, you know, service channel and like tell me what's interesting to know,
37:49 but if it's a small bug, just fix it. I don't want to have to write a spec for
37:51 that, right? >> I have this sort of uh hypothetical future uh that I like to
37:58 share sometimes with people as a provocation, which is like in a world
38:01 where we have like truly amazing agents, like what does it look like to be a
38:04 soloreneur? Um, and uh, you know, one terrible idea for how it could look is that it's
38:12 actually there's a mobile app and um, every idea that the agent has to do is
38:17 just like vertical video on your phone and then you can like swipe left if you
38:21 think it's a bad idea and you can like swipe right if it's a good idea and like
38:24 you can press and hold and like speak to your phone if you want to get feedback
38:28 on the idea before you swipe, you know. So in this world like basically what
38:31 your job is just to like plug in this app into like every single like signal
38:36 system you know system of record and then you just sort of sit back and like
38:39 swipe. I don't know. >> I love this. So this is like Tinder
38:42 meets Tik Tok meets codeex. >> It's pretty terrible. >> No, this is great. So the idea here is
38:47 this thing is this agent is watching and right listening to you paying attention
38:51 to the market your users and it's like cool here's something I should do. It's
38:54 like a proactive engineer just like here we should build this feature fix this
38:56 thing. >> Exactly. I think they're communicating with you in like the lowest like the
39:05 gyms like the modern way to communicate. >> Yeah. >> Swipe left or right and in vertical feed
39:10 and then the Sora video. Okay. So I see how this all connects now. I see.
39:13 >> Yeah. To be clear, we're not building that but like you know it's a fun idea.
39:17 I mean you see you know like in this example though like one of the things
39:19 that it's doing is it's consuming external signals right. I think the
39:23 other really interesting thing is like if we think about like what is the most
39:28 successful like AI product to date um I would argue um it's funny actually
39:34 not to confuse things at all but like the first time we used the the brand
39:38 codeex at OpenAI was actually the model powering GitHub copilot. This is like
39:42 way back in the day, years ago. And so we decided to reuse that that brand
39:45 recently um because it's just so good, you know, codeex code execution. But I
39:50 think actually like autocomp completion and IDEs is like one of the most
39:54 successful AI products to date. And part of what's so magical about it is that
40:01 when the it can surface like ideas for helping you really rapidly. When it's
40:05 right, you're accelerated. When it's wrong, it's not like that annoying. It
40:08 can be annoying, but it's not that annoying, right? And so you can create
40:12 this like mixed initiative system that's like contextually responding to like
40:17 what you're attempting to do. And so in my mind, this is like a really
40:21 interesting thing for us as open as we're building. So for instance, you
40:25 know, when I think about launching a browser, which we did with Atlas, right?
40:29 Like in my mind, one of the really interesting things we can then do is we
40:33 can then like contextually surface like ways that we can help you as you're
40:37 going about your day, right? And so we break out of this like, you know, we're
40:41 just looking at code or we're just in your terminal um into this idea that
40:44 like, hey, like a real teammate is dealing with a lot more than just code,
40:47 right? They're dealing with a lot of things that are web content. So like,
40:51 you know, how can we help you with that? >> Man, there's so much there and I love
40:55 this. Okay, so autocomplete on web with the browser. That's so interesting. just
40:58 like here's all the things that we can help you with as you're browsing and
41:01 going about your day. I want to talk about Atlas. I'll come back to that. Uh
41:05 codeex code execution. Did not know that. That's really clever. I I get it
41:10 now. Okay. And then this chatter, what is a chatter driven development? Uh I
41:14 had a No, this is a really good idea, but it reminds me I had John Gon on the
41:19 podcast, CTO of Block, and they they have this product called Goose, which is
41:24 their own internal agent thing. And he talked about an engineer at block just
41:30 uh has goose watch him with like his screen and listens to every meeting and
41:36 proactively does work that he should will probably want to do. So ships a PR
41:41 sends an email drafts a Slack message. So he's doing exactly what you're
41:44 describing in in kind of a very early way. >> Yeah, that's super interesting. And you
41:49 know, I bet you the So, if we go if we went and asked them what the bottleneck
41:52 to that productivity is, did did they share what it is? >> Uh, probably looking at it just making
41:57 sure this is the right the right thing to do. Yeah. >> Yeah. So, like we see this now like we
42:01 have a Slack integration for Codex. People love, you know, if there's like
42:04 some thing that you need to do quickly. People just like at mentioned Codex like
42:07 why do you think this bug is happening? Right. Doesn't have to be an engineer.
42:10 Even like maybe you know data scientists often here are using Codex a ton to just
42:14 like answer questions like why do you think this metric moved? What happened?
42:18 So questions you you get the answer right back in Slack. It's amazing, super
42:22 useful. But when it's as for when it's writing code, then you have to go back
42:27 and look at the code, right? And so the real like I think bottleneck right now
42:30 is like validating that the code worked and like writing code review.
42:34 So in my mind, if we wanted to get to something like uh you know that uh a
42:38 friend you were talking about world, I think we we really need to figure out
42:42 how to get people to configure their coding agents to be much more autonomous
42:46 on those later stages of the work. It makes sense like you said writing code.
42:49 I used to be an engineer as an engineer for 10 years. Really fun to write code.
42:53 Really fun to just get in the flow, build, architect, test. Not so fun to
42:56 look at everyone else's code and just have to go through and be on the hook if
43:00 it is doing something dumb that's going to take down production. And now that
43:03 building has become easier, what I've always heard from companies that are
43:06 really at the cutting edge of this is the bottleneck is now like figuring out
43:09 what to build and then it's at the end of like, okay, we have all this all 100
43:13 hours to review. Who's going to go through all that? >> Right. Yeah.
43:19 This episode is brought to you by Jira product discovery. The hardest part of
43:22 building products isn't actually building products. It's everything else.
43:26 It's proving that the work matters, managing stakeholders, trying to plan
43:30 ahead. Most teams spend more time reacting than learning, chasing updates,
43:34 justifying road maps, and constantly unblocking work to keep things moving.
43:39 Jira product discovery puts you back in control. With Jira product discovery,
43:43 you can capture insights and prioritize high impact ideas. It's flexible, so it
43:47 adapts to the way your team works and helps you build a road map that drives
43:51 alignment, not questions. And because it's built on Jira, you can track ideas
43:56 from strategy to delivery, all in one place. Less chasing, more time to think,
44:01 learn, and build the right thing. Get Jirroduct Discovery for free at
44:06 atlassian.com/lenny. That's atassian.com/lenny. What has the impact of Codex been on the
44:13 way you operate as a product person, as a PM? It's clear how engineering is
44:19 impacted. Uh, code is written for you. What has it done to the way you operate,
44:24 the way PMs operate at at OpenAI? Yeah, I mean I think mostly I just feel like
44:28 much more empowered. Um I've always been sort of more technical leaning PM and especially when
44:34 I'm working on products for engineers, I feel like it's necessary to like you
44:37 know dog food the product but even beyond that I I I just feel like I can
44:42 do much much more as a PM. And uh you know Scott Beltski talks about this idea
44:45 of like compressing the talent stack. I'm not sure if I've phrased that right,
44:48 but it's basically this idea that like maybe the boundaries between these roles
44:52 are a little bit like less needed than before because people can just do much
44:57 more and every time you someone can do more you can like skip one communication
45:00 boundary and make the team like that much more efficient, right? So I think I
45:07 think we see it you know in a bunch of functions now but I guess since you
45:11 asked about like product specifically uh you know now like answering questions
45:15 much much easier you can know just ask codeex for thoughts on that uh a lot of
45:20 like PM type work understanding what's changing again just ask codeex for help
45:25 with that um prototyping is often faster than writing specs this is something
45:29 that a lot of people have talked about I think something that I don't think it's
45:33 super surprising But something that's slightly surprising is like we see like
45:36 we're mostly building codecs for to write code that's going to be deployed
45:40 to production but actually we see a lot of throwaway code written with codeex
45:43 now. It's kind of going back to this idea of like you know ubiquitous code.
45:48 So you'll see uh you know someone wants to do an analysis like if I want to
45:51 understand something it's like okay just give codeex a bunch of data but then ask
45:54 it to build like an interactive like data viewer for this data right you
45:56 would that's just like too annoying to do in the past but now it's just like
46:00 totally worth the time of just getting an agent to go do something. Um,
46:04 similarly, I've seen like some pretty cool prototypes on our design team about
46:09 like if you want to well like a designer basically wanted to build an animation
46:13 and this is the coin animation in codeex and it was like normally it'd be too
46:17 annoying to program this animation. So they just vibe coded a animation editor
46:21 and then they use the animation editor to build the animation which they then
46:25 checked into the repo. Actually, our designers are there's a ton of
46:28 acceleration there. And like speaking of compressing the town stack, I think our
46:31 designers are very PM. So, you know, they they do ton of product work. And like they actually
46:38 have like an entire like vibecoded sort of side prototype of the Codex app. And
46:41 so, a lot of how we talk about things is like we'll have like a really quick jam
46:44 because there's like 10,000 things going on. And then designer will like go think
46:48 about how this should work, but instead of like talking about it again, they'll
46:50 just like vibe code a prototype of that in their like standalone prototype.
46:54 We'll play with it. If we like it, they'll vibe code that prototype into or
46:59 vibe engineer that prototype into an actual PR to land. And then depending on
47:02 their comfort with the codebase, like codeex CLI and Rust is a little harder.
47:06 Maybe they'll like land it themselves or they'll like get close and then an
47:09 engineer can help them like land the PR. Um, you know, we recently shipped the
47:15 Sora Android app. Um and uh that was one of the most sort of mind-blowing
47:19 examples of acceleration actually because usage of of codeex internally at
47:24 open is obviously really really high but it's been growing uh over the course of
47:28 the year both in terms of like now it's basically like all technical staff use
47:32 it uh but even like the intensity and knowhow of how to make the most of
47:35 coding agents has gone up by a ton and so the Sora Android app right like a
47:42 fully new app we built it in 18 days it went from like zero to launch to
47:46 employees and then 10 days later so 28 days total we went to just like GA to
47:51 the public and that was done just like with the help of Codex
47:56 so pretty insane velocity I would say it was like a little bit I don't want to
48:01 say easy mode but there is one thing that Codex is really good at if you're a
48:04 company that's like building software on multiple platforms so you've already
48:07 figured out like some of the underlying like APIs or systems asking codeex to
48:13 like to port things over is really effective because it has like something
48:15 you can go look at. And so the engineers on that team uh were basically having
48:20 codeex go look at the iOS app, produce plans of work that needed to be done and
48:23 then go implement those. And it was kind of looking at iOS and Android at the
48:27 same time. And so you know basically it was like two weeks to launch to
48:30 employees four weeks total. Insanely fast. >> What makes that even more insane is it
48:35 was the it became the number one app in the app store. >> I don't this just boggles the mind.
48:39 Okay. So >> yeah. So imagine releasing number one app on the app store with like a handful
48:45 of engineers >> uh I think it was like >> two or three possibly
48:53 >> uh in a handful of weeks. Yeah, this is absurd. So >> yeah, so that's a really fun um example
49:01 of uh acceleration. And then like Atlas was the other one that I think um Ben
49:06 did a podcast the the the engine on Atlas uh sharing a little bit of how we
49:12 built there. You know many Atlas is is actually I mean it's it's a browser
49:15 right and building a browser is really hard. Um and so we uh had to build a lot
49:23 of difficult systems in order to do that and basically we got to the point where
49:27 that team has a ton of power users of codecs right now. And um you know it got
49:32 to the point where they they basically were you know we were talking to them
49:34 about it because a lot of those engineers are people I used to work with
49:38 before at my startup and so they'd say you know before this would have taken us
49:42 like two to three weeks for two to three engineers and now it's like one engineer
49:48 one week. Um so massive acceleration there as well. And what's quite cool is
49:52 that uh you know we we shipped Atlas on on Mac first but now we're working on
49:56 the Windows version. you know that so the team now is like ramping up on
49:58 Windows and they're helping us make codecs better on Windows 2 which is
50:02 admittedly earlier like just the model we we shipped last week is the first
50:06 model that natively understands PowerShell. So you know PowerShell being
50:11 uh the native like shell language on Windows. So yeah, it's been it's been
50:16 really awesome to see like the whole company getting accelerated by codeex
50:21 like from and you know most obviously also research and like improving how
50:24 quickly we train models and how well we do it and then even like uh design as we
50:28 talked about and and marketing like actually we're at this point now where
50:32 uh my product marketer is often also making string changes just directly from
50:36 Slack or like updating docs directly from Slack. >> These are amazing examples. You guys are
50:42 living at the bleeding edge of what is possible and this is how other companies
50:46 are going to work. Uh just shipping again what became the number one app in
50:49 the app store and just beloved all over the it just like took over the I don't
50:54 know the world for at least a week. Uh built you said in 28 days and like I
50:58 don't know 10 days 18 days just to get like the core of it working.
51:02 >> Yeah. So like 18 days we had a thing that employees were playing with and
51:05 then 10 days later we were out. >> And you said just a couple engineers.
51:07 >> Yeah. >> Two or three. Okay. And then Atlas you
51:11 said was took a week to build. >> No, no, no. So Atlas, not the whole
51:16 week, but Atlas was like a really meaty project. >> Yeah.
51:18 >> Um and so I was talking to one of the engineers on Atlas um about like you
51:23 know just how what they use codex for and it's basically like we use codex for
51:25 absolutely everything. I was like okay well like you know how would you how
51:29 would you measure the acceleration? And so basically the the answer I got back
51:31 was >> previously it would have taken two to three weeks for two to three engineers
51:36 and now it's like one engineer one week. Do you think this eventually moves to
51:39 non-engineers doing this sort of thing? Like does it have to be an engineer
51:42 building this thing? Could sort of have built been built by I don't know a PM or
51:46 designer. I think we will very much get to the point where well basically where
51:50 the boundaries are a little bit blurred, right? Like I think you're going to want
51:54 someone who's like understands the details of what they're building, but
51:58 what details those are will evolve. Kind of like how now like if you're writing
52:02 Swift, you don't have to speak assembly. You know, there's a handful of people in
52:05 the world and it's really important that they exist. and like speak assembly. Uh
52:09 maybe more than a handful, right? But that's like a specialized function that
52:14 like most companies don't need to have. So I think we're just going to naturally
52:17 see like an increase in layers of abstraction. And then the cool thing is
52:21 now we're entering like the language layer of abstraction like natural
52:25 language. And then natural language itself is really flexible, right? Like
52:29 you could have engineers talking about like a plan and then you could have
52:32 engineers talking about a spec and then you could have engineers talking about
52:35 just, you know, a product or an idea. So I think we can also like start moving up
52:39 those layers of of abstraction as well. But you know I I do think this is going
52:43 to be gradual. I don't think it's going to go to like all of a sudden like
52:46 nobody ever writes anything and like you know any code and it's just specs. I
52:49 think it's going to be much more like okay we've set up our coding agent to be
52:53 really good at like previewing the build or like at running tests. Maybe that's
52:56 the first part right that most people have set up. And it's like okay now
52:59 we've set it up so that it can like execute the build and it can like see
53:03 the results of its own changes but you know we haven't yet built a good
53:06 integration harness so that it can like in the case of Atlas like by the way I
53:08 don't know if they've done any of this or not I think they've done a lot of
53:11 this but you know maybe the next stage is like enable it to like load a few
53:16 sample pages to see how well those work right so then okay now we're going to
53:19 like set up set up do that and I think for some time at least we're going to
53:22 have humans kind of curating like which of these connectors or systems or
53:26 components that the agent needs to be good at talking to and then you know in
53:30 the future there will be an even greater unlock where Codex tells you how to set
53:34 it up or maybe sets itself up in a repo. What a wild time to be alive. Wow. I'm
53:38 curious just the second order effects of this sort of thing. Just how quickly it
53:42 is to build stuff. What does that do? Does that mean distribution becomes much
53:46 much more important? Does it mean uh ideas are just worth a lot more? It's
53:50 interesting to think about how quick how that changes. >> I'm curious what you think. I still
53:56 don't think ideas are worth as much as maybe some a lot of people think. I
53:59 think still think execution is really hard, right? Like you can build
54:01 something fast, but you still need to execute well on it. Still needs to make
54:06 sense and be a coherent thing overall. Um Yeah. And distribution is massive.
54:10 >> Yeah. Just feels like everything else is now more important. Everything that
54:13 isn't the building piece, which is >> coming up with an idea, getting to
54:17 market, profit, >> all that kind of stuff. I I think we
54:21 might have been in this weird temporary phase where you know for a while like
54:26 you could you could just it was so hard to build product that you mostly just
54:31 had to be really good at building product and it maybe didn't matter if
54:34 you like had an intimate understanding of a specific customer.
54:39 Um, but now I think we're getting to this point where actually like if I
54:42 could only choose like one thing to understand, it would be like really
54:46 meaningful understanding of like the problems that a certain customer has,
54:49 right? If I could only if I could only go in with one like core competency. So
54:54 I think that that's that's ultimately still what's going to matter most,
54:57 right? Like if you're starting a new company today and you have like a really
55:02 good understanding and like network of customers that are currently underserved
55:05 by AI tools, I think you're like you're set, right? Whereas if you're like good
55:09 [clears throat] at building like you know websites, but you don't have any
55:12 specific customer to build for, I think you're in for a much harder time.
55:17 Bullish on vertical AI startups is what I'm hearing. Yeah, I completely agree.
55:20 There's like, you know, there's like the general thing that can solve a lot of
55:23 problems and then there's like we're going to solve presentations incredibly
55:25 well and we're going to understand the presentation problem uh better than
55:30 anyone and we're going to uh plug into your workflows and all these other
55:33 things that matter for a very specific problem. Okay. Incredible. When you
55:39 think about progress on codecs, I imagine you have a bunch of evals and
55:42 there's all these public benchmarks. What's something you look at to tell
55:45 you, okay, we're making really good progress. I imagine it's not going to be
55:48 the one thing, but what do you focus on? What's like something you're trying to
55:51 push? What's like a KPI or two? One of the things that I'm constantly reminding
55:56 myself of is that a tool like Codex sort of naturally is a tool that you would,
56:00 you know, become a power user of, right? And so we can accidentally spend a lot
56:03 of our time thinking about features that are like very deep in the user adoption
56:08 journey. Um, and so we can kind of end up oversolving for that. And so I think
56:12 it's like just critically important to like go look at like your like D7
56:16 retention, right? just go try the product. Like sign up from scratch
56:19 again. Um I have a few too many like catchup pro accounts that I've just like
56:24 in order to maximally correctly dog food like signed up for on my Gmail and they
56:27 charge me like 200 bucks a month. I need to expense those. But uh uh you know
56:33 like I think just like the feeling of being a user and the early retention
56:37 stats are still like super important for us because you know as much as this
56:41 category is is taking off I think we're still in the very early days of like
56:45 people using them. Um, another thing that we do that that might might be I
56:51 think we might be the most like user feedback slashsocial media pill team out
56:56 there in this space is like a few of us are like constantly on Reddit and
57:01 Twitter and uh you know there's a there's praise up there and there's a
57:04 lot of complaints but we take the complaints like very seriously and look
57:08 at them and I think that again because you can use like coding agent for so
57:12 many different things um it often is like kind of broken in any sort of ways
57:17 for like specific behaviors. Um, and so we we actually monitor a lot just like
57:20 what the vibes are on social media pretty often, especially I think for for
57:27 Twitter X, um, it's a little bit more hypy and then Reddit is a little more
57:34 negative but real actually. Um, so I've started increasingly paying attention to
57:37 like how people are talking about using Codex on Reddit. Actually,
57:41 >> this is uh important for people to know. Which the subreddits do you check most?
57:44 Is there like an R codeex or >> I mean the algorithm is pretty good at
57:48 surfacing stuff but like r/codex is is there >> okay I'll take very interesting and then
57:52 uh if people tag you on Twitter you still see that but maybe not as powerful
57:56 as seeing it on Reddit. >> Well yeah the interesting well the thing
57:58 with Twitter is it's a little bit more onetoone even if it's like in public
58:01 whereas like with Reddit there's like really good upvoting mechanics and like
58:05 maybe most people are still not bots unclear. Um so you get you get like good
58:09 signal on what matters and what other people think. So uh interestingly uh
58:13 Atlas I want to talk about that briefly. Uh you guys launched Atlas. I tweeted
58:18 actually that I tried Atlas and then I I don't love the AI only uh search
58:23 experience. I was just like I just want Google sometimes or whatever like just
58:26 waiting for AI to give me an answer. I'm like I don't want to and there was no
58:29 way to switch. I just tweeted hey I'm I'm switching back. I don't it's not
58:32 great. And I feel like I made some PMs at OpenAI sad and I saw someone tweet
58:37 okay we have this now which I imagine was always part of the plan. It's
58:40 probably an example of we just ship we got to ship stuff, see how people use it
58:43 and then we figure it out. Uh so I guess one is that I don't know is there
58:46 anything there and two I'm just curious why are you guys building a web browser?
58:51 So I I worked on Atlas for a bit. Um I don't work on it now. Um but you know
58:55 like the a bit of the narrative here for for me just to tell my story a bit was
58:58 like I was working on this like screen sharing like pair programming startup
59:03 right and then we joined open AI and so the idea was really to build a
59:07 contextual desktop assistant and the reason I believe that's so important is
59:11 because I think that it's really annoying to have to give all your
59:14 context to an assistant and then to figure out how it can help you right and
59:18 so if it could just like understand what you are trying to do then it could
59:23 maximally accelerate do um and so I I I would you know I still think of Codex
59:26 actually as like a contextual assistant um from a little bit of a different
59:30 angle like starting with coding tasks but um the some of the some of the
59:36 thinking at least for me personally I can't speak for the whole project but
59:40 was that a lot of work is done in the web and if we could build a browser then
59:45 we could be contextual for you but in a much more first class way we weren't
59:48 hacking like other desktop software which have like very varied report for
59:53 for like what content they're rendering to the accessibility tree. Uh we
59:56 wouldn't be relying on screenshots which are a little bit slower and unreliable.
60:00 Instead, we we could like be in the rendering engine, right? And like
60:03 extract whatever we needed to to help you. Um and also I like to think of like
60:09 you know video games like I don't know if you've played like I don't know say
60:13 Halo right like you walk up to an object. I mean this true for many games
60:16 you press man it's been a long time this is embarrassing. press X and it just
60:21 does the right thing, right? And I was one of those guys who always read the
60:23 instruction manual for every video game that I bought. And I remember the first
60:26 time I read about a contextual action and I just thought it was like this
60:31 really cool idea. And uh you know the the thing about a contextual action is
60:34 we need to know what you are attempting to do. We need to have a little bit of
60:37 context and then we can and then we can help. Uh, and I think this is critically
60:43 important because you know, imagine this world that we reach, right, where we're
60:45 we have agents that are helping you thousands of times per day. Um, imagine
60:50 if the only way we could tell you that we helped you is if we could like push
60:55 notify you. So, you get a thousand push notifications a day of an AI saying
60:59 like, "Hey, I did this thing. Do you like it?" It'd be super annoying, right?
61:03 Whereas imagine going back to software engineering like I was looking at a
61:07 dashboard and I noticed some like key metric had like gone down
61:12 and you know at that point in time an II could like maybe go take a look and then
61:15 surface the fact that it has an opinion on why this metric went down and maybe a
61:19 fix right there right when I'm looking at the dashboard right that would be
61:22 like that would much more keep me in flow and enable the agent to take action
61:27 on like many more things so in my mind like part of why I'm excited for us to
61:32 have a browser is that I think we have then like much more context around like
61:37 what we should help with. Users have much more control over what they want us
61:40 to look at. It's like hey if you want to open if you want us to like take action
61:43 on something you can open it in your AI browser. If you don't then you can open
61:46 it in your other browser right? So like really clear control and boundaries and
61:51 then we have the ability to build UX that's like mixed initiative so that we
61:54 can surface contextual actions to you like at the times they're helpful as
1:01:58 Non-engineering use cases for Codex
61:58 opposed to just like randomly notifying you. hearing the vision for Codeex being
62:01 the super assistant. It's not just there to code for you. It's trying to do a lot
62:05 for you as a teammate, as this kind of super teammate that makes you awesome at
62:10 work. So, I get this. Speaking of that, are there other non-engineering
62:15 common use cases for codecs? Just ways that non-engineers, we talked about it,
62:18 you know, designers prototyping and building stuff. Are there any, I don't
62:22 know, fun or unexpected ways people are using codecs that aren't engineers? I
62:25 mean there's a load of a load of unexpected ways but I think like most of
62:31 where we're seeing like real traction with people using things are still for
62:35 now like very like I would say coding adjacent or like sort of tech oriented
62:39 places where there's like a mature ecosystem um or you know maybe you're
62:43 doing data an data analysis or something like that. I personally am expecting
62:47 that we're going to see a lot more of that over time. Um, but for now like
62:51 we're keeping the team like very focused on just coding for now because there's
1:02:53 Codex’s capabilities
62:54 so much more work to do. >> For people that are thinking about
62:58 trying out codecs, is there like um does it work for all kinds of code bases?
63:02 What what code does it support? If you're like I don't know SAP, can you
63:06 add codec and start building things? What's kind of like the sweet spot or
63:11 does it start to not be amazing yet? This I'm really glad you asked this
63:14 question actually because the best way to try codeex is to give it your hardest
63:19 tasks which is a little different than some of the other coding agents like you
63:23 know some tools you might think okay let me like start easy or just like you know
63:27 like vibe code something random and decide if I like the tool whereas like
63:32 we're really building codeex to be the like professional tool that you can give
63:36 your like hardest problems to um and you know that writes like high quality code
63:40 in your like enormous code base that is in fact not perfect right now. So yeah,
63:43 I think if you're going to try codeex, you want to try it on like a real task
63:48 that you have and not necessarily like dumb that task down to something that's
63:53 like trivial, but actually like you know like a good one would be like you have a
63:55 hard bug and you don't know what what's causing that bug and you ask Codex to
63:59 like help figure that out or like to implement that, you know, the fix.
64:02 >> I love that answer. Just give it your hardest problem. I will say like you
64:05 know if you if you're like hey okay well the hardest problem I have is that I
64:08 need to build like a new unicorn business like obviously that you know
64:13 it's not going to work not yet. So I think it's like give it like the hardest
64:18 problem but something that is still like one like question right or one task um
64:23 to start that's if you're testing and then over time you can learn how to use
64:25 it for like bigger things. >> Yeah. What languages does does it
64:28 support? Basically the way we've trained codeex is like there's a distribution of
64:32 languages that we support and it's like fairly aligned with like the frequency
64:36 of these languages in the world. So unless you're writing some like very
64:39 esoteric language or like some private language, it should do fine in your
64:42 language. If someone was just getting started, is there a tip you could share
64:46 to help them be successful? Like if you could just whisper a little tip into
1:04:49 Tips for getting started with Codex
64:49 someone just setting up Codex for the first time to help them have a really
64:53 good time, what's something you would whisper? >> I might say try a few things in
64:57 parallel, right? Right? So you could try giving it a hard task. Um maybe ask it
65:03 to understand the codebase. Uh formulate a plan with it around an idea that you
65:07 have and kind of build your way up from there. And like sort of the meta idea
65:11 here is it's again it's like you're building trust with the new teammate,
65:15 right? And so like you wouldn't go to a new teammate and just give them like hey
65:18 do this thing here's zero context. you would start by like first making sure
65:22 they understand the codebase and then you would like maybe align on a an
65:24 approach and then you would have them go off and do bit by bit right and I think
65:28 if you use codeex in that way you'll just sort of naturally start to
65:30 understand like the different ways of prompting it because it is it's a super
65:35 powerful like agent and model but it is it is a little bit different to prompt
1:05:37 Skills to lean into in the AI age
65:38 codeex and other models just a couple more questions one we touch on this a
65:44 little bit as AI does more and more coding there's always this question of
65:48 should I learn to code why should they spend time doing this sort of thing. For
65:52 people that are trying to figure out what to do with their career, especially
65:55 if they're into software engineering, computer science, do you think there's
65:59 specific elements of computer science that are mo more and more important to
66:03 lean into maybe things they don't need to worry about? Like what do you think
66:06 people should be leaning into skill-wise in as this becomes more and more of a
66:11 thing in our workplace? I think there's like a couple angles you could go at
66:18 this from. Um, I think the, well, the easiest one to think of at
66:24 least is just like be a doer of things. Um, I think that, you know, with coding
66:28 agents, um, getting better and better over time. It's just what you can do as
66:33 even like someone in college or a new grad is just like so much more than what
66:37 that was before. And so, I think you just want to be taking advantage of
66:40 that. You know, definitely when I'm looking at like hiring folks who are
66:43 earlier career, it's like definitely something that I think about is how how
66:47 productive are they using the latest tools, right? They should be like super
66:51 productive. And if you think of it in that way, they actually have like less
66:55 of a handicap than before versus a more senior career person because, you know,
66:59 the divide is actually getting smaller because they've got these amazing coding
67:02 agents now. Um, so that's one thing which is like I guess the thing the
67:05 advice is just like learn about whatever you want but just make sure you spend
67:08 time doing things not just like fulfilling homework assignments. I guess
67:12 I think the other side of it though is that it's still deeply worth
67:17 understanding like what makes a good like overall software system. So I still
67:22 think that like skills like really strong systems engineering skills or
67:27 even like really effective like communication and collaboration with
67:31 your team, skills like that I think are are important are going to continue to
67:35 matter for for quite some time. Like I don't think it's going to be like all of
67:39 a sudden uh the AI coding agents are just able to build like perfect systems
67:43 without your help. I think it's going to look much more gradual where it's like
67:48 okay we have these AI coding agents they're able to validate their work it's
67:52 still important and like for example like I'm thinking of an engineer who was
67:55 working on Atlas since we were talking about it he set up codeex so it can like
67:59 verify its own work which is a little bit non-trivial because of the nature of
68:02 the Atlas project. So the way that he did that was he actually prompted codeex
68:05 like hey why can't you verify your work fix it and like did that on a loop right
68:11 and so you still like at various phases are going to want a human in the loop to
68:15 like help configure the coding agent to be effective and so I think like you
68:19 still want to be able to reason about that so maybe it's like less important
68:23 that you can like type really fast and like you understand exactly how to write
68:27 not that anyone writes a you know for each loop or something right but it is
68:31 or you know you don't need to know how implement like a specific algorithm. But
68:33 I think you need to be able to reason about the different systems and like
68:36 what makes like effective a software engineering team effective. So I think
68:40 that's the other really important thing. And then like maybe the last angle that
68:44 you could take is I think if you're on the frontier of knowledge for a given
68:49 thing, I still think that's like deeply interesting to go down partially because
68:54 that knowledge is still going to be like uh you know agents aren't going to be as
68:58 good at that. But also partially because I think that like by trying to advance
69:01 the frontier of a specific thing, you'll actually like end up like being forced
69:05 to take advantage of coding agents and like using them to accelerate your own
69:09 workflow as you go. >> What's an example that when you when you
69:12 talk about being at the frontier? So >> Codex writes a lot of the code that
69:15 helps like manage its training runs, the key infrastructure. Uh you know, we move
69:21 pretty fast and so we have a Codex code review is like catching a lot of
69:23 mistakes. It's actually caught some like pretty interesting configuration
69:27 mistakes and uh you know we're starting to see glimpses of the future where
69:31 we're actually starting to have codeex even like be on call for its own
69:36 training which is pretty interesting. Um so there's lots there.
69:39 >> Uh wait what does that mean to be on call for its own training? So it's
69:42 running it's training and it's like oh something broke someone needs and it it
69:45 does it like alert people or it's like here I'm going to fix the problem and re
69:48 restart. This is an early idea that we're like figuring out, but the basic
69:51 idea is that you know during a training run there's like a bunch of graphs that
69:54 like today like humans are looking at and it's like really important to like
69:58 look at those. Um we call this babysitting >> because it's very expensive to train I
70:02 imagine and very important to move fast and exactly and there's a lot of there's
70:06 a lot of systems underlying uh the training run and so like a system could
70:09 go down or there could be an error somewhere that gets introduced and so we
70:13 might need to like fix it or pause things or I don't know there's lots of
70:16 actions we might need to take and so basically having codeex like run on a
70:20 loop to like evaluate how those charts are moving over time um is sort of this
70:24 idea that we have to like how to enable us to like train like way more
70:27 efficiently. I love that. This is very much along the lines of this is the
70:31 future of agents. It's codeex isn't just for building code, right? It's it's a
70:34 lot more than that. >> Yeah. >> Okay. Last question. Uh being at OpenAI,
1:10:36 How far are we from a human version of AI?
70:41 uh I can't not ask about your AGI timeline and how far you think we are
70:45 from AGI. I know this isn't what you work on, but there's a lot of opinions,
70:50 a lot of I don't know timelines. How far do you think we are from a humanly human
70:56 version of AI? Whatever that means to you. For me, I think that it's a little
71:01 bit about like when do we see the acceleration curves kind of go like this
71:03 or I don't know which way I'm mirrored here, right? When do we see the hockey
71:08 stick? And I think that the current limiting factor, I mean there's many,
71:11 but I think a current underappreciated limiting factor is like literally human
71:16 typing speed or human multitasking speed on like writing prompts,
71:20 right? And like you know, you were talking about it's like you can have an
71:22 agent like watch all the work you're doing, but if you don't have the agent
71:27 uh also validating its work, then you're still bottlenecked on like can you go
71:30 review all that code, right? So my view is that we need to um unblock those
71:36 productivity loops from like humans having to prompt and humans having to
71:40 like manually validate all the work. And so if we can like rebuild systems to let
71:45 the agent like be default useful, we'll start unlocking hockey sticks.
71:48 Unfortunately, I don't think that's going to be binary. I think it's going
71:51 to be very dependent on what you're building, right? So like I would imagine
71:55 that like next year if you're a startup and you're building a new new piece of
71:59 like you know some new app or something it'll be possible for you to set it up
72:02 on a stack where agents are like much more self sufficient than not right but
72:07 now let's say I don't know you message SAP right let's say you work in SAP like
72:11 they have many like complex systems and they're not going to be able to just
72:13 like get the agent to be self-sufficient overnight in those systems so they're
72:17 going to have to slowly like maybe replace systems or update systems to
72:21 allow the agent to like handle more of the work end to end. And so basically my
72:25 sort of long answer to your question, maybe boring answer is that I think
72:29 starting next year we're going to see like early adopters like starting to
72:33 like hockey stick their productivity. Um and then over the years that follow,
72:36 we're going to see larger and larger companies like hockey stick that
72:39 productivity. And then somewhere in that fuzzy middle is like when that hockey
72:44 sticking will be like flowing back into the AI labs and that's when we'll we'll
72:48 basically be at the AGI tier. >> I love this answer. It's very practical
72:52 and it's something that comes up a lot on this podcast just like the time to
72:55 review all the things AI is doing is really annoying and a big bottleneck. I
72:59 love that you're working on this because it's one thing to just make coding much
73:03 more efficient and do that for people. It's another to take care of that final
73:08 step of okay is this actually great? And that's so interesting that your sense is
73:11 that's the limiting factor. It comes back to your earlier point of even if AI
73:16 did not advance anymore. We have so much more potential to unlock if we uh as we
73:22 learn to use it more effectively. Uh so that is a really unique answer. I
73:25 haven't heard that perspective on what is the big unlock human typing speed to
73:29 review basically what AI is doing for us. >> Mhm. So good. Okay. Uh Alexander, we
1:13:31 Hiring and team growth at Codex
73:35 covered a lot of ground. Is there anything that we haven't covered? Is
73:38 there anything you wanted to share, maybe double down on before we get to
73:44 our very exciting lightning round? I think uh one thing is that the codeex
73:48 team is growing and uh as I was just saying, we're still somewhat limited by
73:51 human thinking speed and human typing speed. We're working on it. So um if
73:58 you're an engineer um or a salesperson or I am hiring for product, a product
74:03 person, uh please hit us up. I'm not sure the best way to give contact info,
74:06 but I guess you can go to our jobs page or do they have contact for you?
74:10 Actually, do listeners have contact for you >> before they send me like, "Hey, I want
74:13 to apply to Codex." >> Uh, I do have a contact form at lenny
74:16 richchi.com. I'm afraid of all the amazing people that are ping me. But
74:19 there we go. We could try that. Let's see how that goes. >> Okay. Or Yeah. Or another maybe an
74:24 easier. We can edit all that out or up to you. But uh yeah, or I would just say
74:28 you can drop us a DM. Uh, for example, I'm Emir Rico on Twitter and hit me up
74:32 if you're interested in joining the team. >> What a dream job for so many people.
74:38 What's a sign they I don't know what's like a way to filter people a little bit
74:42 so they're not flooding your inbox. >> So, specifically, if you want to join
74:46 the codeex team, then you need to be a technical person who uses these tools.
74:50 And I think I would just ask yourself the question, uh, hey, let's say, you
74:54 know, I were to join OpenAI and work on Codeex over the next six months, you
74:59 know, and crush it. What does the life of a software engineer look like then?
75:02 And I think if you have an opinion on that, you should apply. And if you don't
75:05 have an opinion on that and have to think about it first, you know,
75:09 depending on how long you have to think about it, I guess that would be the
75:12 filter, right? Like I think there's a lot of people thinking about the space
75:16 and so we're we're very interested in folks who sort of have already been
75:21 thinking about like what the future should look like with agents and like we
75:23 don't have to agree on where where we're going but I think we want people who
75:26 like are very passionate about the topic. I guess >> it's very rare to be working on a
75:32 product that has this much impact and is at such a bleeding edge of where it's
75:37 possible. It's uh what a cool role for the right person. So, uh, um, it's
75:40 awesome that you have an opening and this audience is, uh, a really good fit
75:45 potentially for for that role. So, I hope we find someone that would be
1:15:47 Lightning round and final thoughts
24:59 The future of AI and coding agents
24:59 integrated product and research team. How do you think you win in this space?
25:04 Do you think it it'll event it'll always be this kind of like race with other
25:08 models constantly kind of leaprogging each other? Do you think there's a world
25:11 where someone just t runs away with it and no one else can ever catch up? Is
25:15 there like a path to just we win? >> Again comes back to this idea of like
25:19 building a teammate and not just a teammate that you know uh participates
25:24 in team planning and prioritization. Not just a teammate that you know really
25:27 tests its code and like helps you maintain and deploy. But even a teammate
25:31 you know like if you think again an engineering teammate they can also like
25:34 schedule a calendar invite right or move standup or do whatever right. And so in
25:42 my mind, if we just imagine that every day or every week some like crazy new
25:46 capability is just going to be deployed by a research lab, it's just impossible
25:50 for us like you know as humans to keep up and like use all this technology. And
25:54 so I think we need to get to this world where you kind of just have like an AI
25:59 teammate or super assistant that you just talk to and it just knows how to be
26:04 helpful like on its own, right? And so you don't you don't have to be like
26:07 reading the latest tips for how to use it. You just like you've plugged it in
26:11 and it just provides help. And so that's kind of the shape of what I think we're
26:14 building. And I think that will be like a very sticky like winning product if we
26:18 can do so. So the shape that in my head at least I have is that we build you
26:23 know maybe a fun topic is like is chat the right interface for AI? I actually
26:27 think chat is a very good interface when you don't know what you're supposed to
26:30 use it for. uh in the same way that if I think of like I'm like on a teams or in
26:34 Slack with a teammate, chat is pretty good. I can ask for whatever I want,
26:37 right? It's like it's kind of the the common denominator for everything. So
26:40 you can chat with a super assistant about whatever topic you want, whether
26:45 it be coding or not. And then if you are like a functional expert in a specific
26:49 domain such as coding, there's like a guey that you can pull up to go really
26:54 deep and like look at the code and like work with the code. So I think like what
26:59 we need to build as open AI is basically this idea of like you have chat chatpt
27:02 PT and that is a tool that's like ubiquitously available to like everyone.
27:06 You start using it even like outside of work right to just help you. You become
27:09 very comfortable with the idea of being accelerated with AI. And so then you get
27:13 to work and you just can naturally just yeah I'm just going to ask it for this
27:16 and I don't need to know about all the connectors or like all the different
27:19 features. I'm just going to ask it for help and it'll surface to me the the
27:23 best way that it can help at this point in time and maybe even chime in when I
27:27 didn't ask it for help. Um, so in my mind, if we can get to that, I think
27:30 that's, you know, that's how we we really build like the winning product.
27:34 This is so interesting because with the my chat with Nick Charlie, the head of
27:37 chat JPT, I think he shared that the original name for Chat JPT was super
27:41 assistant or something like that. >> Yeah. >> And it's interesting that there's like
27:46 that approach to the super assistant and then there's this codeex approach. It's
27:49 almost like the B TOC version and the B2B version. And what I'm hearing is the
27:53 idea here is okay, you start with coding and building and then it's doing all
27:56 this other stuff for you, scheduling meetings, I don't know, probably posting
28:01 in Slack, uh I don't know, shipping designs, I don't know. Is that is the
28:04 idea there? This is like the the business version of ChatGpt in a sense.
28:08 Or is there or is there something else there? >> Yeah. So, you know, so we're getting to
28:12 the like the like one-year time horizon conversation. A lot of this might happen
28:16 sooner, but in terms of fuzziness, I think we're at the one year. So I'll
28:19 give you like a contention in like the plausible way we get there, but as for
28:23 how it happens, who knows? So basically, if we're going to build a super
28:26 assistant, it has to be able to do things, right? So like we're going to
28:29 have a model and it's going to be able to do stuff affecting your world.
28:33 >> And one of the learnings I think we've seen over the past year or so is that
28:38 for models to do stuff, they're much more effective when they can use a
28:41 computer, right? Okay. So now we're like, okay, we need the super assistant that can use a
28:47 computer, right? or many computers. And now the question is, okay, well, how
28:50 should it use the computer, right? And there's lots of ways to use a computer.
28:54 Uh, you know, you could try to hack the OS and like use accessibility APIs.
28:57 Maybe a bit easier is you could point and click. That's a little slow, you
29:02 know, and, uh, unpredictable sometimes. Um, and another way, it turns out the
29:06 best way for models to use computers is simply to write code, right? And so
29:09 we're kind of getting to this idea where like, well, if you want to build any
29:12 agent, maybe you should be building a coding agent. And maybe to the user, a
29:17 nontechnical user, they won't even know they're using a coding agent. The same
29:19 way that no one thinks about are they using the internet or not, which is
29:22 they're more just like is Wi-Fi on? Right? So I think that what we're doing
29:27 with codeex is we're building a software engineering teammate. And as part of
29:30 that, we're kind of building an agent that can use uh a computer by writing
29:36 code. And so we're already seeing like some pull for this. It's like quite
29:39 early, but we're starting to see people like who are using codeex for like
29:43 coding adjacent product purposes. And so as that develops, I think we'll
29:47 just naturally see that like, oh, it turns out like we should just always
29:50 have the agent write code if there is a coding way to solve a problem instead
29:53 of, you know, even if you're doing a financial analysis, right? Like maybe
29:56 write some code for that. So basically like, you know, you were like, hey, is
29:59 this like the two ends of of uh of this product for the super assistant, right,
30:03 of CHCH PT? In my mind, like just coding is a core competency of any agent,
30:06 including Chach PT. And so like what really what we think we're building is
30:10 like that competency. But so here's here's like the really cool thing about
30:13 agents writing code is that you can import code right code is like
30:19 composable interoperable right because if if we you know one very reductive
30:23 view we could have for an agent is it's just going to be given a computer and
30:26 it's just going to like point and click and you know go around but you know that
30:32 is the future and then how we get there is difficult to sort of chart a path
30:36 because a lot of the questions around building agents aren't like can the
30:41 agent do it but it's more about well how can we help the agent understand the
30:44 context that it's working in and like the team that's using it you know
30:47 probably has a way that they like to do things they have guidelines they
30:50 probably want certain deterministic guarantees about what the agent can or
30:54 cannot do or they want to know that the agent understands sort of this detail
30:59 like an example would be you know if we're looking at a crash reporting tool
31:04 hitting a connector for it every sub team is probably has a different meta
31:07 prompt for like how they want the crashes to be analyzed ized, right? And
31:12 so we start to get to this thing where like, yeah, we have this agent sitting
31:15 in front of a computer, but we need to make that configurable for the team or
31:19 for the user, right? And let them like stuff that the agent does often, we
31:22 probably just want to like build in as a competency that this agent has that it
31:27 can do. So I think we end up with this generalizable thing that you were saying
31:31 of like an agent that can just write its own scripts for whatever it wants to do.
31:36 But I think that the the really key part here is can we make it so that
31:40 everything that the agent has to do often or that it does well we can just
31:44 like remember and store so that the agent doesn't have to write a script for
31:47 that again. Right. Or maybe like if I just joined a team and you are already
31:51 on the same team as me. I can just like use all those scripts that the agents
31:53 had written already. >> Yeah. It's like if this is our teammate
31:57 uh we can they can share things that it's learned from working with other
32:00 people at the company. Just makes sense as a metaphor. >> Yeah. It feels like you're in the uh
32:05 Karpathy camp of agents today are not that great and mostly slop and maybe in
32:09 the future they'll be awesome. Does that resonate? >> I think so. I think coding agents are
32:14 pretty great. I think >> uh ton of value, >> right? Yep.
32:19 >> And then I think like agents outside of coding, it's still like very early and
32:23 you know, this is just my opinion, but I think they're going to get a whole lot
32:26 better once they can use coding too and like in a composable way.
32:29 This is it's kind of the fun part of like when you're building for software
32:33 engineers. Like I at my startup we were building for software engineers too for
32:36 a lot of that journey and they're just such a fun audience to build for because
32:41 you know they also like building for themselves and are often like even more
32:45 creative than we are and thinking about how to use the technology. Um and so
32:48 like by building for software engineers you get to just observe a ton of
32:52 emergent behaviors and like things that you should do and build into the
32:55 product. I love how you you say that because a lot of people building for
32:57 engineers get really annoyed because the engineers are so they're just always
33:00 complaining about stuff. They're like, "Ah, that sucks. Why'd you build it this
33:04 way?" I love that you enjoy it, but I think it's probably because you're
33:06 building such an amazing tool for engineers that can actually solve
33:11 The impact of AI on engineering
33:11 problems and just, you know, code for them. Um, kind of along those lines, you
33:15 know, there's always this talk of what will happen with jobs, engineers,
33:18 coding, do you have to learn coding, all these things? Uh clearly the way you're
33:21 describing it is it's a teammate. It's going to work with you, make you more
33:24 superhuman. It's not going to replace you with the way you just think about
33:28 the impact on the field of engineering having this super intelligent
33:33 engineering teammate. I think there's there's two sides to it, but the one we
33:37 were just talking about is this idea that maybe every agent should actually
33:43 use code and be a coding agent. And in my mind, that's just like a small part
33:46 of this like broader idea that like, hey, as we make code even more
33:48 ubiquitous, I mean, you could probably claim it's ubiquitous today, even pre
33:51 AAI, right? But as we make code even more ubiquitous, it's actually just
33:56 going to be used for many more purposes. And so there's just going to be a ton
33:59 more need for people with this like humans with this competency. So that's
34:05 my view. I think this is like quite a complex topic. So, you know, it's
34:08 something we talk about a lot and we have to kind of see how it pans out. But
34:12 I think what we can do what we can do basically as a product team building in
34:15 the space is just try to always think about how are we building a tool so that
34:18 it feels like we're like maximally accelerating uh people you know rather
34:24 than building a tool that makes it like more unclear what you should do as the
34:29 human right like I think like to to you know give an example right now like
34:33 nowadays when you work with a coding agent um it writes a ton of code but it
34:36 turns out writing code is actually one of the most fun parts of software
34:40 engineering for many software engineers. is so then you end up reviewing AI code,
34:45 right? And that's often a less fun part of the job for many software engineers,
34:49 right? And so I actually think like we see that like this this comes out plays
34:53 out all the time in like a ton of micro decisions. And so we as a product team
34:55 are always thinking about like okay, how do we make this more fun? How do we make
34:58 you feel more empowered whereas it's not working and I I would argue that like
35:01 reviewing agent written code is like a place that today is like less fun. And
35:06 so you know then I think okay what can we do about that? Well, we can ship a
35:09 code review feature that like helps you build confidence in the Irw written
35:12 code. Okay, cool. You know, another thing we can do is we can make it so
35:14 that the agent's like better able to validate its work. And you know, it gets
35:18 all the way down into like micro decisions like if you're going to have
35:23 the an agent capability to validate work and let's say you have like I'm thinking
35:27 of Codex web right now like you have a a pane that sort of reflects the work the
35:30 agent did. What do you see first? Do you see the diff or do you see the image
35:34 preview of the code it wrote? Right? And you know, I think if you're thinking
35:36 about this from perspective like how do I empower the human? How do I make them
35:40 feel like as as accelerated as possible like you obviously see the image first,
35:43 right? You shouldn't be reviewing the code unless first you know you've seen
35:46 the image unless maybe it's being like reviewed by an AI and now it's time for
35:49 you to take a look. When I had uh Michael Charel, the CEO of Cursor on the
35:53 podcast, he he had this kind of vision of us moving to something beyond code.
35:58 And I've seen this rise of something called specd driven development where
36:02 you kind of just write the spec and then the code, you know, the AI writes code
36:05 for you. And so you kind of start working at this higher abstraction
36:09 level. Is that something you see where we're going? Just like engineers not
36:12 having to actually write code or look at code and there's going to be this higher
36:16 level of abstraction that we focus on. Yeah, I mean I think I think there's
36:19 like constantly these levels of abstraction and they're actually already
36:23 played out today, right? Like today like coding agents mostly it's like prompt to
36:29 patch right we're starting to see people doing like spec driven development or
36:32 like planned driven development that's actually one of the ways when people ask
36:35 like hey how do you run codex on a really long task well it's like often
36:38 collaborate with it first to write like a plan MD like a markdown file that's
36:42 your plan and once you're happy with that then you ask it to go off and do
36:46 work and if that plan has verifiable steps it'll like work for much longer.
36:51 Um so we're totally seeing that. I think spec driven development is like an
36:55 interesting idea. It's not clear to me that it'll work out that way because a
36:57 lot of people don't write like don't like writing specs either, but it seems
37:02 plausible that some some people will work that way. You know, like a a bit of
37:06 a joke idea though is like if you think of like um the way that many teams work
37:11 today, they're they often like don't necessarily have specs, but the team is
37:14 just really self-driven and so stuff just gets done. And so almost that is
37:17 like I'm coming up with this on the spot so it's you know not a good name but
37:21 like chatterdriven development where it's just like stuff is happening you
37:24 know on social media and like in your team communications tools and then as a
37:28 result like code gets written and deployed right so yeah I think I'm a
37:33 little bit more oriented in that way of you know I don't even necessarily want
37:37 to have to write a spec like sometimes I want to only if I like writing specs
37:42 right uh other times I might just want to say like hey here's like the
37:45 customer, you know, service channel and like tell me what's interesting to know,
37:49 but if it's a small bug, just fix it. I don't want to have to write a spec for
37:51 that, right? >> I have this sort of uh hypothetical future uh that I like to
37:58 share sometimes with people as a provocation, which is like in a world
38:01 where we have like truly amazing agents, like what does it look like to be a
38:04 soloreneur? Um, and uh, you know, one terrible idea for how it could look is that it's
38:12 actually there's a mobile app and um, every idea that the agent has to do is
38:17 just like vertical video on your phone and then you can like swipe left if you
38:21 think it's a bad idea and you can like swipe right if it's a good idea and like
38:24 you can press and hold and like speak to your phone if you want to get feedback
38:28 on the idea before you swipe, you know. So in this world like basically what
38:31 your job is just to like plug in this app into like every single like signal
38:36 system you know system of record and then you just sort of sit back and like
38:39 swipe. I don't know. >> I love this. So this is like Tinder
38:42 meets Tik Tok meets codeex. >> It's pretty terrible. >> No, this is great. So the idea here is
38:47 this thing is this agent is watching and right listening to you paying attention
38:51 to the market your users and it's like cool here's something I should do. It's
38:54 like a proactive engineer just like here we should build this feature fix this
38:56 thing. >> Exactly. I think they're communicating with you in like the lowest like the
39:05 gyms like the modern way to communicate. >> Yeah. >> Swipe left or right and in vertical feed
39:10 and then the Sora video. Okay. So I see how this all connects now. I see.
39:13 >> Yeah. To be clear, we're not building that but like you know it's a fun idea.
39:17 I mean you see you know like in this example though like one of the things
39:19 that it's doing is it's consuming external signals right. I think the
39:23 other really interesting thing is like if we think about like what is the most
39:28 successful like AI product to date um I would argue um it's funny actually
39:34 not to confuse things at all but like the first time we used the the brand
39:38 codeex at OpenAI was actually the model powering GitHub copilot. This is like
39:42 way back in the day, years ago. And so we decided to reuse that that brand
39:45 recently um because it's just so good, you know, codeex code execution. But I
39:50 think actually like autocomp completion and IDEs is like one of the most
39:54 successful AI products to date. And part of what's so magical about it is that
40:01 when the it can surface like ideas for helping you really rapidly. When it's
40:05 right, you're accelerated. When it's wrong, it's not like that annoying. It
40:08 can be annoying, but it's not that annoying, right? And so you can create
40:12 this like mixed initiative system that's like contextually responding to like
40:17 what you're attempting to do. And so in my mind, this is like a really
40:21 interesting thing for us as open as we're building. So for instance, you
40:25 know, when I think about launching a browser, which we did with Atlas, right?
40:29 Like in my mind, one of the really interesting things we can then do is we
40:33 can then like contextually surface like ways that we can help you as you're
40:37 going about your day, right? And so we break out of this like, you know, we're
40:41 just looking at code or we're just in your terminal um into this idea that
40:44 like, hey, like a real teammate is dealing with a lot more than just code,
40:47 right? They're dealing with a lot of things that are web content. So like,
40:51 you know, how can we help you with that? >> Man, there's so much there and I love
40:55 this. Okay, so autocomplete on web with the browser. That's so interesting. just
40:58 like here's all the things that we can help you with as you're browsing and
41:01 going about your day. I want to talk about Atlas. I'll come back to that. Uh
41:05 codeex code execution. Did not know that. That's really clever. I I get it
41:10 now. Okay. And then this chatter, what is a chatter driven development? Uh I
41:14 had a No, this is a really good idea, but it reminds me I had John Gon on the
41:19 podcast, CTO of Block, and they they have this product called Goose, which is
41:24 their own internal agent thing. And he talked about an engineer at block just
41:30 uh has goose watch him with like his screen and listens to every meeting and
41:36 proactively does work that he should will probably want to do. So ships a PR
41:41 sends an email drafts a Slack message. So he's doing exactly what you're
41:44 describing in in kind of a very early way. >> Yeah, that's super interesting. And you
41:49 know, I bet you the So, if we go if we went and asked them what the bottleneck
41:52 to that productivity is, did did they share what it is? >> Uh, probably looking at it just making
41:57 sure this is the right the right thing to do. Yeah. >> Yeah. So, like we see this now like we
42:01 have a Slack integration for Codex. People love, you know, if there's like
42:04 some thing that you need to do quickly. People just like at mentioned Codex like
42:07 why do you think this bug is happening? Right. Doesn't have to be an engineer.
42:10 Even like maybe you know data scientists often here are using Codex a ton to just
42:14 like answer questions like why do you think this metric moved? What happened?
42:18 So questions you you get the answer right back in Slack. It's amazing, super
42:22 useful. But when it's as for when it's writing code, then you have to go back
42:27 and look at the code, right? And so the real like I think bottleneck right now
42:30 is like validating that the code worked and like writing code review.
42:34 So in my mind, if we wanted to get to something like uh you know that uh a
42:38 friend you were talking about world, I think we we really need to figure out
42:42 how to get people to configure their coding agents to be much more autonomous
42:46 on those later stages of the work. It makes sense like you said writing code.
42:49 I used to be an engineer as an engineer for 10 years. Really fun to write code.
42:53 Really fun to just get in the flow, build, architect, test. Not so fun to
42:56 look at everyone else's code and just have to go through and be on the hook if
43:00 it is doing something dumb that's going to take down production. And now that
43:03 building has become easier, what I've always heard from companies that are
43:06 really at the cutting edge of this is the bottleneck is now like figuring out
43:09 what to build and then it's at the end of like, okay, we have all this all 100
43:13 hours to review. Who's going to go through all that? >> Right. Yeah.
43:19 This episode is brought to you by Jira product discovery. The hardest part of
43:22 building products isn't actually building products. It's everything else.
43:26 It's proving that the work matters, managing stakeholders, trying to plan
43:30 ahead. Most teams spend more time reacting than learning, chasing updates,
43:34 justifying road maps, and constantly unblocking work to keep things moving.
43:39 Jira product discovery puts you back in control. With Jira product discovery,
43:43 you can capture insights and prioritize high impact ideas. It's flexible, so it
43:47 adapts to the way your team works and helps you build a road map that drives
43:51 alignment, not questions. And because it's built on Jira, you can track ideas
43:56 from strategy to delivery, all in one place. Less chasing, more time to think,
44:01 learn, and build the right thing. Get Jirroduct Discovery for free at
44:06 atlassian.com/lenny. That's atassian.com/lenny. What has the impact of Codex been on the
44:08 How Codex has impacted the way PMs operate
44:13 way you operate as a product person, as a PM? It's clear how engineering is
44:19 impacted. Uh, code is written for you. What has it done to the way you operate,
44:24 the way PMs operate at at OpenAI? Yeah, I mean I think mostly I just feel like
44:28 much more empowered. Um I've always been sort of more technical leaning PM and especially when
44:34 I'm working on products for engineers, I feel like it's necessary to like you
44:37 know dog food the product but even beyond that I I I just feel like I can
44:42 do much much more as a PM. And uh you know Scott Beltski talks about this idea
44:45 of like compressing the talent stack. I'm not sure if I've phrased that right,
44:48 but it's basically this idea that like maybe the boundaries between these roles
44:52 are a little bit like less needed than before because people can just do much
44:57 more and every time you someone can do more you can like skip one communication
45:00 boundary and make the team like that much more efficient, right? So I think I
45:07 think we see it you know in a bunch of functions now but I guess since you
45:11 asked about like product specifically uh you know now like answering questions
45:15 much much easier you can know just ask codeex for thoughts on that uh a lot of
45:20 like PM type work understanding what's changing again just ask codeex for help
45:25 with that um prototyping is often faster than writing specs this is something
45:29 that a lot of people have talked about I think something that I don't think it's
45:33 super surprising But something that's slightly surprising is like we see like
45:36 we're mostly building codecs for to write code that's going to be deployed
45:40 Throwaway code and ubiquitous coding
45:40 to production but actually we see a lot of throwaway code written with codeex
45:43 now. It's kind of going back to this idea of like you know ubiquitous code.
45:48 So you'll see uh you know someone wants to do an analysis like if I want to
45:51 understand something it's like okay just give codeex a bunch of data but then ask
45:54 it to build like an interactive like data viewer for this data right you
45:56 would that's just like too annoying to do in the past but now it's just like
46:00 totally worth the time of just getting an agent to go do something. Um,
46:04 similarly, I've seen like some pretty cool prototypes on our design team about
46:09 like if you want to well like a designer basically wanted to build an animation
46:13 and this is the coin animation in codeex and it was like normally it'd be too
46:17 annoying to program this animation. So they just vibe coded a animation editor
46:21 and then they use the animation editor to build the animation which they then
46:25 checked into the repo. Actually, our designers are there's a ton of
46:28 acceleration there. And like speaking of compressing the town stack, I think our
46:31 designers are very PM. So, you know, they they do ton of product work. And like they actually
46:38 have like an entire like vibecoded sort of side prototype of the Codex app. And
46:41 so, a lot of how we talk about things is like we'll have like a really quick jam
46:44 because there's like 10,000 things going on. And then designer will like go think
46:48 about how this should work, but instead of like talking about it again, they'll
46:50 just like vibe code a prototype of that in their like standalone prototype.
46:54 We'll play with it. If we like it, they'll vibe code that prototype into or
46:59 vibe engineer that prototype into an actual PR to land. And then depending on
47:02 their comfort with the codebase, like codeex CLI and Rust is a little harder.
47:06 Maybe they'll like land it themselves or they'll like get close and then an
47:09 engineer can help them like land the PR. Um, you know, we recently shipped the
47:10 Shipping the Sora Android app
47:15 Sora Android app. Um and uh that was one of the most sort of mind-blowing
47:19 examples of acceleration actually because usage of of codeex internally at
47:24 open is obviously really really high but it's been growing uh over the course of
47:28 the year both in terms of like now it's basically like all technical staff use
47:32 it uh but even like the intensity and knowhow of how to make the most of
47:35 coding agents has gone up by a ton and so the Sora Android app right like a
47:42 fully new app we built it in 18 days it went from like zero to launch to
47:46 employees and then 10 days later so 28 days total we went to just like GA to
47:51 the public and that was done just like with the help of Codex
47:56 so pretty insane velocity I would say it was like a little bit I don't want to
48:01 say easy mode but there is one thing that Codex is really good at if you're a
48:04 company that's like building software on multiple platforms so you've already
48:07 figured out like some of the underlying like APIs or systems asking codeex to
48:13 like to port things over is really effective because it has like something
48:15 you can go look at. And so the engineers on that team uh were basically having
48:20 codeex go look at the iOS app, produce plans of work that needed to be done and
48:23 then go implement those. And it was kind of looking at iOS and Android at the
48:27 same time. And so you know basically it was like two weeks to launch to
48:30 employees four weeks total. Insanely fast. >> What makes that even more insane is it
48:35 was the it became the number one app in the app store. >> I don't this just boggles the mind.
48:39 Okay. So >> yeah. So imagine releasing number one app on the app store with like a handful
48:45 of engineers >> uh I think it was like >> two or three possibly
48:53 >> uh in a handful of weeks. Yeah, this is absurd. So >> yeah, so that's a really fun um example
49:01 Building the Atlas browser
49:01 of uh acceleration. And then like Atlas was the other one that I think um Ben
49:06 did a podcast the the the engine on Atlas uh sharing a little bit of how we
49:12 built there. You know many Atlas is is actually I mean it's it's a browser
49:15 right and building a browser is really hard. Um and so we uh had to build a lot
49:23 of difficult systems in order to do that and basically we got to the point where
49:27 that team has a ton of power users of codecs right now. And um you know it got
49:32 to the point where they they basically were you know we were talking to them
49:34 about it because a lot of those engineers are people I used to work with
49:38 before at my startup and so they'd say you know before this would have taken us
49:42 like two to three weeks for two to three engineers and now it's like one engineer
49:48 one week. Um so massive acceleration there as well. And what's quite cool is
49:52 that uh you know we we shipped Atlas on on Mac first but now we're working on
49:56 the Windows version. you know that so the team now is like ramping up on
49:58 Windows and they're helping us make codecs better on Windows 2 which is
50:02 admittedly earlier like just the model we we shipped last week is the first
50:06 model that natively understands PowerShell. So you know PowerShell being
50:11 uh the native like shell language on Windows. So yeah, it's been it's been
50:16 really awesome to see like the whole company getting accelerated by codeex
50:21 like from and you know most obviously also research and like improving how
50:24 quickly we train models and how well we do it and then even like uh design as we
50:28 talked about and and marketing like actually we're at this point now where
50:32 uh my product marketer is often also making string changes just directly from
50:36 Slack or like updating docs directly from Slack. >> These are amazing examples. You guys are
50:42 living at the bleeding edge of what is possible and this is how other companies
50:46 are going to work. Uh just shipping again what became the number one app in
50:49 the app store and just beloved all over the it just like took over the I don't
50:54 know the world for at least a week. Uh built you said in 28 days and like I
50:58 don't know 10 days 18 days just to get like the core of it working.
51:02 >> Yeah. So like 18 days we had a thing that employees were playing with and
51:05 then 10 days later we were out. >> And you said just a couple engineers.
51:07 >> Yeah. >> Two or three. Okay. And then Atlas you
51:11 said was took a week to build. >> No, no, no. So Atlas, not the whole
51:16 week, but Atlas was like a really meaty project. >> Yeah.
51:18 >> Um and so I was talking to one of the engineers on Atlas um about like you
51:23 know just how what they use codex for and it's basically like we use codex for
51:25 absolutely everything. I was like okay well like you know how would you how
51:29 would you measure the acceleration? And so basically the the answer I got back
51:31 was >> previously it would have taken two to three weeks for two to three engineers
51:36 and now it's like one engineer one week. Do you think this eventually moves to
51:39 non-engineers doing this sort of thing? Like does it have to be an engineer
51:42 building this thing? Could sort of have built been built by I don't know a PM or
51:46 designer. I think we will very much get to the point where well basically where
51:50 the boundaries are a little bit blurred, right? Like I think you're going to want
51:54 someone who's like understands the details of what they're building, but
51:58 what details those are will evolve. Kind of like how now like if you're writing
52:02 Swift, you don't have to speak assembly. You know, there's a handful of people in
52:05 the world and it's really important that they exist. and like speak assembly. Uh
52:09 maybe more than a handful, right? But that's like a specialized function that
52:14 like most companies don't need to have. So I think we're just going to naturally
52:17 see like an increase in layers of abstraction. And then the cool thing is
52:21 now we're entering like the language layer of abstraction like natural
52:25 language. And then natural language itself is really flexible, right? Like
52:29 you could have engineers talking about like a plan and then you could have
52:32 engineers talking about a spec and then you could have engineers talking about
52:35 just, you know, a product or an idea. So I think we can also like start moving up
52:39 those layers of of abstraction as well. But you know I I do think this is going
52:43 to be gradual. I don't think it's going to go to like all of a sudden like
52:46 nobody ever writes anything and like you know any code and it's just specs. I
52:49 think it's going to be much more like okay we've set up our coding agent to be
52:53 really good at like previewing the build or like at running tests. Maybe that's
52:56 the first part right that most people have set up. And it's like okay now
52:59 we've set it up so that it can like execute the build and it can like see
53:03 the results of its own changes but you know we haven't yet built a good
53:06 integration harness so that it can like in the case of Atlas like by the way I
53:08 don't know if they've done any of this or not I think they've done a lot of
53:11 this but you know maybe the next stage is like enable it to like load a few
53:16 sample pages to see how well those work right so then okay now we're going to
53:19 like set up set up do that and I think for some time at least we're going to
53:22 have humans kind of curating like which of these connectors or systems or
53:26 components that the agent needs to be good at talking to and then you know in
53:30 the future there will be an even greater unlock where Codex tells you how to set
53:34 Codex’s impact on productivity
53:34 it up or maybe sets itself up in a repo. What a wild time to be alive. Wow. I'm
53:38 curious just the second order effects of this sort of thing. Just how quickly it
53:42 is to build stuff. What does that do? Does that mean distribution becomes much
53:46 much more important? Does it mean uh ideas are just worth a lot more? It's
53:50 interesting to think about how quick how that changes. >> I'm curious what you think. I still
53:56 don't think ideas are worth as much as maybe some a lot of people think. I
53:59 think still think execution is really hard, right? Like you can build
54:01 something fast, but you still need to execute well on it. Still needs to make
54:06 sense and be a coherent thing overall. Um Yeah. And distribution is massive.
54:10 >> Yeah. Just feels like everything else is now more important. Everything that
54:13 isn't the building piece, which is >> coming up with an idea, getting to
54:17 market, profit, >> all that kind of stuff. I I think we
54:21 might have been in this weird temporary phase where you know for a while like
54:26 you could you could just it was so hard to build product that you mostly just
54:31 had to be really good at building product and it maybe didn't matter if
54:34 you like had an intimate understanding of a specific customer.
54:39 Um, but now I think we're getting to this point where actually like if I
54:42 could only choose like one thing to understand, it would be like really
54:46 meaningful understanding of like the problems that a certain customer has,
54:49 right? If I could only if I could only go in with one like core competency. So
54:54 I think that that's that's ultimately still what's going to matter most,
54:57 right? Like if you're starting a new company today and you have like a really
55:02 good understanding and like network of customers that are currently underserved
55:05 by AI tools, I think you're like you're set, right? Whereas if you're like good
55:09 [clears throat] at building like you know websites, but you don't have any
55:12 specific customer to build for, I think you're in for a much harder time.
55:17 Bullish on vertical AI startups is what I'm hearing. Yeah, I completely agree.
55:20 There's like, you know, there's like the general thing that can solve a lot of
55:23 problems and then there's like we're going to solve presentations incredibly
55:25 well and we're going to understand the presentation problem uh better than
55:30 anyone and we're going to uh plug into your workflows and all these other
55:33 things that matter for a very specific problem. Okay. Incredible. When you
55:35 Measuring progress on Codex
55:39 think about progress on codecs, I imagine you have a bunch of evals and
55:42 there's all these public benchmarks. What's something you look at to tell
55:45 you, okay, we're making really good progress. I imagine it's not going to be
55:48 the one thing, but what do you focus on? What's like something you're trying to
55:51 push? What's like a KPI or two? One of the things that I'm constantly reminding
55:56 myself of is that a tool like Codex sort of naturally is a tool that you would,
56:00 you know, become a power user of, right? And so we can accidentally spend a lot
56:03 of our time thinking about features that are like very deep in the user adoption
56:08 journey. Um, and so we can kind of end up oversolving for that. And so I think
56:12 it's like just critically important to like go look at like your like D7
56:16 retention, right? just go try the product. Like sign up from scratch
56:19 again. Um I have a few too many like catchup pro accounts that I've just like
56:24 in order to maximally correctly dog food like signed up for on my Gmail and they
56:27 charge me like 200 bucks a month. I need to expense those. But uh uh you know
56:33 like I think just like the feeling of being a user and the early retention
56:37 stats are still like super important for us because you know as much as this
56:41 category is is taking off I think we're still in the very early days of like
56:45 people using them. Um, another thing that we do that that might might be I
56:51 think we might be the most like user feedback slashsocial media pill team out
56:56 there in this space is like a few of us are like constantly on Reddit and
57:01 Twitter and uh you know there's a there's praise up there and there's a
57:04 lot of complaints but we take the complaints like very seriously and look
57:08 at them and I think that again because you can use like coding agent for so
57:12 many different things um it often is like kind of broken in any sort of ways
57:17 for like specific behaviors. Um, and so we we actually monitor a lot just like
57:20 what the vibes are on social media pretty often, especially I think for for
57:27 Twitter X, um, it's a little bit more hypy and then Reddit is a little more
57:34 negative but real actually. Um, so I've started increasingly paying attention to
57:37 like how people are talking about using Codex on Reddit. Actually,
57:41 >> this is uh important for people to know. Which the subreddits do you check most?
57:44 Is there like an R codeex or >> I mean the algorithm is pretty good at
57:48 surfacing stuff but like r/codex is is there >> okay I'll take very interesting and then
57:52 uh if people tag you on Twitter you still see that but maybe not as powerful
57:56 as seeing it on Reddit. >> Well yeah the interesting well the thing
57:58 with Twitter is it's a little bit more onetoone even if it's like in public
58:01 whereas like with Reddit there's like really good upvoting mechanics and like
58:05 maybe most people are still not bots unclear. Um so you get you get like good
58:09 Why they are building a web browser
5:13 The speed and ambition at OpenAI
5:18 here and welcome to the podcast. >> Thank you so much. I've been following
5:21 for ages and I'm excited to be here. >> I'm even more excited. I really
5:24 appreciate that. I want to start with your time at Open AI. So, you joined
5:30 OpenAI about a year ago. Before that, you had your own startup for about 5
5:34 years. Before that, you were a product manager at Dropbox. I imagine OpenAI is
5:39 very different from every other place you've worked. Let me just ask you this.
5:44 What is most different about how OpenAI operates? And what's something that
5:47 you've learned there that you think you're going to take with you wherever
5:50 you go, assuming you ever leave? By far, I would say the speed and ambition of
5:54 working at OpenAI are just like dramatically more than what I can
5:58 imagine. And you know, I guess it's kind of an embarrassing thing to say because
6:01 you, you know, everyone who's a startup founder thinks like, "Oh yeah, my
6:04 startup moves super fast and the talent bar is super high and we're super
6:07 ambitious." But I have to say like working at OpenAI just kind of like made
6:10 me reimagine what he what that even means. We hear this a lot about, you
6:14 know, feels like every AI company is just like, "Oh my god, I can't believe
6:17 how fast they're moving." Is there an example of just like, "Wow, that
6:19 wouldn't have happened this quickly anywhere else." >> The most obvious thing that comes to
6:23 mind is just like the the explosive growth of codeex itself. I think it's a
6:27 while since we bumped our external number, but like you know it's like the
6:32 the 10xing of Codeex's scale was just like super fast in a matter of months
6:37 and it's like well more since then and you know like once you've lived through
6:40 that or at least speaking for myself like having lived through that now I
6:45 feel like anytime I'm going to spend my time on like you know building tech
6:49 product there's that kind of that speed and scale that I now need to to to meet.
6:54 If I think of like what I was doing in my startup, it moved like way slower.
6:58 And I, you know, there's always this balance with startups of like how much
7:01 do you commit to an idea that you have versus like find out that it's not
7:06 working uh and then pivot. But I think one thing I've realized at OpenAI is
7:09 like the the amount of impact that we can have and in fact need to have to do
7:13 a good job is so high that it it's I have to be like way more ruthless with
7:16 how I spend my time. Before we get to codeex, is there a way that they've
7:20 structured the org or I don't know the way that open operates that allows the
7:23 team to move this quickly because everyone everyone wants to move super
7:27 fast. I imagine there's a structural approach to allowing this to happen.
7:30 >> I mean, so one thing is just the technology that we're building with has
7:35 like just transformed so many things, you know, from like both how we build
7:39 but also like what kinds of things we can enable uh for users. And you know we
7:43 spend most of our time talking about like the sort of improvements within the
7:47 foundation models but I I believe that even if we had no more progress today
7:51 with models which is absolutely not the case but if even if we had no more
7:55 progress we are way behind on product. There's so much more product to build.
7:59 >> So I think like just like the moment is ripe if that makes sense.
8:03 >> But I think there's a lot of sort of counterintuitive things that surprised
8:06 me when I arrived as far as like how things are structured. One example that
8:10 comes to mind is like when I was working on my startup and and before that when I
8:12 was a dropbox, it was like very important, you know, especially as a PM
8:16 to like always kind of rally the ship and it was kind of like make sure you're
8:18 pointed in the right direction and that you can like accelerate in that
8:24 direction. But here I think because we don't exactly know like what
8:27 capabilities will even come up soon and we don't know what's going to work uh
8:31 technically and then we also don't know what's going to land even if it works
8:34 technically. It's much more important for us to be very like humble and learn
8:39 a lot more empirically and just try things quickly and like the org is is
8:44 set up in that way to to be incredibly bottoms up. You know, this is again one
8:47 of those things that like as you were saying, everyone wants to move fast. I
8:50 think everyone likes to say that they're bottoms up or at least a lot of people
8:53 do, but OpenAI is like truly truly bottoms up and that's like been a
8:58 learning experience for me that now like it it'll be interesting if I ever work
9:02 at like I don't think it'll ever it'll even make sense to work at a nonAI
9:05 company in the future. I don't even know what that means. But if I were to
9:08 imagine it or go back in time, I think I would like run things totally different.
9:12 >> What I'm hearing is kind of this uh ready, fire, aim uh is the approach more
9:17 than ready, aim, fire. And this something and as you processed that uh
9:21 because that may not come across well but I actually have heard this a lot at
9:25 AI companies is because you don't know and Nick Charlie shared I think the same
9:28 sentiment because you don't know how people will use it. It doesn't make
9:31 sense to spend a lot of time making it perfect. It's better to just get it out
9:36 there in a primordial way see how people use it and then go big on that use case.
9:41 Yeah. It's like to okay to use this analogy a little bit I feel like there
9:44 there is an aim component but the aim component is much fuzzier. you know,
9:48 it's kind of like roughly what do we think can happen? like someone um I've
9:52 learned a ton from working here is a is a research lead and he likes to say that
9:57 like in open AI we can can have really good conversations about something
10:01 that's like a year plus from now and you know there's a lot of ambiguity in what
10:04 will happen but but like that's a right sort of timeline and then we can have
10:07 really good conversations about what's happening like in like low months or low
10:11 or weeks but there's kind of this like awkward middle ground which was like as
10:14 you start approaching a year but you're not at a year where it's like very
10:18 difficult to reason about right and so as far As far as like aiming, I think we
10:21 want to know like, okay, what are some of the futures that we're trying to
10:24 build towards and like a lot of the problems we're dealing with in AI, like
10:26 such as alignment, are problems you need to be thinking out like really far out
10:30 into the future. So, we're kind of aiming fuzzily there. But when it comes
10:34 down to the more tactically like, oh yeah, like what product will we build
10:37 and therefore how will people use that product? That's the place where we're
10:40 much more like let's find out empirically. >> That's a good way of putting it.
10:44 Something else that when people hear this, they people sometimes hear
10:49 companies like yours saying, "Okay, we're gonna be bottoms up. We're gonna
10:51 try a bunch of stuff. We're not going to have exactly a plan of where it's going
10:55 in the next few months." The key is you all hire the best people in the world.
10:59 And so that feels like a really key ingredient in order to be this
11:02 successful at Bottoms Upwork. it just super resonates basically.
11:07 >> Um I was just like again surprised or even shocked when I arrived at like the
11:11 level of like individual like drive and like autonomy that everyone here has. So
11:18 I think like the way that OpenAI runs like many you can't like read this or be
11:22 on listen to a podcast and be like I am I'm just going to deploy this to my
11:26 company. Um you know maybe this is a harsh thing to say but I think like yeah
11:28 very few companies have the talent caliber to be able to do that. So it
11:33 might need to be like adjusted if you were going to implement this.
11:36 >> Okay. So let's talk codeex. You lead work on codeex. How's codeex going? What
11:40 numbers can you share? Is there anything you can share there? Also just not
11:43 everyone knows exactly what codeex is. Explain what codeex is. Totally. Yeah.
11:48 So uh I have the very lucky job of of living in the future and leading
11:53 products on codeex. Um and codeex is open coding agent. So super concretely
11:59 that means it's an IDE extension VS code extension uh that you can install or a
12:02 terminal tool that you can install and when you do so you can then basically
12:06 pair with codeex to answer questions about code write code uh you know run
12:12 tests execute code and do a bunch of the work in sort of that like thick middle
12:15 section of the software development life cycle which is all about uh you know
12:19 writing code that you're going to get into production. Uh more broadly we
12:25 think of codeex as like it's the what it currently is is just the beginning of a
12:29 software engineering teammate. And so you know when we when you when we use a
12:32 big word like teammate like some of the things we're imagining are that it's not
12:36 only able to to write code but actually it participates like early on in like
12:40 the ideation and planning phases of writing software and then further
12:43 downstream in terms of like validation deploying and like maintaining code. to
12:48 make that a little more fun. Like one thing I like to imagine is like if you
12:51 think of what Codex is today, it's a bit like this like really smart intern that
12:55 like refuses to read Slack and like doesn't check data dog or like Sentry
12:59 unless you ask it to. And so like no matter how smart it is, like how much
13:02 are you going to trust it to write code without you also working with it, right?
13:05 So that's how people use it mostly today is they pair with it.
13:08 >> But we want to get to the point where you know it can work like just like a
13:12 new intern that you hire, you don't only ask them to write code, but you ask them
13:15 to participate across the cycle. And so you know that like even if they don't
13:17 get something right the first try, they're eventually going to be able to
13:20 iterate their way there. >> I thought the way uh I thought the point
13:23 about not reading Slack in Dave Dog was it's just not distracted. It's just
13:26 constantly focused and is always in flow. But I get what you're saying there
13:30 is it doesn't have all the context on everything that's going on.
13:33 >> And like that's not only true when it's performing a task, but again if you
13:36 think of like the best human teammates, like you don't tell them what to do,
13:39 >> right? Like maybe when you first hire them, you have like a couple meetings
13:42 and you're like, "Hey, like you kind of learn like, okay, this is this these
13:45 prompts work for this teammate. These prompts don't, right? This is how to
13:48 communicate with this person." Then eventually you give them some starter
13:50 tasks. You delegate a few tasks. But then eventually you just say like, "Hey,
13:53 great. Okay, you're working with this set of people in this area of the
13:57 codebase. You know, feel free to work with other people in other parts of the
14:00 codebase too even." And yeah, you tell me what you think makes sense to be
14:03 done, right? And so, you know, we think of this as like proactivity and like one
14:06 of our major goals with Codeex is to like get to proactivity.
14:12 I think this is this is like critically important to like achieve the mission of
14:15 OpenAI which is to deliver the benefits of AI to all humanity. You know, I like
14:19 to joke today that like AI products and it's it's a half joke. They're actually
14:23 like really hard to use because you have to like be very thoughtful about when it
14:29 could help you. And if you're not prompting a model to help you, it's
14:33 probably not helping you at that time. And if you think of how many times like
14:36 the average user is prompting AI today, it's probably like tens of times. But if
14:40 you think of how many times people could actually get benefit from a really
14:44 intelligent entity, it's thousands of times per day. And so a large a large
14:48 part of our our goal with codeex is to figure out like what is the shape of an
14:52 actual teammate agent that is sort of helpful by default. When people think
14:57 about cursor and uh even cloud code, it it's like a IDE that helps you code and
15:01 kind of autocompletes code and maybe does some agentic work. What I'm hearing
15:05 here is the vision is is different which is it's a teammate. It's like a remote
15:09 teammate, a building code for you that you talk to and ask to do things and it
15:14 also does IDE autocomplete and things like that. Is that is that a kind of a
15:17 differentiator in the way you think about codecs? It's basically this idea
15:22 that like we want the way like if you're a developer and you're trying to get
15:25 something done, we want you to just feel like you have superpowers and you're
15:29 able to move much much faster. But we don't think that in order for you to
15:33 reap those benefits, you need to be sitting there constantly thinking about
15:37 like how can I invoke AI at this point to do this thing. We want you to be able
15:40 to sort of like plug it in to the way that you work and have it just start to
15:43 do stuff without you having to think about it. >> Okay. I have a lot of questions along
15:46 those lines, but uh just how's it going? Is there any stats, any numbers you can
15:49 share about how Codex is doing? >> Yeah, it's been Codex has been growing
15:53 like absolutely explosively um since the launch of GPT5 back in August. Um
15:57 there's some definitely some interesting like product insights to talk about as
16:00 to like how we unlock that growth if you're interested. But yeah, the last
16:03 the last stat we shared there was like we we were like well over 10x since
16:08 August. In fact, it's been like 20x since then. Um, also the codex models
16:12 are serving many many trillions of tokens a week now and it's basically
16:17 like our most served coding model. Um, one of the really cool things that we've
16:20 seen is that the way that we decided to set up the codeex team uh was to build a
16:25 you know really tightly integrated product and research team that are
16:28 iterating on the model and the harness together. And it turns out that lets you
16:32 just do a lot more and try many more experiments as to how these things will
16:36 work together. And so we were just training these models for use in our
16:40 first party harness that we were very opinionated about. And then what we've
16:44 started to see more recently actually is that other major sort of API coding
16:48 customers are now starting to adopt these models as well. And so we've
16:51 reached a point where actually the codeex model is the most served coding
16:55 model in the API as well. >> You uh hinted at this uh what unlocked
17:00 this growth? I am extremely interested in hearing that. It felt like before, I
17:04 don't know, maybe this was before you joined the team. It just felt like cloud
17:07 code was killing it. Just everyone was sitting on top of cloud code. It was by
17:11 far the best way to code. And then all of a sudden, Codex comes around. I
17:16 remember Carpathy tweeted that he just like has never seen a model like this.
17:20 He I think the tweet was the gnarliest bugs that he runs into that he just
17:23 spends hours trying to figure out. Nothing else has solved. He gives it to
17:27 Codeex, lets it run for an hour, and it solves it. What What did you guys do? We
17:32 have this strong sort of mission here at OpenAI to you know basically to build
17:38 AGI. Um and so we we think a lot about what how can we shape the product so
17:43 that it can scale right you know earlier I was mentioning like hey like if you're
17:45 an engineer you should be getting help from from AI like thousands of times per
17:50 day right and so we thought a lot about the primitives for that when we launched
17:54 our first version of codeex uh which was Codex cloud and that was basically a
17:58 product that had its own computer lives in the cloud you could delegate to it
18:02 and you know the sort of the coolest part about that was you could run many
18:05 many tasks in parallel But some of the challenges that we saw
18:11 are that it's a little bit harder to set that up both in terms of like
18:14 environment configuration like giving the model the tools it needs to validate
18:18 changes and to learn how to prompt in that way. And sort of my my analogy for
18:22 this is going back to this teammate analogy. It's like if you hired a
18:26 teammate but you're never allowed to get on a call with them and you can only go
18:30 back and forth, you know, asynchronously over time. like that works for some
18:33 teammates and eventually that's actually how you want to spend most of your time.
18:36 So that's still the future, but it's hard to initially adopt.
18:40 So we still have that vision of like that's what we're trying to get you to a
18:43 teammate that you delegate to and then is proactive and we're seeing that
18:48 growing. But the key unlock is actually first you need to land with users in a
18:51 way that's like much more intuitive and like trivial to get value from. So the
18:56 way that most people discover like the vast majority of users discover codeex
19:00 today is either they download an IDE extension or they run it in their CLI
19:05 and the agent works there with you on your computer interactively and uh it
19:09 works within a sandbox which is actually like a really cool piece of tech to to
19:13 help that be safe and secure but it has access to all those dependencies. So if
19:17 the agent needs to do something like it needs to run a command it can do so
19:20 within the sandbox. we don't have to set up any environment and if it's a command
19:23 that doesn't work in the sandbox it can just ask you and so you can get into
19:27 this like really strong feedback loop using the model and then over time like
19:31 our team's job is to like help turn that feedback loop into you sort of as a
19:35 byproduct of using the product configuring it so that you can then be
19:39 delegating to it down the line and again analog you keep coming back to it but
19:43 like if you hire a teammate and you ask them to do work but they you just give
19:46 them like a fresh computer from the store it's going to be hard for them to
19:49 do their job right but if as you work with them side by side. You could be
19:52 like, "Oh, you don't have a password for this service we use. Here's the password
19:56 for this service." You know, yeah, don't worry. Feel free to run this command.
19:59 Then it's like much easier for them to then go off and do work for hours
20:03 without you. So, what I'm hearing is the initial version of Codeex was almost too
20:06 far in the future. It's like a remote in the cloud uh agent that's coding for you
20:11 asynchronously. And what you did is okay, let's actually come back a little
20:15 bit. Let's integrate into the way engineers already integrate into IDs and
20:20 locally and help them kind of on ramp to this new world. Totally. And this was it
20:26 was quite interesting because we we dog food product a ton at OpenAI. So you
20:30 know dog food as in we use our own product and so Codex has been
20:34 accelerating OpenAI over the course of the entire year and the cloud product
20:38 was a massive accelerant to the company as well. Um it just turns out that this
20:44 is one of those places where the signal we got from dog fooding is a little bit
20:47 different from the signal you get from like the general market because at
20:50 OpenAI you know we train reasoning models all day and so we're very used to
20:54 this kind of prompting thing and like you know think up front run things
20:59 massively in parallel and uh you know it would take some time and then come back
21:03 to it later asynchronously and so you know now when we build we still get a a
21:06 ton of signal from dog footing internally but uh you know we're also
21:11 very cognizant of like the different ways that different audiences use the
21:14 product. That's really funny. It's like live in the future but maybe not too far
21:17 in the future. And I could see how everyone open AI is living very far in
21:21 the future and sometimes that won't that won't work for everyone.
21:25 >> Yeah. What about just like uh intelligence training data? I don't
21:28 know. Is there something else that helped Codeex accelerate its ability to
21:32 actually code? Is it like better, cleaner data? Is it more just models
21:36 advancing? Is there anything else that really helped accelerate? Yeah. So
21:41 there's like a few components here. Um I guess you know you were mentioning
21:44 models and the models have improved a ton. In fact um just last Wednesday we
21:50 shipped GPD 5.11 CEX Max a very you know accurately named model. Uh that is that
21:56 is awesome. It is awesome both because it is um for any given task that you
22:01 were using GPD 5.1 codecs for it's like you know roughly uh 30% faster at
22:06 accomplishing that task but also it unlocks a ton of intelligence. So if you
22:10 use it at our higher reasoning levels, it's just like even smarter. Um, and you
22:13 know that that feedback that or that tweet you were saying like Karpathi made
22:16 about like, hey, give us your gnarliest bugs like you know obviously there's a
22:20 ton going on in the market right now, but like Codex Max is definitely like
22:24 carrying that mantle of uh, you know, tackling the hardest bugs. Um, so that
22:28 is that is super cool. But I will say it's like some of what how we're
22:32 thinking about this is evolving a little bit from being like yeah we're just
22:35 going to think about the model and like let's just like train the best model to
22:38 really thinking about like what is an agent actually overall right and you
22:43 know I'm not going to try to define agent exactly but at least the stack
22:46 that we think of it as having is it's like you have this model really smart
22:51 reasoning model that knows how to do a specific kind of task really well. So we
22:53 can talk about how we make that possible. But then actually we need to
22:59 serve that model through an API into a harness. And both of those things also
23:03 have a really big role here. So for instance, one of the things uh that
23:07 we're really proud of is you can have GP5.1 CX max work for really long
23:11 periods of time. That's not like normal, but you can set it up to do that or that
23:15 might happen. But now routinely we'll hear about people saying like yeah, it
23:18 ran like overnight or it ran for 24 hours. M >> and so you know for a model to work
23:22 continuously for that amount of time it's going to exceed its context window
23:25 and so we have a solution for that which we call compaction. Um but compaction is
23:30 actually a feature that uses like all three layers of that stack. So you need
23:36 to have a model that has a concept of compaction and knows like okay as I
23:39 start to approach this context window I might be asked to like prepare to be run
23:43 in a new context window. And then at the API layer, you need an API that like
23:47 understands this concept and like has an endpoint that you can hit to do this
23:50 change. And at the harness layer, you need a harness that can like prepare the
23:53 payload for this to be done. And so like shipping this compaction feature that
23:56 now just like made this behavior possible to like anyone using codecs
23:59 actually been working across all three things. And I think that's like
24:03 increasingly going to be true. Another maybe like underappreciated version of
24:08 this is is if you think about all the different coding products out there,
24:10 they all have like very different tool harnesses with like very different
24:14 opinions on how the model should work. And so if you want to train a model to
24:17 be good at like all the different ways uh it could work. Like you know maybe
24:20 you have a strong opinion that it should work using semantic search, right? Maybe
24:24 you have a strong opinion that it should like call bespoke tools or maybe you
24:27 have like in our case a strong opinion that it should just use like the shell
24:32 work in the terminal. You know, you can be much you can move much faster if
24:34 you're just optimizing for one of those worlds, right? And so the way that we
24:38 built codeex is that it just uses the shell. But in order to make that like
24:43 safer and secure, we uh have a sandbox that the model is used to operating in.
24:46 So I think one of the biggest accelerants to go all the way back to
24:49 your to your answer question Russian is just like we're building all three
24:52 things in parallel and like kind of tuning each one and um you know
24:56 constantly experimenting with how those things work with like a tightly
24:59 integrated product and research team. How do you think you win in this space?
25:04 Do you think it it'll event it'll always be this kind of like race with other
25:08 models constantly kind of leaprogging each other? Do you think there's a world
25:11 where someone just t runs away with it and no one else can ever catch up? Is
25:15 there like a path to just we win? >> Again comes back to this idea of like
25:19 building a teammate and not just a teammate that you know uh participates
25:24 in team planning and prioritization. Not just a teammate that you know really
25:27 tests its code and like helps you maintain and deploy. But even a teammate
25:31 you know like if you think again an engineering teammate they can also like
25:34 schedule a calendar invite right or move standup or do whatever right. And so in
25:42 my mind, if we just imagine that every day or every week some like crazy new
25:46 capability is just going to be deployed by a research lab, it's just impossible
25:50 for us like you know as humans to keep up and like use all this technology. And
25:54 so I think we need to get to this world where you kind of just have like an AI
25:59 teammate or super assistant that you just talk to and it just knows how to be
26:04 helpful like on its own, right? And so you don't you don't have to be like
26:07 reading the latest tips for how to use it. You just like you've plugged it in
26:11 and it just provides help. And so that's kind of the shape of what I think we're
26:14 building. And I think that will be like a very sticky like winning product if we
26:18 can do so. So the shape that in my head at least I have is that we build you
26:23 know maybe a fun topic is like is chat the right interface for AI? I actually
26:27 think chat is a very good interface when you don't know what you're supposed to
26:30 use it for. uh in the same way that if I think of like I'm like on a teams or in
26:34 Slack with a teammate, chat is pretty good. I can ask for whatever I want,
26:37 right? It's like it's kind of the the common denominator for everything. So
26:40 you can chat with a super assistant about whatever topic you want, whether
26:45 it be coding or not. And then if you are like a functional expert in a specific
26:49 domain such as coding, there's like a guey that you can pull up to go really
26:54 deep and like look at the code and like work with the code. So I think like what
26:59 we need to build as open AI is basically this idea of like you have chat chatpt
27:02 PT and that is a tool that's like ubiquitously available to like everyone.
27:06 You start using it even like outside of work right to just help you. You become
27:09 very comfortable with the idea of being accelerated with AI. And so then you get
27:13 to work and you just can naturally just yeah I'm just going to ask it for this
27:16 and I don't need to know about all the connectors or like all the different
27:19 features. I'm just going to ask it for help and it'll surface to me the the
27:23 best way that it can help at this point in time and maybe even chime in when I
27:27 didn't ask it for help. Um, so in my mind, if we can get to that, I think
27:30 that's, you know, that's how we we really build like the winning product.
27:34 This is so interesting because with the my chat with Nick Charlie, the head of
27:37 chat JPT, I think he shared that the original name for Chat JPT was super
27:41 assistant or something like that. >> Yeah. >> And it's interesting that there's like
27:46 that approach to the super assistant and then there's this codeex approach. It's
27:49 almost like the B TOC version and the B2B version. And what I'm hearing is the
27:53 idea here is okay, you start with coding and building and then it's doing all
27:56 this other stuff for you, scheduling meetings, I don't know, probably posting
28:01 in Slack, uh I don't know, shipping designs, I don't know. Is that is the
28:04 idea there? This is like the the business version of ChatGpt in a sense.
28:08 Or is there or is there something else there? >> Yeah. So, you know, so we're getting to
28:12 the like the like one-year time horizon conversation. A lot of this might happen
28:16 sooner, but in terms of fuzziness, I think we're at the one year. So I'll
28:19 give you like a contention in like the plausible way we get there, but as for
28:23 how it happens, who knows? So basically, if we're going to build a super
28:26 assistant, it has to be able to do things, right? So like we're going to
28:29 have a model and it's going to be able to do stuff affecting your world.
28:33 >> And one of the learnings I think we've seen over the past year or so is that
28:38 for models to do stuff, they're much more effective when they can use a
28:41 computer, right? Okay. So now we're like, okay, we need the super assistant that can use a
28:47 computer, right? or many computers. And now the question is, okay, well, how
28:50 should it use the computer, right? And there's lots of ways to use a computer.
28:54 Uh, you know, you could try to hack the OS and like use accessibility APIs.
28:57 Maybe a bit easier is you could point and click. That's a little slow, you
29:02 know, and, uh, unpredictable sometimes. Um, and another way, it turns out the
29:06 best way for models to use computers is simply to write code, right? And so
29:09 we're kind of getting to this idea where like, well, if you want to build any
29:12 agent, maybe you should be building a coding agent. And maybe to the user, a
29:17 nontechnical user, they won't even know they're using a coding agent. The same
29:19 way that no one thinks about are they using the internet or not, which is
29:22 they're more just like is Wi-Fi on? Right? So I think that what we're doing
29:27 with codeex is we're building a software engineering teammate. And as part of
29:30 that, we're kind of building an agent that can use uh a computer by writing
29:36 code. And so we're already seeing like some pull for this. It's like quite
29:39 early, but we're starting to see people like who are using codeex for like
29:43 coding adjacent product purposes. And so as that develops, I think we'll
29:47 just naturally see that like, oh, it turns out like we should just always
29:50 have the agent write code if there is a coding way to solve a problem instead
29:53 of, you know, even if you're doing a financial analysis, right? Like maybe
29:56 write some code for that. So basically like, you know, you were like, hey, is
29:59 this like the two ends of of uh of this product for the super assistant, right,
30:03 of CHCH PT? In my mind, like just coding is a core competency of any agent,
30:06 including Chach PT. And so like what really what we think we're building is
30:10 like that competency. But so here's here's like the really cool thing about
30:13 agents writing code is that you can import code right code is like
30:19 composable interoperable right because if if we you know one very reductive
30:23 view we could have for an agent is it's just going to be given a computer and
30:26 it's just going to like point and click and you know go around but you know that
30:32 is the future and then how we get there is difficult to sort of chart a path
30:36 because a lot of the questions around building agents aren't like can the
30:41 agent do it but it's more about well how can we help the agent understand the
30:44 context that it's working in and like the team that's using it you know
30:47 probably has a way that they like to do things they have guidelines they
30:50 probably want certain deterministic guarantees about what the agent can or
30:54 cannot do or they want to know that the agent understands sort of this detail
30:59 like an example would be you know if we're looking at a crash reporting tool
31:04 hitting a connector for it every sub team is probably has a different meta
31:07 prompt for like how they want the crashes to be analyzed ized, right? And
31:12 so we start to get to this thing where like, yeah, we have this agent sitting
31:15 in front of a computer, but we need to make that configurable for the team or
31:19 for the user, right? And let them like stuff that the agent does often, we
31:22 probably just want to like build in as a competency that this agent has that it
31:27 can do. So I think we end up with this generalizable thing that you were saying
31:31 of like an agent that can just write its own scripts for whatever it wants to do.
31:36 But I think that the the really key part here is can we make it so that
31:40 everything that the agent has to do often or that it does well we can just
31:44 like remember and store so that the agent doesn't have to write a script for
31:47 that again. Right. Or maybe like if I just joined a team and you are already
31:51 on the same team as me. I can just like use all those scripts that the agents
31:53 had written already. >> Yeah. It's like if this is our teammate
31:57 uh we can they can share things that it's learned from working with other
32:00 people at the company. Just makes sense as a metaphor. >> Yeah. It feels like you're in the uh
32:05 Karpathy camp of agents today are not that great and mostly slop and maybe in
32:09 the future they'll be awesome. Does that resonate? >> I think so. I think coding agents are
32:14 pretty great. I think >> uh ton of value, >> right? Yep.
32:19 >> And then I think like agents outside of coding, it's still like very early and
32:23 you know, this is just my opinion, but I think they're going to get a whole lot
32:26 better once they can use coding too and like in a composable way.
32:29 This is it's kind of the fun part of like when you're building for software
32:33 engineers. Like I at my startup we were building for software engineers too for
32:36 a lot of that journey and they're just such a fun audience to build for because
32:41 you know they also like building for themselves and are often like even more
32:45 creative than we are and thinking about how to use the technology. Um and so
32:48 like by building for software engineers you get to just observe a ton of
32:52 emergent behaviors and like things that you should do and build into the
32:55 product. I love how you you say that because a lot of people building for
32:57 engineers get really annoyed because the engineers are so they're just always
33:00 complaining about stuff. They're like, "Ah, that sucks. Why'd you build it this
33:04 way?" I love that you enjoy it, but I think it's probably because you're
33:06 building such an amazing tool for engineers that can actually solve
33:11 problems and just, you know, code for them. Um, kind of along those lines, you
33:15 know, there's always this talk of what will happen with jobs, engineers,
33:18 coding, do you have to learn coding, all these things? Uh clearly the way you're
33:21 describing it is it's a teammate. It's going to work with you, make you more
33:24 superhuman. It's not going to replace you with the way you just think about
33:28 the impact on the field of engineering having this super intelligent
33:33 engineering teammate. I think there's there's two sides to it, but the one we
33:37 were just talking about is this idea that maybe every agent should actually
33:43 use code and be a coding agent. And in my mind, that's just like a small part
33:46 of this like broader idea that like, hey, as we make code even more
33:48 ubiquitous, I mean, you could probably claim it's ubiquitous today, even pre
33:51 AAI, right? But as we make code even more ubiquitous, it's actually just
33:56 going to be used for many more purposes. And so there's just going to be a ton
33:59 more need for people with this like humans with this competency. So that's
34:05 my view. I think this is like quite a complex topic. So, you know, it's
34:08 something we talk about a lot and we have to kind of see how it pans out. But
34:12 I think what we can do what we can do basically as a product team building in
34:15 the space is just try to always think about how are we building a tool so that
34:18 it feels like we're like maximally accelerating uh people you know rather
34:24 than building a tool that makes it like more unclear what you should do as the
34:29 human right like I think like to to you know give an example right now like
34:33 nowadays when you work with a coding agent um it writes a ton of code but it
34:36 turns out writing code is actually one of the most fun parts of software
34:40 engineering for many software engineers. is so then you end up reviewing AI code,
34:45 right? And that's often a less fun part of the job for many software engineers,
34:49 right? And so I actually think like we see that like this this comes out plays
34:53 out all the time in like a ton of micro decisions. And so we as a product team
34:55 are always thinking about like okay, how do we make this more fun? How do we make
34:58 you feel more empowered whereas it's not working and I I would argue that like
35:01 reviewing agent written code is like a place that today is like less fun. And
35:06 so you know then I think okay what can we do about that? Well, we can ship a
35:09 code review feature that like helps you build confidence in the Irw written
35:12 code. Okay, cool. You know, another thing we can do is we can make it so
35:14 that the agent's like better able to validate its work. And you know, it gets
35:18 all the way down into like micro decisions like if you're going to have
35:23 the an agent capability to validate work and let's say you have like I'm thinking
35:27 of Codex web right now like you have a a pane that sort of reflects the work the
35:30 agent did. What do you see first? Do you see the diff or do you see the image
35:34 preview of the code it wrote? Right? And you know, I think if you're thinking
35:36 about this from perspective like how do I empower the human? How do I make them
35:40 feel like as as accelerated as possible like you obviously see the image first,
35:43 right? You shouldn't be reviewing the code unless first you know you've seen
35:46 the image unless maybe it's being like reviewed by an AI and now it's time for
35:49 you to take a look. When I had uh Michael Charel, the CEO of Cursor on the
35:53 podcast, he he had this kind of vision of us moving to something beyond code.
35:58 And I've seen this rise of something called specd driven development where
36:02 you kind of just write the spec and then the code, you know, the AI writes code
36:05 for you. And so you kind of start working at this higher abstraction
36:09 level. Is that something you see where we're going? Just like engineers not
36:12 having to actually write code or look at code and there's going to be this higher
36:16 level of abstraction that we focus on. Yeah, I mean I think I think there's
36:19 like constantly these levels of abstraction and they're actually already
36:23 played out today, right? Like today like coding agents mostly it's like prompt to
36:29 patch right we're starting to see people doing like spec driven development or
36:32 like planned driven development that's actually one of the ways when people ask
36:35 like hey how do you run codex on a really long task well it's like often
36:38 collaborate with it first to write like a plan MD like a markdown file that's
36:42 your plan and once you're happy with that then you ask it to go off and do
36:46 work and if that plan has verifiable steps it'll like work for much longer.
36:51 Um so we're totally seeing that. I think spec driven development is like an
36:55 interesting idea. It's not clear to me that it'll work out that way because a
36:57 lot of people don't write like don't like writing specs either, but it seems
37:02 plausible that some some people will work that way. You know, like a a bit of
37:06 a joke idea though is like if you think of like um the way that many teams work
37:11 today, they're they often like don't necessarily have specs, but the team is
37:14 just really self-driven and so stuff just gets done. And so almost that is
37:17 like I'm coming up with this on the spot so it's you know not a good name but
37:21 like chatterdriven development where it's just like stuff is happening you
37:24 know on social media and like in your team communications tools and then as a
37:28 result like code gets written and deployed right so yeah I think I'm a
37:33 little bit more oriented in that way of you know I don't even necessarily want
37:37 to have to write a spec like sometimes I want to only if I like writing specs
37:42 right uh other times I might just want to say like hey here's like the
37:45 customer, you know, service channel and like tell me what's interesting to know,
37:49 but if it's a small bug, just fix it. I don't want to have to write a spec for
37:51 that, right? >> I have this sort of uh hypothetical future uh that I like to
37:58 share sometimes with people as a provocation, which is like in a world
38:01 where we have like truly amazing agents, like what does it look like to be a
38:04 soloreneur? Um, and uh, you know, one terrible idea for how it could look is that it's
38:12 actually there's a mobile app and um, every idea that the agent has to do is
38:17 just like vertical video on your phone and then you can like swipe left if you
38:21 think it's a bad idea and you can like swipe right if it's a good idea and like
38:24 you can press and hold and like speak to your phone if you want to get feedback
38:28 on the idea before you swipe, you know. So in this world like basically what
38:31 your job is just to like plug in this app into like every single like signal
38:36 system you know system of record and then you just sort of sit back and like
38:39 swipe. I don't know. >> I love this. So this is like Tinder
38:42 meets Tik Tok meets codeex. >> It's pretty terrible. >> No, this is great. So the idea here is
38:47 this thing is this agent is watching and right listening to you paying attention
38:51 to the market your users and it's like cool here's something I should do. It's
38:54 like a proactive engineer just like here we should build this feature fix this
38:56 thing. >> Exactly. I think they're communicating with you in like the lowest like the
39:05 gyms like the modern way to communicate. >> Yeah. >> Swipe left or right and in vertical feed
39:10 and then the Sora video. Okay. So I see how this all connects now. I see.
39:13 >> Yeah. To be clear, we're not building that but like you know it's a fun idea.
39:17 I mean you see you know like in this example though like one of the things
39:19 that it's doing is it's consuming external signals right. I think the
39:23 other really interesting thing is like if we think about like what is the most
39:28 successful like AI product to date um I would argue um it's funny actually
39:34 not to confuse things at all but like the first time we used the the brand
39:38 codeex at OpenAI was actually the model powering GitHub copilot. This is like
39:42 way back in the day, years ago. And so we decided to reuse that that brand
39:45 recently um because it's just so good, you know, codeex code execution. But I
39:50 think actually like autocomp completion and IDEs is like one of the most
39:54 successful AI products to date. And part of what's so magical about it is that
40:01 when the it can surface like ideas for helping you really rapidly. When it's
40:05 right, you're accelerated. When it's wrong, it's not like that annoying. It
40:08 can be annoying, but it's not that annoying, right? And so you can create
40:12 this like mixed initiative system that's like contextually responding to like
40:17 what you're attempting to do. And so in my mind, this is like a really
40:21 interesting thing for us as open as we're building. So for instance, you
40:25 know, when I think about launching a browser, which we did with Atlas, right?
40:29 Like in my mind, one of the really interesting things we can then do is we
40:33 can then like contextually surface like ways that we can help you as you're
40:37 going about your day, right? And so we break out of this like, you know, we're
40:41 just looking at code or we're just in your terminal um into this idea that
40:44 like, hey, like a real teammate is dealing with a lot more than just code,
40:47 right? They're dealing with a lot of things that are web content. So like,
40:51 you know, how can we help you with that? >> Man, there's so much there and I love
40:55 this. Okay, so autocomplete on web with the browser. That's so interesting. just
40:58 like here's all the things that we can help you with as you're browsing and
41:01 going about your day. I want to talk about Atlas. I'll come back to that. Uh
41:05 codeex code execution. Did not know that. That's really clever. I I get it
41:10 now. Okay. And then this chatter, what is a chatter driven development? Uh I
41:14 had a No, this is a really good idea, but it reminds me I had John Gon on the
41:19 podcast, CTO of Block, and they they have this product called Goose, which is
41:24 their own internal agent thing. And he talked about an engineer at block just
41:30 uh has goose watch him with like his screen and listens to every meeting and
41:36 proactively does work that he should will probably want to do. So ships a PR
41:41 sends an email drafts a Slack message. So he's doing exactly what you're
41:44 describing in in kind of a very early way. >> Yeah, that's super interesting. And you
41:49 know, I bet you the So, if we go if we went and asked them what the bottleneck
41:52 to that productivity is, did did they share what it is? >> Uh, probably looking at it just making
41:57 sure this is the right the right thing to do. Yeah. >> Yeah. So, like we see this now like we
42:01 have a Slack integration for Codex. People love, you know, if there's like
42:04 some thing that you need to do quickly. People just like at mentioned Codex like
42:07 why do you think this bug is happening? Right. Doesn't have to be an engineer.
42:10 Even like maybe you know data scientists often here are using Codex a ton to just
42:14 like answer questions like why do you think this metric moved? What happened?
42:18 So questions you you get the answer right back in Slack. It's amazing, super
42:22 useful. But when it's as for when it's writing code, then you have to go back
42:27 and look at the code, right? And so the real like I think bottleneck right now
42:30 is like validating that the code worked and like writing code review.
42:34 So in my mind, if we wanted to get to something like uh you know that uh a
42:38 friend you were talking about world, I think we we really need to figure out
42:42 how to get people to configure their coding agents to be much more autonomous
42:46 on those later stages of the work. It makes sense like you said writing code.
42:49 I used to be an engineer as an engineer for 10 years. Really fun to write code.
42:53 Really fun to just get in the flow, build, architect, test. Not so fun to
42:56 look at everyone else's code and just have to go through and be on the hook if
43:00 it is doing something dumb that's going to take down production. And now that
43:03 building has become easier, what I've always heard from companies that are
43:06 really at the cutting edge of this is the bottleneck is now like figuring out
43:09 what to build and then it's at the end of like, okay, we have all this all 100
43:13 hours to review. Who's going to go through all that? >> Right. Yeah.
43:19 This episode is brought to you by Jira product discovery. The hardest part of
43:22 building products isn't actually building products. It's everything else.
43:26 It's proving that the work matters, managing stakeholders, trying to plan
43:30 ahead. Most teams spend more time reacting than learning, chasing updates,
43:34 justifying road maps, and constantly unblocking work to keep things moving.
43:39 Jira product discovery puts you back in control. With Jira product discovery,
43:43 you can capture insights and prioritize high impact ideas. It's flexible, so it
43:47 adapts to the way your team works and helps you build a road map that drives
43:51 alignment, not questions. And because it's built on Jira, you can track ideas
43:56 from strategy to delivery, all in one place. Less chasing, more time to think,
44:01 learn, and build the right thing. Get Jirroduct Discovery for free at
44:06 atlassian.com/lenny. That's atassian.com/lenny. What has the impact of Codex been on the
44:13 way you operate as a product person, as a PM? It's clear how engineering is
44:19 impacted. Uh, code is written for you. What has it done to the way you operate,
44:24 the way PMs operate at at OpenAI? Yeah, I mean I think mostly I just feel like
44:28 much more empowered. Um I've always been sort of more technical leaning PM and especially when
44:34 I'm working on products for engineers, I feel like it's necessary to like you
44:37 know dog food the product but even beyond that I I I just feel like I can
44:42 do much much more as a PM. And uh you know Scott Beltski talks about this idea
44:45 of like compressing the talent stack. I'm not sure if I've phrased that right,
44:48 but it's basically this idea that like maybe the boundaries between these roles
44:52 are a little bit like less needed than before because people can just do much
44:57 more and every time you someone can do more you can like skip one communication
45:00 boundary and make the team like that much more efficient, right? So I think I
45:07 think we see it you know in a bunch of functions now but I guess since you
45:11 asked about like product specifically uh you know now like answering questions
45:15 much much easier you can know just ask codeex for thoughts on that uh a lot of
45:20 like PM type work understanding what's changing again just ask codeex for help
45:25 with that um prototyping is often faster than writing specs this is something
45:29 that a lot of people have talked about I think something that I don't think it's
45:33 super surprising But something that's slightly surprising is like we see like
45:36 we're mostly building codecs for to write code that's going to be deployed
45:40 to production but actually we see a lot of throwaway code written with codeex
45:43 now. It's kind of going back to this idea of like you know ubiquitous code.
45:48 So you'll see uh you know someone wants to do an analysis like if I want to
45:51 understand something it's like okay just give codeex a bunch of data but then ask
45:54 it to build like an interactive like data viewer for this data right you
45:56 would that's just like too annoying to do in the past but now it's just like
46:00 totally worth the time of just getting an agent to go do something. Um,
46:04 similarly, I've seen like some pretty cool prototypes on our design team about
46:09 like if you want to well like a designer basically wanted to build an animation
46:13 and this is the coin animation in codeex and it was like normally it'd be too
46:17 annoying to program this animation. So they just vibe coded a animation editor
46:21 and then they use the animation editor to build the animation which they then
46:25 checked into the repo. Actually, our designers are there's a ton of
46:28 acceleration there. And like speaking of compressing the town stack, I think our
46:31 designers are very PM. So, you know, they they do ton of product work. And like they actually
46:38 have like an entire like vibecoded sort of side prototype of the Codex app. And
46:41 so, a lot of how we talk about things is like we'll have like a really quick jam
46:44 because there's like 10,000 things going on. And then designer will like go think
46:48 about how this should work, but instead of like talking about it again, they'll
46:50 just like vibe code a prototype of that in their like standalone prototype.
46:54 We'll play with it. If we like it, they'll vibe code that prototype into or
46:59 vibe engineer that prototype into an actual PR to land. And then depending on
47:02 their comfort with the codebase, like codeex CLI and Rust is a little harder.
47:06 Maybe they'll like land it themselves or they'll like get close and then an
47:09 engineer can help them like land the PR. Um, you know, we recently shipped the
47:15 Sora Android app. Um and uh that was one of the most sort of mind-blowing
47:19 examples of acceleration actually because usage of of codeex internally at
47:24 open is obviously really really high but it's been growing uh over the course of
47:28 the year both in terms of like now it's basically like all technical staff use
47:32 it uh but even like the intensity and knowhow of how to make the most of
47:35 coding agents has gone up by a ton and so the Sora Android app right like a
47:42 fully new app we built it in 18 days it went from like zero to launch to
47:46 employees and then 10 days later so 28 days total we went to just like GA to
47:51 the public and that was done just like with the help of Codex
47:56 so pretty insane velocity I would say it was like a little bit I don't want to
48:01 say easy mode but there is one thing that Codex is really good at if you're a
48:04 company that's like building software on multiple platforms so you've already
48:07 figured out like some of the underlying like APIs or systems asking codeex to
48:13 like to port things over is really effective because it has like something
48:15 you can go look at. And so the engineers on that team uh were basically having
48:20 codeex go look at the iOS app, produce plans of work that needed to be done and
48:23 then go implement those. And it was kind of looking at iOS and Android at the
48:27 same time. And so you know basically it was like two weeks to launch to
48:30 employees four weeks total. Insanely fast. >> What makes that even more insane is it
48:35 was the it became the number one app in the app store. >> I don't this just boggles the mind.
48:39 Okay. So >> yeah. So imagine releasing number one app on the app store with like a handful
48:45 of engineers >> uh I think it was like >> two or three possibly
48:53 >> uh in a handful of weeks. Yeah, this is absurd. So >> yeah, so that's a really fun um example
49:01 of uh acceleration. And then like Atlas was the other one that I think um Ben
49:06 did a podcast the the the engine on Atlas uh sharing a little bit of how we
49:12 built there. You know many Atlas is is actually I mean it's it's a browser
49:15 right and building a browser is really hard. Um and so we uh had to build a lot
49:23 of difficult systems in order to do that and basically we got to the point where
49:27 that team has a ton of power users of codecs right now. And um you know it got
49:32 to the point where they they basically were you know we were talking to them
49:34 about it because a lot of those engineers are people I used to work with
49:38 before at my startup and so they'd say you know before this would have taken us
49:42 like two to three weeks for two to three engineers and now it's like one engineer
49:48 one week. Um so massive acceleration there as well. And what's quite cool is
49:52 that uh you know we we shipped Atlas on on Mac first but now we're working on
49:56 the Windows version. you know that so the team now is like ramping up on
49:58 Windows and they're helping us make codecs better on Windows 2 which is
50:02 admittedly earlier like just the model we we shipped last week is the first
50:06 model that natively understands PowerShell. So you know PowerShell being
50:11 uh the native like shell language on Windows. So yeah, it's been it's been
50:16 really awesome to see like the whole company getting accelerated by codeex
50:21 like from and you know most obviously also research and like improving how
50:24 quickly we train models and how well we do it and then even like uh design as we
50:28 talked about and and marketing like actually we're at this point now where
50:32 uh my product marketer is often also making string changes just directly from
50:36 Slack or like updating docs directly from Slack. >> These are amazing examples. You guys are
50:42 living at the bleeding edge of what is possible and this is how other companies
50:46 are going to work. Uh just shipping again what became the number one app in
50:49 the app store and just beloved all over the it just like took over the I don't
50:54 know the world for at least a week. Uh built you said in 28 days and like I
50:58 don't know 10 days 18 days just to get like the core of it working.
51:02 >> Yeah. So like 18 days we had a thing that employees were playing with and
51:05 then 10 days later we were out. >> And you said just a couple engineers.
51:07 >> Yeah. >> Two or three. Okay. And then Atlas you
51:11 said was took a week to build. >> No, no, no. So Atlas, not the whole
51:16 week, but Atlas was like a really meaty project. >> Yeah.
51:18 >> Um and so I was talking to one of the engineers on Atlas um about like you
51:23 know just how what they use codex for and it's basically like we use codex for
51:25 absolutely everything. I was like okay well like you know how would you how
51:29 would you measure the acceleration? And so basically the the answer I got back
51:31 was >> previously it would have taken two to three weeks for two to three engineers
51:36 and now it's like one engineer one week. Do you think this eventually moves to
51:39 non-engineers doing this sort of thing? Like does it have to be an engineer
51:42 building this thing? Could sort of have built been built by I don't know a PM or
51:46 designer. I think we will very much get to the point where well basically where
51:50 the boundaries are a little bit blurred, right? Like I think you're going to want
51:54 someone who's like understands the details of what they're building, but
51:58 what details those are will evolve. Kind of like how now like if you're writing
52:02 Swift, you don't have to speak assembly. You know, there's a handful of people in
52:05 the world and it's really important that they exist. and like speak assembly. Uh
52:09 maybe more than a handful, right? But that's like a specialized function that
52:14 like most companies don't need to have. So I think we're just going to naturally
52:17 see like an increase in layers of abstraction. And then the cool thing is
52:21 now we're entering like the language layer of abstraction like natural
52:25 language. And then natural language itself is really flexible, right? Like
52:29 you could have engineers talking about like a plan and then you could have
52:32 engineers talking about a spec and then you could have engineers talking about
52:35 just, you know, a product or an idea. So I think we can also like start moving up
52:39 those layers of of abstraction as well. But you know I I do think this is going
52:43 to be gradual. I don't think it's going to go to like all of a sudden like
52:46 nobody ever writes anything and like you know any code and it's just specs. I
52:49 think it's going to be much more like okay we've set up our coding agent to be
52:53 really good at like previewing the build or like at running tests. Maybe that's
52:56 the first part right that most people have set up. And it's like okay now
52:59 we've set it up so that it can like execute the build and it can like see
53:03 the results of its own changes but you know we haven't yet built a good
53:06 integration harness so that it can like in the case of Atlas like by the way I
53:08 don't know if they've done any of this or not I think they've done a lot of
53:11 this but you know maybe the next stage is like enable it to like load a few
53:16 sample pages to see how well those work right so then okay now we're going to
53:19 like set up set up do that and I think for some time at least we're going to
53:22 have humans kind of curating like which of these connectors or systems or
53:26 components that the agent needs to be good at talking to and then you know in
53:30 the future there will be an even greater unlock where Codex tells you how to set
53:34 it up or maybe sets itself up in a repo. What a wild time to be alive. Wow. I'm
53:38 curious just the second order effects of this sort of thing. Just how quickly it
53:42 is to build stuff. What does that do? Does that mean distribution becomes much
53:46 much more important? Does it mean uh ideas are just worth a lot more? It's
53:50 interesting to think about how quick how that changes. >> I'm curious what you think. I still
53:56 don't think ideas are worth as much as maybe some a lot of people think. I
53:59 think still think execution is really hard, right? Like you can build
54:01 something fast, but you still need to execute well on it. Still needs to make
54:06 sense and be a coherent thing overall. Um Yeah. And distribution is massive.
54:10 >> Yeah. Just feels like everything else is now more important. Everything that
54:13 isn't the building piece, which is >> coming up with an idea, getting to
54:17 market, profit, >> all that kind of stuff. I I think we
54:21 might have been in this weird temporary phase where you know for a while like
54:26 you could you could just it was so hard to build product that you mostly just
54:31 had to be really good at building product and it maybe didn't matter if
54:34 you like had an intimate understanding of a specific customer.
54:39 Um, but now I think we're getting to this point where actually like if I
54:42 could only choose like one thing to understand, it would be like really
54:46 meaningful understanding of like the problems that a certain customer has,
54:49 right? If I could only if I could only go in with one like core competency. So
54:54 I think that that's that's ultimately still what's going to matter most,
54:57 right? Like if you're starting a new company today and you have like a really
55:02 good understanding and like network of customers that are currently underserved
55:05 by AI tools, I think you're like you're set, right? Whereas if you're like good
55:09 [clears throat] at building like you know websites, but you don't have any
55:12 specific customer to build for, I think you're in for a much harder time.
55:17 Bullish on vertical AI startups is what I'm hearing. Yeah, I completely agree.
55:20 There's like, you know, there's like the general thing that can solve a lot of
55:23 problems and then there's like we're going to solve presentations incredibly
55:25 well and we're going to understand the presentation problem uh better than
55:30 anyone and we're going to uh plug into your workflows and all these other
55:33 things that matter for a very specific problem. Okay. Incredible. When you
55:39 think about progress on codecs, I imagine you have a bunch of evals and
55:42 there's all these public benchmarks. What's something you look at to tell
55:45 you, okay, we're making really good progress. I imagine it's not going to be
55:48 the one thing, but what do you focus on? What's like something you're trying to
55:51 push? What's like a KPI or two? One of the things that I'm constantly reminding
55:56 myself of is that a tool like Codex sort of naturally is a tool that you would,
56:00 you know, become a power user of, right? And so we can accidentally spend a lot
56:03 of our time thinking about features that are like very deep in the user adoption
56:08 journey. Um, and so we can kind of end up oversolving for that. And so I think
56:12 it's like just critically important to like go look at like your like D7
56:16 retention, right? just go try the product. Like sign up from scratch
56:19 again. Um I have a few too many like catchup pro accounts that I've just like
56:24 in order to maximally correctly dog food like signed up for on my Gmail and they
56:27 charge me like 200 bucks a month. I need to expense those. But uh uh you know
56:33 like I think just like the feeling of being a user and the early retention
56:37 stats are still like super important for us because you know as much as this
56:41 category is is taking off I think we're still in the very early days of like
56:45 people using them. Um, another thing that we do that that might might be I
56:51 think we might be the most like user feedback slashsocial media pill team out
56:56 there in this space is like a few of us are like constantly on Reddit and
57:01 Twitter and uh you know there's a there's praise up there and there's a
57:04 lot of complaints but we take the complaints like very seriously and look
57:08 at them and I think that again because you can use like coding agent for so
57:12 many different things um it often is like kind of broken in any sort of ways
57:17 for like specific behaviors. Um, and so we we actually monitor a lot just like
57:20 what the vibes are on social media pretty often, especially I think for for
57:27 Twitter X, um, it's a little bit more hypy and then Reddit is a little more
57:34 negative but real actually. Um, so I've started increasingly paying attention to
57:37 like how people are talking about using Codex on Reddit. Actually,
57:41 >> this is uh important for people to know. Which the subreddits do you check most?
57:44 Is there like an R codeex or >> I mean the algorithm is pretty good at
57:48 surfacing stuff but like r/codex is is there >> okay I'll take very interesting and then
57:52 uh if people tag you on Twitter you still see that but maybe not as powerful
57:56 as seeing it on Reddit. >> Well yeah the interesting well the thing
57:58 with Twitter is it's a little bit more onetoone even if it's like in public
58:01 whereas like with Reddit there's like really good upvoting mechanics and like
58:05 maybe most people are still not bots unclear. Um so you get you get like good
58:09 signal on what matters and what other people think. So uh interestingly uh
58:13 Atlas I want to talk about that briefly. Uh you guys launched Atlas. I tweeted
58:18 actually that I tried Atlas and then I I don't love the AI only uh search
58:23 experience. I was just like I just want Google sometimes or whatever like just
58:26 waiting for AI to give me an answer. I'm like I don't want to and there was no
58:29 way to switch. I just tweeted hey I'm I'm switching back. I don't it's not
58:32 great. And I feel like I made some PMs at OpenAI sad and I saw someone tweet
58:37 okay we have this now which I imagine was always part of the plan. It's
58:40 probably an example of we just ship we got to ship stuff, see how people use it
58:43 and then we figure it out. Uh so I guess one is that I don't know is there
58:46 anything there and two I'm just curious why are you guys building a web browser?
58:51 So I I worked on Atlas for a bit. Um I don't work on it now. Um but you know
58:55 like the a bit of the narrative here for for me just to tell my story a bit was
58:58 like I was working on this like screen sharing like pair programming startup
59:03 right and then we joined open AI and so the idea was really to build a
59:07 contextual desktop assistant and the reason I believe that's so important is
59:11 because I think that it's really annoying to have to give all your
59:14 context to an assistant and then to figure out how it can help you right and
59:18 so if it could just like understand what you are trying to do then it could
59:23 maximally accelerate do um and so I I I would you know I still think of Codex
59:26 actually as like a contextual assistant um from a little bit of a different
59:30 angle like starting with coding tasks but um the some of the some of the
59:36 thinking at least for me personally I can't speak for the whole project but
59:40 was that a lot of work is done in the web and if we could build a browser then
59:45 we could be contextual for you but in a much more first class way we weren't
59:48 hacking like other desktop software which have like very varied report for
59:53 for like what content they're rendering to the accessibility tree. Uh we
59:56 wouldn't be relying on screenshots which are a little bit slower and unreliable.
60:00 Instead, we we could like be in the rendering engine, right? And like
60:03 extract whatever we needed to to help you. Um and also I like to think of like
60:09 you know video games like I don't know if you've played like I don't know say
60:13 Halo right like you walk up to an object. I mean this true for many games
60:16 you press man it's been a long time this is embarrassing. press X and it just
60:21 does the right thing, right? And I was one of those guys who always read the
60:23 instruction manual for every video game that I bought. And I remember the first
60:26 time I read about a contextual action and I just thought it was like this
60:31 really cool idea. And uh you know the the thing about a contextual action is
60:34 we need to know what you are attempting to do. We need to have a little bit of
60:37 context and then we can and then we can help. Uh, and I think this is critically
60:43 important because you know, imagine this world that we reach, right, where we're
60:45 we have agents that are helping you thousands of times per day. Um, imagine
60:50 if the only way we could tell you that we helped you is if we could like push
60:55 notify you. So, you get a thousand push notifications a day of an AI saying
60:59 like, "Hey, I did this thing. Do you like it?" It'd be super annoying, right?
61:03 Whereas imagine going back to software engineering like I was looking at a
61:07 dashboard and I noticed some like key metric had like gone down
61:12 and you know at that point in time an II could like maybe go take a look and then
61:15 surface the fact that it has an opinion on why this metric went down and maybe a
61:19 fix right there right when I'm looking at the dashboard right that would be
61:22 like that would much more keep me in flow and enable the agent to take action
61:27 on like many more things so in my mind like part of why I'm excited for us to
61:32 have a browser is that I think we have then like much more context around like
61:37 what we should help with. Users have much more control over what they want us
61:40 to look at. It's like hey if you want to open if you want us to like take action
61:43 on something you can open it in your AI browser. If you don't then you can open
61:46 it in your other browser right? So like really clear control and boundaries and
61:51 then we have the ability to build UX that's like mixed initiative so that we
61:54 can surface contextual actions to you like at the times they're helpful as
61:58 opposed to just like randomly notifying you. hearing the vision for Codeex being
62:01 the super assistant. It's not just there to code for you. It's trying to do a lot
62:05 for you as a teammate, as this kind of super teammate that makes you awesome at
62:10 work. So, I get this. Speaking of that, are there other non-engineering
62:15 common use cases for codecs? Just ways that non-engineers, we talked about it,
62:18 you know, designers prototyping and building stuff. Are there any, I don't
62:22 know, fun or unexpected ways people are using codecs that aren't engineers? I
62:25 mean there's a load of a load of unexpected ways but I think like most of
62:31 where we're seeing like real traction with people using things are still for
62:35 now like very like I would say coding adjacent or like sort of tech oriented
62:39 places where there's like a mature ecosystem um or you know maybe you're
62:43 doing data an data analysis or something like that. I personally am expecting
62:47 that we're going to see a lot more of that over time. Um, but for now like
62:51 we're keeping the team like very focused on just coding for now because there's
62:54 so much more work to do. >> For people that are thinking about
62:58 trying out codecs, is there like um does it work for all kinds of code bases?
63:02 What what code does it support? If you're like I don't know SAP, can you
63:06 add codec and start building things? What's kind of like the sweet spot or
63:11 does it start to not be amazing yet? This I'm really glad you asked this
63:14 question actually because the best way to try codeex is to give it your hardest
63:19 tasks which is a little different than some of the other coding agents like you
63:23 know some tools you might think okay let me like start easy or just like you know
63:27 like vibe code something random and decide if I like the tool whereas like
63:32 we're really building codeex to be the like professional tool that you can give
63:36 your like hardest problems to um and you know that writes like high quality code
63:40 in your like enormous code base that is in fact not perfect right now. So yeah,
63:43 I think if you're going to try codeex, you want to try it on like a real task
63:48 that you have and not necessarily like dumb that task down to something that's
63:53 like trivial, but actually like you know like a good one would be like you have a
63:55 hard bug and you don't know what what's causing that bug and you ask Codex to
63:59 like help figure that out or like to implement that, you know, the fix.
64:02 >> I love that answer. Just give it your hardest problem. I will say like you
64:05 know if you if you're like hey okay well the hardest problem I have is that I
64:08 need to build like a new unicorn business like obviously that you know
64:13 it's not going to work not yet. So I think it's like give it like the hardest
64:18 problem but something that is still like one like question right or one task um
64:23 to start that's if you're testing and then over time you can learn how to use
64:25 it for like bigger things. >> Yeah. What languages does does it
64:28 support? Basically the way we've trained codeex is like there's a distribution of
64:32 languages that we support and it's like fairly aligned with like the frequency
64:36 of these languages in the world. So unless you're writing some like very
64:39 esoteric language or like some private language, it should do fine in your
64:42 language. If someone was just getting started, is there a tip you could share
64:46 to help them be successful? Like if you could just whisper a little tip into
64:49 someone just setting up Codex for the first time to help them have a really
64:53 good time, what's something you would whisper? >> I might say try a few things in
64:57 parallel, right? Right? So you could try giving it a hard task. Um maybe ask it
65:03 to understand the codebase. Uh formulate a plan with it around an idea that you
65:07 have and kind of build your way up from there. And like sort of the meta idea
65:11 here is it's again it's like you're building trust with the new teammate,
65:15 right? And so like you wouldn't go to a new teammate and just give them like hey
65:18 do this thing here's zero context. you would start by like first making sure
65:22 they understand the codebase and then you would like maybe align on a an
65:24 approach and then you would have them go off and do bit by bit right and I think
65:28 if you use codeex in that way you'll just sort of naturally start to
65:30 understand like the different ways of prompting it because it is it's a super
65:35 powerful like agent and model but it is it is a little bit different to prompt
65:38 codeex and other models just a couple more questions one we touch on this a
65:44 little bit as AI does more and more coding there's always this question of
65:48 should I learn to code why should they spend time doing this sort of thing. For
65:52 people that are trying to figure out what to do with their career, especially
65:55 if they're into software engineering, computer science, do you think there's
65:59 specific elements of computer science that are mo more and more important to
66:03 lean into maybe things they don't need to worry about? Like what do you think
66:06 people should be leaning into skill-wise in as this becomes more and more of a
66:11 thing in our workplace? I think there's like a couple angles you could go at
66:18 this from. Um, I think the, well, the easiest one to think of at
66:24 least is just like be a doer of things. Um, I think that, you know, with coding
66:28 agents, um, getting better and better over time. It's just what you can do as
66:33 even like someone in college or a new grad is just like so much more than what
66:37 that was before. And so, I think you just want to be taking advantage of
66:40 that. You know, definitely when I'm looking at like hiring folks who are
66:43 earlier career, it's like definitely something that I think about is how how
66:47 productive are they using the latest tools, right? They should be like super
66:51 productive. And if you think of it in that way, they actually have like less
66:55 of a handicap than before versus a more senior career person because, you know,
66:59 the divide is actually getting smaller because they've got these amazing coding
67:02 agents now. Um, so that's one thing which is like I guess the thing the
67:05 advice is just like learn about whatever you want but just make sure you spend
67:08 time doing things not just like fulfilling homework assignments. I guess
67:12 I think the other side of it though is that it's still deeply worth
67:17 understanding like what makes a good like overall software system. So I still
67:22 think that like skills like really strong systems engineering skills or
67:27 even like really effective like communication and collaboration with
67:31 your team, skills like that I think are are important are going to continue to
67:35 matter for for quite some time. Like I don't think it's going to be like all of
67:39 a sudden uh the AI coding agents are just able to build like perfect systems
67:43 without your help. I think it's going to look much more gradual where it's like
67:48 okay we have these AI coding agents they're able to validate their work it's
67:52 still important and like for example like I'm thinking of an engineer who was
67:55 working on Atlas since we were talking about it he set up codeex so it can like
67:59 verify its own work which is a little bit non-trivial because of the nature of
68:02 the Atlas project. So the way that he did that was he actually prompted codeex
68:05 like hey why can't you verify your work fix it and like did that on a loop right
68:11 and so you still like at various phases are going to want a human in the loop to
68:15 like help configure the coding agent to be effective and so I think like you
68:19 still want to be able to reason about that so maybe it's like less important
68:23 that you can like type really fast and like you understand exactly how to write
68:27 not that anyone writes a you know for each loop or something right but it is
68:31 or you know you don't need to know how implement like a specific algorithm. But
68:33 I think you need to be able to reason about the different systems and like
68:36 what makes like effective a software engineering team effective. So I think
68:40 that's the other really important thing. And then like maybe the last angle that
68:44 you could take is I think if you're on the frontier of knowledge for a given
68:49 thing, I still think that's like deeply interesting to go down partially because
68:54 that knowledge is still going to be like uh you know agents aren't going to be as
68:58 good at that. But also partially because I think that like by trying to advance
69:01 the frontier of a specific thing, you'll actually like end up like being forced
69:05 to take advantage of coding agents and like using them to accelerate your own
69:09 workflow as you go. >> What's an example that when you when you
69:12 talk about being at the frontier? So >> Codex writes a lot of the code that
69:15 helps like manage its training runs, the key infrastructure. Uh you know, we move
69:21 pretty fast and so we have a Codex code review is like catching a lot of
69:23 mistakes. It's actually caught some like pretty interesting configuration
69:27 mistakes and uh you know we're starting to see glimpses of the future where
69:31 we're actually starting to have codeex even like be on call for its own
69:36 training which is pretty interesting. Um so there's lots there.
69:39 >> Uh wait what does that mean to be on call for its own training? So it's
69:42 running it's training and it's like oh something broke someone needs and it it
69:45 does it like alert people or it's like here I'm going to fix the problem and re
69:48 restart. This is an early idea that we're like figuring out, but the basic
69:51 idea is that you know during a training run there's like a bunch of graphs that
69:54 like today like humans are looking at and it's like really important to like
69:58 look at those. Um we call this babysitting >> because it's very expensive to train I
70:02 imagine and very important to move fast and exactly and there's a lot of there's
70:06 a lot of systems underlying uh the training run and so like a system could
70:09 go down or there could be an error somewhere that gets introduced and so we
70:13 might need to like fix it or pause things or I don't know there's lots of
70:16 actions we might need to take and so basically having codeex like run on a
70:20 loop to like evaluate how those charts are moving over time um is sort of this
70:24 idea that we have to like how to enable us to like train like way more
70:27 efficiently. I love that. This is very much along the lines of this is the
70:31 future of agents. It's codeex isn't just for building code, right? It's it's a
70:34 lot more than that. >> Yeah. >> Okay. Last question. Uh being at OpenAI,
70:41 uh I can't not ask about your AGI timeline and how far you think we are
70:45 from AGI. I know this isn't what you work on, but there's a lot of opinions,
70:50 a lot of I don't know timelines. How far do you think we are from a humanly human
70:56 version of AI? Whatever that means to you. For me, I think that it's a little
71:01 bit about like when do we see the acceleration curves kind of go like this
71:03 or I don't know which way I'm mirrored here, right? When do we see the hockey
71:08 stick? And I think that the current limiting factor, I mean there's many,
71:11 but I think a current underappreciated limiting factor is like literally human
71:16 typing speed or human multitasking speed on like writing prompts,
71:20 right? And like you know, you were talking about it's like you can have an
71:22 agent like watch all the work you're doing, but if you don't have the agent
71:27 uh also validating its work, then you're still bottlenecked on like can you go
71:30 review all that code, right? So my view is that we need to um unblock those
71:36 productivity loops from like humans having to prompt and humans having to
71:40 like manually validate all the work. And so if we can like rebuild systems to let
71:45 the agent like be default useful, we'll start unlocking hockey sticks.
71:48 Unfortunately, I don't think that's going to be binary. I think it's going
71:51 to be very dependent on what you're building, right? So like I would imagine
71:55 that like next year if you're a startup and you're building a new new piece of
71:59 like you know some new app or something it'll be possible for you to set it up
72:02 on a stack where agents are like much more self sufficient than not right but
72:07 now let's say I don't know you message SAP right let's say you work in SAP like
72:11 they have many like complex systems and they're not going to be able to just
72:13 like get the agent to be self-sufficient overnight in those systems so they're
72:17 going to have to slowly like maybe replace systems or update systems to
72:21 allow the agent to like handle more of the work end to end. And so basically my
72:25 sort of long answer to your question, maybe boring answer is that I think
72:29 starting next year we're going to see like early adopters like starting to
72:33 like hockey stick their productivity. Um and then over the years that follow,
72:36 we're going to see larger and larger companies like hockey stick that
72:39 productivity. And then somewhere in that fuzzy middle is like when that hockey
72:44 sticking will be like flowing back into the AI labs and that's when we'll we'll
72:48 basically be at the AGI tier. >> I love this answer. It's very practical
72:52 and it's something that comes up a lot on this podcast just like the time to
72:55 review all the things AI is doing is really annoying and a big bottleneck. I
72:59 love that you're working on this because it's one thing to just make coding much
73:03 more efficient and do that for people. It's another to take care of that final
73:08 step of okay is this actually great? And that's so interesting that your sense is
73:11 that's the limiting factor. It comes back to your earlier point of even if AI
73:16 did not advance anymore. We have so much more potential to unlock if we uh as we
73:22 learn to use it more effectively. Uh so that is a really unique answer. I
73:25 haven't heard that perspective on what is the big unlock human typing speed to
73:29 review basically what AI is doing for us. >> Mhm. So good. Okay. Uh Alexander, we
73:35 covered a lot of ground. Is there anything that we haven't covered? Is
73:38 there anything you wanted to share, maybe double down on before we get to
73:44 our very exciting lightning round? I think uh one thing is that the codeex
73:48 team is growing and uh as I was just saying, we're still somewhat limited by
73:51 human thinking speed and human typing speed. We're working on it. So um if
73:58 you're an engineer um or a salesperson or I am hiring for product, a product
74:03 person, uh please hit us up. I'm not sure the best way to give contact info,
74:06 but I guess you can go to our jobs page or do they have contact for you?
74:10 Actually, do listeners have contact for you >> before they send me like, "Hey, I want
74:13 to apply to Codex." >> Uh, I do have a contact form at lenny
74:16 richchi.com. I'm afraid of all the amazing people that are ping me. But
74:19 there we go. We could try that. Let's see how that goes. >> Okay. Or Yeah. Or another maybe an
74:24 easier. We can edit all that out or up to you. But uh yeah, or I would just say
74:28 you can drop us a DM. Uh, for example, I'm Emir Rico on Twitter and hit me up
74:32 if you're interested in joining the team. >> What a dream job for so many people.
74:38 What's a sign they I don't know what's like a way to filter people a little bit
74:42 so they're not flooding your inbox. >> So, specifically, if you want to join
74:46 the codeex team, then you need to be a technical person who uses these tools.
74:50 And I think I would just ask yourself the question, uh, hey, let's say, you
74:54 know, I were to join OpenAI and work on Codeex over the next six months, you
74:59 know, and crush it. What does the life of a software engineer look like then?
75:02 And I think if you have an opinion on that, you should apply. And if you don't
75:05 have an opinion on that and have to think about it first, you know,
75:09 depending on how long you have to think about it, I guess that would be the
75:12 filter, right? Like I think there's a lot of people thinking about the space
75:16 and so we're we're very interested in folks who sort of have already been
75:21 thinking about like what the future should look like with agents and like we
75:23 don't have to agree on where where we're going but I think we want people who
75:26 like are very passionate about the topic. I guess >> it's very rare to be working on a
75:32 product that has this much impact and is at such a bleeding edge of where it's
75:37 possible. It's uh what a cool role for the right person. So, uh, um, it's
75:40 awesome that you have an opening and this audience is, uh, a really good fit
75:45 potentially for for that role. So, I hope we find someone that would be
75:49 incredible. With that, we've reached our very exciting lightning round. I've got
75:53 five questions for you, Alexander. Are you ready? >> I don't know what these are, but I'm
75:57 excited. Let's do it. >> Uh, they're uh, the same questions ask
76:02 everyone except for the last one. So, uh, probably not a surprise. I should
76:06 probably make them more more often a surprise. Okay, first question. What are
76:09 a couple books that you recommend most to other people? Two or three books that
76:14 come to mind. I have been reading a lot of science fiction recently. And I'm
76:18 sure this has been recommended before, but The Culture, I think it's Ian Banks is the name of
76:24 the author. Part of why I love it is because it's like basically relatively
76:30 recent writing about a future with AI, but it's an optimistic future with AI.
76:34 Um, and I think, you know, a lot of sci-fi is like fairly dystopian. Um, but
76:39 this is like people uh sort of the joke at least on the sub culture subreddit is
76:43 that let me let me see if I can get this right. It is a like space communist
76:49 utopia or or like I think it's a gay space communist utopia. Um, and uh I
76:54 just think it's like really fun to think about um like to use the culture as a
76:58 way to think about like what kind of world can we usher in and like what
77:01 decisions can we make today to help usher in that world. >> Wow. I've not I don't think anyone's
77:05 recommended that. I know you're reading, you mentioned before I started recording
77:09 Lord of the Rings right now. Uh if you want another AIish sci-fi book, uh have
77:15 you read Fire Upon the Deep? >> No, I haven't. >> Okay. It's uh incredibly good. It's like
77:22 a a sci-fi space opera sort of epic tale with uh super intelligence.
77:25 >> Cool. >> Yeah. Somewhat mostly not optimistic,
77:30 but somewhat optimistic. Okay. Next question. Is there a favorite recent
77:35 movie or TV show that you've really enjoyed? >> Yeah, there's an anime called Jiu-Jitsu
77:41 Kaisen, which I really like. Um, again, it's got a kind of a slightly dark topic
77:46 of like demons. Um, but what I love about it is that the hero is really
77:49 nice. And I think there's this new wave of like anime and cartoons where the
77:55 protagonists are really friendly and like people who care about the world rather than being like
78:01 sort of like if you look at like some older anime like that started the genre
78:07 like you know those like Evangelian or Akita and like those characters the
78:11 protagonists are like deeply flawed like quite unhappy um that they didn't start the genre but
78:17 it was like a trend for a while to sort of poke poke fun at the idea that in
78:21 these in these cartoons the protagonist was very young but being given a
78:24 ridiculous amount of responsibility to like save the world. And so there was
78:30 kind of a wave of like uh content that was like critiquing this by making the
78:33 character like basically go through like serious like mental issues in the middle
78:37 of the show. Um and I'm not saying this is better, but at least it's quite fun
78:40 to have like these like really positive protagonists who are just trying to help
78:44 everyone around them. I love how much we're learning about your uh personality
78:49 during these recommendations. Nice protagonists, optimistic futures.
78:53 >> I think, you know, if you don't believe it, you can't r will it into existence.
78:57 So, you're in a balance. >> This is your training data.
79:01 >> Is there a product you've recently discovered you really love? Could be an
79:05 app, could be some clothing, could be some kitchen gadget, tech gadget, a hat.
79:13 Yeah. So I have been like quite into uh you know combustion engines um and cars.
79:19 Actually the reason I came to America initially was cuz I wanted to work on
79:23 like US aircraft. Um but you know now I work in software. Um and so for the
79:28 longest time I basically only had like quite old sports cars. Uh old just
79:33 because they were more affordable. Um and then uh recently um we got a Tesla
79:38 instead. And I have to say that I find the Tesla software like quite inspiring. Um, in
79:45 particular, it has like the self-driving feature. And you know, I've mentioned a
79:49 few times like today like I think it's really interesting to think about how to
79:52 build like mixed initiative software that makes you feel maximally empowered
79:56 as a human, maximally in control, but yet you're getting a lot of help. And I
80:01 think they did a really good job with enabling sort of the car to drive
80:05 itself, but all these different ways that you can adjust what it's doing
80:08 without turning off the self-driving. So like you can accelerate, you know, it'll
80:12 like listen to that, you can turn a knob to change its speed, you can steer
80:17 slightly. Um, I think it's it's actually a masterass in like building an agent
80:21 that still leaves the human in control. This reminds me Nick Turley's whole uh
80:25 mantra was are we maximally accelerated? >> Yeah. Yeah,
80:28 >> feels like it's completely infiltrated everything at OpenAI, which makes sense.
80:33 That tracks. Uh, two more questions. Do you have a life motto that you often
80:38 think about and come back to in work or in life that's been helpful?
80:41 >> I don't know if I have a life motto, but maybe I can tell you about the number
80:45 one value, company value from my startup. >> Love it. >> Which is still something that sticks
80:51 with me, which is to be kind and candid. >> That tracks
80:55 kind and candid. Wow. Yeah. And we had to put them together because we as
81:02 founders realized that we often would be nice and it wasn't actually the right thing
81:09 to do. We would like delay the difficult conversations and we were not candid.
81:12 And so every time we would like remind ourselves of this motto and then we
81:15 would become more candid and then six months later we would realize that we
81:18 were in fact not candid six months ago and we needed to be even more candid. So
81:23 then the question is like okay like how how should we be candid? It's like okay
81:26 well let's let's think of being candid as an act of kindness but also think of
81:29 that both in terms of doing it and willing ourselves to do it but also in
81:32 terms of how we frame it to people. >> That is a beautiful uh way of
81:36 summarizing how to how to lead well. What's the uh the book about dare uh
81:42 challenge directly but care deeply uh radical cander. >> Oh yeah yeah
81:45 >> yeah. So it's like another way of thinking about radical cander. Okay last
81:48 question. I was looking up your last name just like hey what's the what's the
81:52 story here? So your last name is Emiricos and I was talking at JGPT and
81:57 it told me the most famous individuals with the surname are the influential
82:02 Greek poet and psychoanalyst Andreas Emiros and his relative the wealthy shipping
82:09 magnate and art collector George Mureos. So the question is which of these two do
82:14 you most identify with? The Greek poet and psychoanalyst or the wealthy
82:19 shipping magnate and art collector? I think it's it's gonna have to be the
82:25 poet because uh he uh he loved the island that our family's from.
82:29 >> Wait, you know those people? Okay, this is not news to you. Okay.
82:32 >> Well, I mean it's an enormous family, but it's like Greek, so you know these
82:35 big families, everyone like everyone's your uncle, you know what I mean? Like
82:38 my mother's Malaysian and also like everyone is my uncle or aunt in
82:42 Malaysia, too, if that makes sense. >> Yeah. But yeah, he he loved this island
82:48 that the family sort of like initiated from. I believe I don't actually know
82:51 where that chipping magnate lived. I think it was New York or something. But
82:54 anyway, we all came from this island called Andros. Um, which is a really
82:59 beautiful place and it's like there's more like livestock there than than
83:03 humans. Uh, not too many tourists go there. Uh, but I think he like part of
83:07 what I think is really cool is like he published a lot and a lot of his writing
83:11 is about like the beauty of that island which I think is super cool.
83:15 >> Wow, that was an amazing answer. Two more questions. Where can folks find you
83:17 if they want to follow you online and you know maybe reach out and then how
83:20 can listeners be useful to you? >> I I'm one of those people who has social
83:23 media only for the purposes of having work. You know my phone my phone turns
83:27 black and white at like 9:00 p.m. at night. Uh but yeah, so Twitter or XM
83:34 Rico. Um, and uh, yeah, if you post in r/codeex, I'll probably see it. Uh, so
83:40 you know, you can go there. Um, how can listeners be useful? Um, I would say
83:44 please try codeex. Please share feedback. Let us know what to improve.
83:48 We pay a ton of ton of attention to feedback. I think it's like honestly
83:51 like the growth has been amazing, but it's still very early times. Um, so we
83:56 still pay a lot of attention and hope to do so forever. Um and also um I would
84:01 say if you're interested in working on the future of coding agents and then
84:06 agents generally then please uh apply to our job site um and or message me in
84:11 those social media places. Alexander this was awesome. I always love meeting
84:15 people working on AI because it always feels like this very I don't know
84:20 sterile scary mysterious thing and then you meet the people building these tools
84:24 and they're always just so awesome and you especially just so nice and uh as
84:30 you like the examples you shared optimism and kindness you know this is
84:34 what we want to be this is these are the kinds of people we want to be building
84:37 these tools that are going to drive the future so um I'm I'm really thankful
84:42 that you did this Um, grateful to have met you and uh, thank you so much for
84:45 being here. >> Yeah, thanks so much for having me. This
84:48 is fun. Thank you so much for listening. If you found this valuable, you can subscribe
84:54 to the show on Apple Podcasts, Spotify, or your favorite podcast app. Also,
84:58 please consider giving us a rating or leaving a review as that really helps
85:03 other listeners find the podcast. You can find all past episodes or learn more
85:08 about the show at lennispodcast.com.