OpenAI explains how they scaled ChatGPT

5 mins

Ahead of his talk at LeadDev West Coast in October, OpenAI’s Evan Morikawa goes behind-the-scenes on the challenges they faced scaling ChatGPT, the constant need to reset as a leader, and how GPUs don’t in fact grow on trees.

It was one of the most significant technology launches of a generation when OpenAI decided to throw open the floodgates and give users access to its large language model-powered ChatGPT tool. Since then, more than 100 million people have prompted the generative AI tool to answer questions, write computer code, and even write their resumes.

Ahead of his behind-the-scenes peek at how they did it at LeadDev West Coast later this year, LeadDev’s Editor in Chief, Scott Carey (SC), checked in with Evan Morikawa (EM), Applied Engineering Manager at OpenAI, to hear more about how his team overcame this unique scaling challenge.

The below conversation has been edited for clarity and brevity. Evan’s full talk will be delivered at LeadingEng West Coast on October 18 in Oakland, California.

SC: We’re so excited to welcome you to LeadDev West Coast later this year. What are you going to be talking about?

EM: I would like to tell everyone a little bit about how we launched and scaled ChatGPT, including some of the engineering challenges and a little bit of the backstory under the hood with regard to what happened to make that grow.

Some of these aspects, I think will be pretty familiar to anybody who’s scaled engineering teams and back-end infrastructure systems, but I think there are some unique challenges about the way these large language models (LLMs) work and behave, that I will get into.

SC: What was one challenge that really struck you when you were trying to scale ChatGPT to the world?

EM: GPU supply is by far one of our biggest challenges, and remains a challenge today. I will go into much more depth about that during the talk, but when you’re multiplying billions and billions of numbers together, you have to have this very specialized hardware. I don't think anybody, including us, was really prepared to make this happen, supply chain wise. As you can imagine, it’s fairly difficult to turn around new chips on a dime. I think we’ve all seen that over the past several years and it only continues today.

SC: Was there anything that surprised you about the ChatGBT launch?

EM: I would say that we had a pretty strong hunch about a couple of things going into it. Internally, we had already had the chance to play with these things and felt like they were really fun, but you don’t really know how anybody’s going to react. Several months prior to the launch, we launched the image generation tool Dall-E, and that was also an extremely cool, fun thing internally. But how that [ChatGBT] would take off was completely unpredictable.

We did know that this was going to be the first time that we had no waitlist on a product, which was an intentional decision when launching here because historically, capacity constraints and safety reasons led us to use a waitlist, but no one likes waitlists.

That being said, when we did launch, you could talk to these models already through our API developer playground, so in effect, we came into this thinking that not a huge amount would change. The models were slightly newer and safer, we had not released GPT4 yet. This was hitting a nerve with certain people here because it was not exactly a predictable thing.

SC: As an engineering leader, when you go through a launch of that scale, how do you re-adjust after that?

EM: You readjust continuously. Everything was basically changing day-to-day at that point. Not just on the technology and infrastructure side, we very quickly realized that we also need to grow the size of the team as well.

As you grow any sort of organization, with all of the different step functions that you have, when the complexity of your team structures gets larger, when there are more people to coordinate with, and a lot of new people – all of these things definitely hit us and trying to get ahead of a lot of that has been a large personal focus of mine, as well as a lot of the other teams as well, in addition to all of the engineering that all of our teams have had to do to make that happen.

SC: What’s one thing that you're hoping the audience takes away from your talk?

EM: I’d like to demystify some of this whole “AI is a magical black box” thing. Certainly, it's got a lot of attention as a field recently, but I still think there is a feeling that the technology and infrastructure behind this is a bit of a black box. In fact, there’s a huge amount of research taking place to unveil what is happening and make it interpretable.

This was one of the earliest things that caused me to join OpenAI back in 2020. I generally thought I knew how computers worked, sort of, except for this area, which still felt a little bit magical. At the end of the day, when you break things down, and you're forced to look at the realities of scaling a system like this, you treat it somewhat like any other engineering system, with a couple of unique quirks to it, like the implementations of how various chips and things like that work.

The research side has lots of different ways of thinking about these models, but from the engineering, deployment, inference, and scaling side, it has been quite approachable for me and the team. In fact, for nearly all of our applied engineering group, having deep machine learning experience has not been a prerequisite. Most of our challenges are not theoretical, statistics, or mathematical. They’re about really deeply understanding some part of a system, building fault-tolerant, distributed systems, and reliable pieces of software that other humans know how to read, interpret, and build upon.

OpenAI explains how they scaled ChatGPT

Posted in:

Written by:

Featuring:

Share:

SC: We’re so excited to welcome you to LeadDev West Coast later this year. What are you going to be talking about?

SC: What was one challenge that really struck you when you were trying to scale ChatGPT to the world?

SC: Was there anything that surprised you about the ChatGBT launch?

SC: As an engineering leader, when you go through a launch of that scale, how do you re-adjust after that?

SC: What’s one thing that you're hoping the audience takes away from your talk?

Related content

Your Business: Developing commercial awareness to drive business outcomes

Your People: Structuring and developing your teams

Growing Engineering Managers: Breaking down the monolith

Your Delivery: Driving pace and delivering high-impact projects

Back in the Loop: The importance of staying technical in leadership

Your Strategy: Crafting an effective technical strategy and getting buy-in

Tackling Conway’s Law over coffee and donuts

Renee Hunt (Keynote)

Safety & Belonging - A ritual to jumpstart psychological safety

Tech Debt as Innovation: Reframing this forever problem as an opportunity

Creating a technical leadership structure in a flat organization

Managing expectations: Lessons from making large-scale platform changes

Ethics in technology: History, practice, and mitigation

Handling canon events

No plan survives first contact with the enemy: Managing through large scale organizational change

Making do: Scaling expertise on unscalable teams

The OSS maintainer to staff engineer pipeline

How to completely fail at learning

We should all be declaring more incidents

How to successfully lead your team through a downturn

Building engineering teams that can withstand any shock

Why management debt is a company-wide issue

A reality check on generative AI

Putting your power into practice as a Staff+ engineer

How Netflix sustains innovation while “doing more with less”

Cost savings don’t have to impact code quality

Demystifying the VP of engineering role

Meet the speaker: Johnny Austin on the lies engineering managers tell themselves

Making the manager of manager's mindset

Navigating Complex Projects: Finding the Right Mode of Operation

Red 2.0: Transforming a game company

Platform engineering is all about product

Driving positive change through performance improvement plans

Running large scale migrations continuously

How to progress as an engineer while doing what you love

Creating inclusive career ladders

Cloud infrastructure architecture for Nubank’s global expansion

The 9.1 magnitude meltdown at Fukushima

Organizational resilience

Putting power into practice as a Staff+ engineer

Breaking the burnout cycle in engineering teams

"Zero waste" engineering practices

Why observability needs to be treated as critical infrastructure

Healing your teams after layoffs

50 shades of PR dark matter

Building teams in tumultuous times. Lessons learned after multiple layoffs

Coaching autism traits in tech

Built to flex - Strengthen your org to expect change

50 shades of PR dark matter

Engineering transparency

Leading from Incidents: How past incidents can be used to guide company decisions

Are we building engineering platforms using the right metrics?

Moving quickly inside a large organization

From zero to "Brands that Matter" - improving scientific discovery during a pandemic

Engineering without borders

Networking: The map is not the territory

How to drive pace in your team 🏃🏽‍♀️

Build a data-driven on-call workflow for your team with atomic habits

Solving the puzzle of staff+ time management

How to effectively “Spike” a complex technical project

Setting goals as a staff+ engineer

The framework of you: Strategies beyond a growth mindset

Tackling software engineering leaders' dual mandate

Leading in the post-boom environment

Supporting Major Launches as a Staff+ IC

How I tackled the fear of metrics to level up conversations with my ICs

Why can we still not measure team performance?

Sustaining Innovation with changing times: Adapting systems, teams, processes and yourself

Why can we still not measure team performance?