7 mins

Technical choices have non-technical consequences. Here's why engineering leaders need to factor in business needs.

Have you ever known an engineer to thrive in one company, receive rave reviews, and then switch jobs to end up clashing with a new team?

Maybe this engineer has a particular set of assumptions about the “right way” to do software engineering that runs adrift of their new team’s practices. Maybe they keep clashing with teammates over CI failures or resistance to writing tests. Or, maybe they care deeply about quality and are disheartened by a team that is more focused on shipping “good enough” MVP features at max speed.

Regardless of the symptoms, the core issue is the same: there’s an assumption that technical “best practices” are absolute, but instead, they are situational and tied to the business and team context.

It can be hard to see it, but often things that seem like purely technical factors are legitimate business trade-offs, and choices that make perfect sense in one business may be completely wrong in another.

To explore this more deeply, let's look at two particular examples where technical choices around scalability and testing are deeply influenced by the business context around them.

Trade-off 1: Scalability vs. iteration speed

This trade-off is one near and dear to my heart, because I’ve seen it so many times working in a startup environment. Startups often hire engineers for their pedigree (i.e. having worked at a massive successful company like Google or Amazon) and then are surprised when those engineers fail to thrive and even make some choices that are actively harmful to the company.

Engineers coming out of Google or other big tech companies tend to make assumptions about architecting every application for scale. The most visible example here is to start every application using a nosql database that can reach megascale.

This is a choice that makes sense inside of Google. When Google launches something, it is going to get a firehose of traffic from day one, and if it is unable to keep up with that it looks poorly on the business.

However, using nosql solutions comes with a fundamental trade-off in terms of speed of iteration. If your access pattern changes or you start using a dataset in a different way, in a sql based solution you might be able to quickly migrate or add a new secondary index. In a nosql system, you are stuck with an expensive rekeying and re-architecture.

Inside of a startup environment, far from having a torrent of traffic from the beginning, the most common outcome for a new product launch is that almost no one sees it. You gather feedback from the tens or hundreds of people (or, if you're lucky, thousands) who end up using the product, and quickly iterate on the product to improve it. In most startups, you end up throwing away 90% of your attempts before you ever find one that succeeds enough to require scalability.

As a consequence, if you architect for scale from the very beginning, you're wasting a ton of effort, which may slow you down to the point that you never even find the version that is successful in the market before you run out of money.

Thus it makes business sense to deliberately choose solutions that may not scale well, when those solutions give you benefits in terms of iteration speed. As painful as it may feel to write unscalable code, or use a database that will need to be replaced if you get to very high load, those trade-offs can gain you enough development speed to iterate three or four or five times more frequently than if you built for scale from the beginning. And in a startup environment, that can be the difference between survival and bankruptcy.

Trade-off 2: Rigorous testing vs. “test it in production”

This is a subject of great debate within the engineering world. Should you go for perfect test coverage, rigorously testing everything in development environments, or go straight to production and “test it in prod”?

There was a great viral twitter thread from Gergely Orosz over the summer that highlighted what can happen when engineers bring their assumptions from one company to another. In this case, an engineer coming from Facebook into a different environment committed a change that broke another team’s tests. Instead of fixing or working with the other team, they deleted the tests and force landed the changes. This was not a single rogue engineer, but instead an example of an engineer failing to adapt to different engineering cultures driven by very different tradeoffs in the business context.

The level of testing that makes sense in a particular codebase depends on a number of factors, some of them purely technical, but many others related to business maturity, target audience, and business domain.

There are two dimensions being traded off: Costs and benefits

First, let’s look at costs. What is the cost of a bug making it into production vs. what is the cost of having rigorous automated testing? Automated tests have both a substantial upfront cost to write them, as well as ongoing maintenance costs to keep them up to date with feature changes and functionality changes. As a result, if the cost of a production bug is low, it may well not be worth the effort.

So what influences the cost of a production bug? On the technical side, it depends on what other controls you have in place. Do you have a tiered/automated rollout system? Are you able to automatically detect those issues and roll back before they impact many uses? How good is your logging, such that you can quickly isolate and fix those issues? Or are bug reports manual, and debugging painful and slow?

On the business side, what is the consequence of such a production bug? If your product is a social network or game, the consequence might be a few users being a little annoyed when they can't use the product correctly for a short time. If your product is a ride hailing application, it might lead to a user being stranded unable to get home, a much more negative consequence. If your product is a medical device, a bug might be the difference between life and death.

The more costly a bug in production along these dimensions, the more important it is to prioritize rigorous testing. The less costly, the more it may make sense to deprioritize or even actively avoid writing tests.

On the other side of costs, there are benefits. There are definitely other benefits of automated tests besides preventing production bugs. In particular, they can dramatically help with refactoring and maintaining code. But these benefits too come on a spectrum depending on other factors.

On the technical side, other factors may include things like the language you are using. Using a strongly typed language or a type extension in a weakly typed language can get you many of the same refactoring benefits as extensive testing.

And on the business side, you need to ask what the expected life of your code is. If you are still in the early exploration phase of a startup, the lifetime of your code may be only a few months as you rapidly iterate on your product. Maintainability only becomes more of a concern once you have discovered something with some traction that will need to stick around.

Wrapping things up

The common theme between these examples is that technical choices have nontechnical consequences. As an engineer, you are used to making optimization decisions based on the requirements of a system. To be successful as an engineering leader, you need to extend the inputs for your optimization function to the business context you are working in and the needs of that business.

How do you do that? The first step is asking questions. Ask your counterparts in different departments what they need from engineering, and what determines success for them. Ask your leaders about their goals and strategies, and how they see your team's work fitting into it. And ask yourself what assumptions you're holding about engineering practices that are not fundamental truths but instead choices that hold in some contexts and not in others.

To me, this is why the best engineers are those that are curious and adaptable. They have opinions about best practices, but recognize that those practices are applicable to particular situations and constraints. And when they change teams, they go out of their way to understand the context of their new team, and which practices are most appropriate to apply.