5 min read

Wartime Engineering

Wartime Engineering
Photo by Clay Banks / Unsplash

Peacetime vs Wartime

The "Peacetime vs Wartime CEO" paradigm, popularized by Ben Horowitz, delineates the contrasting leadership approaches required during times of stability versus crisis. In the world of engineering, this dichotomy is just as pertinent. A "Peacetime Engineering Leader" might operate in an environment where innovation, long-term planning, and iterative improvements are priorities. They have the luxury to experiment, refine, and optimize, taking calculated risks to advance the technological frontier. Conversely, a "Wartime Engineering Leader" is thrust into a scenario of pressing challenges—be it system outages, architectural flaws, or urgent product rollouts. Here, swift decision-making, resource triage, and immediate problem-solving become the order of the day. Much like Horowitz's CEOs, engineers too must adapt their strategies and mindsets based on the situational demands of their projects and organizations.

When would this be necessary

Lack of an Engineering Roadmap

When I took over the team, there wasn't a set engineering roadmap for the product. The team started off as a few engineers writing a wrapper service around Semantics3, but quickly grew way past its intended use case when stakeholders started seeing a lot of potential in the service. What then happened was a product driven roadmap which resulted in a lot of interesting features being built, but at the expense of proper technical architecture. Adding something new used to break some other part of the service. The service would not scale. There were several other bugs which plagued the service and soon the optimism of the stakeholders turned into frustration.

Lack of Consistent Processes

Deployment Schedules
Whenever I saw a message on the #emergency Slack channel, I used to have a mini-panic attack. Did one of my teams release something that broke the service? Anything on the emergency channel was visible to all stakeholders, so we had to immediately jump on the alert and triage. My

Lack of Motivation

Without a goal to be working towards, the engineers started losing the drive to work on the tickets. This was a death spiral and resulted in poorer and poorer output over time.

Everything is Failing

We sometimes had no idea what was happening or what was going wrong. We had several tools at our disposal for monitoring, SLOs and observability but they were not being used to its fullest potential.

What to do?

Take ownership

The first instinct of a new person joining an existing team is to look at all the ways things are going wrong. When I was brought on board, while recognizing the issues, I tried to dig deeper into why things were the way they are. This led me to uncovering the decisions that were taken along the way and the assumptions of the service owners.

Once I had this distilled down, I knew I had to be the one to ring the alarm on the situation. As a leader, you should not only be the one ringing the alarm, you should also be the one bringing solutions to the table. So I drafted a detailed notion doc listing out how we got here, why this is unsustainable and what I'd like to propose.

Armed with the proposal, I now had to convince the rest of the org that this was the path to take.

Alignment with PED counterparts

The next course of action is to get buy-in from your product, design and engineering counterparts. This is probably the crucial part of the step because if you do not have alignment with them, you are going to be in loggerheads with them for every product feature or ticket.

We agreed upon a higher than usual % of story points for engineering related efforts - 50% in the beginning and gradually ramp down to 20% as the service was more stable and we hit pre-defined milestones.

Transparency with stakeholders

I vividly remember having this call in May 2022 - over 10 people on call representing different aspects of the business - editors, ads, affiliate revenue and others. I was prepared for this call to be a challenging one for multiple reasons:

  • I was about to pitch a plan that asked for 6 months to turn things around. This meant some of their product asks would need to take a backseat.
  • I knew I wasn't the first person proposing drastic changes to the architecture. There were a few attempts before this and they weren't successful.

I pre-empted this and made sure my presentation addressed these concerns. I set out 3 plans:

  • In 6 weeks, deliver 2 quick wins in terms of improvements to data quality and uptime
  • In 3 months, fix 2 medium sized issues
  • In 6 months, fix 2 foundational architectural issues

I wasn't in this alone by this point though. I had the PMs backing my plan and we presented a united front.

Eventually, I had a few questions come in from the stakeholders but they decided to give me a chance and gave me the thumbs-up.

All-hands on deck

I aimed to ensure that all engineers involved in the product were aware of our current situation, the promises I had made to stakeholders, and the ongoing progress within our established timelines.

To achieve this, I initiated a weekly engineering all-hands meeting dubbed the "War Room." I strictly emphasized that these discussions were exclusive to me and the engineers under my supervision. This exclusivity was not well-received by my counterparts in product management. Even my direct superior, the VP, was only briefed on the happenings of the meetings but not invited to participate. This approach was designed to foster a secure environment where engineers could express their thoughts without reservation. I pledged to consider all feedback attentively, ensuring no suggestion was dismissed without due consideration.

The call's Notion doc only had 3 questions:

  • Topics of Discussion (Filled up by any engineer with questions or suggestions the night before the call)
  • Highlights (Filled up by engineers who have fixed a previously identified issue or a new feature)
  • Action Items (Filled up by me, during the call, as we run though the discussion topics)

Outcome

Week after week, for an entire year, our focus was on solidifying the platform. By December 2021, after nearly 6 months of these intensive sessions, we successfully shifted our discussions to a different phase, rebranding our meetings as simply "All-hands." Our persistent efforts had resolved all pressing issues, allowing us to repurpose these gatherings. Instead of troubleshooting, we moved on to sharing weekly updates and giving engineers the floor to showcase their recent work.

The stakeholders were overjoyed - we delivered what we committed to delivering and also set the stage for faster feature releases.