Reflection on 4 years of SWE

I’ve now spent a little over four years of being a full time software engineer, so I figured it’d be a good time to reflect on what I’ve learned so far. To briefly summarize my path: I worked at Cisco Systems right after school for a little over 2 years, studied for a few months at the Recurse Center, then joined a startup called Paxos which is where I’ve been for the last two years.

Ramping up

There is always a switching cost associated with leaving a company and joining a new one. The cost does decrease with experience; new technologies become significantly easier to pick up if you already have something analogous in mind. I’d say it took a good 6 months before I was highly productive at Cisco and less than 3 months at Paxos, despite there being different languages and some new technologies.

To get up to speed on a codebase efficiently, you can start by just running the tests and stepping through the code with a debugger. When I first joined Cisco, I tried to make a diagram of classes to understand the relationships, but I found that use-case oriented learning via test stepping was more efficient. A class diagram fails to capture the aspect of time. When you step through the code, you are simultaneously learning what the objects are and how exactly they are used in a realistic user flow.

If the project involves some complicated architecture in terms of networking or sprawling tools like Kubernetes, you definitely need to poke around to get a handle on how things are actually connected. The problem with architecture diagrams is that they can go stale fast. So although it’s a good starting point to read any existing documentation, you really need to find the live deployment configuration files, open a set of ssh terminals, install whatever tools are required to describe a live system and use those tools to ascertain the current block diagram of the architecture. I like to save all the commands I use in a plain text file somewhere alongside a description of what they do, as it comes in handy when you need to debug something in production or answer a question like “is TLS used between these two hosts.”

In both places I’ve worked, new hires were assigned a mentor to direct questions to. It’s important to strike a balance between asking too many or trivial questions and asking too few, then getting stuck on something for too long. A simple structured approach to this problem is time boxing. Give yourself X minutes/hours to figure something out yourself, after which you reach out for help. When you reach out, you also want to gather as much important information as you can in a single discussion, to avoid multiple interruptions for the mentor. Good follow up questions include ones about the history of specific components, future directions, edge cases and which specific engineers are experts in which areas. Inevitably you’ll stumble across something that doesn’t make sense. The worst thing you can do is try to immediately change it or deride the original author and the best thing you can do is to respectfully ask “Is there a reason that X is like that?”.

Every project I’ve ever worked on had long stretches of being in a migration period between an older version and a better version. I’ve just accepted that when jumping into a new code base, there will most likely be some in flight migration threads (e.g. we’re trying to get all these classes to use the more generic helper) and the only thing you can do is be aware of them.

One of the primary goals in software engineering is to build well-tested, versatile components. As you’re working on a new component or expanding an existing one, it’s always a good idea to look at related code and look for generic patterns that can be extracted. One of the reasons I like asking about future directions of components in the ramp up phase is exactly for this reason. It’s common that a code area owner has some undocumented opinions on where the generic patterns are emerging or likely to emerge and through leveraging that knowledge you can make valuable contributions earlier than you otherwise would have.

Stay learning

I’ve seen a pretty wide variance in engineer growth. Some engineers seem to amass critical knowledge in many important areas very quickly and start to drive technical discussions because of it and some don’t. Granted some of that discussion-driving prowess is personality dependent and not purely technical, but engineers are very receptive to technical arguments. I think the difference comes down to autodidact ability. The variety of approaches people take to tackle some amorphous goal like “learn Golang,” results in a variety of knowledge depth. Everyone will learn through natural osmosis of doing the job, but you can accelerate that by crafting your own learning plan to cover the more rarely encountered situations. Then when they come up, you’ll be ready to jump on them.

To tackle amorphous goals like “learn Golang,” I think the most important thing is that your learning plan rests on active learning as opposed to passive. Active learning activities to me include writing code, stepping through 3rd party code, writing out concepts and presenting overviews to other engineers to name a few. I’ve found presentations particularly rewarding because of their dual purpose: you’re forced to learn something well quickly and it contributes to your reputation as someone with expertise in the area. I probably gave the most “Lunch & Learns” of any engineer at the startup and it helped solidify my position as a blockchain expert in the company, even though I didn’t necessarily know anything about the topics I chose to present on up front. Passive learning activities include reading the spec or articles, reading code, watching videos or listening to podcasts. Obviously you need to do some passive activities in order to move forward with active ones, but there is a big difference between strategically looking for a particular answer in order to implement it and reading, watching or listening aimlessly. The highest throughput activity for me has always been a project about some relevant topic in a language I wanted to learn. I’ve done this a couple times, some salient examples are go-is-is and exchange. In both cases, I explored a concept and language simultaneously. I also find it easier to stay motivated if there is some clear output of a functioning project once built.

Big company or small, I’ve found that deadlines are always tight and projects tend to run overtime. An unfortunate consequence of tight deadlines is that engineers take shortcuts. If you always take shortcuts, you never get exposed to more sophisticated designs that are more scalable or more generic; you never learn how to produce those more sophisticated designs within the constraints of a tight deadline. The best engineers recognize opportunities for growth when they arise and either spend extra hours exploring solutions for a more sophisticated design to get exposure to it, or after the deliverable is out the door, they reflect on it and follow up with a refactor.

I’ve read a fair amount of software engineering books while on the job. If you don’t have much experience, then I’d recommend The Pragmatic Programmer as a nice overview of basic SWE practices. I read it straight out of school and found it immediately useful. For backend web service system design Designing Data Intensive Applications is the best. Contrary to popular sentiment about how algorithmic interviews are terrible and unrelated to the job, I actually thought that reading Elements of Programming Interviews and doing some leetcode made me a much better programmer. I found that armed with that practice, I started to see a lot more opportunities for sophistication in everyday programming work. I definitely don’t recommend reading a specific book on a programming language, it’s better to read Structure and Interpretation of Computer Programming Languages as a generic “how programming languages work” book, then lean on practical experience and every day work for learning the nuances of a particular language.

Testing

I think it’s obvious that unit tests are a good thing, but like everything in engineering, you need to strike a balance. Unit tests serve many purposes: documentation of existing functionality, defense against regressions which in turn promotes refactoring and a structured development process which catches bugs as early in the lifecycle as possible. However, slow, bloated unit tests can be a tax on the whole team’s development throughput, so you need to be thinking in terms of the return on investment. A bloated unit test is expensive to maintain, if the functionality changes, someone needs to refactor the test as well. When writing a test, you should be optimizing for the minimum amount of fast code which covers the maximum amount of new complex - likely bug producing - code.

Mocks are a fantastic way to isolate the specific functionality you want to test, cover error cases all while keeping the tests fast. The most common things that I’ve seen mocked are external connections and storage. However, something as fully featured as Postgres is probably not worth mocking because such a mock would be expensive to maintain and difficult to ensure alignment between the mock and real implementations. I’ve learned to embrace the Golang idiom of “accepting interfaces” when instantiating new components because it keeps the code extremely testable; pass in a mock, a connection to a test environment or production environment and the code all stays the same.

I won’t do a better job of summarizing the test hierarchy than Martin Fowler, so I’d just recommend reading this. The main takeaway is to be even more selective about what you test higher in the pyramid.

Debugging

You have to debug in a systematic way. There is always a reason for everything, nothing is magical. An easy way to burn valuable time is to tweak one line of code, add another print statement, run a test and repeat. In contrast, a systematic debugging cycle involves observing the symptoms of the issue, reading and understanding the relevant code deeply, formulating a hypothesis e.g. “I think this specific pointer is null because this log said it failed to initialize” and then writing a test locally to reproduce the issue and confirm the fix. One of the quickest ways to track down the bug is to focus the code reading phase on the changes introduced between the last known working version and the erroneous one.

Sometimes the bug is intermittent, it takes too long to appear or it can’t be reproduced locally. In those more difficult cases, the approach still begins the same way, you observe symptoms, read code and formulate a hypothesis. It’s just that now you need to redeploy a debug version with enough additional information that you can either prove or disprove the theory. Repeat if your theory is wrong. If you are flat out of theories, you may need to hand off the bug to someone with fresh eyes as it’s easy to get stuck in a particular thought pattern. It’s quite possible that this cycle can take a long time if the bug is hard to reproduce, there were cases at Cisco where it took months to track down a bug that only occurred in large, difficult to simulate environments.

Another situation is that you just don’t have time to reproduce it and fix it because it’s an important production issue. In those cases, you may need to deploy a workaround or temporary fix and follow up later with a further investigation.

Project management

Design documents

For a while I pushed back on detailed design documents, reasoning that it’s better to just figure it out more of the details while implementing. As I’ve now written more than a couple of them myself, I’ve found that browsing through the code looking figuring out how some new large feature can be built atop the existing system is just a separate skill that I used to be bad at. Like anything it improves with practice; over time I’ve been able to come up with more detailed ones before implementation. You learn how to decompose the larger goal into little bundles of related problems that individual engineers can solve. Design documents are incredibly valuable because it forces you to ask a product manager all the hard questions up front about edge cases and user experience. There is definitely a challenge around design documents going stale and thus being a confusing form of code documentation. It’s extremely easy to not go back to a design document after the code is live and update it. I think it’s ok for the nuanced details to remain as code comments, but the design document should be updated if there are significant architectural changes that were made due to some discovery during implementation.

Estimation

Software estimation is notoriously difficult and only gets more accurate with smaller scope, more experience in general with a specific codebase and team. Tasks can be underestimated for a plethora of reasons - more testing was required, incorrect initial scope, unknown dependencies, unplanned work and so on. I’m a fan of putting ranges on the estimation and if a specific number is required just taking the upper end of the range.

Managers

I’ve had 5 managers over the past 4 years through various projects and there is a huge range of managerial styles out there. Nevertheless, I’ve noticed 3 common elements that managers put different emphasis on: hitting explicit targets, emotional support and technical guidance. A manager’s finite amount of energy is spent on those three things in wildly different proportions. Too much of a focus on any particular one can make a poor manager. I’ve had managers who provided near zero technical guidance, but were very supportive emotionally and vice versa. I’ve had managers who instilled fear in the team and led death marches to hit targets. They all have their own philosophy on what’s best to efficiently achieve company goals. Personally, I’ve found that technical acumen is the most critical factor. Without a very healthy dose of that, projects can head down bad architectural paths, deadlines can be missed because the manager doesn’t really understand what the engineers are working on and the team morale can collapse as engineers lose faith in their leader. A technically strong manager, who doesn’t overstep their boundaries and try to do everything themselves, can be a great source of inspiration and birth a positive feedback cycle of high quality work and team growth.

Career trajectory

Promotions

Everyone who I’ve seen get promoted was vocal, well-known and worked effectively on high profile, visible projects. They’re not timid about asking to work on specific things. Shipping high quality code for pre-planned tasks is a requirement of course, but you also need to be valuable in discussions around what exactly to build and how exactly to build it.

Big vs small companies

Naval Rakivant’s podcast How to Get Rich is a lot more than just a framework for generating wealth, it also covers some philosophical aspects of one’s career. A key takeaway I took from it is that “a good career trajectory is one where you start at a big company and then move progressively smaller until you eventually work for yourself.” More established companies have to have robust structure to keep them alive. The upside of the robust structure is that the training and mentorship is likely to be explicit and include more hand-holding. If you are fresh out of school, that’s a usually good thing. Of course, some people have the owner mentality already and don’t need the hand-holding. The downside of the structure is that when you want to take the reins and increase your ownership and accountability, you have to wait in line. That’s a great time to move to a startup. I ended up following that trajectory before I even listened to the podcast, but it did turn out exactly like he describes. I needed the hand-holding at Cisco and learned the ropes, then I got to write a ton of greenfield code and be around for early system design discussions at the startup.

Specialist vs generalist

Conventional wisdom is that the modern engineering world rewards specialists and I agree. However to me there’s something even more valuable about becoming a specialist than purely remuneration. A colleague who’s a physics PhD turned software engineer once remarked that “it’s easier to go deep on something once you’ve already gone deep on something else.” That really resonated with me. By specializing, you’re not closing the door all the other wonderful topics out there, but it’s actually the opposite - you’re practicing how to specialize, which enables you to become a deep generalist, as opposed to a shallow generalist which people often contrast the specialist with.

When to manage

The range of management roles you can take is a function of your technical expertise. In general, you need to have done the job before you can manage it. As I mentioned above, my opinion is that great managers actually have significantly more expertise than their subordinates as it helps to inspire and get projects done efficiently. Again its a tradeoff, by staying technical longer you expand your options of what is manageable, at the expense of management experience. An engineering career can be long though and resuming technical work is difficult after moving away from it, so it seems better to air on the side of staying technical for longer.

I view software engineering like a fundamental skill that can be worked on itself or in conjunction with a specific field like finance, biology and so on. It can be used to solve business problems, express yourself as an art form or anything in between. 4 years is really just the beginning and I expect to keep exploring for many years to come!

Written on October 27, 2020