fetched at November 8, 2020
I've come across the book System Design Interview: an Insider's Guide by accident (paperback book, and online course). I was looking for good book resources after several people have been asking me how they can get better at building distributed systems or learning designing systems at scale. Especially when they don't have the opportunity to do so as part of their day-to-day work.
The topic is somewhat a chicken-and-egg one. You'll know how to design a large system after you designed one before. But if you've not done so: how would you build an ULR shortener like bit.ly, with hundreds of millions of links? A chat app like Whatsapp? A file storage system like Dropbox or Google Drive?
There are many resources online - the most well-known one being System Design Primer on GitHub or reading High Scalability articles. In my case, I was looking for a more "structured" approach, as opposed to just dumping a bunch of concepts you need to know in these interviews.
This book is the most "real-world" systems design book I've come across that does a solid effort to teach concepts, step by step, to people who have yet to work at systems at scale. And it's also a welcome refresher to those who are familiar with some of these systems but would like to venture into various other types of large systems. It is clear from the start that the book was written by someone familiar with systems at scale. The author is Alex Xu, a software engineer previously at Oracle, Zynga, and Twitter.
The book comes with more than 10 case studies and a framework that it introduces and consistently uses with these case studies. There's also an accompanying online course that has the same content as the book, but you can follow along in a web browser, and the diagrams are colored.
A framework for the systems design interview
System design interviews can feel intimidating, and having a framework on how to navigate them can help you feel more in control. The book recommends a 4-step process that I also agree with:
- Understand the problem and establish the design scope. I like to phrase this as confirming the problem, asking questions, and making constraints clear. "In a systems design interview, giving out an answer quickly gives you bonus points." - the book suggests. They are right.
- Propose high-level design and get buy-in. I see people all too often jump into implementing without confirming their approach satisfies the constraints, and they're not over-engineering. The interviewers expect a conversation - similar to real-life design - and this step helps you achieve exactly that.
- Design deep-dive. Once you know you're on the right track, it's time to roll up your sleeves and get into the details. This is the part you'll need to have the understanding and vocabulary of the systems domain. The book will help understand several of the concepts you'll need. Resources like System Design Primer also help with this phase.
- Wrap-up. With a design that seems sensible, you might close with identifying the bottlenecks and improvement areas.
The book lays out time allocation suggestions for an hour-long interview: a few minutes for understanding, 10-15 for the high-level design, 10-25 for the deepdive, and a few more for the wrap-up. I wouldn't be overly prescriptive, but I would suggest to not start the deepdive the first 10 minutes (gather enough context), and leave time for the wrap-up.
I've done dozens of systems design interviews as an interviewer. Back when I was interviewing at the likes of Facebook and Uber, I also got feedback on how good (or not great) my approach was.
One thing you should avoid is "just memorizing" the approaches of the problems. That's far from the point. I made this mistake when I interviewed at Facebook, and was asked to build a part of Instagram. I had done this exercise, and so I just drew out a complicated system. I never talked about constraints or tradeoffs with my interviewer. In fact, I never had a two-way conversation.
A systems design interview is as much about communication with the interviewer as it is about your systems and architecture knowledge. This is why, while the book will help fill gaps you might have on how large systems are built, it won't substitute you collaborating with someone in designing a system.
The book's case studies work well as they go deeper and deeper into the problem domain, forcing you to understand relevant concepts at each step. Take the rate limiter problem and how it's tackled:
- Client-side vs server-side rate limiting, and their tradeoffs.
- Rate limiting algorithms: token bucket, leaking bucket, fixed window, sliding window log & counter.
- Deepdive: rate limiting rules. A look at Lyft's rate limiting component.
- Rate limiters in distributed environments, supporting multiple servers and/or concurrent threads.
- Performance optimization & monitoring. Most of this falls into productiozation, and operating a real-world system.
- References for further reading, linking to industry sources like the how Cloudfare built their rate limiter or understanding the AWS API rate limiting settings.
A pro for the book is how the case studies in the book cover good ground, and a variety of problems:
- "Basics": rate limiter, consistent hashing, key-value-store, zero to millions of users.
- Web: URL shortener, web crawler.
- Social: newsfeed, notification system, chat
- Videos & storage: design YouTube, design Google riveConclusion
- Misc: unique ID generator in distributed systems, search autocomplete
This book is a solid recommend from me: and not just for preparing for the systems design interview, but to strengthen your systems design muscle for the day-to-day. The book/course comes with typical design problems and brings a pretty good, step-by-step approach to them. But if you just read through them, you'll miss out on the real value of such a resource.
Aim to draw out how you would design the system before reading how the author tackled the problem. You'll go through the book slower: but the concepts will stick. And you'll have approaches to use not just on the interview but when debating with colleagues on how to build a system.
There were a few topics that I missed from the book and that I would have covered. Though the book does a good job in going deep in fundamental concepts like rate limiting, consistent hashing, and sharding, or exploring the scene behind key-value stores, I wish things like caching and replication strategies would have been explored more. Both these topics are relevant in many scenarios.
While the book presents decent solutions to each of problems, I missed having alternative solutions with tradeoffs. In several cases, you can tradeoff the number of machines (and thus cost) for latency, resilience for disasters for cost or latency, and so on. These concepts are easier to grasp with examples. While the book goes deeper in this space that what I have otherwise seen, there is room for more depth.
Additionally, the book focuses on backend systems design. Client-side systems design problems for native mobile engineers or web engineers are usually different - I've helped design both these types of interviews. In all fairness, covering those approaches is likely out of scope for this book. Still, for non-backend engineers, the book can be helpful but potentially less applicable.
Fun facts about the book: from the author himself
After reading the book, I reached out to the author, Alex, congratulating him for a solid resource. As I'm also writing a book, we started talking about how he approached writing and what he's learned from this experience. Here area few fun facts, straight from the author:
- Alex started to write the book when he was preparing for systems design interviews and could not find good resources to do so. His friends quickly became interested, and he ended up releasing the first version as a course and on Amazon.
- The first version of the book came with lots of reader feedback. While the book has a good number of readers, many of them complained about diagrams not being clear and not being enough case studies. Alex decided to act on all the feedback and redid most of the book for the second version.
- The book took a year to write. Alex progressed roughly one chapter per month. He shared how coming up with "easy to understand" diagrams were time-consuming, as was finding the balance of progressing with "good enough" speed for the reader to follow.
- The book has organically become a top 100 Computers&Technology book on Amazon. At the time of my writing, it ranks as #89 in this category, ahead of Clean Code. Alex shared how this was an organic process. The book and accompanying course are now both popular enough for him to consider spending even more time on them.
Further systems design resources
- Google's architecture in 2008 on High Scalability. I find it helpful to understand how companies handled scale over a decade ago. Keep in mind, Google was already bigger in 2008 than many companies need to worry about in 2020.
- Youtube architecture in 2008. Building a system similar to YouTube is a common challenge. It's good to understand how YouTube addressed this, early on.
- Scale at Facebook in 2010: how Facebook approached scaling challenges in their earlier days.
- Scaling Twitter in 2009 and in 2013
- The Netflix experimentation platform in 2016
- How these companies approached scale: Flickr, Dropbox, LinkedIn, Uber, Whatsapp, Pinterest.
- Money movements at scale at Uber - this is talking slightly home, as I worked with the system described in this article, and wrote about some of my learnings. Airbnb scaling their payments platform is also a good read.
- System Design Primer on GitHub: the largest collection of all systems related concepts worth knowing.
- High Scalability blog: the place to go for real-world scalability articles and discussions.