Scaling – Web Development
So we had these app servers that were all running their own caches, and then we had a couple databases that were all replicas of each other. At this point we added a load balancer, and this load balancer probably actually ran. It was probably a program running on one of these app machines, and these guys were still keeping their caches in sync using interacting with the databases directly. And we had a limit ot how many app servers we can have because we had this complicated caching spread thing. The next thing we added was the memcache layer. So instead of these app services containing their end memory cache, they would communicate via memcache. So instead of having to keep their cache in sync, we just had 1 cache that was just shared among all of our app servers. I’m sad it took us so long to figure this out because memcache existed when we started reddit, and we should have been using it from the beginning. This is what allowed us to get all of that state, all of that cache, out of the apps and into memcached and allowed us to add apps arbitrarily. Once we had that going, that allowed us to scale our apps and they stayed in sync, and so we can add an app, lose an app. We didn’t have to worry about it. The next thing we had to start
dealing with was the database load. So we’re already replicating for kind of
durability and for performance reasons, so we can spread our reads across multiple
machines as we started segmenting on type. So we’d have a database for just links; then we could separate comments out into its own database. And so these would still replicate to each other, but if you’re only submitting a link, you only have to touch this database,
and if you’re only, like, reading a comment, for example, you only have to touch this database. And this is actually still basically the general setup reddit has today in terms of how the database is scaled. And we never wrote sharding in the beginning, and I really regret that. When I rewrote the ThingDB, the second version of it, I had in the back of my head, you know, I should add sharding, because we’re going to need that someday. And then I just wanted to get the damn thing into production so I stopped. The big lession I’ve learned is when
you’re writing a big system like that, if you don’t do the hard parts up front, you may never get another opportunity to them, because now the database is so big, that if we wanted to bolt on sharding, that’s a huge project. It’s easier right now to just add bigger machines and more caching. It’s not going to work forever, and somebody’s going to have to bite the bullet and do that. And it would have been a lot easier to do it at the time. Since all of our queries, we stopped using joins when we switched to ThingDB, sharding’s actually fairly straightforward if you kind of do it right from the beginning. Over time some of the software on these app servers changed so we’ve always been using Python. I don’t remember what app server we used originally. We switched from whatever we used initially to web.py, which is a framework that we wrote at reddit. Aaron was basically the main author of that, and it’s still out there on the Internet somewhere. And this is where the first time I recall seeing a framework that had kind of the notion of a handler class and then functions for get and post, and I’ve become kind of addicted to thinking of web applications that way. Actually the Google app engine, the webapp2 framework, inherited a lot of that design from web.py, which is nice. Nice for me, at least, is that design decision has kind of stuck around for a little while so I think that means it was a good one. Now Python uses a web framer called Pylons, but it uses a very old version of Pylons. And basically when we switched to Pylons, Aaron had stopped maintaining web.py. I didn’t want to maintain it. We switched to something else maintained by somebody else. And then we basically shredded most of it and made it function just like web.py. In hindsight we probably should have just written our own because that’s effectively what we did, but we did it on top of Pylons. So if you want to use the reddit version of Pylons, it’s open source. It’s online. But it doesn’t resemble anything like the actual Pylons web framework at this point. And to my knowledge, that’s still what they use today, this hacked up version of Pylons.