Back-of-the-envelope calculations
Back-of-the-envelope calculationsTypes of data center serversWeb serversApplication serversStorage serversStandard number to rememberRequests estimationCalculationsNumber of server requiredStorage requirementsBandwidth requirements
Back-of-the-envelope calculations help us ignore the nitty-gritty details of the system (at least at the design level) and focus on more important aspects.
Some examples of a back-of-the-envelope calculation could be:
The number of concurrent TCP connections a server can support.
The number of requests per second (RPS) a web, database, or cache server can handle.
The storage requirements of a service.
Types of data center servers
Web servers
Web servers are the first point of contact after load balancers. Data centers have racks full of web servers that usually handle API calls from the clients. Eg, Facebook has used a webserver with 32 GB RAM and 500 GB storage but a custom 16-core Intel processor (2011).
Application servers
The application server usually carry the business logic and heavy computation tasks. Application servers primarily provide dynamic content, whereas web servers mostly serve static content to the client, which is mostly a web browser. They can require extensive computational and storage resources. For example, Facebook has used application servers with a RAM of up to 256 GB and two types of storage—traditional rotating disks and flash—with a capacity of up to 6.5 TB (2011).
Storage servers
Of course it need a high volume of hard drive. Yet on the software level, there are different types of storage system that handles different types of data. Take YouTube as an example:
Blob (Binary Large Object) storage for encoded videos.
Temporary processing queue storage for storing daily uploaded videos that is waiting for processing temporarily.
Bigtable for thumbnails of videos
Relational database management system (RDBMS) for video related information, account info, comments, etc,.
Others like analytical storage Hadoop's HDFS and other.
Standard number to remember
Component | Time ( |
---|---|
L1 cache reference | 0.9 |
L2 cache reference | 2.8 |
L3 cache reference | 12.9 |
Main memory reference | 100 |
Compress 1KB with Snzip | 3,000 (3 |
Read 1 MB sequentially from memory | 9,000 (9 |
Read 1 MB sequentially from SSD | 200,000 (200 |
Round trip within same datacenter | 500,000 (500 |
Read 1 MB sequentially from SSD with speed ~1GB/sec SSD | 1,000,000 (1 |
Disk seek | 4,000,000 (4 |
Read 1 MB sequentially from disk | 2,000,000 (2 |
Send packet SF –> NYC | 71,000,000 (71 |
The important number for queries per second (QPS), but i may varies depend on many factors like type of query and machine configuration, etc.
Types | Rate |
---|---|
QPS handled by MySQL | 1000 |
QPS handled by key-value store | 10,000 |
QPS handled by cache server | 100,000 – 1M |
Requests estimation
Within a server, how many client requests can it handle? Metric: Request per second (RPS)
There are two types of requests:
CPU-bound requests
Memory-bound requests
CPU-bound: It mainly depends on the time a task will take and the number of CPU threads. Then we have
Memory-bound: Similarly, it depends on how many memory a worker will consume and the time a task will take
If we consider a server receive half CPU-bound request and half memory-bound request, then the server can handle total of
The calculation above is only an ideal condition. In real life, the latency, code error, bad logics could also affect the result.
Calculations
Number of server required
Assume we have 500 Million Daily Active Users (DAU), a single user makes 20 requests per day on average. And we have servers that can handle 8,000 RPS each. Then we have
Only need 15 servers, which is clearly unreal for 10 Billion daily requests. From this, we can see that the factors we omit are also non-negligible. We should also aware that a client request usually will be processed by multiple servers, like the web servers, the application servers and storage servers. That's why the data center will have much more servers than we estimated.
The DAU can give us the upper bound of how many requests a service can get at most for a time point. So we can assume the DAU is the highest burst request in a day to provision the servers. Therefore, the service we calculated earlier will need
Storage requirements
The assumptions: 250 M DAU, each user has 3 posts in a day on average. 10% of posts contain images and 5% of posts contain a video. An image is 200 KB and a video is 3 MB on average. The post's meta data will require 250 Bytes to store in database.
Then we have total storage required for a day is 128 TB. That is 47 PB a year.
Bandwidth requirements
To estimate the bandwidth, we need the following steps:
Estimate the daily amount of incoming data
Estimate the daily amount of outgoing data
Estimate the bandwidth by second
We can use the storage requirement we learned earlier as the total amount of incoming data.
That is
For the outgoing traffic, we assume each user will view 50 posts a day, considering the same ratio of the videos and images. That is
Bandwidth for the metadata:
For the image data:
For the video data:
Thus, we require
In total, this system need $BW{outgoing} + BW{incoming} = 197.5Gbps + 12Gbps = 209.5Gbps