Back-of-the-envelope Calculations

Back-of-the-envelope calculations

Back-of-the-envelope calculations help us ignore the nitty-gritty details of the system (at least at the design level) and focus on more important aspects.

Some examples of a back-of-the-envelope calculation could be:

  • The number of concurrent TCP connections a server can support.

  • The number of requests per second (RPS) a web, database, or cache server can handle.

  • The storage requirements of a service.

Types of data center servers

server types

Web servers

Web servers are the first point of contact after load balancers. Data centers have racks full of web servers that usually handle API calls from the clients. Eg, Facebook has used a webserver with 32 GB RAM and 500 GB storage but a custom 16-core Intel processor (2011).

Application servers

The application server usually carry the business logic and heavy computation tasks. Application servers primarily provide dynamic content, whereas web servers mostly serve static content to the client, which is mostly a web browser. They can require extensive computational and storage resources. For example, Facebook has used application servers with a RAM of up to 256 GB and two types of storage—traditional rotating disks and flash—with a capacity of up to 6.5 TB (2011).

Storage servers

Of course it need a high volume of hard drive. Yet on the software level, there are different types of storage system that handles different types of data. Take YouTube as an example:

  1. Blob (Binary Large Object) storage for encoded videos.

  2. Temporary processing queue storage for storing daily uploaded videos that is waiting for processing temporarily.

  3. Bigtable for thumbnails of videos

  4. Relational database management system (RDBMS) for video related information, account info, comments, etc,.

  5. Others like analytical storage Hadoop's HDFS and other.

Standard number to remember

ComponentTime (ns)
L1 cache reference0.9
L2 cache reference2.8
L3 cache reference12.9
Main memory reference100
Compress 1KB with Snzip3,000 (3 μs)
Read 1 MB sequentially from memory9,000 (9 μs)
Read 1 MB sequentially from SSD200,000 (200 μs)
Round trip within same datacenter500,000 (500 μs)
Read 1 MB sequentially from SSD with speed ~1GB/sec SSD1,000,000 (1 ms)
Disk seek4,000,000 (4 ms)
Read 1 MB sequentially from disk2,000,000 (2 ms)
Send packet SF –> NYC71,000,000 (71 ms)

The important number for queries per second (QPS), but i may varies depend on many factors like type of query and machine configuration, etc.

TypesRate
QPS handled by MySQL1000
QPS handled by key-value store10,000
QPS handled by cache server100,000 – 1M

Requests estimation

Within a server, how many client requests can it handle? Metric: Request per second (RPS)

There are two types of requests:

  • CPU-bound requests

  • Memory-bound requests

CPU-bound: It mainly depends on the time a task will take and the number of CPU threads. Then we have RPSCPU=NCPU×1TTask

Memory-bound: Similarly, it depends on how many memory a worker will consume and the time a task will take RPSmemory=RAMsizeWorkermemory×1TTask

If we consider a server receive half CPU-bound request and half memory-bound request, then the server can handle total of RPStotal=1RPSCPU+1RPSmemory

The calculation above is only an ideal condition. In real life, the latency, code error, bad logics could also affect the result.

Calculations

Number of server required

Assume we have 500 Million Daily Active Users (DAU), a single user makes 20 requests per day on average. And we have servers that can handle 8,000 RPS each. Then we have

IMG_2187(20230510-175433)

Only need 15 servers, which is clearly unreal for 10 Billion daily requests. From this, we can see that the factors we omit are also non-negligible. We should also aware that a client request usually will be processed by multiple servers, like the web servers, the application servers and storage servers. That's why the data center will have much more servers than we estimated.

The DAU can give us the upper bound of how many requests a service can get at most for a time point. So we can assume the DAU is the highest burst request in a day to provision the servers. Therefore, the service we calculated earlier will need 500M8000=62,500 servers at most to provide the availability.

Storage requirements

The assumptions: 250 M DAU, each user has 3 posts in a day on average. 10% of posts contain images and 5% of posts contain a video. An image is 200 KB and a video is 3 MB on average. The post's meta data will require 250 Bytes to store in database.

Then we have total storage required for a day is 128 TB. That is 47 PB a year.

Bandwidth requirements

To estimate the bandwidth, we need the following steps:

  • Estimate the daily amount of incoming data

  • Estimate the daily amount of outgoing data

  • Estimate the bandwidth by second

We can use the storage requirement we learned earlier as the total amount of incoming data.

That is 128×101286400×812Gbps incoming bandwidth.

For the outgoing traffic, we assume each user will view 50 posts a day, considering the same ratio of the videos and images. That is 250M×50posts=12.5B posts are viewed per day.

12.5B86400=145000 posts viewed per second.

Bandwidth for the metadata: 145000×250×8bits0.3Gbps.

For the image data: 145000×10100×200000×8bits=23.2Gbps.

For the video data: 145000×5100×3×106×8bits=174Gbps.

Thus, we require 0.3+23.2+174197.5Gbps outgoing bandwidth.

In total, this system need $BW{outgoing} + BW{incoming} = 197.5Gbps + 12Gbps = 209.5Gbps

By bruce

Leave a Reply

Your email address will not be published. Required fields are marked *