Back-of-the-envelope calculations

Back-of-the-envelope Calculations

Back-of-the-envelope calculations

Back-of-the-envelope calculationsTypes of data center serversWeb serversApplication serversStorage serversStandard number to rememberRequests estimationCalculationsNumber of server requiredStorage requirementsBandwidth requirements

Back-of-the-envelope calculations help us ignore the nitty-gritty details of the system (at least at the design level) and focus on more important aspects.

Some examples of a back-of-the-envelope calculation could be:

The number of concurrent TCP connections a server can support.
The number of requests per second (RPS) a web, database, or cache server can handle.
The storage requirements of a service.

Types of data center servers

server types

Web servers

Web servers are the first point of contact after load balancers. Data centers have racks full of web servers that usually handle API calls from the clients. Eg, Facebook has used a webserver with 32 GB RAM and 500 GB storage but a custom 16-core Intel processor (2011).

Application servers

The application server usually carry the business logic and heavy computation tasks. Application servers primarily provide dynamic content, whereas web servers mostly serve static content to the client, which is mostly a web browser. They can require extensive computational and storage resources. For example, Facebook has used application servers with a RAM of up to 256 GB and two types of storage—traditional rotating disks and flash—with a capacity of up to 6.5 TB (2011).

Storage servers

Of course it need a high volume of hard drive. Yet on the software level, there are different types of storage system that handles different types of data. Take YouTube as an example:

Blob (Binary Large Object) storage for encoded videos.
Temporary processing queue storage for storing daily uploaded videos that is waiting for processing temporarily.
Bigtable for thumbnails of videos
Relational database management system (RDBMS) for video related information, account info, comments, etc,.
Others like analytical storage Hadoop's HDFS and other.

Standard number to remember

Component	$ns$ )
L1 cache reference	0.9
L2 cache reference	2.8
L3 cache reference	12.9
Main memory reference	100
Compress 1KB with Snzip	$\mu s$ )
Read 1 MB sequentially from memory	$\mu s$ )
Read 1 MB sequentially from SSD	$\mu s$ )
Round trip within same datacenter	$\mu s$ )
Read 1 MB sequentially from SSD with speed ~1GB/sec SSD	$ms$ )
Disk seek	$ms$ )
Read 1 MB sequentially from disk	$ms$ )
Send packet SF –> NYC	$ms$ )

The important number for queries per second (QPS), but i may varies depend on many factors like type of query and machine configuration, etc.

Types	Rate
QPS handled by MySQL	1000
QPS handled by key-value store	10,000
QPS handled by cache server	100,000 – 1M

Requests estimation

Within a server, how many client requests can it handle? Metric: Request per second (RPS)

There are two types of requests:

CPU-bound requests
Memory-bound requests

CPU-bound $RPS_{CPU} = N_{CPU} \times \frac{1}{T_{Task}}$

Memory-bound $RPS_{memory} = \frac{RAM_{size}}{Worker_{memory}} \times \frac{1}{T_{Task}}$

$RPS_{total} = \frac{1}{RPS_{CPU}} + \frac{1}{RPS_{memory}}$

The calculation above is only an ideal condition. In real life, the latency, code error, bad logics could also affect the result.

Calculations

Number of server required

Assume we have 500 Million Daily Active Users (DAU), a single user makes 20 requests per day on average. And we have servers that can handle 8,000 RPS each. Then we have

IMG_2187(20230510-175433)

Only need 15 servers, which is clearly unreal for 10 Billion daily requests. From this, we can see that the factors we omit are also non-negligible. We should also aware that a client request usually will be processed by multiple servers, like the web servers, the application servers and storage servers. That's why the data center will have much more servers than we estimated.

$\frac{500 M}{8000} = 62,500$ servers at most to provide the availability.

Storage requirements

The assumptions: 250 M DAU, each user has 3 posts in a day on average. 10% of posts contain images and 5% of posts contain a video. An image is 200 KB and a video is 3 MB on average. The post's meta data will require 250 Bytes to store in database.

Then we have total storage required for a day is 128 TB. That is 47 PB a year.

Bandwidth requirements

To estimate the bandwidth, we need the following steps:

Estimate the daily amount of incoming data
Estimate the daily amount of outgoing data
Estimate the bandwidth by second

We can use the storage requirement we learned earlier as the total amount of incoming data.

$\frac{128\times 10^{12}}{86400} \times 8 \approx 12 Gbps$ incoming bandwidth.

$250 M \times 50 posts = 12.5 B$ posts are viewed per day.

$\frac{12.5B}{86400} = 145000$ posts viewed per second.

$145000 \times 250 \times 8 bits \approx 0.3 Gbps$ .

$145000 \times \frac{10}{100} \times 200000 \times 8 bits = 23.2 Gbps$ .

$145000 \times \frac{5}{100} \times 3 \times 10^6 \times 8 bits = 174 Gbps$ .

$0.3 + 23.2 + 174 \approx 197.5 Gbps$ outgoing bandwidth.

In total, this system need $BW{outgoing} + BW{incoming} = 197.5Gbps + 12Gbps = 209.5Gbps

Back-of-the-envelope calculations

Back-of-the-envelope calculations

Types of data center servers

Web servers

Application servers

Storage servers

Standard number to remember

Requests estimation

Calculations

Number of server required

Storage requirements

Bandwidth requirements

By bruce

Related Post

Leave a Reply Cancel reply

You Missed

# LeetCode Daily Challenge Discussion | 399. Evaluate Division

# LeetCode Daily Challenge | 785. Is Graph Bipartite

# LeetCode Daily Challenge Discussion | 1557. Minimum Number of Vertices to Reach All Nodes

2130. Maximum Twin Sum of a Linked List # LeetCode Daily Challenge

About

Categories

Tags

Recent Post

# LeetCode Daily Challenge Discussion | 399. Evaluate Division

# LeetCode Daily Challenge | 785. Is Graph Bipartite