What is CDN Database?

What is CDN Database How do CDNs work Explained.



A content delivery network (CDN) refers to a geographically distributed group of servers which work together to provide fast delivery of Internet content.

A CDN allows for the quick transfer of assets needed for loading Internet content including HTML pages, javascript files, stylesheets, images, and videos. The popularity of CDN services continues to grow, and today the majority of web traffic is served through CDNs, including traffic from major sites like Facebook, Netflix, and Amazon.

A properly configured CDN may also help protect websites against some common malicious attacks, such as Distributed Denial of Service (DDOS) attacks.

The Stateful CDN is a model that incorporates a distributed database into the technology stack. The purpose of the CDN database is to provide state to stateless services such as edge functions, edge containers, and serverless. Cloudflare is the first startup to introduce a CDN database. In due time, more CDNs will follow. This post is a high-level overview of NewSQL and emerging CDN database.

CDN Explanation.

A few years ago, building a distributed SQL database with strong consistency and ACID compliance was extremely difficult. Some tried but failed. Because of the rigidity of SQL architecture, NoSQL emerged as an alternative. Companies like Google and Amazon sacrificed consistency for availability because it meant they could scale horizontally and add millions of clients.

Then Calvin happened, followed shortly thereafter by Google Spanner. The thought leaders behind Calvin and Spanner introduced a novel approach to building a distributed database that incorporated strong consistency and high availability. As a result, a new crop of startups popped up that followed in their ways.

For this first time, startups and CDNs have the tools and information to develop a CDN database that has strong consistency and high availability, globally. Academia and startups have published deep insight on the subject, in the form of white papers, videos, and blog post.

When a user tries to access the webserver data(website) from a particular location, if the CDN settings are configured for the website, then the request first goes to nearby optimal CDN node. Now, if the CDN already has the requested data cached on that node, then it is served to the user from that node without the need to go back to the origin server. However, if the data is not cached on to the CDN node serving that user, then request first goes to CDN node, then from there it goes to the origin server. CDN node then fetches the data from the origin server and serves the user’s request. Moreover, CDN node caches this data to serve any future requests from this user or any other user requesting that data from this node. So, in principle, using CDN on a website, only the first user requesting specific content suffers an extra latency and all the user requesting this content in the future would have faster access to this specific content because it would be served from the CDN node. Of course, this depends on the content cache setting of the content (e.g. website cache settings) as these settings may require CDN node to fetch the content again based on the content expiration time. This is explained a bit more at the bottom of this page.

We have identified three startups that are leading the movement of the distributed SQL database. Although these startups have built some parts of the database from scratch, many incorporate existing technologies like RocksDB storage engine and raft consensus.

At its core, a CDN is a network of servers linked together with the goal of delivering content as quickly, cheaply, reliably, and securely as possible. In order to improve speed and connectivity, a CDN will place servers at the exchange points between different networks.

CDN stands for Content Delivery Network. As the name suggests it is a network of distributed nodes (also known as Edge Location Servers) which helps deliver contents (webpages, video, image etc.) to the end-user based on the user’s location, content origin server and the Edge server location. CDN nodes have the cache capabilities to cache the content and can serve the content to a user from a location which is geographically close to the end-user. CDN nodes are deployed in multiple geographic locations by CDN providers and can span over multiple ISP (Internet Service Provider) networks.

These Internet exchange points (IXPs) are the primary locations where different Internet providers connect in order to provide each other access to traffic originating on their different networks. By having a connection to these high speed and highly interconnected locations, a CDN provider is able to reduce costs and transit times in high speed data delivery.

CDNs just bring static information closer to the users, caching that information in points-of-presence (PoPs) around the globe. It is mostly done by web-servers sitting within those PoPs. So whatever you can't retrieve by HTTP GET will likely be a problem. For example, legacy protocol RTMP (also video) is supported by legacy CDNs (Level3/Akamai/EdgeCast), but not by newly formed Cloudflare/Cloudfront and so on, because it requires adds-on to web-server and clutters workflows.

Technically, any static database can be stored in a file, and the file can be cached by a CDN. But then, again, it would be your code that takes care of db->file->db metamorphosis. Therefore, if something is static, you don't really want to use database for it (to be future/CDN-proof). Subtitles are just text files, so let them be files in asset folders. I appreciate that high level architecture might be beyond your control here (due to specific ingesting system for instance), but then the answer is that you won't be able to do what you try, and resulting performance will suffer.

Let’s assume you own a website which is hosted on a web server (origin of the content) and serves images of inspirations quotes to the website visitors. As you started your website, you hosted it in a particular geographic location, say UK. Initially when the website started, you may see visitor from UK visiting your website to see and download content. As your website starts to get popular, more users from UK and abroad starts to access your website. This creates three challenges. First, as more users are accessing the same webserver, the load on webserver increases and it would not be able to serve all the users at the same time, in turn, causing bad user experience in accessing your website. Second, as the origin server is located in UK, users accessing it would face latency because the data is transferred over internet from a geographically distant location. Third challenge is the bandwidth cost that you need to pay to serve content from the origin server as a greater number of users require more bandwidth to serve the content.

These three challenges can be tackled by using CDN. As CDN is a network of geographically distributed servers for serving content, it acts as a middleman between the end-user and the origin server. For the first challenge, once a user accesses the website, request is sent to the most suitable node for the content delivery and then it is served from the CDN nodes in the same principle as described above. So, in this case, instead of sending all the requests from all users to origin server, it is sent to the distributed nodes and load on origin server is minimised.

একটি মন্তব্য পোস্ট করুন

0 মন্তব্যসমূহ

Close Menu