All of us use the internet extensively in our day to day lives, but few of us really think about how it works on a fundamental level. Understandably, most people assume that the inner workings of the internet are far too complicated for them to understand. Let’s change that, shall we?
Aside from satisfying curiosity, it seems that, given the prominent role that the internet plays in our lives, we should have a better general understanding of how it works. With modern high-speed internet connections, websites and online services load in seconds. This can mask just how much is going on behind the scenes whenever you connect to the internet and leave your homepage.
Requests and Responses
A simplified explanation for how the internet works would be this: you (the client) sends a request to a website (the host). The website sends back a result, which is then displayed in your web browser. This all happens so fast that you might not realize there are several things going on under the hood when you connect to a website.
Most websites consist of a variety of different elements, each of which needs to be requested individually. The underlying code of a webpage, written in HTML (Hypertext Markup Language), tells the browser what it needs to request in order to display the webpage, as well as general instructions on how to display it correctly. This enables dynamic websites that display differently according to the device being used.
Before data is sent back to the client from the host, it is broken up into thousands of packets. Each packet contains a tiny bit of information about the file, which the browser then reassembles. These files can be divided into two categories – assets and code files.
These files contain code that can be written in a number of languages – HTML, CSS, Perl, JavaScript, etc. Assets include images, video, audio, and any other media to be displayed or downloaded.
Servers
Websites that you connect to are hosted on servers. You can think of a server as a big, internet-connected computer. Servers that just need to serve up website data might be quite basic in terms of computing power. However, you can also rent servers that have the specs of a high-end gaming rig.
More complex websites require more powerful servers and websites that experience huge volumes of traffic often utilize multiple servers, with auto-routing protocols directing new users to servers where the load is lightest.
Servers have operating systems installed – usually special versions of Linux or Windows. These operating systems are designed to sit on servers and handle their connections. Think of the operating system as one of those switchboard operators that used to route phone calls. When a client connects to a host server, the operating system knows what to do with the incoming connection.
When the server receives an HTTP request, it knows how to process it and, if successful, return an appropriate response.
Protocols
In order for the client to communicate with the host, they need to both use a common set of protocols. These common protocols ensure that any devices that need to connect to the internet are able to do so. Without this standardization, different parts of the internet would require different software for access. Conversely, different protocols can be used, as in the case with TOR, to create networks that behave differently.
There are two protocols that are fundamental to how the internet works – Hypertext Transfer Protocol, Transmission Control Protocol, and Internet Protocol. Collectively, these are the protocols that allow online devices to communicate with one another.
- HTTP: This protocol defines a common language for the client and the host to use to talk to one another. Requests that you send to online servers will be in HTTP format.
- TCP/IP: These communication protocols dictate how data travels across the internet. Specifically, the IP Layer is responsible for directing packets to a specific computer. The TCP Layer is responsible for directing those packets to the right port on the client computer, so they go to the right application.
Domain Name Service
Consider the URL, ‘http://www.example.com/pogs/boglins’. The ‘example.com’ part of the URL contains information about the host’s location and identity. The ‘/pogs/boglins’ part of the address specifies which bit of the website you want to access. You can think of the first part as being like a phone number, which puts you through to a particular business, and the latter part an extension that enables you to connect directly to the right phone.
But computers don’t work with words, they work with numbers. The URL example.com is nice and easy for a human to read, but it’s not what your computer wants. In order to connect to a host, your browser needs to know the IP address of the server you are trying to access.
In order to find the IP address, your browser performs a Domain Name Service (DNS) Lookup. This process is akin to a person looking in the phone book for a telephone number.
When you register a domain name, you are assigned an IP address. So, when example.com is registered, it is added to the domain name registry, along with the associated IP address. When a browser performs a DNS lookup for example.com, it will find the associated IP address of the server where the website is stored and will establish a connection before sending the users’ request.
HTML
The main component of most webpages is an HTML file. This is the source code of the web page; it contains all the instructions a browser needs for displaying the page correctly. It also tells the browser what assets it needs to request from the host.
Each additional asset represents a different request between the client and the host. Sometimes, the HTML code will call for an external script, perhaps a Python or Perl script. Within this script, there may be subsequent calls for other assets. No assets are loaded unless they are requested, however, many web browsers will cache certain elements so that if you visit the page again, you only need to load content that has changed.
Loading a Website
Let’s take a look at how all of those individual components come together when you load a website.
When you click a hyperlink or enter a URL into your web browser, your web browser performs a DNS lookup, which tells it the IP address of the server hosting the website you are requesting. It then establishes a connection with the host using the TCIP/IP protocol.
Once a connection is established, the client sends an HTTP request to the host, either asking for the homepage, or for a specific page or file.
If the client’s request is successful, the host will send the data requested. Before the data is sent back to the client’s web browser, it is broken into packets. The web browser then takes all those individual pieces and puts them together into individual files. Initially, this is usually the HTML source code for the web page, which tells the web browser what assets to request and how to display them. The browser then renders the website you see.
Whenever you connect to a website, there is a lot more going on than it initially seems. Think about how many individual elements make up the websites you used every day. Even though it happens in the blink of an eye, your web browser and the server have to do a lot of communicating.