All of
us use the internet extensively in our day to day lives, but few of us really
think about how it works on a fundamental level. Understandably, most people
assume that the inner workings of the internet are far too complicated for them
to understand. Let’s change that, shall we?
Aside
from satisfying curiosity, it seems that, given the prominent role that the
internet plays in our lives, we should have a better general understanding of
how it works. With modern high-speed internet connections, websites and online
services load in seconds. This can mask just how much is going on behind the
scenes whenever you connect to the internet and leave your homepage.
Requests and Responses
A
simplified explanation for how the internet works would be this: you (the
client) sends a request to a website (the host). The website sends back a
result, which is then displayed in your web browser. This all happens so fast
that you might not realize there are several things going on under the hood when
you connect to a website.
Most
websites consist of a variety of different elements, each of which needs to be
requested individually. The underlying code of a webpage, written in HTML
(Hypertext Markup Language), tells the browser what it needs to request in
order to display the webpage, as well as general instructions on how to display
it correctly. This enables dynamic websites that display differently according
to the device being used.
Before
data is sent back to the client from the host, it is broken up into thousands
of packets. Each packet contains a tiny bit of information about the file,
which the browser then reassembles. These files can be divided into two
categories – assets and code files.
These
files contain code that can be written in a number of languages – HTML, CSS,
Perl, JavaScript, etc. Assets include images, video, audio, and any other media
to be displayed or downloaded.
Servers
Websites that you connect to are hosted on servers. You can think of a server as a big, internet-connected computer. Servers that just need to serve up website data might be quite basic in terms of computing power. However, you can also rent servers that have the specs of a high-end gaming rig.
More complex websites
require more powerful servers and websites that experience huge volumes
of traffic often utilize multiple servers, with auto-routing protocols
directing new users to servers where the load is lightest.
Servers
have operating systems installed – usually special versions of
Linux or Windows. These operating systems are designed to sit on servers
and handle their connections. Think of the operating system as one of those
switchboard operators that used to route phone calls. When a client connects to
a host server, the operating system knows what to do with the incoming
connection.
When the
server receives an HTTP request, it knows how to process it and, if successful,
return an appropriate response.
Protocols
In order
for the client to communicate with the host, they need to both use a common set
of protocols. These common protocols ensure that any devices that need to
connect to the internet are able to do so. Without this standardization,
different parts of the internet would require different software for access.
Conversely, different protocols can be used, as in the case with TOR, to create
networks that behave differently.
There
are two protocols that are fundamental to how the internet works – Hypertext
Transfer Protocol, Transmission Control Protocol, and Internet Protocol.
Collectively, these are the protocols that allow online devices to communicate
with one another.
- HTTP: This protocol defines a common
language for the client and the host to use to talk to one another. Requests
that you send to online servers will be in HTTP format.
- TCP/IP: These communication protocols
dictate how data travels across the internet. Specifically, the IP Layer is
responsible for directing packets to a specific computer. The TCP Layer is
responsible for directing those packets to the right port on the client computer,
so they go to the right application.
Domain Name Service
Consider
the URL, ‘http://www.example.com/pogs/boglins’.
The ‘example.com’ part of the URL
contains information about the host’s location and identity. The ‘/pogs/boglins’ part of the address
specifies which bit of the website you want to access. You can think of the
first part as being like a phone number, which puts you through to a particular
business, and the latter part an extension that enables you to connect directly
to the right phone.
But computers
don’t work with words, they work with numbers. The URL example.com is nice and easy for a human to read, but it’s not what
your computer wants. In order to connect to a host, your browser needs to know
the IP address of the server you are trying to access.
In order to find the IP address, your browser performs a Domain Name Service (DNS) Lookup. This process is akin to a person looking in the phone book for a telephone number.
When you
register a domain name, you are assigned an IP address. So, when example.com is registered, it is added
to the domain name registry, along with the associated IP address. When a
browser performs a DNS lookup for example.com,
it will find the associated IP address of the server where the website is
stored and will establish a connection before sending the users’ request.
HTML
The main
component of most webpages is an HTML file. This is the source code of the web
page; it contains all the instructions a browser needs for displaying the page
correctly. It also tells the browser what assets it needs to request from the
host.
Each additional asset represents
a different request between the client and the host. Sometimes, the HTML code
will call for an external script, perhaps a Python or Perl script. Within this
script, there may be subsequent calls for other assets. No assets are loaded
unless they are requested, however, many web browsers will cache certain
elements so that if you visit the page again, you only need to load content
that has changed.
Loading a Website
Let’s
take a look at how all of those individual components come together when you load a website.
When you
click a hyperlink or enter a URL into your web browser, your web browser
performs a DNS lookup, which tells it the IP address of the server hosting the
website you are requesting. It then establishes a connection with the host
using the TCIP/IP protocol.
Once a
connection is established, the client sends an HTTP request to the host, either
asking for the homepage, or for a specific page or file.
If the
client’s request is successful, the host will send the data requested. Before
the data is sent back to the client’s web browser, it is broken into packets.
The web browser then takes all those individual pieces and puts them together
into individual files. Initially, this is usually the HTML source code for the
web page, which tells the web browser what assets to request and how to display
them. The browser then renders the website you see.
Whenever
you connect to a website, there is a lot more going on than it initially seems.
Think about how many individual elements make up the websites you used every
day. Even though it happens in the blink of an eye, your web browser and the
server have to do a lot of communicating.