Network Protocols

A network protocol is a set of rules for exchanging information over a network. Protocols are to networks what language is to humans. For two devices on a network to successfully communicate with each other, they must both follow the same protocols.

A networking system is designed as a multi-layer system, where each layer has its protocol for managing and transmitting data. The Open Systems Interconnection (OSI) model is a conceptual model used to define seven such layers. Without getting into details of all layers, just to illustrate the concept, the uppermost layer is known as the Application layer, whereas the lowermost layer is known as the Physical layer. The Application layer is the actual software application that initiates the sending of the data and on the receiving side is the final application that consumes the data. This Physical layer is the core layer at which transmission across a medium takes place. The five layers manage everything required to transform the data from the format application sends it into the format the physical medium can transmit it in.

Each protocol defines the rules for computers to interact at that level and leverages the services of the protocol layer below it until the lowest layer controls the hardware that sends information across the transmission media.

The two most widely used protocols at the lower layers are the Transmission Control Protocol (TCP) and Internet Protocol (IP), together known as TCP/IP.

The widely used application protocols are HyperText Transfer Protocol (HTTP), Simple Mail Transfer Protocol (SMTP), and File Transfer Protocol (FTP).

TCP/IP

Transmission Control Protocol (TCP) and Internet Protocol (IP), together referred to as TCP/IP, is a set of communications protocols used in the Internet and similar computer networks. The protocols specify how data should be formatted, addressed, transmitted, routed, and received.

The Internet Protocol defines an identifier, known as the IP address (or just IP in short), for each node on a network. The IP address of a node is used by the TCP/IP protocol to identify the source and destination nodes for data transfer. Hence IP addresses must be unique across nodes in a network and across interconnected networks.

For example, two office local area networks that do not ever intend to connect can use the same set of IP addresses within their respective networks. If they are intended to connect, they must use IP addresses that are unique across both networks. If a node is connected to the Internet, the IP address must be unique across the Internet.

Network administrators assign an IP address to each device connected to a network. Such assignments may be static or dynamic. In a static assignment, a device retains its IP as long as it is on the network, even if it shuts down and restarts or even temporarily leaves and rejoins the network. In a dynamic assignment, a device may be assigned a new IP every time it restarts or leaves and rejoins the network from a pool of IPs. In the case of the Internet, IP addresses are assigned by a global body so that every device on the Internet will have a unique IP address.

IP addresses have a specific structure. They are represented as four eight-digit binary numbers, separated by a dot. Each number can be one of 00000000 to 11111111 in binary or 0 to 255 in decimal. The complete IP address can therefore be anything between 0.0.0.0 and 255.255.255.255.

IP Address Format

Some IP addresses are reserved for specific purposes on TCP/IP networks. For example:

IP Address	Purpose
0.0.0.0	It represents the default network, which is the abstract concept of just being connected to a TCP/IP network.
255.255.255.255	This address is reserved for network broadcasts or messages that should go to all computers on the network.
127.0.0.1	This is called the loopback address, which is a way for a computer to identify itself.

Due to the growth of the Internet and the depletion of available addresses a new version of IP addresses (known as IPv6, while the previous version was IPv4), using 128 bits, is being used for newer systems.

HyperText Transfer Protocol (HTTP/HTTPS)

HTTP is an application-layer protocol that follows a request-response pattern between a client and a server. The client opens a connection with the server and sends it a request for data. The client then waits and keeps the connection open until it receives a response from the server. Once the response is received, it could be the requested data or an error response, the client closes the connection. For the next request, it opens a new connection. HTTP is a stateless protocol, which means that the client or the server does not keep any data (state) between two requests.

HTTP is the application protocol used by the World Wide Web (referred to as WWW or the Web). The Web started as a relatively simple solution to share content in multiple formats, including text documents, images, videos, and audio. It evolved into a comprehensive solution and is now the basis for almost everything we do today on the Internet. That is why the Internet and the World Wide Web are so often used interchangeably, although they are technically very different things.

In the context of the World Wide Web, a client is referred to as a Web Browser and a server as a Web Server. The text that is requested by the client and sent by the server is known as Hypertext. HyperText has a concept of hyperlinks, that link multiple hypertext documents. Hypermedia is a term used for content other than text, including images, video, and audio. The World Wide Web is a collection of hypermedia.

The HTTP protocol is not limited to a request for content, it can also be a request for transactions. The client can send any kind of data to the server which will process it and send back a response after processing the data. The protocol is also not only limited to a web browser but also to other clients. For example, in machine-to-server communication in an IoT solution, the request is not initiated by a human but rather by a machine. All web and mobile applications use the HTTP protocol for their transactions.

There is a security protocol known as Transport Layer Security (TLS), earlier known as Secure Sockets Layer (SSL), which is used in conjunction with the application protocols to add a layer of security to the data being transferred. With TLS, HTTP becomes HTTPS, where S stands for "secure".

Domain Name System

The TCP/IP protocol that is used to send data between nodes on a network requires that all devices on a network must have a unique identifier known as the IP address. IP addresses are represented by four numbers, each ranging from 0 to 255, separated by a dot. An IP address can therefore be anything between 0.0.0.0 and 255.255.255.255.

Since such IP addresses are not user-friendly or easy to remember, the Domain Name System (DNS) was created to assign user-friendly names to each IP address. A domain name consists of two or more parts that are separated by a dot. For example, one of the most widely used domain names would be google.com.

Domain names are structured as a hierarchy. Going from right to left, each part on the left of a dot specifies a child of the part on its right. The right-most part specifies what is known as a top-level domain (TLD). Top-level domains indicate a category of establishments. For example, .com is used for global commercial businesses, .edu is used for educational institutes, .gov is used for government organizations, and so on. Top-level domains (TLDs) started with a small set but today have expanded to include countries (for example, .co.in is used for commercial businesses in India and so on), and specific business types (.bank, .tv, and so on).

Some TLDs are restricted to entities that meet certain criteria, such as .edu can only be assigned to a registered educational institute, while others like .com are open for anyone to use, including individuals.

The next part is the name of the entity. For example, google.com or stanford.edu, are domain names specifically assigned to Google as a business and Stanford as an educational institute. These two parts of the domain name must be globally unique combinations. No two entities can be assigned the same two-part domain name. For example, you cannot have two businesses with google.com or two educational institutes with stanford.edu, but you can have google.com and google.co.in (a domain for Google globally and another one for Google in India).

The first part, such as www, as in www.google.com (opens in a new tab) or www.standford.edu (opens in a new tab) is the reference to the actual node on the network. This three-part domain name is now mapped to a unique IP address. Now, instead of needing to remember an IP address for a device, you can remember a far more user-friendly domain name.

One of the global entities that manage the Internet, ICANN, along with a global network of Domain Registrars, usually one in each country, managing the allocation of unique domain names. Ideally, a business, an academic institute, or any other entity would prefer a domain name that matches its name exactly. When allocating domain names, as far as possible, exact matches to registered names are allocated to prevent misrepresentation of the entity. With many common domain names already taken up, new businesses have to settle for near-matches.

Uniform Resource Locator (URL)

The World Wide Web is a collection of hypermedia (text, images, videos, and content in other formats that link to each other). These hypermedia elements are referred to as resources. To identify a specific resource, web browsers use what is known as a Uniform Resource Locator (URL). A typical URL will have the form http://www.example.com/resource.html (opens in a new tab), which includes the protocol to be used to access the resource (HTTP), the domain name that identifies the server on which the resource is located (www.example.com (opens in a new tab)), and finally, the resource name (resource.html).

URL Structure

Network Connections The Internet and WWW