Introduction to Kubernetes

Article

Muhammad Waseem · Mar 25, 2024 7m read

#Containerization #Databases #Deployment #Docker #Framework #Kubernetes #Microservices #Other

In this article, we will cover below topics:

What is Kubernetes?
Main Kubernetes (K8s) Components

What is Kubernetes?

Kubernetes is an open-source container orchestration framework developed by Google. In essence, it controls container speed and helps you manage applications consisting of multiple containers. Additionally, it allows you to operate them in different environments, e.g., physical machines, virtual machines, Cloud environments, or even hybrid deployment environments.

What problems does it solve?

The emergence of containerization in microservices technology has given rise to applications composed of multiple containers.

However, managing those containers across multiple environments using scripts and self-made tools can be complex. That is why this specific scenario caused the need for developing container orchestration technologies, e.g., such orchestration tools as Kubernetes which guarantee the following features:

High availability In simple words, it means that the application has no downtime, so it is always accessible to the users.

Scalability suggests that the application has high performance, it loads fast, and the users have very high response rates from the application.
Disaster recovery indicates that in case the infrastructure has some problems (e.g., the data was lost, the servers exploded, or any other problem with the service center occurred), there must be a mechanism to pick up the data and restore it to the latest state so that the main application does not lose any information, and the containerized application can run from the most recent state after the recovery.

Main Kubernetes (K8s) Components

Below, you can find an overview of the core components of Kubernetes:

Pod

Pod is a fundamental concept and one of the core components of Kubernetes. A Pod is the smallest deployable unit in Kubernetes, representing a single instance of a running process in a cluster. It can contain one or more containers that are tightly coupled and share the same network namespace, allowing them to communicate with each other.

What does pod do? It creates a running environment or a layer on top of the Docker container. We can have an application pod for our application that will possibly use a database pod with its container. An important concept here is that we typically expect the pod to run one application container inside of it. Yet, you can also run multiple containers inside one pod, but generally, it only happens if you have the main application container and a helper container or a side service that has to run inside that pod. You only have one server and two containers running on it with an abstraction layer on top of it.

kubernetes offers a virtual network. It means that every pod gets its IP address but not the container. As a result, pods can communicate with each other using an IP address that is an internal one.

Pod components in Kubernetes also have another important concept called "ephemeral". It means that they can die easily, and when that happens (e.g. if you lose a database container because it crashed or because the nodes on the server that you are using ran out of resources), the pod will die, and a new one will get created to replace it. When that event occurs, a new pod will get assigned a new IP address. It is inconvenient if you are communicating with the database using the IP address since from now on you will have to adjust it every time the pod restarts and because another component of Kubernetes called "Service" is utilized.

Service

Service is a static or permanent IP address that can be attached to each pod, meaning that the app will have one service, and the database pod will have another service. The good thing here is that the life cycles of the service and the Pod are not connected, so even if the Pod dies, the service and its IP address will remain, indicating that you will not have to change the endpoint anymore.

To access an application through the browser you would have to create an external service. It is a service that opens the communication from external sources. However, you would, obviously, not want your database to be open to public requests. That is why you would need to develop something called an internal service.

Ingress

When we create an external service, we typically specify the HTTP protocol, a node IP address, and the port number that looks like http://129.80.102.2:8080. Although it is good enough for testing purposes, for the final product you would want your URL to look like http://domain-name. That is why another component of Kubernetes called "Ingress" was invested. So, instead of service, the request goes first to Ingress, which forwards it to the service afterward.

ConfigMap

It is a rule of thumb to establish communication via a service. In this case, our application will have a database endpoint. Yet, where do we usually configure this database URL or endpoint? Typically, we would do it in the application properties file or as an external environmental variable. However, generally, it is located inside the built image of the application. For instance, if the endpoint of the service has changed, we would have to adjust that URL in the application. It signifies that we would have to rebuild the application with a new version and push it to the repository. Then we will have to pull that new image in our pod and restart the whole thing. As you can see, it is a bit tedious for such a small change as a database URL. For that reason, Kubernetes has a component called "ConfigMap".

It usually contains such configuration data as URLs of a database or some other services that we employ in Kubernetes. We simply connect it to the Pod so that it can get the data that ConfigMap contains. From now on, if we change the name of the service and the endpoint, we will only need to adjust the config map.

It helps us manage configuration changes independently of the application code, making it easier to modify settings without rebuilding and redeploying the application.

Secret

Putting a password or other credentials to a "ConfigMap" in plain text format might be insecure, even considering it is an external configuration.

For this purpose, Kubernetes has another component called "secret". It resembles ConfigMap, but the difference is that we use it to store secret data credentials, and it is kept in base 64 encoded format.

Just as we did with ConfigMap, we should simply connect the secret to our pod so that the pod can see the data and read from the secret.

Volume

It is time to look closer at another crucial concept - data storage. Occasionally our application uses a database, and if the database container or the Pod gets restarted, the data would be gone. It is problematic and inconvenient because we want our database or log data to persist reliably long term. The way to do it in Kubernetes is by using another component called "Volumes".

It connects physical storage on a hard drive to your pod. The storage could be located either on a local machine (meaning on the same server node where the Pod is running) or on remote storage (signifying outside of the Kubernetes cluster). It could be cloud storage or your premise storage that is not a part of the Kubernetes cluster. It suggests that you have an external reference on it. At this point, when the database pod or container gets restarted, all the data will persist there. It is a remote storage. You can simply think of it as an external hard drive plugged into the Kubernetes cluster.

Deployment

What will happen if our application pod dies/crashes or if we have to restart the Pod because we have built a new container image? In this case, we will have some downtime when a user can reach our application. It is a very negative thing if it happens in production, but this is precisely why distributed systems and containers are at an advantage. So, instead of relying on only one pod, we replicate everything on multiple servers. That blueprint is called "Deployment".

So, now we will have another node where a replica or clone of our application can run. It will also be connected to the service. Do you remember that we previously said the service is like a persistent static IP address with a DNS name? It implies that you do not have to adjust the endpoint constantly when a pod dies. If one of the replicas of your application pod fails, the service will forward the requests to another one, so our application will still be accessible for the user.

StatefulSet

Likewise, if our database pod perishes, we need to have a database replica as well. Still, we cannot duplicate a database using "Deployment". The reason for that is that the database has a state that is its data.

It suggests that if we have clones or replicas of the database, they will all need to access the same shared data storage. In that scenario, you will require a mechanism that can manage which pods are currently writing to that storage and which ones are reading from there. It is necessary to avoid the data inconsistencies. This mechanism, in addition to a replicating feature, is offered by another Kubernetes component called "StatefulSet".

Database pods should always be developed using StatefulSets to ensure that the database reading and writing are synchronized.

To summarize, we have explored the most common Kubernetes components:

We started with the pods and the services we needed to communicate with each other.
Then, we examined the Ingress component used to Route traffic into the cluster.
We have also looked at an external configuration using ConfigMaps and Secrets.
Afterward, we analyzed Data persistence using Volumes.
Finally, we have taken a quick look at pod blueprints with replicating mechanisms like Deployments and StatefulSets, where the latter is employed specifically for such stateful applications as databases.

Thanks