Software-Defined Storage and Containers

Comments

As far as I understood, they discussing mounting volumes from another container.

As I know some kind of this possibility was in some previous versions of docker-compose. When you can configure to start multiple containers, and it offered to mount volumes_from another container in this configuration. But now docker-compose becomes bigger and supports swarm configuration, they decided to change volumes_from, to just named volumes. In this way, you can define one volume, and use it in different containers at the same time. And it works pretty well. And I think, it is quite enough, with different drivers available for volumes, you can use data from almost everywhere.

And it is now possible even with current Caché versions.

But I think, that way offered in the article could be less effective. You still have to mount volume but in some different way, but what important is how real storage is used.

If these data containers are just a connectors between cluster storage and application. A think it still possible to use it as a mounted volume, or you can configure mounting during the build of an image.

Hi Rob,

I think the article is confusing and not clear. There are various technologies that are coming into play here and simple terms like volumes are just not enough to comprehend the context they are used within. I wish IT could speak like in Edward DeBono "The DeBono Code Book". Things would be clearer, maybe...  :)

This post could degenerate in a long essay so I'm going to try to focus on explaining the basics. My hope is that future Q&A in this thread will have the opportunity to clarify things further and allows us all to learn more as technologies are developed as we sleep... 

The wider context that the marketing person from RH is writing about is that of containers & persistent data. That is enough to write a book :) In its essence, a container does NOT exclude persistent data. The fact that we see containers described as ephemeral is only partially true because:

  • They actually have a writeable layer
  • They force us to think about the separation of concern between code and data (this is a good thing)

The other part of the context of the article is that the writer refers to Kubernetes, a container orchestrator engine that you can now find in your next Docker CE engine you'll download. Yes, Docker has Docker Swarm (their container orchestrator) featured in the engine but you can now play with Kubernetes (K8s) too. In the cloud-era everybody is friend with everybody :)

Now, K8s was select last December as the main application-orchestrator for OpenShift (a cloud-like, full application-provisioning, management platform owned by Red Hat). That was a major change: no more VMs to handle an application; just provide containers and let K8s manage them.

Now, we all know that having 100 Nginx containers running is one thing but having 100 stateful, transactional & mirrored Caché, Ensemble or InterSystems IRIS containers is a completely different game. Well, you would not be surprised to know that persistence is actually the toughest issue to solve and manage... with &  without containers. Think of an EC2 instance that just died. What happened to that last transaction you committed? Hopefully, you had an AWS EBS volume that was just mounted on it (i.e you did not use the boot/OS disk) and the data is still in that volume that has a life of its own. All you have to do is spin up another EC2 instance and mount that same volume. The point being that you must know that you have an EBS volume and that you take very good care of it and of its data: you snapshot it, you replicate the data, etc.

The same exact thing is true when you use K8s. You must provision those volumes. Now in K8s volumes have a slightly different definition: yes it defines the storage and has a higher-level abstraction (see their docs for more info) but the neglected part is that a volume is linked to a POD. A POD is a logical grouping of containers that has its definitions and lifecycle, however, a K8s Volume has a distinct life of its own that can survive the crash of a POD. We get into software defined storage (SDS) territory and hyperconverged storage/infrastructure. This is a very interesting and evolving part of the technology world that I believe will enable us to be even more elastic in the future than we are now (both on-prem and public clouds).

K8s is just trying to make the most of what is available now keeping a higher level abstraction so that they can further tune their engine as new solutions comes to market.

Concluding,  in order to clear some misconceptions the article might have created:

There aren't any special data-container in K8s. There are, however with Docker containers (just to confuse further :) but you should not use them for your database storage).

With K8s and Docker there are SDS drivers/provisioners  (or plugins in Docker lingo) that can leverage available storage. K8s allow you to mount them in your containers. It is supposed to do other things like formatting the volumes but not all drivers are equals and as usual YMMV.

What those drivers and plungins can do for you is mount the same K8s PV from another spun up container in another POD if you original container or POD dies. The usual caveats apply like if you are in another AZs or Region you won't be able to do that. IOW we are still tied to what the low-level storage can offer us. And of course, we would still need to deal with the consistency of that DB that suddenly died. So, how do you deal with that?

With InterSystems IRIS data platform you can use its Mirror technology that can safeguard you against those containers or POD disappearances. Furthermore, as our Mirroring replicates the data you could have a simpler (easier, lower TCO, etc.) volume definition with lower downtime. Just make sure you tell K8s not to interfere with what it does not know: DB transactions ;)

HTH