Distributed Cloud Computing
SlapOS is based on a Master and Slave design. We are going to provide here an overview of SlapOS architecture. We are going in particular to explain the role of Master node and Slave nodes, as well as the software components which they rely on to operate a Distributed Cloud.
Slave nodes request to Master nodes which software they should install, which software they show run and report to Master node how much resources each running software has been using for a certain period of time. Master nodes keeps track of available slave node capacity and available software. Master node also acts as a Web portal and Web service so that end users and software bots can request software instances which are instantiated and run on Slave nodes. Master nodes are stateful. Slave nodes are stateless. More precisely, all information required to rebuild a Slave node is stored in the Master node. This may include the URL of a backup service which keeps a online copy of data so that in case of failure of a Slave node, a replacement Slave node can be rebuilt with the same data.
It is thus very important to make sure that the state data present in Master node is well protected. This could be implemented by hosting Master node on a trusted IaaS infrastructure with redundant resource. Or - better - by hosting multiple Master nodes on a many Slave nodes located in different regions of the world thanks to appropriate data redundancy heuristic. We are touching here the first reflexive nature of SlapOS. A SlapOS master is normally a running instance of SlapOS Master software instanciated on a collection of Slave nodes which, together, form a trusted hosting infrastructure. In other terms, SlapOS is self-hosted.
Let us now review in more detail the role of the SlapOS master node. SlapOS keeps track of the identity of all parties which are involved in the process of requesting Cloud resources, accounting Cloud resources and billing Cloud resources. This includes end users (Person) and their company (Organisation). It includes suppliers of cloud resources as well as consumers of cloud resources. It also include so-called computer partitions which may run a software robot to request Cloud resources without human intervention. It also includes Slave nodes which need to request to SlapOS master which resources should be allocated. SlapOS generated X509 certificates for each type of identity: X509 certificates for people like you and me who login, an X509 certificate for each server which contributes to the resqources of SlapOS and an X509 for each running software instance which may need to request or notify SlapOS master. A SlapOS Master node with a single Slave node, a single user and 10 computer partitions will thus generate up to 12 X509 certificates: one for the slave, one for the user and 10 for computer partitions.
Any user, software or slave node with an X509 certificate may request resources to SlapOS Master node. SlapOS Master node plays here the same role as the backoffice of a marketplace. Each allocation request is recorded in SlapOS Master node as if it were a resource trading contract in which a resource consumer requests a given resource under certain conditions. The resource can be a NoSQL storage, a virtual machine, an ERP, etc. The conditions can include price, region (ex. China) or specific hardware (ex. 64 bit CPU). Conditions are somehow called Service Level Agreements (SLA) in other architectures but they are considered here rather as trading specifications that garantees. It is even possible to specify a given computer rather than relying on the automated marketplace logic of SlapOS Master.
By default, SlapOS Master acts as an automatic marketplace. Requests are processed by trying to find a Slave node which meets all conditions which were specified. SlapOS thus needs to know which resources are available at a given time, at which price and under which caracteristics. Last, SlapOS Master also needs to know which software can be installed on which Slave node and under which conditions.
SlapOS Slave nodes are pretty simple compared to the Master node. Every slave node needs to run software requested by the Master node. It is thus on the Slave nodes that software is installed. To save disk space, Slave nodes only install the software which they really need.
Each slave node is divided into a certain number of so-called computer partitions. One may view a computer partition as a lightweight secure container, based on Unix users and directories rather than on virtualization. A typical barebone PC can easily provide 100 computer partitions and can thus run 100 wordpress blogs or 100 e-commerce sites, each of which with its own independent database. A larger server can contain 200 to 500 computer partitions.
SlapOS approach of computer partitions was designed to reduce costs drastically compared to approaches based on a disk images and virtualization. And it does not prevent from running virtualization software inside a computer partition, which makes SlapOS at the same time cost efficient and compatible with legacy software.
SlapOS Slave software consists of a POSIX operating system, SlapGRID, supervisord and buildout.
SlapOS is designed to run on any operating system which supports GNU's glibc and supervisord. Such operating systems include for example GNU/Linux, FreeBSD, MacOS/X, Solaris, AIX, etc. We hope in the future that Microsoft Windows will also be supported as a host (Microsoft Windows is already supported as a guest) through glibc implementation on windows and a port of supervisord to Windows.
SlapOS relies on mature software: buildout and supervisord. Both software are controlled by SlapGRID, the only original software of SlapOS. SlapGRID acts as a glue between SlapOS Master node (ERP5) and both buildout and supervisord. SlapGRID requests to SlapOS Master Node which software should be installed and executed. SlapGRID uses buildout to install software and supervisord to start and stop software processes. SlapGRID also collects accounting data produced by each running software and sends it back to SlapOS Master. Let us now study with more detail the role of supervisord and buildout.
supervisord is a process control daemon. It can be used to programmatically start and stop processes with different users, handle their output, their log files, their errors, etc. It is a kind of much improved init.d which can be remotely controlled. supervisord is lightweight and old enough to be really mature (ie. no memory leaks).
Quoting the Buildout website, "Buildout is a Python-based build system for creating, assembling and deploying applications from multiple parts, some of which may be non-Python-based. It lets you create a buildout configuration and reproduce the same software later.". Buildout originated from the Zope/Plone community to automate deployment of customized instances of their software. Lead by Jim Fulton, CTO of Zope Corporation, Buildout became a stable and mature product over the years.
Buildout is used in SlapOS to define which software must be executed on a Slave Node. It has a key role in SlapOS industrial successes. Without it, SlapOS could not exist. However, buildout is also often misunderstood - sometimes purposely - by observers who criticize its use in SlapOS. Many people still do not realize that there is no possible software standard on the Cloud and that buildout is the solution to this impossibility. Experts know for example that any large scale production system which is operated on the Cloud (ex. a social network system) or privately (ex. a banking software) uses patched software. Relational databases are patched to meet performance requirements of given applications as soon as data grows. If a Cloud operating system does not provide the possibility to patch about any of its software components, it is simply unusable for large scale production applications. SlapOS is usable because its definition of what is a software is based on the possibility of patching any dependent software component.
Demostrate how works a master, using as example the www.vifib.net
Every computer partition consists of a dedicated IPv6 address, a dedicated local IPv4 address, a dedicated tap interface (slaptapN), a dedicated user (slapuserN) and a dedicated directory (/srv/slapgrid/slappartN). Optionnaly, a dedicated block device and routable IPv4 address can be defined.
SlapOS is usually configured to use IPv6 addresses. Although use of IPv6 is not a requirement (an IPv4 only SlapOS deployment is possible) it is a strong recommendation. IPv6 simplifies greatly the deployment of SlapOS either for public Cloud applications or for private Cloud applications. In the case of public Clouds, use of IPv6 helps interconnecting SlapOS Slave Nodes hosted at home without having to setup tunnels or complex port redirections. In the case of private Cloud, IPv6 replaces existing corporate tunnels with a more resilient protocol which provides also a wider and flat corporate addressing space. IPv6 addressing helps allocating hundreds of IPv6 addresses on a single server. Each running process can thus be attached to a different IPv6 address, without having to change its default port settings. Accounting network traffic per computer partition is simplified. All this would of course be possible with IPv4 or through VPNs but it would be much more difficult or less resilient. The exhaustion of IPv4 adresses prevents in practice allocation some many public IPv4 addresses to a single computer. After one year of experimentation with IPv6 in France, using Free IPv6 native Internet access (more than 50% of worldwide IPv6 traffic), we found that IPv6 is simple to use and creates the condition for many innovations which would else be impossible.
Even though IPv6 is used to interconnect processes globally on a SlapOS public or private Cloud, we found that most existing software is incompatible with IPv6. Reasons varry. Sometimes, IP addresses are stored in a structure of 3 integers, which is incompatible with IPv6. Sometimes, IPv6 URLs are not recognized since only dot is recognized as a separator in IP addresses. For this reason, we provide to each computer partition a dedicated, local, non routable IPv4 address. Legacy software listens on this IPv4 address. A kind of proxy mechanism is then used to create a bridge between IPv6 and IPv4. In the case of HTTP applications, Apache usually plays this role, in addition to the role of applicative firewall (mod_security) and stong security (TLS). In the case of other protocols, we usually use stunnel for the same purpose. We will discuss this approach in the next chapter and study in particular how stunnel can turn an legacy application into an IPv6 compatible application without changing any line of the original code.
For some applications, IP is not the appropriate ISO level. We provide to such applications a tap interface which emulates a physical Ethernet interface. This interface is usually bridged with one of the servers' physical Ethernet interfaces. tap is often used by virtualization software such as kvm to provide access to the outer world network. This is for example how the default kvm implementation of SlapOS is configured. But it could also be used for other applications such as virtual private networks or virtual switches which require a direct access to Ethernet. In a Computer with 100 computer partitions, tap interfaces are named usually slaptap0, slaptap1, etc. until slaptap99.
Every computer partition is linked to a user and a directory. In a Computer with 100 computer partitions, users are named usually slapuser0, slapuser1, etc. until slapuser99. Directories are usually set to /srv/slapgrid/slappart0, /srv/slapgrid/slappart1, etc. until /srv/slapgrid/slappart99. Directory /srv/slapgrid/slappart0 is owned by user slapuser0 and by group slapuser0. Directory /srv/slapgrid/slappart1 is owned by user slapuser1 and by group slapuser1. slapuser0 is not able to access files in /srv/slapgrid/slappart0. slapuser1 is not able to access files in /srv/slapgrid/slappart0. Moreover tap interface slaptap0 is owned by slapuser0, tap interface slaptap1 is owned by slapuser1, etc. Q: what about IPv6 individual adresses, who own them ?
For some applications, it could be necessary to attach to some computer partitions a raw block device. This could be useful to maximize disk I/O performance under certain configurations of kvm, and to acces directly a physical partition of an SSH disk. This possibility has been included in the design of SlapOS, although it is not yet fully implemented.
For some applications, such as providing a shared front-end and accelerated cache, a dedicated IPv4 address is required. This possibility has been included in the design of SlapOS, although it is not yet fully tested (but it should be before Q3 2011).
To summarize security, a Computer Partition is configured to have no access to any information of another Computer Partition. Access rights in SlapOS have thus 3 different levels: global access, computer partition only access and superuser only access. SlapOS slave nodes are normally configured in such a way that global hardware status has global access right. Installing a monitoring software is thus possible without further customization. Every software running in a computer partition has access to all files of the computer partition, owned by the same user. Software running in a computer partition has no possibility to access or modidy files owned by the superuser. As a general design rule, we refuse to grant any superuser privilege to applications or computer partitions. Only SlapGRID and supervisord are executed with superuser privilege.
It is time to demonstrate how a Slave Node parts are placed into a Linux distribution, divided by the followed parts:
Show the buildouts .cfg created by slapos during the process of allocation and deploy.
Demostratrate which process are controled by supervisord and the Network Interfaces created for each Computer Partition.
Demostrate example of usages on slapconsole and vifib.net, requesting to Deploy a Software Release into a Slave Node and also how to request an Software Instance
For more information, please contact Jean-Paul, CEO of Nexedi (+33 629 02 44 25).