Get started with SlapOS, the Distributed Cloud System Architecture

Get started with SlapOS

the Distributed Cloud System Architecture

Agenda

Inheritance: the Desktop Grid (DG) Paradigm
SlapOS components and concepts

Masters and Slaves
Computer Partitions
Networking

Full example
Conclusion

This tutorial has 4 parts. In the first part we explain the concept of desktop grid because SlapOS view is, in some part, very close the concept and architecture of volunteer computing. In the second part we explain the different components: master and slave nodes in SlapOS, computer partition, and we explain the SlapOS approaches for networking.

The DG paradigm (1st generation)

Desktop grids have been popularized by the Seti@home (from Berkeley) more than 10 years ago. Currently the largest distributed computing effort with over 3 million users. Participants run a program that downloads and analyzes radio telescope images for the Search for Extraterrestrial Intelligence (SETI) .

DG Architecture

As you can see on the picture, DG is a Federation of thousand of nodes with Internet as the communication layer and PC as the compute nodes. The first generation was said to be 'monolitic' because you can run only one application and you cannot change the scheduler for instance (or any other pieces). Note also that DG are characterized by Volatility of nodes, local IP, Firewall problems.

The DG paradigm (2nd generation)

The 2nd generation is an effort to have an architecture lessmonolitic.

DG Architecture

(images: fci@lri.fr)

On the picture you can see here that the scheduler can be pluged-in: it is a component that can be replaced. The storage and the transport protocol can also be pluged. Moreover, we have direct communication between peers; People also implemented Security (including result certification because computation are done on PCs); At least, applications are coming from any e-Science domain.

The DG paradigm (3rd generation)

The 3rd generation (2006) was to have a fully distributed architecture with no more any central element. The current generation is probably in search for revisiting the concept in using Web 2.0 tools and to "push" the concept at the level of the Tablets and Smartphones. In this case, we have to revisit the interactions between entities and in terms of Web 2.0 concepts.

BonjourGrid Architecture

(Cérin, Abbes, Jemni - 2009)

The picture is about Bonjourgrid (Cerin, Abbes, Jemni) which is not a new DG middleware but a ccordination protocol able to coordinate multiple instances of DG middleware (Boinc, Condor, XtremWeb). As you can see, we have different "Computing Elements" running concurrently. A Computing element is a master (in red) + slaves (in green). The BonjourGrid protocol allows any slave to participate for any master. It is a vision of multiple servers that are coordinated in a distributed way. BonjourGrid incorporates mecanisms for the fault of a master... but it is out of the scope of this tutorial.

SlapOS Cloud Technology

SlapOS is a cloud technology with no virtualization (VMware, VM...) and no data centers (data are spread over “clients”)

Everything is a Process ...

SlapOS has the motto "Everything is a Process". It resume the key philosophy to unify SaaS, PaaS and IaaS. SlapOS is able to deploy, start, stop, monitor and destroy a process or a group of processes from some software. SlapOS is also able to measure the resource consumption of the process (Disk, CPU, Memory) and produce report of usage, for billing or scientify purpose.

SlapOS and Desktop Grid Platform

In desktop Grid computing, we manage computation. Networking issues (bandwidth), data storage (input/output data) are the key points.

SlapOS as a generic tool...

SlapOS does not manage (data/compute)- intensive computing applications... but the framework is rich enough to: implement a dedicated data infrastructure (kumofs, Neo, S3...), implement a billing system, plug a reservation system, etc...

Master and Slaves

SlapOS is based on a Master and Slave design. We are going to provide here an overview of SlapOS architecture. We are going in particular to explain the role of Master node and Slave nodes, as well as the software components which they rely on to operate a Distributed Cloud.

Master and Slaves

Slave nodes request to Master nodes which software they should install, which software they show run and report to Master node how much resources each running software has been using for a certain period of time. Master nodes keeps track of available slave node capacity and available software. Master node also acts as a Web portal and Web service so that end users and software bots can request software instances which are instantiated and run on Slave nodes.

Master nodes are stateful. Slave nodes are stateless. More precisely, all information required to rebuild a Slave node is stored in the Master node. This may include the URL of a backup service which keeps a online copy of data so that in case of failure of a Slave node, a replacement Slave node can be rebuilt with the same data.

It is thus very important to make sure that the state data present in Master node is well protected. This could be implemented by hosting Master node on a trusted IaaS infrastructure with redundant resource. Or - better - by hosting multiple Master nodes on a many Slave nodes located in different regions of the world thanks to appropriate data redundancy heuristic.

We are touching here the first reflexive nature of SlapOS. A SlapOS master is normally a running instance of SlapOS Master software instanciated on a collection of Slave nodes which, together, form a trusted hosting infrastructure. In other terms, SlapOS is self-hosted.

Master Node

Let us now review in more detail the role of the SlapOS master node. SlapOS keeps track of the identity of all parties which are involved in the process of requesting Cloud resources, accounting Cloud resources and billing Cloud resources. This includes end users (Person) and their company (Organisation). It includes suppliers of cloud resources as well as consumers of cloud resources. It also include so-called computer partitions which may run a software robot to request Cloud resources without human intervention. It also includes Slave nodes which need to request to SlapOS master which resources should be allocated. SlapOS generated X509 certificates for each type of identity: X509 certificates for people like you and me who login, an X509 certificate for each server which contributes to the resqources of SlapOS and an X509 for each running software instance which may need to request or notify SlapOS master. A SlapOS Master node with a single Slave node, a single user and 10 computer partitions will thus generate up to 12 X509 certificates: one for the slave, one for the user and 10 for computer partitions.

Any user, software or slave node with an X509 certificate may request resources to SlapOS Master node. SlapOS Master node plays here the same role as the backoffice of a marketplace. Each allocation request is recorded in SlapOS Master node as if it were a resource trading contract in which a resource consumer requests a given resource under certain conditions. The resource can be a NoSQL storage, a virtual machine, an ERP, etc. The conditions can include price, region (ex. China) or specific hardware (ex. 64 bit CPU). Conditions are somehow called Service Level Agreements (SLA) in other architectures but they are considered here rather as trading specifications that garantees. It is even possible to specify a given computer rather than relying on the automated marketplace logic of SlapOS Master.

By default, SlapOS Master acts as an automatic marketplace. Requests are processed by trying to find a Slave node which meets all conditions which were specified. SlapOS thus needs to know which resources are available at a given time, at which price and under which caracteristics.

Last, SlapOS Master also needs to know which software can be installed on which Slave node and under which conditions.

Slave Nodes

SlapOS Slave nodes are pretty simple compared to the Master node.

Every slave node needs to run software requested by the Master node. It is thus on the Slave nodes that software is installed. To save disk space, Slave nodes only install the software which they really need.

Each slave node is divided into a certain number of so-called computer partitions. One may view a computer partition as a lightweight secure container, based on Unix users and directories rather than on virtualization. A typical barebone PC can easily provide 100 computer partitions and can thus run 100 wordpress blogs or 100 e-commerce sites, each of which with its own independent database. A larger server can contain 200 to 500 computer partitions.

SlapOS approach of computer partitions was designed to reduce costs drastically compared to approaches based on a disk images and virtualization. And it does not prevent from running virtualization software inside a computer partition, which makes SlapOS at the same time cost efficient and compatible with legacy software.

Master Software

The reference implementation of SlapOS Master node is based on ERP5. SlapOS Master node is actually derived from ERP5 implementation for a Central Bank. The underlying idea is that currency clearing and cloud resource clearing are very similar. They should thus be implemented with the same software. Since ERP5 was already implemented to run a Central Bank in 8 countries, it was a natural choice. Moreover, ERP5 has demonstrated its scalability for large CRM applications (ex. Beteireflow) and its trustability for accounting. Thanks to NEOPPOD, its distributed NoSQL database, ERP5 can provide the kind of transactional nature and scalability which is required for a stateful marketplace.

Implementing SlapOS Master on top of ERP5 was a direct application of ERP5 Universal Business Model (UBM) technology, an model which unifies all sciences of management and which has been acknowledged by numerous IEEE publications as a major shift in enterprise application design. Each Computer is represented by an Item in UBM. Allocation requests, resource deliveries and resource accounting are represented by a Movement in UBM. The movement resource can be: software hosting, CPU usage, disk usage, network usage, RAM usage, login usage, etc. software hosting movement start whenever the running software starts in the computer partition and stop whenever the running software stops. Resource usage movements start and stop for accounting each period of time, independently of the software running state. The software release which is run on the computer partition is also an Item in UBM, just like to subscription contract identifier. The parties (client, supplier) are represented as Node in UBM. More surprisingly, each Network is considered also as a Node in UBM, just like a storage cell is represented as a Node in logistics.

Slave Software

SlapOS Slave software consists of a POSIX operating system, SlapGRID, supervisord and buildout.

SlapOS is designed to run on any operating system which supports GNU's glibc and supervisord. Such operating systems include for example GNU/Linux, FreeBSD, MacOS/X, Solaris, AIX, etc. We hope in the future that Microsoft Windows will also be supported as a host (Microsoft Windows is already supported as a guest) through glibc implementation on windows and a port of supervisord to Windows.

SlapOS relies on mature software: buildout and supervisord. Both software are controlled by SlapGRID, the only original software of SlapOS. SlapGRID acts as a glue between SlapOS Master node (ERP5) and both buildout and supervisord. SlapGRID requests to SlapOS Master Node which software should be installed and executed. SlapGRID uses buildout to install software and supervisord to start and stop software processes. SlapGRID also collects accounting data produced by each running software and sends it back to SlapOS Master. Let us now study with more detail the role of supervisord and buildout.

supervisord is a process control daemon. It can be used to programmatically start and stop processes with different users, handle their output, their log files, their errors, etc. It is a kind of much improved init.d which can be remotely controlled. supervisord is lightweight and old enough to be really mature (ie. no memory leaks).

Quoting the Buildout website, "Buildout is a Python-based build system for creating, assembling and deploying applications from multiple parts, some of which may be non-Python-based. It lets you create a buildout configuration and reproduce the same software later.". Buildout originated from the Zope/Plone community to automate deployment of customized instances of their software. Lead by Jim Fulton, CTO of Zope Corporation, Buildout became a stable and mature product over the years.

Buildout is used in SlapOS to define which software must be executed on a Slave Node. It has a key role in SlapOS industrial successes. Without it, SlapOS could not exist. However, buildout is also often misunderstood - sometimes purposely - by observers who criticize its use in SlapOS. Many people still do not realize that there is no possible software standard on the Cloud and that buildout is the solution to this impossibility. Experts know for example that any large scale production system which is operated on the Cloud (ex. a social network system) or privately (ex. a banking software) uses patched software. Relational databases are patched to meet performance requirements of given applications as soon as data grows. If a Cloud operating system does not provide the possibility to patch about any of its software components, it is simply unusable for large scale production applications. SlapOS is usable because its definition of what is a software is based on the possibility of patching any dependent software component.

Where is my patch?

Still people who name a software such as "kvm" or "MySQL" believe that this is enough (and for them, SlapOS provides aliases for the words "kvm" and "mysql" which link to an explicit buildout definition). However, the reality is not that straightforward. For example some releases of kvm support nbd protocol over IPv6 and some not. Some releases of kvm support sheepdog distributed block storage and some not. Some releases of kvm support CEPH distributed block storage and some not. Most users who run kvm to try a software do not care about IPv6, sheepdog or CEPH. But those users who run kvm on SlapOS need IPv6 support to access NBD and this is for now only available as patch. Those who want resilient storage may want sheepdog support which is only available from version 0.13. And those who want CEPH support also need a patch. Those who want the IPv6 patch may prefer not to use the CEPH patch which is not yet stable officially. And those who want CEPH patch may distrust the IPv6 patch. All-in-all, there is no way to agree on a single version of kvm. All the different releases of kvm may have to be installed on SlapOS Slave nodes in order to meet market requirements. Since the patch possibilities are so wide, the easiest way to know afterall which kvm is being installed on a SlapOS node is simply to list where its original source code was obtained from and which patches were applied. This is exactly what buildout does, in just a few lines of configuration. Buildout also eliminates any complex or time consuming process to distribute binary packages on a wide range of hardware architecture thanks to a trusted, distributed, caching mechanism which does not even centralize signature.

The problem we are discussing here with kvm is even more complex with MySQL. There are now multiple sources of MySQL: the official one (MySQL), the one by MySQL original author Michael Widenius (MariaDB), the one by Percona InnoDB experts and the one by Cubrid which is not MySQL but claims to be 90% compatible with it. Among each source of MySQL sources, there are different versions. Default compilation options may also differ. Authors of large scalable applications know very well that the performance of their application can be dramatically impacted by subtle changes to the SQL optimizer. Changing the version of source of MySQL may simply lead to a performance collapse. We always remember an example of application for which we had to change the default parameters in MySQL header file in order to scan 32 rows instead of 8 for query optimization. Therefore, if we did not have the possibility to choose which source of MySQL to use and which patch to apply to it, we just could not have run enterprise applications with SlapOS and show industrial success stories.

Arguments and counter-arguments against Buildout

The use of buildout by SlapOS is disruptive compared to traditional approaches of software distribution. It has enabled industrial success faster. But it also has lead to slower adoption of SlapOS by certain communities, often for incorrect rationale. We are going to discuss further.

What about disk images?

Some people consider that buildout is irrelevant since Cloud should be based on disk images and virtual machines. What those people do not realize is that not only SlapOS can run about any disk image format but that buildout can be used to automate the production of disk images, much better probably than many other tools. And it is open source.

What about distributions' packaging systems?

Some people consider that buildout is irrelevant since it is possible to achieve the same with packaging systems of GNU/Linux distributions. What they do not realize is that not only buildout can rely on existing GNU/Linux distribution packages (at the expense of portability) but that buildout can also be used to automate the production of packages for multiple GNU/Linux distributions in little effort. Also, buildout format is much more concise when it comes to patching or adding dependencies to existing software thanks to the "extends" mechanism. Last, buildout provides a kind of packaging format which can reuse language based packaging formats (eggs, gems, CPAN, etc.) in a way which is neither specific to a given GNU/Linux distribution nor to GNU/Linux itself. In a sense, buildout integrates much better with native language distribution systems than GNU/Linux packaging systems do. And native language distribution systems are currently becoming the de facto standard for developers.

What about separation between software and instance?

Some people consider that buildout prevents sharing the same executable among multiple instances of the same application. This is a common misconception, which is also wrong. SlapOS is a typical example of how to deploy once single software made of shared libraries and executable binaries and create hundred instances of it without any binary code duplication, without wasting resident RAM.

I need something that is language agnostic

Some people consider that buildout is designed for python only. What they do not realize is that buildout is already used to build software based on C, C++, Java, Perl, Ruby, etc. And it would not be an issue to extend SlapOS and support any buildout equivalent. But we are not aware of any system builder such as buildout which can support as many different architectures and languages in such a flexible way.

Come on, I'm on Windows

Some people consider that buildout is not for Windows or that it does not support proprietary software in binary form, without source code. Again, this is a misconception. Buildout is just an automation tool. Whenever source code is not available, buildout can take a binary file as input. This is what is often done for example to build Java applications based on .war distribution archives, or to deploy openoffice binaries which would else take 24 hours to compile. Buildout is also compatible with Windows. Automating the installation or the replication of Windows based software with buildout is possible. Buildout would even be an excellent candidate to automate the conversion of Windows disk images from one host environment to another. Generally speaking, running SlapOS natively on Windows could be very useful both for SlapOS and... for Windows.

It destroys the work made by GNU/Linux distributions

Overall, what makes buildout so debated by some observers is that it shows a different path for software distribution, especially for open source software distribution. Instead of focusing - as GNU/Linux distributions do - on providing a consistent set of about any possible open source application with perfectly resolved dependencies and maximized sharing of libraries, it focuses on building a single application only and its dependencies in a way which maximizes the portability between different GNU/Linux distributions and POSIX compliant operating systems. Application developers only need to care about their own application and stabilize its distribution. Unlike what happens with most GNU/Linux distributions, they do not need to care about possible consequences of changing one shared library on other applications hosted on the same operating system. Buildout is after all an approach to software distribution in which the most complex software has about 100 dependencies to resolve, compared to 10,000+ interdependent packages in a traditional GNU/Linux distribution. Buildout puts the burden of maintenance on each application packager and removes the burden of managing global dependencies, thus allowing parallel and faster release cycles for every application. All this with a very concise approach.

Not convinced yet?

If this discussion does not make you convinced yet that buildout is an efficient solution to specify a software executable and deploy it on the Cloud, please consider the following problem to solve: automate the packaging of ERP5 open source ERP and all its dependencies (OpenOffice, patched Zope, patched MariaDB, etc.) on all major GNU/Linux distributions in such a way that it is possible to provide the same behavior on every GNU/Linux distribution and to run 100 instances of ERP5 on the same server, each of which can have its own MariadDB daemon and Zope daemon. Obviously, if you find a better solution, please let us know.

SLAP Protocol

SlapOS is based on the SLAP protocol. Both SlapOS Master reference implementation based on ERP5 and SlapGRID reference implementation in python could be replaced. An implementation of the SLAP protocol was for example already made in Java on the client side in a few days. Implementing SLAP for about every language should be just as easy.

The SLAP protocol is a polling protocol. Every SlapOS Slave Node contacts through HTTP SlapOS Master Node for 4 different purpose: to define capacity, to collect the list of software to install, to collect the list of computer partitions to configure and to post accounting information.

At boot time, each Slave Node contacts SlapOS Master node to notify it that the boot process was completed and provides a list of available computer partitions, in particular their identifier and IPv6 address. This is the set-capacity request. This request is then launched again every 24 hours in order to take into account possible changes of network configuration, which normally should not happen but which sometimes do.

Every 5 minutes, SlapOS Slave node requests the list of software which should be installed. As for most parts of SLAP protocol, the values which are exchanged are promises to reach, not actions to take. SlapOS Master thus returns the complete list of software which are expected to be installed by the Slave node, not taking into account whether such software was installed or not. Reversely, if a software which was installed is no longer in the list, it implies that it should be removed. Just remember, SlapOS Slave Nodes are supposed to be stateless, just as the SLAP protocol.

Every 5 minutes, SlapOS Slave node requests the list of computer partitions to configure. This is handled by a different process. The underlying idea is that installing a software could take between a couple of minutes (if it was already compiled and cached for the same architecture) to a couple of hours (if it needs to be compiled for the architecture). Configuring an instance should take on the other hand less than a couple of seconds and ideally less than a second. Each time SlapOS Slave node requests the list of computer partitions, this will eventually lead to the reconfiguration of all partitions. A large server could contain 300 partitions. If the configuration of a single partition takes one second, it takes 5 minutes to reconfigure all partitions. Obviously, SlapGRID tries to optimize partition configuration and will only reconfigure those partitions which configuration has changed since the last run. But, in case of incident, such as an earthquake or electricity shortage in a region, it is possible that all computer partitions of a given server need to be reconfigured at the same time, even though this is not desireable. In order to make sure that such massive reconfiguration do not lead to system collapse, we have taken the decision design to run configuration with a single process and a single thread, so that most cores of the host server are still available for running what they are actually supposed to run, instead of running configuration software.

Every day, accounting information is collected from every computer partition. It is the role of the software instance running in the computer partition to produce a file which contains usage and incident reports in TioXML format. All files are aggregated and posted to SlapOS Master which then uses them for further accounting and billing. One should take note that the accounting information which is exchanged is very abstract and can cover both physical usage (ex. CPU, RAM, disk), virtual usage (ex. number of users, number of transactions) and incidents (ex. failure to access data for 5 minutes). TioXML format is easy to extend in order to cover about any possible billing requirement.

We are currently considering to extend the get-cp-list request with HTTP long polling or Web Sockets in order to make the system more reactive and at the same poll SlapOS Master less often. For now, it is not a priority. The goal of SLAP protocol will probably never consist of instantly providing a Cloud resource. For instant provisioning, we rather recommend a predictive pre-allocation approach. Rather than allocating on demand, one should pre-allocate based on previsions or for safety and simple pass to the requester the pre-allocated resource. We even think that slowing down the provisioning of resources is a good approach to reduce the risk of speculation on the availability of Cloud resources and thus an efficient way to increase Cloud Resilience. Further research combining Computer Science and Economy could eventually prove or infirm our assertion. Anyway, we think that more scalability could be reached through an HTTP-based push protocol. It remains to be seen how well such a protocol can resist to frequent network interruptions over intercontinental Internet transit routes.

Master Software at Slave Node

The Master Software can be deployed as Slave Node. This permit that any company or institution have their own Master.

Computer Partitions

The concept of Computer Partition is fundamental to understand the structure of a SlapOS Slave Node. A Computer Partition can be seen as a lightweight container or jail. It provides a reasonable level of isolation, based on the host operating system user and group management. It does not provide however the same level of isolation as the one which exists between virtual machines, unless of course computer partitions are used to run virtualization software, something SlapOS can do. We came with the idea of computer partition after trying other approaches. Arround 2004, we started using chrooted filesystems and linux-vserver jails. We also tried to run virtual machines on the same server hardware. We found that both linux-vserver jails and virtual machines required to maitain one complete filesystem per instance of application. This generated much additional effort compared to having to maintain only one filesystem. Also it was impossible to run hundreds of filesystems or virtual machines on the same host because of the huge overhead of each filesystem and virtual machine. This meant that reaching low cost hosting for standard open source applications was close to impossible with this approach. We then discovered buildout and found that it was possible to split buildout in two independent profiles: one profile to build the software in a self contained way and one profile to configuration files in a directory with links to a shared software directory. The concept of Computer Partition was created. Thanks to this concept, it is now possible to reach a hosting cost of less than 1 EUR / month per hosted application. Competition with Cloud monopolies becomes possible for all independent software vendors.

Let us now review the details of a Computer Partition.

Computer Partition N

dedicated global IPv6
dedicated local IPv4
dedicated slaptapN
dedicated slapuserN
/srv/slapgrid/slappartN
optional /dev/sdaX and IPv4

Every computer partition consists of a dedicated IPv6 address, a dedicated local IPv4 address, a dedicated tap interface (slaptapN), a dedicated user (slapuserN) and a dedicated directory (/srv/slapgrid/slappartN). Optionnaly, a dedicated block device and routable IPv4 address can be defined.

SlapOS is usually configured to use IPv6 addresses. Although use of IPv6 is not a requirement (an IPv4 only SlapOS deployment is possible) it is a strong recommendation. IPv6 simplifies greatly the deployment of SlapOS either for public Cloud applications or for private Cloud applications. In the case of public Clouds, use of IPv6 helps interconnecting SlapOS Slave Nodes hosted at home without having to setup tunnels or complex port redirections. In the case of private Cloud, IPv6 replaces existing corporate tunnels with a more resilient protocol which provides also a wider and flat corporate addressing space. IPv6 addressing helps allocating hundreds of IPv6 addresses on a single server. Each running process can thus be attached to a different IPv6 address, without having to change its default port settings. Accounting network traffic per computer partition is simplified. All this would of course be possible with IPv4 or through VPNs but it would be much more difficult or less resilient. The exhaustion of IPv4 adresses prevents in practice allocation some many public IPv4 addresses to a single computer. After one year of experimentation with IPv6 in France, using Free IPv6 native Internet access (more than 50% of worldwide IPv6 traffic), we found that IPv6 is simple to use and creates the condition for many innovations which would else be impossible.

Even though IPv6 is used to interconnect processes globally on a SlapOS public or private Cloud, we found that most existing software is incompatible with IPv6. Reasons varry. Sometimes, IP addresses are stored in a structure of 3 integers, which is incompatible with IPv6. Sometimes, IPv6 URLs are not recognized since only dot is recognized as a separator in IP addresses. For this reason, we provide to each computer partition a dedicated, local, non routable IPv4 address. Legacy software listens on this IPv4 address. A kind of proxy mechanism is then used to create a bridge between IPv6 and IPv4. In the case of HTTP applications, Apache usually plays this role, in addition to the role of applicative firewall (mod_security) and stong security (TLS). In the case of other protocols, we usually use stunnel for the same purpose. We will discuss this approach in the next chapter and study in particular how stunnel can turn an legacy application into an IPv6 compatible application without changing any line of the original code.

For some applications, IP is not the appropriate ISO level. We provide to such applications a tap interface which emulates a physical Ethernet interface. This interface is usually bridged with one of the servers' physical Ethernet interfaces. tap is often used by virtualization software such as kvm to provide access to the outer world network. This is for example how the default kvm implementation of SlapOS is configured. But it could also be used for other applications such as virtual private networks or virtual switches which require a direct access to Ethernet. In a Computer with 100 computer partitions, tap interfaces are named usually slaptap0, slaptap1, etc. until slaptap99.

Every computer partition is linked to a user and a directory. In a Computer with 100 computer partitions, users are named usually slapuser0, slapuser1, etc. until slapuser99. Directories are usually set to /srv/slapgrid/slappart0, /srv/slapgrid/slappart1, etc. until /srv/slapgrid/slappart99. Directory /srv/slapgrid/slappart0 is owned by user slapuser0 and by group slapuser0. Directory /srv/slapgrid/slappart1 is owned by user slapuser1 and by group slapuser1. slapuser0 is not able to access files in /srv/slapgrid/slappart0. slapuser1 is not able to access files in /srv/slapgrid/slappart0. Moreover tap interface slaptap0 is owned by slapuser0, tap interface slaptap1 is owned by slapuser1, etc. Q: what about IPv6 individual adresses, who own them ?

For some applications, it could be necessary to attach to some computer partitions a raw block device. This could be useful to maximize disk I/O performance under certain configurations of kvm, and to acces directly a physical partition of an SSH disk. This possibility has been included in the design of SlapOS, although it is not yet fully implemented.

For some applications, such as providing a shared front-end and accelerated cache, a dedicated IPv4 address is required. This possibility has been included in the design of SlapOS, although it is not yet fully tested (but it should be before Q3 2011).

To summarize security, a Computer Partition is configured to have no access to any information of another Computer Partition. Access rights in SlapOS have thus 3 different levels: global access, computer partition only access and superuser only access. SlapOS slave nodes are normally configured in such a way that global hardware status has global access right. Installing a monitoring software is thus possible without further customization. Every software running in a computer partition has access to all files of the computer partition, owned by the same user. Software running in a computer partition has no possibility to access or modify files owned by the superuser. As a general design rule, we refuse to grant any superuser privilege to applications or computer partitions. Only SlapGRID and supervisord are executed with superuser privilege.

Computer Partition N

Process(N, 0)
Process(N, 1)
...
Process(N, q)

A single computer partition is intended to host a single elementary application such as a database, an application server or a test runner. Yet, multiple UNIX processes maybe required for this purpose. If we consider the case of a Zope Web application server, two processes at least are allocated. One process for Apache acts as secure applicative firewall (mod_security + mod_ssl). Another process is the Zope application server itself. In the case of a database, one process is the database itself and another process is stunnel application which maps IPv6 ports to local IPv4 ports.

The number of processes is even higher for applications.Running ERP5 requires not less than 12 processes: backend_apache, certificate_authority, conversion_server, crond, erp5_update, kumo_gateway, kumo_manager, kumo_server, memcached, mysql_update, mysqld, zope_1. In this case, the computer partition acts as a one place fits all container for ERP5 and all its dependencies. A similar approach would be followed for any shrinked wrapped applications, including Apache/PHP/MySQL applications. This is acceptable since the concept of "elementary" still relates to the idea that only one instance of the application is launched and that, most of the time, is not used. Multiple computer partitions can thus be allocated on a single computer. However, this approach does not consider the possibility to scale up.

Some users even use a single computer partition to run multiple instances of the same application server. Computer partition is no longer elementary in this case. It acts as a mini-cluster and ends up consuming all resources of a computer. We are no longer in the original intention of elementary usage. This kills both the scalability of the application and the possibility to optimize resources in SlapOS through fine grained resource allocation.

SlapOS Networking

It is a design choice of SlapOS to consider that the only commonality between nodes of a distributed Cloud is IP and that there is no possibility to rely in network management services such as BGP to implement value added networking. SlapOS networking is thus based on flat IP addressing model. There is no notion of virtual local area network (VLAN) at the core of SlapOS. There is no notion of quality of service at the core of SlapOS. There is no encryption and no security at the core of SlapOS. It is the role of applications to implement such concepts by allocating appropriate ressources and encapsulating them into insecure and unpredictable IP transit.

It would be an interesting research topic to discuss how to provide quality of service or virtual local area network management service on top of insecure and somehow unpredictable IP transit. We hope that someone will contribute to this research by implementing for example a complete Infrastructure as a Service (IaaS) stack on top of SlapOS with the idea to deploy over a collection of computers spread all over the world. This topic is however out of scope of SlapOS core design.

IPv6

The use of IPv6 is recommended in order to create a global, distributed, peer-to-peer, unencrypted network of intercommunicating processes with a single, flat, addressing space. In an ideal SlapOS implementation, all software instances allocated on computer partitions of Slave Nodes can communicate one each other through IPv6 connections. Some users, represented on the drawing with a laptop, access SlapOS processes using IPv6 directly. This is the case of developers who need to access processes directly without a front end. Most legacy users however access SlapOS applications processes through IPv4 and application front-ends. Application front-ends are thus allocated both on IPv4 and IPv6 on special computer partitions with a dual IPv4 and IPv6 addressing.

The use of IPv6 is sometimes questionned by observers. For end users, IPv4 front ends provide access to the IPv6 backend. The use of IPv6 is thus transparent. On the other hand, any reasonable developer is able nowadays to setup an IPv6 tunnel using miredo for example or to setup an IPv6 tunnel through tunnel brokers such as Hurricane Electric. Until now, we have been able to implement IPv6 access in about any condition: on mobile 3G connections, on home ADSL, in a university in China, etc. In the worst case, we simply connect through IPv4 and HTTP to a remote virtual machine hosted on SlapOS and accessible through a front-end. We then use that virtual machine instead of our local machine.

Yet, some large organisations refuse to implement IPv6. In this case, IPv6 can be replaced by IPv4 in SlapOS as long as a VPN is deployed to provide a global, flat addressing space with enough available addresses. It should be possible to allocate 100 IPv4 addresses on each SlapOS Slave Node. Distributed VPN technologies such as tinc could eventually be integrated at the core of SlapOS to implement a margeIPv4 flat adressing space without sacrificing the key concept of distribution of resources which is at the core of SlapOS.

stunnel: Security and Legacy

The main problem with IPv6 is that it is poorly supported by most applications. Another problem with IPv6 is IPSEC. Althrough IPSEC is a beautiful technology, it is not easy to deploy it in a way which provides encryption and authentification on a per UNIX user base. It is also difficult to deploy in a completely decentralized way.

stunnel provides a solution to both problems. Whenever a secure communication is needed between two applications, an stunnel process is created at both ends. stunnel maps local IPv4 addresses to global IPv6 addreses and encrypts all communication. stunnel is also used to restrict access to a few X509 certificates.

stunnel is used for example in SlapOS to connect to MySQL database servers hosted on public IPv6 servers. MySQL client itself only supports partly IPv6 and does not encrypt connections. With tunnel, it is possible to access MySQL over IPv6 with encryption and possibly strong authentication. The same approach is used to access memcached servers. Memcached was originally designed for trustable Local Area Network (LAN). By encapsulating memcached protocol into stunnel, we can get both IPv6 support, encryption and authentication.

Generally speaking, we found while implementing SlapOS that most software components which are used by large Web infrastructure such as social networks, SaaS or search engines are designed for trustable environments and private clusters. Porting those applications to distributed Clouds and untrustable networks requires an additional effort to make the connection secure. Rather than using a centralized VPN approach, we found that stunnel could be used as a very efficient peer-to-peer VPN, and at the same time solve the IPv6 migration problem.

stunnel itself provides enough performance compared to available IP network transit bandwidth. According to stunnel authors, stunnel performance on a Core 2 Duo architecture can reach up to 600 Mbit/s(http://www.stunnel.org/?page=perf).

Data throughput	RC4-MD5	75MB/s	600Mbit/s
	AES128-SHA	55MB/s	440Mbit/s
	AES256-SHA	47MB/s	296Mbit/s
	DES-CBC3-SHA	15.5MB/s	124Mbit/s
New connections	without session cache (1024-bit RSA key)	Software	290 conn/s
	without session cache (1024-bit RSA key)	Hardware	estimated^* approx. 1 000 conn/s
	with session cache	2 150 conn/s
Max. concurrent sessions	Unix poll() / Win32	over 10 000
Max. concurrent sessions	Unix select()	500
Virtual memory usage	155KB/concurrent connection

Full Example

It is now time to discuss what to do next with SlapOS now that we have an idea of its architecture. As a general guideline, we recommend practical implementation and pratical use. Experimentation is a great source of ideas, of innovation and helps further understanding the strength and weaknesses of any system, including SlapOS.

Available Components

There is now days more than +140 components available to build your own Application Stack.

Live Demonstration

community.slapos.org/wiki

The Live Demonstration will be made based on Published tutorial on SlapOS Wiki. You can reproduce the tutorial and get Similar result on your Linux Machine.

Use Cases

SlapOS is still beta quality software as of June 30th 2001. Yet, it is no longer in its early development stage. SlapOS has reached a good level of stability thanks to a wide coverage by unit tests and functional tests. It is stable enough to be used on production systems and deploy mission critical ERP applications.

Use Cases Examples

Nexedi (Many, Production )
IFF (Cloudooo, Research)
Central Bank (ERP5, Testing)
Aerospace Company (ERP5, Production)
Transportation Company (ERP5, Production)
SANEF Tooling UK (ERP5, Production)
CompatibleOne (Many, Reseach)
ViFiB (Many, Beta)

SlapOS early success stories include the provisioning of virtual machines at Nexedi for developers who use a network PC instead of a laptop, unit testing of software, scalability testing of software. SlapOS was used to teach Cloud computing and PaaS to students at Paris 13 University. It was also used to deploy UNG Docs open source Web Office, teach scalability testing and teach ERPs at Instituto Federal Fluminense (IFF) in Brazil.

Four majour customers of ERP5 open source ERP have moved their production systems to SlapOS. A Central Bank has been using ERP5 for currency issuing and global banking since May 30th 2011. An aerospace company is finalizing tests of ERP5 over SlapOS for ERP/CRM applications. ERP5 has been implemented on SlapOS for SANEF UK tolling as well as in another highway tolling company.

Compatible ONE collaborative project uses SlapOS to run scalability tests on NoSQL storages and gather results. SlapOS implementations of sheepdog and kumofs were made for this purpose.

Last, VIFIB hosting company has started alpha testing of SlapOS for devlopers of SaaS and PaaS applications which may be based or not on IaaS. VIFIB supports various stacks such as LAMP, Java or ERP5. It provides Billing as a Service (BaaS) to any software publisher looking for the shortest time to market solution to offer their solutions on the Cloud.

Research Topics

Interesting Ideas

Consumption vs. Size
Allocation Algorithm
Resilience
Mobile Devices
Multi Master

Conclusion

It is time for me to conclude the tutorial

SlapOS: Distributed Cloud OS

The vision (close to DG)
The technologies
... and now you can provide a SaaS!

Business

SlapOS is an infrastructure for building “social networking communities” based on volunteers (mutual aid, “solidarity”)

Assumption: if the business model is not compliant with these ideas, all the work will fail

Why it may rise?

The infrastructure is the network and the “data centers” are at home => low cost

Kondo, Derrick et al.: Cost-benefit analysis of Cloud Computing versus desktop grids. In Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing. Washington, DC, USA: IEEE Computer Society, 2009

Note: a similar work comparing SlapOS costs and Business Cloud costs would be a nice idea; To be fair, we may compare french hosting providers (because they are known to charge at low cost) and SlapOS... and on a scientific basis (define metrics that make sense!). However, this research paper raises challenging issues for instance, Given these monetary cost-benefits, how do Volonteer Computing (VC) compare with Cloud platforms (EC2...)? Can cloud computing platforms be used in combinaison with VC platforms to improve cost-effectiveness even further? They found that the volunteer nodes needed to achieve the compute power of a small EC2 instance is about 2.83 active volunteers to 1 and that pay-per-use model of “EC2 like Clouds” should decrease by an “order of magnitude”. There results are encouraging!

Do you want to know more?

community.slapos.org/
community.slapos.org/wiki
Desktop Grid Computing
(Chapman and Hall/CRC Numerical Analysis and Scientific Computation Series)

Feel free to request more informations about SlapOS to our presenters

Rafael Monnerat works at Nexedi Brazil Director and core developer of ERP5 Project. Participated in several ERP Implementation and R&D; projects world wide. In addition to ERP5, Rafael is core developer of several others Open Source Cloud Computing Applications, like SlapOS, CloudOOo, UNG and Vifib. Co-Author of the book chapter "ERP5: Designing for Maximum Adaptability" (Beautiful Code), and few others book chapters and academic papers.

Christophe Cerin is a full time professor with Laboratoire d'Informatique de Paris Nord (LIPN), university of Paris 13. His research focuses on high performance parallel systems to develop efficient parallel libraries, to build fully distributed middleware (BonjourGrid and PastryGrid) for desktop grids. He is also studying the problems of memory management in thread libraries in heterogeneous environments for multicore machines. He is currently working on cloud technologies under the umbrella of the Resilience project which is a new project funded by the french ministry of industry (Sept 2011- Aug 2013). The resilience project includes the development of SlapOS that reuses some techniques inherited from Desktop Grid concepts.

You can also find more information about this topic at the Book Desktop Grid Computing (Chapman and Hall/CRC Numerical Analysis and Scientific Computation Series)