Analysis / Technology /

An Architectural View of Apache OpenWhisk

3 Feb 2017 7:35am, by

An open source project driven by IBM and Adobe, Apache OpenWhisk is a robust Functions-as-a-Service (FaaS) platform that can be deployed in the cloud or within the data center. When compared to other serverless projects, OpenWhisk is a robust, scalable platform designed to support thousands of concurrent triggers and invocations.

You can sign up for the hosted version of OpenWhisk running in Bluemix or deploy a Vagrant-based environment on your development machine. Refer to our previous coverage on getting started with OpenWhisk.

In this article, we’ll explore the design and architecture of OpenWhisk. We will go behind the scenes of the deployment to identify various components and their role.

To get the best out of this guide, it is recommended that you have a fully configured OpenWhisk environment based on Vagrant running in your machine. The CLI of OpenWhisk, wsk, should be configured to talk to the local setup.

Quick Recap of the Architecture

Apache OpenWhisk is designed to act as an asynchronous and loosely coupled execution environment that can run functions against external triggers. Developers write standalone functions that are uploaded as Actions, which is completely autonomous and independent of the event sources. They can be invoked as long as an event source passes the right set of parameters that are essential for the invocation.

Once the Actions are in place, developers can create Triggers, which are endpoints that are explicitly called by event sources such as databases, stream processing engines, file systems, and line-of-business applications. An Action is independent of a Trigger, which means that it may or may not have any Action bound to it. When an event source fires a Trigger, it has no knowledge of the Actions that may be invoked. The set of Actions bound to a Trigger is discovered and executed only at runtime.

But, how do developers bind Actions with Triggers? That’s where the Rules come into the play. They act as the glue between Triggers and Actions by creating a loosely coupled association between them. This design pattern enables the same Action to get invoked by different Triggers.

 

This loosely coupled architecture makes OpenWhisk a scalable, reliable, and robust serverless platform. Each layer involved in the execution is designed to scale independently.

Actions, Rules, and Triggers can be created and managed through REST endpoints. All that the event source needs to do to invoke an Action is to call the Trigger REST API.

The workflow illustrated below highlights the simple yet powerful mechanism of creation and invocation of code:

$ cat > hello.js << EOF

function main() {

    console.log('Hello World');

    return {msg: 'Hello World'};

}

EOF

$ wsk -i action create helloAction hello.js

ok: created action helloAction

$ wsk -i trigger create helloTrigger

ok: created trigger helloTrigger

$ wsk -i rule create helloRule helloTrigger helloAction

ok: created rule helloRule

$ wsk -i trigger fire helloTrigger

ok: triggered /guest/helloTrigger with id 350364de139547ba8c95113ef0908911

The Building Blocks of OpenWhisk

Let’s now take a closer look at the core components of OpenWhisk.

The following diagram depicts the high-level architecture of OpenWhisk. From Nginx to Kafka to Docker, multiple technologies are powering this serverless platform.

If you have access to the Vagrant box, SSH into it to check the running Docker containers and the images pulled by the system:

$ docker ps --format "{{.Names}} - {{.Image}}"

wsk0_425_warmJsContainer_20170202T014345371Z - whisk/nodejs6action:latest

wsk0_424_warmJsContainer_20170202T014153330Z - whisk/nodejs6action:latest

nginx - nginx:1.11

invoker0 - whisk/invoker:latest

controller - whisk/controller:latest

kafka - ches/kafka:0.10.0.1

zookeeper - zookeeper:3.4

registrator - gliderlabs/registrator

consul - consul:0.7.0

couchdb - couchdb:1.6

The first two containers represent the recently invoked Actions while the other containers directly map to the core components.

Let’s understand the role of each of these components.

Nginx

This open source web server exposes the public-facing HTTP(S) endpoint to the clients. It is primarily used as a reverse proxy for API and also for terminating SSL. Every request hitting the OpenWhisk infrastructure, including those originating from the wsk CLI go through this layer. Since it is entirely stateless, the Nginx layer can be easily scaled out.

Controller

After a request passes through the reverse proxy, it hits the Controller, which acts as the gatekeeper of the system. Written in Scala, this component is responsible for the actual implementation of the OpenWhisk API. It performs the authentication and authorization of every request before handing over the control to the next component. Think of this as an orchestrator of the system which will decide the path that the request will eventually take.

CouchDB

The state of the system is maintained and managed in CouchDB, an open source JSON data store. The credentials, metadata, namespaces, and the definitions of actions, triggers, and rules, are stored in CouchDB. The controller verifies the credentials against the ones stored in this database.

You can access the database from a browser by visiting http://192.168.33.13:5984/_utils/ (link works only on the developer machine)

The initial set of Actions, Triggers, and Rules that we created are visible at http://192.168.33.13:5984/_utils/database.html?vagrant_vagrant-ubuntu-trusty-64_whisks:

It is interesting to note that the source code of our helloAction is also persisted in CouchDB, which has the definition of the Action, default parameters, and the assigned quota of resources:

Each invocation in OpenWhisk results in an Activation that contains the output of the Action. For example, the following command asynchronously invokes the helloAction resulting in a new Activation Id:

$ wsk -i action invoke helloAction

ok: invoked /guest/helloAction with id 8004746e06f147b99c1e8b0be875ec64

The output of this invocation can be retrieved through the following command.

$ wsk -i activation get --summary 8004746e06f147b99c1e8b0be875ec64

activation result for /guest/helloAction (success at 2017-02-02 08:01:46 +0530 IST)

{

   "msg": "Hello World"

}

We can see the output being stored in CouchDB.

Consul

Contemporary distributed computing platforms like OpenWhisk, Kubernetes, and Swarm are relying on distributed key/value stores for state management. OpenWhisk uses consul as the single source of truth accessible by every component of the system. It also provides service discovery capabilities making it easy for the Controller to discover the entities that will invoke an Action. These entities are called as Invokers, which are directly responsible for executing the code. Consul maintains a list of available Invokers and their health status.

Consul is supported by Registrator, which watches for new Docker containers and inspects them to decide the services they provide. When the Docker engine creates a new container, Registrator receives an event which gets pushed into Consul.

When the Controller needs to delegate the Action to an Invoker, it looks up in the Consul store to pick the ideal candidate.

Hit the http://192.168.33.13:8500/ui/#/dc1/services URL to explore Consul:

You can see the available Invokers by visiting http://192.168.33.13:8500/ui/#/dc1/kv/invokers/:

Kafka

Apache Kafka is typically used for building real-time data pipelines and streaming applications. It powers many production workloads that need reliable, high-velocity data ingestion. OpenWhisk takes advantage of Kafka to connect Controller with Invokers.

Kafka buffers the messages sent by the Controller before delivering them to the Invoker handpicked from Consul during the previous phase. When Kafka confirms that the message is delivered, The Controller immediately responds with the Activation ID. This stateless architecture makes OpenWhisk highly scalable.

Apache ZooKeeper maintains and manages the Kafka cluster. Zookeeper’s primary job is to track status of nodes present in Kafka cluster and also to keep track of the topics, messages, and quotas.

Invoker

Written in Scala, the Invoker tackles the final stage of the execution process. Based on the runtime requirements and the quota allocation, it spins up a new Docker container that acts as the unit of execution for the chosen Action. The Invoker copies the source code from CouchDB and injects that into the Docker container. Once the execution is completed, it stores the outcome of the Activation in CouchDB for future retrievals. The Invoker makes the decision of either reusing an existing “hot” container, or starting a paused “warm” container, or launching a new “cold” container for a new invocation. It looks up the state of containers in Consul to make an appropriate call.

There may be multiple Invokers depending the load and utilization of the platform.

Depending on the metadata, the Invoker chooses which container image to use. Below are two containers resulted from invoking helloAction.

$ docker ps --format "{{.Names}} - {{.Image}}"| grep "JsContainer"

wsk0_457_warmJsContainer_20170202T035154330Z - whisk/nodejs6action:latest

wsk0_456_warmJsContainer_20170202T035037341Z - whisk/nodejs6action:latest

Docker

Apache OpenWhisk is built on top of proven open source technologies including Docker, which plays a very important role. Almost all the components of OpenWhisk are packaged and deployed as containers. From Nginx to Kafka to Consul, everything in the platform runs as a container.

Refer to the available images on Docker Hub for a list of OpenWhisk container images.

Coming Soon: API Gateway

Though not fully integrated yet, OpenWhisk also has an inbuilt API Gateway to expose Actions as HTTP endpoints. Based on Openresty and Nginx, this open source project is maintained by Adobe.

This capability is currently available as an experimental feature. The following command shows the possible options.

$ wsk -i api-experimental

work with APIs

Usage:

 wsk api-experimental [command]


Available Commands:

 create      create a new API

 get         get API details

 delete      delete an API

 list        list APIs

Summary

OpenWhisk is a fascinating project to learn about distributed systems and serverless platforms. IBM and Adobe must be appreciated for donating it to Apache Foundation encouraging community involvement. Though many open source serverless platforms have mushroomed in the recent past, OpenWhisk stands out for its robust architecture and design.

The choice of open source components chose to build OpenWhisk are the best of the breed. They contribute to the overall success of the platform.

I expect to see increased adoption of OpenWhisk in the coming months. Watch out this space for tutorials on integrating OpenWhisk with popular databases, storage engines, and API gateways.

IBM is a sponsor of The New Stack.


A digest of the week’s most important stories & analyses.

View / Add Comments