Software-Defined Storage with an Understandable Interface, the Ceph Way, Part Three

The Ceph storage cluster, with all its robust and scalable design, would not be a usable piece of storage without some standard interfaces that a storage consumer can understand. The standard interfaces that Ceph provides for this purpose are:
- RADOS Gateway (RGW): an HTTP object interface.
- Remove Block Device (RBD): a block interface.
- CephFS: a POSIX-compliant file interface.
For the purposes of this article, we will be discussing RGW.
RADOS Gateway (RGW)
RGW is a Representational State Transfer (RESTful) API service, which provides a standard HTTP/Web Object interface to its clients, and uses ‘librados’ to store clients’ data in the form of RADOS objects in the Ceph storage cluster. A HTTP/Web Object is a client-side concept, which the Ceph storage cluster has no knowledge of. Similarly, RADOS objects describe how the data is viewed by Ceph storage cluster, and HTTP clients talking to RGW have no idea about RADOS objects. The RGW is a Ceph client that knows about Ceph storage clusters and cluster maps. It is also aware about the semantics that its HTTP clients understand. There is no one-to-one mapping between a RADOS object and a HTTP/Web Object. Conversely, a HTTP/Web Object is striped into many RADOS objects and Ceph storage clusters to take care of managing these RADOS objects. The metadata and semantics of a HTTP/Web Object is stored separately (again, in form of RADOS objects) and only the RGW is aware and responsible for translating RADOS objects into HTTP/Web Objects and vice versa.
There are three concepts related to this striping mechanism:
1. Stripe Unit
A stripe unit is the smallest unit of data. A HTTP object may be striped in many stripe units based on object size and stripe width (stripe size), defined by RGW Ceph clients. Typically, all stipe units are of equal size, except the last stripe. For example, a HTTP object of 1.1 MB can be striped in to five stripe units (stripe count = 5), where stripe width is 256 KB; one will get four stripe units sized at 256 KB, but the last stripe will only have 100 KB of data.
2. RADOS Object
A RADOS object has a configurable maximum object size (2 MB, 4 MB). Default RADOS object size is 4 MB. Each RADOS object has the capacity to hold many stripe units.
3. Object Set
These stripe units are written in various RADOS objects. The Ceph client (RGW) writes all stripe units belonging a particular Web/HTTP object in parallel fashion to these RADOS objects. Their RADOS objects act as a set of disks, participating in RAID-0 configuration and collectively referred to as an “object set.” As each RADOS object belongs to different PG, OSD and Disk, the Ceph clients achieve excellent performance gains, as it is not limited by a single disk, but the aggregated speed of many disks.
HTTP Object/Web Object
A HTTP or Web Object is data stored in such a semantic way that the owner of the data only knows about a unique uniform resource identified (URI) and user credentials. It performs data operations such as create, delete, update and manage the data using standard HTTP methods like PUT, POST, GET, DELETE and HEAD. On the other end web server, or its affiliates serving the client request, it handles all related data management functions, availability, reliability and access control.
The communication between clients and servers is based on a RESTful API. RESTful web architecture is a widely adopted standard due to its simpler design, performance and maintainability. A detailed discussion of RESTful web service and its architecture is beyond the scope of this article; however, there are ample resources available to explain the subject. Some common examples of such services are Amazon S3 and RackSpace Cloud Files. There is a standard set of published APIs to access and consume these services. Amazon S3 and OpenStack Swift API are widely used Object Store APIs. RGW supports both the S3 and Swift APIs and also provide some inter-operability between them.
RGW RESTful Service and Web Front-End
The RGW was initially developed as a Fast_CGI module that can be plugged into an Apache web server. It serves the REST request coming from any web client, on the other end, it will use ‘librados’ to talk to a Ceph storage cluster. The community however, is now more open to a new approach where the RGW itself can handle web clients by tightly integrating it with CivetWeb. The integrated approach makes RGW service easier to configure and also saves on response time, as seen by the web client.
To act as an RGW server, a host should be a Ceph client of any Ceph storage cluster. There are separate sets of packages needed for that, namely ceph-client, and librados that are generic for any ceph-client. Additionally, radosgw and radosgw-agent are also needed to become a RGW server. After installing these packages, Ceph.conf needs to be updated accordingly. Keeping in mind a globally distributed design of today’s data centers, a deployment can have multiple RGW servers in different geographical regions and zones. A RGW region is a logical geographical region, and each region may have many zones. A RGW zone is a logical group of one or more RGW instances. Each region has a master zone, and any data coming from the client is first written to the master zone and is then copied to other zones in that region. A object may be read from all the zones in that region but can be written only to the master zone. Radosgw-agent is responsible for inter-zone and inter-region metadata synchronization. By default, each RGW region uses some Ceph pools to store its data. Ceph pools can also be configured to use different Ceph pools as well. By default, each region will have the following Ceph pools:
- .RegionName-ZoneName.rgw
- .RegionName-ZoneName.rgw.control
- .RegionName-ZoneName.rgw.gc
- .RegionName-ZoneName.log
- .RegionName-ZoneName.intent-log
- .RegionName-ZoneName.usage
- .RegionName-ZoneName.users
- .RegionName-ZoneName.users.email
- .RegionName-ZoneName.users.swift
- .RegionName-ZoneName.users.uid
All these pools are used by RGW instances to keep region and zone information, along with user information and metadata. Respectively, <pool_prefix>.rgw.buckets and <pool_prefix>.rgw.buckets.index indicate the default data placement and the data index pool. Different pool names can be used as well, but it is always good practice to keep the suffixes of these pools the same.
RGW User and Data Management
In a file-based access mechanism, data is viewed in terms of files and arranged in directories, where the file and directory belong to a user or to a user group. Similarly, in HTTP-based object storage systems, data is stored in Web Objects (can be equated with file with associated metadata), and can be arranged in containers (OpenStack Swift terminology) or buckets (Amazon S3 terminology). The user management and access control mechanism is provided by an external or integrated authentication system.
In OpenStack Swift parlance, each Swift object belongs to a container. A container is associated with some account. A user should be registered with the account to access the container and objects in that account. A user has certain assigned roles and has some, all or no access to objects and containers. Each user is identified in the system by a unique account and username combination. Each such user has a password. A user can either use account:username and password every time and let the authentication system generate a token for it, or it can generate a token once and use this token in all subsequent requests. A token will expire after a certain timeframe.
In Amazon S3 parlance, each S3 object is stored in a bucket. A user has one or more pairs of access_key and secret_key. A user may also generate a temporary token using its access_key and secret_key. Each user will have some or all the access permissions on buckets and objects.
RGW user management is mix of both approaches, to accommodate both APIs. A user is first created that serves as an S3 user, and it has one access_key and secret_key. We can add more such key pairs to these users. Each user may have many sub-users. This user:subuser combination can be used as account:user combination that the Swift API understand. Subuser will be generated with some secret-key, and even tokens can be generated for it. Each sub-user may have different access rights like read-only, write, read-write and full-control. The radosgw-admin is an utility that provides a way to manage user, subusers and keys.
Setting Up and Testing a Small radosgw Deployment
The steps given here are for illustration purposes only, so it would be wise to go through the official documentation for any real deployment. We’ve already discussed deploying a Ceph storage cluster in the previous article. For this example, we will assume that you already have a running Ceph storage cluster with the following topology:
- Three monitors: mon-1, mon-2 and mon-3.
- Three OSD nodes: osd-node-1, osd-node-2 and osd-node-3.
- One Admin-node: admin-node.
Install
We will add a new node named rgw-1. This node needs Ceph client-side packages, namely ceph-common, librados, radosgw, radosgw-agent. You may use ceph-deploy or apt-get to install these packages on the client.
Key Management
Generate cephx keys and add these keys so node rgw-1 can access the Ceph storage cluster. Execute the following commands from any monitor node:
1 2 3 4 5 6 7 8 |
sudo ceph-authtool --create-keyring /etc/ceph/ceph.client.rgw1.keyring sudo chmod +r /etc/ceph/ceph.client.rgw1.keyring sudo ceph-authtool /etc/ceph/ceph.client.rgw1.keyring -n client.rgw1 --gen-key sudo ceph-authtool -n rgw1 --cap osd 'allow rwx' --cap mon 'allow rwx' /etc/ceph/ceph.client.rgw1.keyring sudo ceph -k /etc/ceph/ceph.client.admin.keyring auth add client.rgw1 -i /etc/ceph/ceph.client.rgw1.keyring |
Configure RGW Node
After the Ceph storage cluster deployment, admin-node must have a generated ceph.conf file. Add the following section to configure the new radosgw:
1 2 3 4 5 6 |
[client.rgw1] host = rgw1 rgw_socket_path = /tmp/rgw-1.sock rgw_dns_name = rgw1 log_file = /var/log/radosgw/rgw-1.log rgw_frontends = "civetweb port=8080" |
After making any changes in ceph.conf, it should be copied on all the nodes in the cluster. This can can be easily achieved using ceph-deploy as follows:
1 |
ceph-deploy --overwrite-conf admin mon-1 mon-2 mon-3 osd-node-1 osd-node-2 osd-node3 rgw1 |
Start The Service
Make sure you set your firewall accordingly. The port mentioned in ceph.conf should be opened to public.
1 |
rgw1#sudo /etc/init.d/radosgw start |
Check the logs at /var/log/radosgw/rgw1.log to see whether it’s been started properly.
Create User/Subuser & Keys
1 2 3 4 5 |
sudo radosgw-admin user create --uid=user1 --display-name="user1" --email=user1@example.com sudo radosgw-admin subuser create --uid=user1 --subuser=user1:swift1 --access=full sudo radosgw-admin key create --subuser=user1:swift1 --key-type=swift --gen-secret |
After executing the following commands, you should be seeing something like the following as an output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
{ "user_id": "user1", "display_name": "user1", "email": "user1@example.com", "suspended": 0, "max_buckets": 1000, "auid": 0, "subusers": [ { "id": "user1:swift1", "permissions": "full-control"}, ], "keys": [ { "user": "user1:swift1", "access_key": "0H7I8DW8Q20C5HDQTV0O", "secret_key": ""}, { "user": "user1", "access_key": "RTR9BKRLB9DRCDN5R6DE", "secret_key": "JzjRmEgPepMVvG7dqea7OXmxJzW06t7j9cM95Qg7"}], "swift_keys": [ { "user": "user1:swift1", "secret_key": "asdJdfjsdhkjfhG7dqea7OXmxJzW06t7j9cM95Qg7yw5"}, ], "caps": [], "op_mask": "read, write, delete", "default_placement": "", "placement_tags": [], "bucket_quota": { "enabled": false, "max_size_kb": -1, "max_objects": -1}, "user_quota": { "enabled": false, "max_size_kb": -1, "max_objects": -1}, "temp_url_keys": []} |
Here s3 user, user1 has:
1 |
access-key=’RTR9BKRLB9DRCDN5R6DE’ |
and:
1 |
secret_key=’JzjRmEgPepMVvG7dqea7OXmxJzW06t7j9cM95Qg7’ |
The Swift user user1:swift1 has:
1 |
Password(secret_key)=”asdJdfjsdhkjfhG7dqea7OXmxJzW06t7j9cM95Qg7yw5” |
Testing The Swift User
The Python Swift client can be installed on any machine that can reach to radosgw host ‘rgw1’. The following steps describe how to use user1:swift1 credentials to create and access Swift objects, served by Ceph storage cluster and radosgw:
1 2 3 4 5 6 7 |
any-node#swift -A http://rgw1:8080/auth -U user1:swift1 -K secret-key-generatedlist any-node# swift -A http://rgw1:8080/auth -U user1:swift1 -K secret-key-generatedupload mycontainer a-file-in-localdir any-node# swift -A http://rgw1:8080/auth -U user1:swift1 -K secret-key-generated list any-node# swift -A http://rgw1:8080/auth -U user1:swift1 -K secret-key-generated download mycontainer any-object-name |
Similarly, you may use s3curl, s3cmd or any other s3 client to do any s3 object operations. The complete list of supported swift and S3 APIs are available at ceph.com.
Pushpesh Sharma is currently working as a senior test development engineer at SanDisk India Device Design Center in Bangalore. He has over six years experience in evaluating cloud, virtualization and storage technologies. He holds a Bachelors degree in Engineering (Information Technology) from the Government Engineering College Kota (Raj.), India. He also holds a Certificate in Marketing and HRM from SJMSOM, IIT, Bombay. In his free time he likes to read (anything, mostly) and listen to good music, and he enjoys good food and wine.
Feature image via Flickr Creative Commons.