What news from AWS re:Invent last week will have the most impact on you?
Amazon Q, an AI chatbot for explaining how AWS works.
Super-fast S3 Express storage.
New Graviton 4 processor instances.
Emily Freeman leaving AWS.
I don't use AWS, so none of this will affect me.
Cloud Native Ecosystem / Edge Computing / Microservices / Security / Serverless / Tech Life

iRobot Confronts the Challenges of Running Serverless at Scale

Apr 12th, 2017 1:00am by
Featued image for: iRobot Confronts the Challenges of Running Serverless at Scale

What are the challenges of running serverless at scale? Imagine you are running an international consumer robot company on serverless. The majority of the robot units are in homes outside of the United States. You need to secure your robots when they connect to the cloud. You need to be able to identify and send out firmware updates in batches. And you want to start thinking about how some of your 15 million home robot fleets can start talking with other home automation products or share data to the internet in new ways.

Is serverless mature enough to operate at this scale?

This is the challenge facing Ben Kehoe, cloud Robotics Research Scientist at iRobot. Kehoe is pretty much all in on serverless, as his talks at previous Serverlessconf events in Brooklyn and London attest. He will be speaking again at the next Serverlessconf, being held in Austin on April 26-28. This time he will focus on what’s missing from serverless providers.

At Serverlessconf, Kehoe has now shared several updates on his journey using serverless at iRobot, and while a keen advocate and believer, he is also more than willing to discuss in his Medium blogs what the serverless community needs to focus on next in order to mature the tooling.

Managing a Global Robot Fleet in a Serverless Architecture

iRobot provides a range of vacuum cleaner robots and has branched out into adjacent robotic products, like mops, gutter-cleaners and the like. Kehoe reports that there are some 15 million iRobot models now in the market being used globally, with only 40 percent of them in the United States. The newest Roombas are internet connected and come with an app so that consumers can control their iRobot via their smartphone.

For now, Kehoe is focusing on the iRobot Roomba 900 series and encouraging that to be a true Internet of Things product. The other Roombas and iRobot products are internet connected, most recently with an integration to Alexa, and Kehoe points out there is also a Create robot, which is a “Roomba for hackers.”

“We see in the future the majority of our products will be internet connected,” Kehoe said, signaling a move not uncommon for smart appliance manufacturers that are transitioning from being a device company to a connected machine company.

To manage serverless at scale, Kehoe uses a range of AWS products, including the AWS IoT platform, the infrastructure-as-code service CloudFormation and the serverless functionality of AWS Lambda.

Across using these services in combination, iRobot needs to manage audit ability requirements as a global enterprise, which is why, Kehoe said, he needs “Serverless Ops.”That led iRobot’s cloud development team to build software, called Cloudr, to manage its serverless microservices architecture.

“We create an application consisting of a set of microservices. It is made up of one stack per microservice,” explained Kehoe. “The developer defines a CloudFormation template in YAML. We inject a DynamoDB table for service discovery. All of the Lambdas have to talk to each other and to external services, and you may not know what they are going to be called at deployment time, you only know that at run time, so the name of this table gets injected into the lambdas that you deploy, so they can look up information there, after they are deployed,” he said.

The company maintains a separate stack for the application as a whole. “That allows us to create the wiring of the application layer, and then those references go into the custom resource lambdas for the stacks that populate those DynamoDB tables and that information propagates back into the service,” he said. “This also allows us to do cross-service policies based on dependencies that you declare so that only the resources that need access to any given AWS resource are allowed it.”

Updating Serverless Architecture at Scale

Current serverless offerings are most lacking when it comes to helping roll out firmware updates to a global fleet of iRobots.

“With IaaS and low-level PaaS, you have a lot of control, you can roll out behind the load balancer or create a new instance and direct traffic over,” explained Kehoe. “In a serverless environment, you can’t roll out behind the load balancer because we don’t get any control over that. You could deploy in place but then it is just a big lever that you pull and it either works or it doesn’t. So how do we host multiple versions simultaneously?”

Kehoe said it is possible but inelegant. Engineers can create multiple versions of the architecture on the API gateway and have separate APIs for each. But AWS doesn’t allow two APIs to use the same custom domain. “For AWS IoT it gets even worse,” said Kehoe. “There is one MQTT server per account in a region, although it is possible to have multiple accounts. But certificates of device identities can only be registered in one account in one region. That is a source of frustration for us.”

To address the shortcoming, the serverless architecture gets increasingly complex. To even identify where each robot is might need an update that requires three service discovery components.

Complex architecture for upgrading robots in serverless

Today, one of the key missing pieces is integration testing.

The “Netflix Chaos Monkey can go in and muck up your system and show you what your system does when it is degraded. That becomes much less possible when you are serverless because the only control you have is over your Lambda functions,” Kehoe explained. He gave the example that if DynamoDB has difficulties writing data, the developers don’t know what the log messages will look like, so they can’t instrument for that risk scenario.

This si why iRobot came up with its own infrastructure destruction agent, the appropriately called Monkeyless Chaos. “So you may be able to make the Lambda see errors when they are not happening, for example, when you see a DynamoDB entry, or you write to a Kinesis stream and the Lambda function forces it to take an extra five seconds. We are interested in that type of integration testing to verify the robustness of the system,” Kehoe explained.

Leveraging Serverless to Focus on Business Concerns

But this is still very much the start of the journey for iRobot, who are thinking several steps ahead as far as what an Internet of Things, serverless architecture looks like at scale. While the company is building and testing serverless with the Roomba 980, company engineers are also thinking through what sort of additional connectivity products customers might want if their robots could do more than just talking to the app.

“What is our place in the smart home?” Kehoe asked. Roomba robots, for instance, build maps through their visual technology of the floor area space they are cleaning. “At the moment, we don’t expose that. There are other products that can build maps of the home, but none of them have the same level of understanding or the ease of use as a vacuum that you are already using. The first visual navigating robot in the home is a very unique space for us from which to build new smart home devices.”

Operating Serverless at Scale

“Serverless is all about computational functions, so IoT is a good match for that as well, so there is a lot of synergy and alignment there. But as we get more into the big data aspect of our products, we are not going to be on Lambda forever, at some point we will have big batch processing jobs. The serverless part of it will be primarily for the event-driven piece, while the batch driven piece will move to ECS and things like that,” Kehoe said.

One factor will be the per-cycle compute cost on Lambda, which will be more costly. “So when you have a certain scale of it, it is not cost effective to force it into Lambda,” Kehoe said, adding that, for now, the costs are not prohibitive so they can do reading from SQS queues in Lambda rather than move to Elastic Beanstalk, for example.

“With any technology, here are the places where it works and where it doesn’t, you are always trying to find a balance with how pragmatic the solution is to meet your goals,” Kehoe said.

Kehoe said the ability to focus on the business logic and scaling applications was a big part of the gain in starting out with a serverless architecture. “With 15 million robots in the field, however, we have the business scale, and we need the cloud scale to be ready for that,” said Kehoe.

Kehoe will be speaking later this month at Serverlessconf in Austin on what is missing from serverless providers.

Feature image by Chris Bartle, licensed under CC BY 2.0.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.