CI/CD / Programming Languages

Q&A with Microsoft’s Jeremy Winter: Automation is the New Visualization

12 May 2016 8:38am, by

We cover a lot of open source technologies that help organizations develop and deploy software at scale. But what is happening in the enterprise Microsoft shops? What does the Redmond giant offer to help its largest customers build scale-out cloud-based environments?

We sat down with Jeremy Winter, Microsoft’s partner director of program management for the Microsoft Operations Management Suite (OMS) to find out what this service offers for the cloud curious and the cloud native. Introduced in 2015, OMS is a collection of management tools that span a number of different product categories, including a workflow engine, log analysis and backup and recovery capabilities, all offered in convenient cloud services.

While OMS support for containers is still pretty nascent, Microsoft is clearly up to something ambitious here, hoping to establish OMS as a central hub for monitoring and managing any and all computational resources, not just Microsoft’s. It will do this through the tireless ingestion of log data through a variety of means, including the Fluentd open source data sharing protocol. OMS can be used to analyze log data from any application or hardware, and, more importantly, automate operations that use log data conditions as triggers.

TNS: What can OMS do?

Winter: For the Microsoft Azure customer, we provide them enterprise grade management across all the Azure instances. You can deal with them on a per VM instance, or you can manage them across the entire environment. So if I want just to go back up one of those VM instances out there, I’m going to back it up to the cloud. Or I can take a snapshot of a VM to make sure that I’ve got a historical view for recovery.

Automation in OMS

Automation in OMS

It’s the same process if I want to run an automation job that shuts down all the machines that are running idle during a certain period of the day or the evening.

OMS can help you think about how you’re dealing with your Windows and your Linux VMs sitting inside AWS as well. We can manage instances running on AWS. We can manage the Ubuntu running on an AWS. We can pull syslogs off of it. I can use AWS CloudTrail; I can look at AWS CloudWatch.

IT is going to be sitting with multiple clouds, multiple technologies. A lot of our existing customers have a big footprint of management already with Microsoft System Center capabilities. We support Linux and predominantly Windows. I need to help them as they move to whatever cloud technology and whatever operating systems they use with a central set of policy management and automation tools.

We’ve leveraged a lot of the technology coming from like Solr and Lucene, and we’ve been listening to Linux based customers of what they want.

So that analytics can be used for free. We can ingest 500MB and keep it for seven days.

TNS: 500MB of what?

Winter: Of just raw data that’s ingested. So I can send in my logs, I can send syslogs, I can send custom logs.

TNS: What can it do with the logs?

Winter: We have a solution gallery of all sorts of very management specific things that an enterprise manager would need. So I need to see the dependencies of how my application sits. I might be moving a workload to the cloud, and I want to know what’s dependent on it. If I’m running it in a service, I’m going to want to know what are all the components it needs. How’s the network? How’s the storage connection?

So I may want to have it run every 15 minutes to look for a type of condition, maybe report on machines that aren’t patched. You may want to back up your VMs. These could be Azure VMs. It could back up things from your own premises, leveraging our cloud storage or cloud. You may want alerts, and may want your alerts sent to a pager. We use Webhooks to hook into whatever system you use.

The OMS Graphical Runbook

The customer gets a dashboard of kind of the core paradigms that they care about in their environment. So if I wanted to see, for example, my system updates, I just choose system updates. It now gives me a view of the status of where I’m at on all my system updates.

TNS: So I could use this for AWS, for Azure, or for servers running on virtual machines or my bare metal servers, as long as I’m shipping the logs over to OMS.

Winter: That’s right, on your own premises. If you have Red Hat Linux machines running on an OpenStack infrastructure, and you care about all those instances, you can pull them right in.

That’s a bit of that mind twist for people. You can import your own custom data, create your own custom fields. So we have customers on the manufacturing or retail side that have devices out there in the environment. Instead of having them route log data to their on-premises monitoring systems, they’re now having those things shipped to the OMS, and now they get to see the overall view of how those systems are performing.

We just GA’d the custom log work last week, and you can now bring in custom logs from whatever you define. Create custom fields, so that you can then query and search on just like you do in our search language, and then you can create these easy-to-use solutions. So you can have a dashboard that asks, “How’s my manufacturing device running?” And you can have that view.

Your traditional enterprise IT needs to know these aspects, even your new DevOps world needs to know this, what I call enterprise-grade management.

OMS is already in the cloud. So if someone’s like, “I don’t want this old infrastructure. I don’t want to have to continue to think about all this hardware and gear and pay for people to manage the management solutions. I just want to leverage the cloud.” And we do that with just a normal SaaS solution.

So in your traditional Microsoft way, you’d think, “Hey, you’re going to give me Windows or System Center 2016 and it’s a few years after you delivered on System Center 2012 and I’m used to these every-three-year cadences.” But we’re real-time here. We treat it as a startup. We work directly with customers, and when we feel the feature is ready, we release, so it’s real-time.

We have seen … a huge spike in the number of jobs that are being created and the number of minutes that they’re now running. They’ve doubled last year and doubled again in the last three months.

Customers will say, “Hey, I needed log search API, and I want it to talk to an API.” Forty-five days later, we delivered it. In one six-month span, we did over 350 deployments to the service that brought in new capabilities or tweaks to features. These are ready-made solutions or ready-made intelligence centered around management things.

TNS: The system updates are OS updates, or are they app updates?

Winter: Right now, we’re focused on the OS updates because that’s where we hear the most pain for the customers are. A lot times with the apps, users have their own solutions that they’re using, Puppet, Chef, Ansible.

We also have security. This is a big area that we’re starting to see. Security is more than just a security team’s problem now, it’s everyone’s problem. You may be the developer managing your own system out on one of these clouds, you can still need to see where you are at with your security, your patches, what’s changed.

I can query right into my critical updates. OMS gives you a visual perspective of what’s going on, but underneath it is a full on analytic stack. You have the search capabilities that you’d expect out of any analytics system, whether you’re using an ELK, Splunk, Loggly, or SumoLogic.

 

The ONS dashboard

The OMS dashboard

TNS: What’s that query language?

Winter: This is a derivative that we’ve kind of mashed up from PowerShell and Solr. So when Lucene and Solr start to support say, Joins, we do a roll out in our environment and the Joins show up.

I want to be able to have some simplified approach, and we’ve seen that it makes it a hell of a lot easier for customers because if they want to add a new data type, they can just add it in. “All right, I want to go look at the security events and what the event ID is.” I don’t have to go think about the query. I just choose it and it shows up. But you don’t have to go learn the query language out of the gate. You still have a visualization view.

TNS: Excellent. Yeah. It’s not like we’re not interested in visualization, but we’re hearing a lot about automation. Now can you take any actions based on this log data coming in?

Winter: Yeah. This is something that’s very powerful compared to a lot of the other systems out there. We have automation behind the scenes.

Say an IT shop has a malicious IP in its network, an IP with a botnet in the environment. I can take my wire data, things that are going over the network. We can identify it and tell them.

So if someone, for example, wants to schedule VMs to shut down, they can build it right here. It’s not just, “Hey, I’m going to go stop machines.” You need to actually connect up to Azure, look at what VMs you’re going to touch, look at the logic and the health around those and then go to start to turn them off or do other specific actions.

We have seen, just like you, a huge spike in the number of jobs that are being created and the number of minutes that they’re now running. They’ve doubled last year and doubled again in the last three months.

Now not only do we have automation, but we have this capability of enforcing configuration, through PowerShell or whatever scripting language you want, you can identify the policy or the configuration you want and we can enforce it. From an IT perspective, a developer or admin may want define a gold master, and then tell if any of the deployments have drifted from this.

You can tell which ones are compliant, and which ones are running non-compliant. We can even drill down to the level that says, “All right, why is it not compliant?”

And as I look at why it’s not compliant, I can see it hasn’t been compliant for a while. Let me see what’s going on here. As you drill down in further, it can show that this one file is the one that’s not compliant. You can choose to set the policy that within 15 minutes, the system can take that file and enforce it back.

TNS: So the compliant file is reinstalled?

Winter: That’s right. So as we start to see updates and changes in the environment, we’ll see change tracking, automation and configuration policy all come together, which really helps provide a consistent approach of policy across those workloads that move to the cloud.

Again, underneath, it’s all powered by all this data is just being merged into this really powerful analytic system.

TNS: When OMS runs an automated job, where is it run from?

Winter: You run it from the cloud. This is a cloud-based solution. When I have jobs, I can choose where I want that job to execute at. So we have a notion of a hybrid worker and what that means is if I want to take and run this on my on-premises, I can take the hybrid worker down and I can still control and see the status of all my jobs, but it’s being executed on-premises. So you’re driving the automation from the cloud.

Containers are early, but they’re really taking off.

It’s always been at the heart of what we’ve been building is ensuring that this can run on any cloud as well as on the on-premises. Because the things I’m hearing from IT is, “Man, we’re in a complex world. We’ve got a lot of different tools. I’m trying to get visibility across all those tools and I’m now starting to do heavy development or heavy migrations into these different cloud infrastructures. I just need an easier tool to be able to do it.” So this is what we’ve been building.

So when customers start to head to the cloud, I want to make sure we had an enterprise-proven system that was there for them. It’s there for Azure customer, but also for outside of Azure as well. So it really is bringing all your data into one place.

TNS: What about containers?

Winter: So we’ve been watching the container space. I have a team that’s incubating on the containers. We’ve spent some time with Docker. We’ve spent some time with Mesos and Mesosphere. From what we hear in the IT side is that they’re relying on other teams to do the builds. Often times, they’re even allowing the developer to deploy apps directly.

Automation in OMS

Automation in OMS

But often, IT may either be doing those deployments or they just need to understand the insights to what’s happening out in that microservices world. Whether they’re long lived or short lived, they still need to know how many of those containers are out there, what’s happening inside those containers, what the logs inside say.

We’re off working with the set of customers. What we’re trying to do is we make sure we can use the Docker Swarm APIs and others to pull data just like we do with custom logs, pull data from that to just see like how many containers you have in your environment, what are hosts that are running on those containers, and how are those containers running. So, how many are running, how many are stopped, how many have failed?

TNS: So, when it is completed, what could this container service offer developers?

Winter: Well I think the way we look at it, is that we can guarantee that first, that developers can have the freedom to keep doing what they need with containers while IT has the visibility and the insights that they need. That’s one.

Number two, it gives you the historical reporting of what’s been going on in that environment because ultimately you may need it for compliance or security reasons. So you have that historical ingestion of the data.

The third advantage is that you can choose to validate the policy that the IT defines. You may have policy that is set by the container solution of choice. You now also have a way to enforce that policy.

Containers have their own automation, but if I need to start doing more orchestrated automation outside of those containers, I can start to leverage that as well.

Containers are early, but they’re really taking off. Right now our approach in the OMS is to allow people to have the flexibility to keep using the tools that they want that really want and to help IT start to get a better handle of those multi-cloud environments and multi-tool environments.

TNS: So in a sense MOMs is kind of agnostic to new technologies, as long as you got the logs coming in, you can use them for the automation and the reporting mechanisms.

Winter: Yeah. That custom solution is a key piece because we also want to give the customers the flexibility to build what they need and so this where you should be able to do so. So you can save searches and create these custom views, and this is what we see a lot of.

This sets the stage for something down the road as we leverage Fluentd and the community around Fluentd. Fluentd is a new model that the Linux community has driven us towards as we think about how to surface data. This opens the door for many connections. The many systems that already support Fluentd to start also to flow through OMS very easily.

The thing is with the open source community; it’s a market of patterns. If I choose a certain technology, there’s a pattern of other technologies around it. One year, it may be one technology, and four years later, it may be another.

So we’ve watched this pattern where we are experimenting and with the Fluentd pattern right, and we’re pretty positive on it. We think that they’ve got those that have really been pushing to have something, and so we’re watching it. We’re participating with it and yeah, that’s the approach we’re taking. If we see that the community shifts to a new pattern, we’ll shift to a new pattern.

Translation: Mara Kruk

A newsletter digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.