A Closer Look into Google Stackdriver
Last month at GCP Next conference, Google announced the public beta of Stackdriver cloud monitoring and logging service. It is designed to be a hybrid monitoring service spanning both Amazon Web Services and Google Cloud Platform.
Stackdriver was founded in 2013 in Boston by former VMware employees, Dan Belcher and Izzy Azeri. It was created as a unified monitoring engine for AWS and Rackspace. In a short time, Stackdriver managed to acquire customers from both enterprise market and managed service providers. The service which had a slick user interface was known for its integration with a variety of workloads.
After launching Compute Engine in 2012, Google moved fast in adding new infrastructure services required by ops teams. To add monitoring capabilities to its cloud platform, Google acquired Stackdriver in May 2014. A year later, it surfaced as the preview of Google Cloud Monitoring service for Compute Engine, App Engine, Cloud Pub/Sub, and Cloud SQL. As expected, Google conveniently dropped the support for AWS. Like most of the GCP services, Cloud Monitoring had its own set of APIs.
Instead of announcing the general availability of the service, Google surprised everyone with the Stackdriver announcement at GCP Next. Not only did the service gain its original name but also support for AWS. With enterprises going with the multi-cloud strategy, Google realized the need to have cross-cloud support for its monitoring service. The product marketing teams at Google wanted to emphasize on the cross-cloud capability. They decided to retain the original name, Stackdriver instead of Google Cloud Monitoring, which is closely aligned with the GCP branding. Though the support for Rackspace cloud is dropped, the speakers at GCP Next mentioned that it’s possible to monitor infrastructure deployed in other clouds.
During the last two years, Google added multiple DevOps related capabilities to its cloud platform. Services like Cloud Logging, Cloud Trace, Cloud Debugging were integrated with App Engine to enable developers and DevOps teams to analyze and troubleshoot deployments. With the announcement of Stackdriver, Google consolidated the monitoring, logging, tracing, and debugging services with it. Google Stackdriver has now become a one-stop shop for all the DevOps services on Google Cloud Platform.
Let’s take a closer look at the features of Stackdriver.
Stackdriver Monitoring is one of the key features of the service. Customers can create dashboards to track various metrics related to the VMs and the services deployed with them. It also supports alerts that are delivered via email, SMS, PagerDuty, and HipChat. The service can also track metrics from other GCP services such as Pub/Sub and Cloud SQL.
Though Stackdriver supports default metrics such as CPU utilization, disk I/O, memory utilization, network traffic, and uptime, it needs an agent to be installed in each VM for monitoring workload specific metrics. Any VM deployed on GCP or AWS with outbound Internet access can report the metrics via the agent. There are additional plugins for monitoring common open source servers such as MongoDB, Apache, Nginx, Elasticsearch, and more.
The service also supports custom metrics that be integrated with the dashboards and alerts. Applications can easily send custom metrics through the Stackdriver API.
Stackdriver Monitoring is based on collectd, an open source software that collects system performance metrics.
Though not as mature as Amazon CloudWatch, Stackdriver Monitoring offers core capabilities required for monitoring VMs, GCP resources.
Its support for custom metrics enables customers to configure advanced scenarios.
Stackdriver Logging is designed to collect and store logs from workloads deployed in both Google Cloud Platform and Amazon Web Services. The service can gather logs from Compute Engine, App Engine, EC2, and Google Cloud Audit Logs.
Based on Fluentd, Stackdriver Logging relies on the google-fluentd logging agent installed in each VM. The logs streamed from multiple sources can be viewed in the Stackdriver Log Viewer. Applications can also use the Logging API to programmatically send the logs.
One of the powerful features of Stackdriver Logging is exporting the logs to external services such as Google Cloud Storage bucket, a Google BigQuery dataset, a Google Cloud Pub/Sub topic, or any combination of these three services. This enables long-term retention of logs by moving to an inexpensive Google Cloud Storage bucket. Logs exported to BigQuery can be searched and analyzed. Cloud Pub/Sub can be used to export Logs to third-party services or a REST endpoint.
Stackdriver Error Reporting is a service that aggregates, stores and displays errors in a central location. It can show time charts, occurrences, affected user counts, first and last seen dates, and cleaned exception stack traces. The service supports applications deployed in App Engine and Compute Engine.
Customers can configure the service to send emails when a new error occurs. Errors can also be retrieved via REST API.
Formerly known as Cloud Debugger, Stackdriver Debugger lets developers inspect the state of the code of a Java, Python, or Go application deployed in App Engine or Compute Engine. Google claims that the service doesn’t interfere with the performance of the application. The state of an application can be viewed without adding logging statements explicitly.
The service comes with a catch — it works only with applications whose source code is stored in Google Cloud Source Repository, Github or Bitbucket. Developers need to configure a local git repo that’s connected to the cloud repository.
Developers integrating their applications with Stackdriver need to install the agent for the supported language. It can capture and inspect the call stack and local variables in the application.
The local variables and call stack at a specific location in the source code can be captured as a snapshot, which applies to all the running instances of the application. The snapshots can be shared with other team members through a URL.
The last feature of Stackdriver is the distributed tracing system for applications deployed in App Engine. It collects latency data from applications and displays in near real time in the console. The service can be used to investigate the latency impacting the application performance. It can trace the latency data for requests to App Engine URIs and additional data for round-trip RPC calls to App Engine services like Datastore, URL Fetch, and Memcache.
With microservices architecture going mainstream, this service can be used for performing the root cause analysis of latency related issues.
Existing applications running on App Engine can be enabled for tracing. After the service collects sufficient data, a custom analysis report can be created. The report shows the overall latency for each request made by the application.
Stackdriver is Google’s answer to Amazon CloudWatch and CloudTrail. The service has the potential to become the core DevOps platform for applications and workloads deployed in Google Cloud Platform.
AWS customers with no investment in GCP may not prefer the service. But for multi-cloud and hybrid deployments, Google Stackdriver is a viable option for monitoring.