Technology

Streamlining Elasticsearch Deployments with ELK and the Cloud

23 Oct 2015 1:01pm, by

Earlier this month, Amazon Web Services launched its own hosted Elasticsearch to support log analytics and real-time application monitoring, making the cloud giant the latest service provider to offer hosted Elasticsearch.

The idea is a solid one. Elasticsearch is heavily used for website searches, application searches and log management, among other duties. Using Elasticsearch and the full ELK (Elasticsearch, Logstash, Kibana) stack in large-scale deployments can be fraught with complexities. A substantial amount of knowledge and effort is required to maintain and support such a stack, so companies are increasingly choosing hosted or as-a-service solutions.

When you leverage a managed or a self-service cloud solution for ELK, you gain many advantages, including an exceedingly important one — peace of mind. Without having to expend unnecessary efforts, you can always count on the platform’s high availability, scalability and security.

Your Own ELK … Just Doesn’t Make Sense

While outsourcing the Elasticsearch stack may seem like a clear choice, this approach also has a few complications, particularly with demanding workloads.  Managing a private stack includes tasks such as deployment and setup, getting started, integration, UI, scalability, stability and maintenance. It’s in the latter three areas that matters can get sticky within a private ELK environment.

Proper scaling requires engineers to invest countless hours of work. Stability suffers when tough queries cause exceptions and result in a non-responsive Kibana dashboard. Additionally, authentication and authorization can be a challenge when your dashboard has to be accessed by members of different teams, and you are faced with implementing and tracking each team member’s restrictions. Finally, the system should always be expected to stay up-to-date with the latest versions of different ELK components.

Overall, it’s better to have the team deal with developing the company’s core competencies instead of wasting valuable time wrangling the ELK deployment.

It’s important to first validate your use case and determine whether or not you need Elasticsearch or the full ELK stack. There are a number of guides that walk through deploying Elasticsearch or for deploying the full ELK stack, but choosing the right solution depends on your specific requirements. Here are some common Elasticsearch and ELK use cases and corresponding solutions for consideration:

Use Case: Search Engine

As a search engine, Elasticsearch is very versatile; with its high scalability and speed, it can be used to create an internal search option for your website or application, either as a traditional search engine or as the basis for a more sophisticated recommendation engine. Elasticsearch can sort documents, rate them by relevance, rank them by popularity and implement different plugins to extend functionality even further.

With Elasticsearch, you can store customer information for easy access, run a basic CRM analytics search, or even store data for medical research in order to correlate and quickly obtain information.

In order to enjoy a highly scalable and available production-grade search engine, you should take a look at the leading solutions: compose.io (acquired by IBM), qbox.io, found.no (acquired by Elastic) and Amazon’s recently announced hosted Elasticsearch solution. These solutions also enrich basic Elasticsearch APIs to help you customize and enhance users’ search experiences.

Amazon Elasticsearch Found.no (Elastic) qbox
Pre-installed Elasticsearch Yes Yes Yes
Version 1.5 Latest Latest
Premium Elasticsearch plugins No Yes Yes
Access to Elasticsearch API Yes Yes Yes
Runs within your VPC Yes No No

Use Case: Log Analytics

Log analytics is one of the main use cases of the complete ELK stack. The combination of Elasticsearch, Logstash and Kibana makes log analysis more intuitive for most users. However, the challenges associated with log analytics differ from those related to search engine capabilities:

  • Mapping: Elasticsearch is very sensitive to schema conflicts (mapping), usually causing one to two percent of logs to be lost due to conflicts in mapping. This can be resolved using various methods of mapping adaptation.
  • Burst management: Logs are “bursty” by definition. A database log being purged, a spike in traffic or a failure in one of the services can cause excessive log generation, requiring the ELK stack to double or triple in capacity in a couple of minutes across Logstash and Elasticsearch.
  • Parsing: Log parsing and enrichment is a common offering with proprietary log solutions; it otherwise requires the mundane and error prone work of independently parsing logs with complex Grok (Logstash) scripts.
  • Log sources: By definition, logs can arrive from different sources, geo-locations and types. Logs can be shipped by an agent or must be pulled by the ELK stack. This requires integration with Logstash plugins that can periodically and effectively pull data from S3, Heroku and other resources.
  • Scaling Logstash: This can be challenging and may require tight load balancing, monitoring and error correction.
  • Index management and data curation: Log analytics represents a stream of structured and unstructured data that requires special index management and the automatic purging of old and irrelevant data.
  • Queuing: An ELK stack by itself cannot function without additional components such as a strong, highly available queuing system that can allow scaling, bursts and input control.
  • Access control: Log data can include sensitive data and enable collaboration between R&D, support and DevOps teams. Authentication and authorization are not part of the ELK stack.
  • Compliance: Archiving logs for a longer retention period (months or years) is mandatory for many organizations.

One of the vendors that has driven the competition between ELK and proprietary software is Logz.io. As a new log analytics market player, Logz.io seems to be the only vendor that provides the complete ELK stack in the cloud as-a-service for enterprises. That being said, it is important to note that in the space of log analytics, you can find other proprietary solutions such as Splunk and Sumo Logic.

Logz.io (ELK-as-a-Service) Amazon Elasticsearch (hosted servers) Found.no (hosted servers)
Pre-installed ELK Yes No Logstash No Logstash
Direct access to Elasticsearch API Proprietary API Yes Yes
Auto-scale Yes No No
Version Latest 1.5 Latest
Resolves mapping conflicts Yes No No
Automatically parses logs Yes No No
Alerts User-interface to setup alerts No Watcher (JSON)
Data curation (purges old logs) Yes No No
Log spike protection Yes No No
Kibana Role-based access Yes No Shield (partial)
Archives logs to S3 Yes No No
Integrates to AWS log sources Yes No No
Automatic index management and curation Yes No No

Final Note

It’s imperative to first understand your use case. Do you need Elasticsearch or a complete log management solution? Understand the gaps and challenges of every alternative and decide which one is right for you. With the success of Elasticsearch and the ELK stack, companies now provide a hosted or as-a-service version of this open source software. Consequently, they could help relieve the burden of maintaining Elasticsearch.

IBM is a sponsor of The New Stack.

Feature image: A Rocky Mountain Bull Elk, photographed by Mongo, licensed under the public domain.


A digest of the week’s most important stories & analyses.

View / Add Comments