Tips for Optimizing Elasticsearch on Azure
Elasticsearch is one of my favorite platforms. It’s an open source RESTful search platform built on Lucene.
Elasticsearch provides amazing performance, incredible scale, easy management, and virtually every feature you’d expect from a search index. It’s no surprise that over the past few years, it has become the de facto standard for building web search applications.
The core Elasticsearch product is free, and thus can be hosted in several ways:
A. Infrastructure-as-a-Service: On virtual machines or Docker / Kubernetes containers.
B. Managed Infrastructure-as-a-Service: Through Elastic themselves or a partner like AWS or Qbox. This provides the same control as hosting it yourself without the overhead of patch management, backups, and monitoring.
C. Platform-as-a-Service (Simplified): Microsoft’s Azure Search PaaS offering is built upon Elasticsearch and offers most, but not all, features.
After deploying multiple Elasticsearch instances in each scenario, here’s a list of tips.
Tips
-
When possible, stick with Azure Search. It supports the most common scenarios of indexing data, analzing data, querying results quickly, and organizing results as aggregations. Elasticsearch can be complex, so this is the easiest and cheapest way to leverage its core features.
-
If hosting Elasticsearch infrastructure, use Docker containers when possible. Elasticsearch is offered as a Docker container, which can be deployed in minutes with minimal configuration. Adding nodes to clusters is easiest with containers.
-
If hosting Elasticsearch as virtual machines, research the templates. There are several flavors of Elasticsearch on the Azure Marketplace. Versions, pricing, and support vary. When in doubt, stick with the official image from Elastic. This is also available as an Azure Resource Manager template.
-
If deploying a multi-node Elasticsearch cluster in one resource group, consider centralizing indexes using Azure Files. Azure Files fully support Elasticsearch, allowing us to decouple compute from storage and take advantage of fast, low-cost Azure storage.
-
Ensure that no indexes are stored on temporary drives: This seems obvious, but I’ve heard of multiple clients losing data because of choosing a VM’s temporary drive for index or configuration storage.
-
Understand X-Pack, its features, and pricing: While the core Elasticsearch product is free and fully-featured, Elastic offers a premium add-on product called X-Pack which adds security, compliance, and monitoring capabilities. X-Pack pricing can dwarf cloud fees, so factor that requirement in before choosing a deployment model.
-
Harden network access: Too many Elasticsearch services are deployed with the default ports of 9200 and 9300 accessible without authentication. Add authentication, add TLS encryption (using X-Pack or a third-party library), and use network security groups to limit access to ports. When appropriate, host your search-driven application close to Elasticsearch and handle access through your own business logic on a closed virtual network.
-
Plan for business continuity: For VMs or containers, plan to use Azure Backup and Azure Site Recovery.