ElasticSearch Index Rollover with Timestamps
If you are using ElasticSearch for storing system or application logs, then your ES cluster can quickly gets very big.
Fortunately, ElasticSearch provides functionality for automatic rollover and deletion of indices.
Here is how to configure it.
Note: all configurations and examples here are based on ELK 7.x. Some functionalities are not available in previous versions. If you are using an older version, consider upgrading.
Create lifecycle polices
ElasticSearch provides the Index Lifecycle Policies (ilp) functionality, which can rollover indices. It is configured, by defining the phases of an index (hot, warm, cold, delete).
When handling logs, we can store them in different indices, which are rolled over automatically on certain conditions (size or time). Additionally old logs indices can be deleted to free up space.
In the example below we are creating a lifecycle policy "test". The "hot" phase, in which the index is used and not rolled over, ends when it gets bigger than 500MB or older than 1 minute. The "delete" phase (when the index is deleted completely) starts 1 minute after the rollover (or 2 minutes after creation).
PUT _ilm/policy/test { "policy": { "phases": { "hot": { "min_age": "0ms", "actions": { "rollover": { "max_size": "500mb", "max_age": "1m" }, "set_priority": { "priority": 100 } } }, "delete": { "min_age": "1m", "actions": { "delete": { "delete_searchable_snapshot": true } } } } } }
Create index template
The index template will make sure that the newly created / rolled over indices have the lifecycle policy applied.
The template below will be applied to any created indices, that have names like "test-*".
PUT _index_template/test { "index_patterns": ["test-*"], "template": { "settings": { "number_of_shards": 1, "index": { "lifecycle": { "name": "test", "rollover_alias": "test" } } } } }
Create the first index and the alias
There are 2 ways to use the alias - with the "is_write_index" property or without.
If you are using the "is_write_index" property (which is recommended and used in the example below), the index alias will point to all indices. The latest one would be the "write" index. This allows you to get data from all indices when searching, while the new documents will be written to the latest index.
If not using the "is_write_index" property, the alias will always point to the latest index. Even if other are still present, no data will be extracted from them through the alias (you will have to search in them directly).
The only requirement for the index is, that its name ends with a number. However, in this example, I am also including the timestamp of the rollover.
In the example below, we are creating an index "test-2021.02.05-000001" with the alias "test".
PUT /%3Ctest-%7Bnow%2Fd%7BYYYY.MM.dd%7D%7D-000001%3E { "aliases": { "test": { "is_write_index" : true } } }
Check the index by alias
Now the alias "test" points to the only index available - "test-2021.02.05-000001".
GET /test (Response) { "test-2021.02.05-000001" : { "aliases" : { "test" : { "is_write_index" : true } }, "mappings" : { }, "settings" : { "index" : { "lifecycle" : { "name" : "test", "rollover_alias" : "test" }, "routing" : { "allocation" : { "include" : { "_tier_preference" : "data_content" } } }, "number_of_shards" : "1", "provided_name" : "<test-{now/d{YYYY.MM.dd}}-000001>", "creation_date" : "1612536374323", "priority" : "100", "number_of_replicas" : "1", "uuid" : "XpsCXK7gRcyoFZf71FfIag", "version" : { "created" : "7100099" } } } } }
Put some data
Let's put 3 documents (3 log lines) in the index.
PUT test/_doc/1 { "message": "a dummy log 1" } PUT test/_doc/2 { "message": "a dummy log 2" } PUT test/_doc/3 { "message": "a dummy log 3" }
Check the count of documents in the index by the alias
The count of documents is now 3.
GET /test/_count (Response) { "count" : 3, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 } }
Test (rollover manually)
We can rollover manually to test the configuration
POST /test/_rollover { "conditions": { "max_age": "1m" } }
Check the indices by the alias again
Now the alias "test" points to both indices "test-2021.02.05-000001" and "test-2021.02.05-000002". The new one (000002) has the property "is_write_index" set to true, the old one (000001) - to false. All new records, passed to the alias, are saved in the new index, but searching with the alias finds documents from both indices.
GET /test (Response) { "test-2021.02.05-000001" : { "aliases" : { "test" : { "is_write_index" : false } }, "mappings" : { "properties" : { "message" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } } } }, "settings" : { "index" : { "lifecycle" : { "name" : "test", "rollover_alias" : "test" }, "routing" : { "allocation" : { "include" : { "_tier_preference" : "data_content" } } }, "number_of_shards" : "1", "provided_name" : "<test-{now/d{YYYY.MM.dd}}-000001>", "creation_date" : "1612536374323", "priority" : "100", "number_of_replicas" : "1", "uuid" : "XpsCXK7gRcyoFZf71FfIag", "version" : { "created" : "7100099" } } } }, "test-2021.02.05-000002" : { "aliases" : { "test" : { "is_write_index" : true } }, "mappings" : { }, "settings" : { "index" : { "lifecycle" : { "name" : "test", "rollover_alias" : "test" }, "routing" : { "allocation" : { "include" : { "_tier_preference" : "data_content" } } }, "number_of_shards" : "1", "provided_name" : "<test-{now/d{YYYY.MM.dd}}-000002>", "creation_date" : "1612536718290", "priority" : "100", "number_of_replicas" : "1", "uuid" : "TEAHJYsNTAWZzF8p5DpfRA", "version" : { "created" : "7100099" } } } } }
Put one more doc and check the count again
If we add one more document (log line), the count shows 4: three from the first index and one from the new one.
PUT test/_doc/4 { "message": "a dummy log 4" } GET /test/_count (Response) { "count" : 4, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 } }
Elastic recommends to rollover indices based on size, not time. This decreases the count of indices and makes the search faster.