Log Management Best Practices

Searching over a large number of logs can lead to a long query times.

Searching over a large amount of logs can lead to a long duration time, the more you reduce the amount of rows searched, the faster search results will be returned.

There are several ways to address that:

1. Datasets

Datasets offer a streamlined approach to organizing log data, enhancing query speed and efficiency.

Why use datasets?

Using datasets, you can define rules to group your logs into separated tables. When querying logs from a single dataset, less irrelevant data is scanned and therefore results are retrieved faster.

What is a dataset?

A dataset consists of the following:

  • Dataset rule- defines log filters using Lumigo syntax. Each log that meets the filter will be saved to the dataset
  • Retention period- available periods are according to your active log plan/s

Planning your datasets grouping strategy

  • Create datasets to meet your organization structure, or business questions (e.g. dataset per service, environment, etc.)
  • Define dataset rules using static filters or change infrequently (e.g. filter by team or business unit)

📘

Upon ingestion of a new log, it's checked against all dataset rules and stored in each dataset where the rule is satisfied.

If none of the dataset rules were met, the log will be stored in the default dataset named default.

Create datasets using the UI

  1. Navigate to the Datasets management page in the Lumigo platform
  2. Click on Create a dataset button
  3. Define the dataset name
  4. Set your dataset rule using Lumigo Search Syntax
  5. Select the retention period

📘

Dataset rules apply for newly ingested logs, and will not affect previously ingested logs.

2. Optimized Fields

Lumigo leverages a powerful rational database, where logs are stored in tables and divided into columns according to the log fields.

Searching on columns provides with a much faster search experience. Therefore, Lumigo has an automatic solution for indexing your fields and according to the logic below:

  1. When a significant amount of logs is ingested, the top frequent log fields will start to get indexed into columns. The rest of the log fields are stored using attributes.
  2. The first 100 fields with INT/BOOL types will be stored as indexes

Sometimes you may perform a search on a field that wasn't indexed due to it's frequency. In these cases, you can manually index these fields via the Optimized Fields page in the lumigo platform.

In the Optimized Fields page, you can both edit/delete indexes automatically created by Lumigo, or create your own indexes to streamline searching on useful fields for your organization.

3. Field search

  1. Log query

Using Lumigo Search Syntax, you can filter your logs on exact field value, or prefix the expected value using wildcards.

Filters can be set with a single click using the search UI:

  1. Include/exclude field from query- A click on a specific log row opens the log viewer. When hovering on a specific log field, you can either include or exclude it from the query.
  1. Query autosuggest- when typing in the query bar, you will be provided with fields/values that meet your input

  1. Visual filters- available at the left panel of the UI Search page, used to narrow down your query with a click.

Field search example

Saying we have the following log line:

{
  "asctime":"2024-03-31 15:02:55,774",
  "customer_id":"c_5ab98f20a3ad4",
  "duration":152,
  "env":"l-0331-16-15",
  "lambda_name":"l-0331-16-15_trc-inges-stsls3_get-single-transaction-async-v2",
  "levelname":"INFO",
  "message":"logz.io search stat",
  "query_type":"query_specific_invocation",
  "service_version":"1.0.1470",
  "stack_name":"trc-inges-stsls3"
 }

And we want to calculate the average of duration field. Currently, there are 2B log lines to aggregate in the selected time-range.

Querying the average of the entire 2B log lines may take up to 30 seconds.

Using filters, we can reduce significantly the amount of scanned logs to aggregate, and improve the search duration up to 10x faster.

For this example, we could use the following filters to fine-tune our query:

  1. Filter out all of the logs that do not consists of the took field (syntax:took:>0)
  2. Filter on specific environment (syntax: env:"l-0331-16-15")