Photo by fabio on Unsplash

Welcome to the third part of the Workshop. As usual, to keep each article as compact as possible, I will shortcut the queries to snippets. If you want to see the complete code, please consult my GitHub page for this workshop.

In this first part of the scripting workshop, I will give you a brief overview of which languages are supported. You will learn, what a basic script looks like, how it can be called, and how you store it. We will look at some fundamentals like, how to test your script, increase the readability, and how and why you should parameterize them. With that said, let’s jump right in.


Currently, Elasticsearch offers 3 languages with built-in plugins, each for a specific purpose. In the coming parts of the workshop, the focus will be on the painless scripting language. The expression language, as well as mustache, will be part of the later workshops. Java will not be covered, however: since painless can use a set of Java-Classes over an API, we will cover calling and using them in the following workshops.

What is the value of using scripts, instead of using predefined pipeline-processors? While processors should be the preferred method to transform data, they are limited in their capabilities and might not be able to solve your specific problem. Painless-script offers here a completely new dimension in flexibility.

Hello World

Let’s create the traditional “Hello World” with the _scripts/painless/_execute API. This is an inline script since it’s not stored:

POST /_scripts/painless/_execute
{
  "script": {
    "source": "return('Hello World')"
} }

Here are a few things to mention: The single quotes are OK for a one-liner script. As you will see, scripts are growing quickly, compressing everything in one line will be hard to read. Second: the string ‘Hello World’ is hardcoded. If you want to use the same script for different strings, it needs to be compiled each time the code changes. Parameters can be ingested into the scripts, therefore the script does not change and needs to be compiled only once. The use of parameters instead of literal is strongly recommended:

"source": """
  return(params.phrase)
""", 
"params": {
  "phrase": "Hello World"
}

The use of parameters makes in this script not much sense, but later you will see, the parameters can be passed to queries and pipelines. In longer scripts, you want to add comments. If we use a triplet of double-quotes, we can write our script over multiple lines:

"source": """
    // This is a oneline comment
    
    return(params.phrase)
    
    /* This is a
    multiline comment */
""",

Working with data

Let’s index a document and calculate the age of a person:

PUT persons/_doc/1
{ "name": "John",
  "sur_name": "Smith",
  "year_of_birth": 1925 }

We are testing our script first and using therefore runtime-mappings. Please notice, runtime-mappings are not storing data and are useful for exploring and testing:

GET persons/_search
{
  "runtime_mappings": {
    "age": {
      "type": "long",
      "script": {
        "source": """
        long age = params.today - doc['year_of_birth'].value;
        emit(age)
        """,
        "params": { "today": 2022 }
      }
    }
  },
  "fields": [ "age" ] }

To read the value of a field, you need to access the “doc-map”, then the field-name and you should also use the “.value” notion to read fields in the runtime_mapping API. You will later use it for other APIs as well.

Storing Scripts

In a later workshop, we will use a Java Class to determine the actual year. For now, the year is set by a parameter. The age, in this case, is calculated to 97. You can store the script in a pipeline with a script processor:

PUT _ingest/pipeline/calc_age_pipeline
{
  "processors": [
    {
      "script": {
        "source": """
          ctx['age'] = params.today - ctx['year_of_birth'];
        """,
        "params": { "today": 2022 }
} } ] }

In pipelines you address now the fields not anymore over the “doc-map”, but over “ctx-map”. A map is a variable that contains elements, a bit like a dictionary in Python or a hash in Perl. The painless contexts and their different variables and ways to access data can be confusing. We will dive into this in the future. Just remember: it matters from which API your script will be called.

Important takeaways:

  • the “.value” to access a value is not valid anymore and will end in an error
  • if your script is used in a pipeline, request the field-values with “ctx[field-name]”
  • if your script is used in an _update or _update_by_query API, request field-values by “ctx._source[field-name]”
  • if your script is used in the _search API with a query or runtime-mapping statement, request the field-values by doc[field-name].value
  • parameters can ingested by dot-notion, for example params.today, or by bracket-notion like params[‘today’].
  • Params in a stored script cannot be accessed over params.xxx or params[xxx]. Pass them when you call the script as you see it in the example for calling scripts by the APIs.

You can store the script under the “calc_age_script” in the cluster state and call it later by its ID:

PUT _scripts/calc_age_script
{
  "script": {
    "lang": "painless", 
    "source": """
      ctx._source['age'] = params['today'] - ctx._source['year_of_birth'];
    """
} }

Calling scripts with the update_by_query API

The document fields are now called by “ctx._source[field-name]”. Let’s use the script in the _update_by_query API:

POST persons/_update_by_query
{
  "script": {
    "id": "calc_age_script",
    "params": { "today": 2022 }
  }, 
  "query": {
    "match_all": {}
} }

Calling scripts with the update API

Let’s use the same script by the _update API:

POST persons/_update/1
{
  "script": {
    "id": "calc_age_script",
    "params": { "today": 2022 }
 } }

Calling Scripts with the reindex API

Let’s have a look at how we can call the script over the reindex API:

POST _reindex
{
  "source": { "index": "persons" },
  "dest": { "index": "persons_with_age" },
  "script": { "id": "calc_age_script", "params": { "today": 1995 } }
}

Calling scripts with the _search API

Let’s call the script with “id”. There are several ways to run scripts within the _search API. Let’s call the script with the script-fields parameter. To get the script working we need to update the script and the variables. We need the “doc-map” and the “.value” method to access the values:

"script": {
  "lang": "painless", 
  "source": """
  params['today'] - doc['year_of_birth'].value;

The call of the script looks like this:

GET persons/_search
{
  "script_fields": {
    "age": {
      "script": {"id": "calc_age_script", "params": { "today": 2022 } }
} } }

Calling scripts with a search-template

Stored scripts are not supported for runtime mappings. Therefore, the script has to be stored inline. However, we can use search templates. Search templates are scripts, but written in “mustache” – we will cover this topic in a later workshop.

PUT _scripts/calc_age_template
{
  "script": {
    "lang": "mustache",
    "source": {
      "runtime_mappings": {
        "age": {
          "type": "long",
          "script": { "source": 
            """ 
             long age = {{act_year}} - doc['year_of_birth'].value;
             emit(age)
            """
          }
        }
      },
      "fields": [ "age" ]
    }
  },
  "params": { "act_year" : "today"}
}

Let’s call the search template:

GET persons/_search/template
{
  "source": "fields",
  "id": "calc_age_template",
  "params": {
    "act_year": 2022
} }

Conclusion

If you made it here: Congratulations! You should now be able to write some basic scripts, store and call them. Remember: _update, _update_by_query, and the _reindex APIs have a “source” parameter, where you can define a script inline, instead of calling a stored script. You also know how to increase the readability of your scripts and how to test them.

This was a small outlook for what is coming in the next workshops. Please also consult the official documentation: elastic.co/guide/elasticsearch/scripting

If Google brought you here, you might check also the start of the series, or the whole series.

And as always: the full queries and code is available on the GitHub page for this workshop

Schreibe einen Kommentar