Set up authentication with X-Pack for Elasticsearch

A simple http Basic authentication to set up for Elasticsearch with user name and password you need to install as the first X-Pack.

Then all functions should be automatically protected with basic auth. After the installation, a default user is available, to be able to continue working:

Name: elastic
Password: changeme

Thus, requests can be made successfully:

curl --user elastic:changeme -XGET 'localhost:9200'

Create your own user

Now can own user are added to:

curl --user elastic:changeme -XPOST 'hlocalhost:9200/_xpack/security/user/SebastianViereck?pretty' -H 'Content-Type: application/json' -d'
{
  "password" : "thePassword",
  "roles" : [ "superuser"],
  "full_name" : "Sebastian Viereck",
  "email" : "%MINIFYHTML92af857af32ed133c54eae0c51f0cdec6%",
  "metadata" : {
    "intelligence" : 7
  },
  "enabled": true
}

After that, requests can be made immediately with the user:

curl --user SebastianViereck:thePassword -XGET 'localhost:9200'

It should own role be created and be used or how here the prefabricated rolls (superuser) be used.

Disable the elastic user

Very important: Of course, the default must user with the elastic “changeme” Password are disabled again. In the elasticsearch.yml must the the following parameters be used:

xpack.security.authc.accept_default_password: false

And the elasticsearch service be restarted:

 sudo service elasticsearch restart

To the control, an error message should appear at the request:

curl --user elastic:changeme -XGET 'localhost:9200'

More security enhancements

It should be a Data encryption be used with SSL.

The IP room, may be communicating with at all, should also be narrowed.

Install ELK stack on Amazon EC2

To the ELK stack, consisting of:

  • Logstash
  • Elasticsearch
  • Kibana

Amazon AWS for testing on a single Amazon EC2 instance to install on, can you do the following:

It boosts an EC2 instance, that is not too small, with regard to the RAM, at least a m4.large with 8 GB RAM and 2 Processors, Elasticsearch is already demanding at the store and also Logstash is very resource hungry. As operating system I chose Ubuntu-16 (Ami-1e339e71).

Then you can Elastic IP create the instance, so that you can easily replace the instances and still continue keeping the IP.

Security groups more…

Elasticsearch completion suggest shows only results from the beginning of the string

If you an AutoComplete with Elasticsearch with realtime “Search-as-you-type” Want to build functionality, Elasticsearch offers a very quick Completion suggest an.

The problem is, that only results can be achieved, you are at the beginning of the string:

A search for “Jackson” the entry has not “Michael Jackson“. Or a search for “Thiller Michael Jackson” does not take “Michael Jackson thriller“.

The solution is: It is simply not possible to do this with the completion suggest, because the algorithm does not support this.

Anyway to get the matching results, I have same query the data for the AutoComplete on, as for the search results generated. While this is moderately worse performance, but the results of the AutoComplete and the search results after you submit match anyway and not iritieren the user.

Introduction to Elasticsearch

Elasticsearch makes it not easy to get, It is helpful to understand the following terms.

The analyser

An analyser calculates the data for the index above and stores the result in the updating of the data once. From this set of tokens, the search can then determine results.

An Analyzer consists of 3 Share, which are applied in the order:

1. Character filter
- html_strip: Strip HTML tags and decode HTML entities such as &
- mapping: Replaces all occurrences of a string with another
- pattern_replace character: Replaced with the help of a regex of each hit by a suitable
2. Tokenizer

- A tokenizer is a calculated string tokens, d.h. individual words and phrases

3. Token filter

- a token can filter out tokens, The e.g.: very short are or how specific stop words “the”, “this is”, “at the”

There are also already prefabricated analyzer for the beginning.

If only an Analyzer is stated, then this is both for the indexing, as well as used for the query string. It can be different with the help of analyzer and search_analyzer.

Mapping more…

Project: Elasticsearch for XT commerce shop search

The last project was very exciting, It was an extension of the PHP shop system called XT-commerce or. of the derivative SEO-Commerce to search current standards for Zeedee Berlin.

Elasticsearch was on an own Amatzon MWS EC2 instance hosted with 1 GB RAM and 1 CPU (very inexpensive).

The following functionality can be turned off all over again in a central location, If there are problems with Elasticsearch and the old MySQL search back in force.

1. AutoComplete / Suggest function when filling the search

When typing the keyword already suggestions are given in the millisecond range. This way the customer can save much time and also helped with the spelling. suggest_zeedee

more…

Elasticsearch 5 Cluster with docker example

With the docker-compose.yml can be quickly built up a container-based environment with docker. The memory values for the Java VM are customizable (512MB) and for the docker image (1GB).

Cluster with 2 Nodes to operate, can see the elasticsearch2 comment out.

version: '2'
services:
  elasticsearch1:
    image: docker.elastic.co/elasticsearch/elasticsearch:5.4.1
    container_name: elasticsearch1
    environment:
      - cluster.name=docker-cluster
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    mem_limit: 1g
    volumes:
      - esdata1:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
    networks:
      - esnet
#  elasticsearch2:
#    image: docker.elastic.co/elasticsearch/elasticsearch:5.4.1
#    environment:
#      - cluster.name=docker-cluster
#      - bootstrap.memory_lock=true
#      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
#      - "discovery.zen.ping.unicast.hosts=elasticsearch1"
#    ulimits:
#      memlock:
#        soft: -1
#        hard: -1
#    mem_limit: 1g
#    volumes:
#      - esdata2:/usr/share/elasticsearch/data
#    networks:
#      - esnet

volumes:
  esdata1:
    driver: local
#  esdata2:
#    driver: local

networks:
  esnet:

Outside of localhost server making it available Elasticsearch

Instance to the outside via HTTP to make available an Elasticsearch, do you do the following:

1. To the outside make the server available, Amazon AWS z.B. through the configure a security group

2. Change the elasticsearch.yml ( /etc/elasticsearch/elasticsearch.yml):

# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 172.44.11.222

3. The Elasticsearch service restart:

sudo service elasticsearch restart

4. Then you can query the status from the outside to look, If everything worked:

curl -XGET '172.44.11.222:9200/_cluster/health?pretty'

Response:

{
  "cluster_name" : "elasticsearch",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Elasticsearch Subquery Scoring Optimization

If you want to build a search query in Easticsearch where you can give documents a bonus score depending on how often a property can be found in other documents

- you need is a Subquery which is not supported by Elasticsearch, you need another better solution.

An Example for a Subquery is the problem:

Imagine a CD online shop. You want to score CDs ( = documents) higher which

1. match a term query AND

2. and which artist has many other CDs in your shop database

You would need a field, which aggragts the artist count in your mapping (artistCount for Example) and on query time and boost the artistCount field with the score:

{
    "query" : {
        "custom_score" : {
            "query" : {
                "match_all" : {}
            },
            "script" : "_score + (1 * doc.artistCount.integerValue)"
        }
    }
}

It would be also a good idea, to have a second index, which holds the artistCount information, because on every update of the inventory of your shop, the artsitCount needs to be recalculated.