Elasticsearch is a flexible and powerful open source, distributed, real-time search and analytics engine. Architected from the ground up for use in distributed environments where reliability and scalability are must haves, Elasticsearch gives you the ability to move easily beyond simple full-text search. Through its robust set of APIs and query DSLs, plus clients for the most popular programming languages, Elasticsearch delivers on the near limitless promises of search technology.
Data analysis is made by the analyser which is built of a tokenizer and zero or more token filters, and it can also have zero or more character mappers. A tokenizer in Lucene is used to split the text into tokens and is built of zero or more token filters.
Filters are processed sequentially. The character mappers are used before the tokenizer. For example you can remove HTML tags with it.
info
Remove all unnecessary fields like html tags to avoid mistaken scoring
A query may be not analyzed (you can decide). For example, the prefix and the term queries are not analyzed while the match query is!
In ElasticSearch, an index is like a table in MariaDB. Data is stored in JSON format called a “document”.
ElasticSearch knows how to work in standalone mode or is able to work in cluster. Cluster implies Sharding + Replication:
When you send a new document to the cluster, you specify a target index and send it to one node (any of available nodes). In cluster mode, ElasticSearch gateways forwards their data to the primary node. In a cluster, there is only one writing node that can switch to another node if this one falls down.
Regarding the JVM parameters, it’s recommended to use 1G (XMX) for small deployments. Check out your logs to see indications about OutOfMemoryError exceptions ‘ES_HEAP_SIZE’ variable size.
info
You should avoid to allocate 50% of your total system memory to the JVM.
To get information regarding nodes, you can use ‘cat’:
1
2
3
4
5
> curl -XGET "http://127.0.0.1:9200/_cat/nodes?v&h=name,id,ip,port,v,m"name id ip port v m
node1 YbCv 192.168.33.31 9300 1.2.2 m
node2 kXy7 192.168.33.32 9300 1.2.2 m
node3 VNK9 192.168.33.33 9300 1.2.2 *
The interesting things here are the master node (last column defined by ‘*’).
If everything was fine, you should have “created” value to true. Each time there will be an update on the document, the version will automatically increase. If you do not specify the id, it will automatically be generated:
Lucene doesn’t know how to update a document. So when you’ll ask to ElasticSearch to update a document, you will in fact delete the current and create a new one. To modify a document (here the model value), you can do it like that:
ElasticSearch knows how to deal with concurrency, however if you really want to be sure to safely delete a document at a certain version, you can force it. It will fail if the document has changed in the meantime: