Elasticsearch – A Basic Introduction

There are three main concepts behind elasticsearch.

 

  • Document: If you think like a database concept, a document is like a row in a database which represents the given entities, something that you are searching for. In elasticsearch, any structure data can work. Every document has a unique id. We can allow elasticsearch to do this for us or we can do this on our own, i.e. create id, in the following document we will see that. It also has a data type, that explains what sort of data is the document is. We can have many documents that belong to the given type.
  • Type: A type is basically a schema or a mapping share by a bunch of documents. We can have a type that defines what an apache access log entry looks like and a mapping that says an apache access log content things like request URL, status code, request time, etc. If you take it to database analogy it’s like a table, where you define individual columns in a given row, or in a document in ES terminology. 
  • Indice or Index: It is basically a collection of types you can search across. To search multiple different types we must ensure all those types must be content under the same search index. The index is the highest level entity we can query in elasticsearch. It’s content a collection of types, which contents a collection of documents. So in database terminology, an index is a database, a type is a table, a document is a row.

Overview How it works
TF-IDF = Term Frequency * Inverse Document Frequency
Term frequency is how often a term appears in a given document. Document frequency is how often a term appears in all document.
Term Frequency / Document Frequency measures the relevance of a term in a document.

ElasticSearch Scales

Index of an elasticsearch is split into shards, and every shard is a self contain an instance of Lucene. You cannot change the number of primary shards in your cluster, later on, you have to define that when you are creating your index up front. Here is the syntax of the rest request to do that.

We would specify a PUT verb with the index name followed by a set structure in JSON that defines the number of primary shard & number of replicas. That means we want 3 primary shards and want 1 replica of each one of them. The means we get a total of 6 shards.
Risks and Checks: Please check if java is installed in your system or not. If not please installed. If yes please check if it is higher than 1.7.*, PHP 7.1 and above needed.