What is Autocomplete ? Let’s take a very common example. Whenever you go to and start typing, a drop-down appears which lists the suggestions. Those suggestions are related to the query and help the user in completing his query. google Suggestions when typing on Google Autocomplete as the says wikipedia Autocomplete, or word completion, is a feature in which an application predicts the rest of a word a user is typing It is also known as or . It helps in navigating or guiding a user by prompting them with likely completions and alternatives to the text as they are typing it. It reduces the amount of character a user needs to type before executing any search actions, thereby enhancing the experience of users. Search as you type Type Ahead Search search AutoCompletion can be implemented by using any database. In this post, we will use to build autocomplete functionality. Elasticsearch Elasticsearch is an open source, distributed and JSON based search engine built on top of . Lucene Approaches There can be various approaches to build autocomplete functionality in Elasticsearch. We will discuss the following approaches. Prefix Query Edge Ngram Completion Suggester Prefix Query This approach involves using a against a custom field. The value for this field can be stored as a keyword so that multiple terms(words) are stored together as a single term. This can be accomplished by using . This approach has some disadvantages. prefix query keyword tokeniser Since the matching is supported only at the beginning of the term, one cannot match the query in the of the text. middle This type of query is not optimised for and may result in increased latency. large dataset Since this is a query, won’t be filtered out. One workaround to deal with this approach can be using an aggregation query to group results and then filtering out results. This involves a bit of processing though on the server side. duplicate results Edge Ngrams This approach involves using at index and search time. When indexing the document, a custom analyser with an filter can be applied. At search time, can be applied. which prevents the query from being split. different analysers edge n-gram standard analyser Edge N-gram tokeniser first breaks the text down into words on custom characters (space, special characters, etc..) and then keeps the n-gram from the start of the string only. This approach works well for matching query in the middle of the text as well. This approach is generally fast for queries but may result in slower indexing and in large index storage. Completion Suggester Elasticsearch is shipped with an in-house solution called . It uses an in-memory data structure called (FST). Elasticsearch stores FST on a per segment basis, which means suggestions scale horizontally as more new nodes are added. Completion Suggester Finite State Transducer Some of the things to keep in mind when implementing Completion Suggester The autosuggest items should have types as its field type. completion An input field can have various canonical or alias name for a single term. Weights can be defined with each document to control their ranking. Storing all the terms in lowercase helps in the case-insensitive match. Context suggesters can be enabled to support filtering or boosting by certain criteria. This approach is the ideal approach to implement autocomplete functionality, however, it also has certain disadvantages Matching always starts at the beginning of the text. So search for in marvels movie dataset will not yield any result. One way to overcome is tokenizing the input text on space and keep all the phrases as canonical names. This way will be stored as america Captain America: Civil War Highlighting of the matched words are not supported. No sorting mechanism is available. The only way to sort suggestions is via weights. This creates a problem when any custom sorting like alphabetical sort or sort by context is required. Implementation Let’s implement the above approaches in Elasticsearch. We will be using Marvels movie data to build our sample index. For easy reference, here is the Spider-Man: Homecoming Ant-man and the Wasp Avengers: Infinity War Part 2 Captain Marvel Black Panther Avengers: Infinity War Thor: Ragnarok Guardians of the Galaxy Vol 2 Doctor Strange Captain America: Civil War Ant-Man Avengers: Age of Ultron Guardians of the Galaxy Captain America: The Winter Soldier Thor: The Dark World Iron Man 3 Marvel’s The Avengers Captain America: The First Avenger Thor Iron Man 2 The Incredible Hulk Iron Man We will be creating an index with type movies marvels. If we see the mapping, we will observe that name is a nested field which contains several field, each analysed in a different way. Field is analysed using a Keyword tokenizer, hence it will be used for Approach name.keywordstring Prefix Query Field is analysed using Edge Ngram tokenizer, hence it will be used for Approach. name.edgengram Edge Ngram Field is stored as a completion type, hence it will be used for Completion Suggester. name.completion We will index all our movies by using Let’s start with Prefix Query approach and try finding movie beginning with . th Query will be This will result in the following movie Thor: The Dark World Thor: Ragnarok The Incredible Hulk Thor The result is fair, but some movies like , are because prefix query only matches at the beginning of the text and not in the middle. Captain America: The Winter Soldier Guardians of the Galaxy missed Lets try finding another movie beginning with . am Here we do not get any results, although satisfy this condition. This confirms the point that Prefix query cannot be used to match in the middle of the text. Captain America Let's run the same search but with Edge Ngram Approach. am Here we get the following result Captain America: The First Avenger Captain America: Civil War Captain America: The Winter Soldier Let’s try finding for Captain America again, but this time with a bigger phrase captain america the Using Edge N-gram approach, we get the following movies Captain America: The Winter Soldier Captain America: The First Avenger Captain America: Civil War Thor: The Dark World Captain Marvel Guardians of the Galaxy The Incredible Hulk Guardians of the Galaxy Vol 2 Ant-man and the Wasp Marvel’s The Avengers If we observe our phrase, only the first two suggestion makes sense. The reason for so many terms getting matched is the functioning of clause. match includes all the documents which contain . Since the field is analysed using ngram, more suggestions(if present) will get included as well. match captain OR america OR the Let’s try using the suggestion query for the same phrase . Suggestion query is written in a slightly different way. captain america the We get the following movies as result Captain America: The First Avenger Captain America: The Winter Soldier Let’s try the same query, but this time with a typo . captain amrica the The above returns no result because no support for fuzziness is present. We can update the query to include support for fuzziness in the following way movie-suggest The above query returns the following results Captain America: The First Avenger Captain America: The Winter Soldier Conclusion Various approaches can be used to implement autocomplete functionality in ElasticSearch. Completion Suggester covers most of the cases which are required in implementing a fully functional and fast autocomplete.