So the first thing was to choose a framework for storing all the text that can make searching pretty easy. The framework we were to use should store millions of text records and be able to search them quickly. So we decided on lucene. “Apache Lucene sets the standard for search and indexing performance” as they say on the website
As we began using lucene we were soon convinced that we had chosen the right framework and were going in the right direction.
Now let’s talk about autocomplete. So how to do autocomplete? Autocomplete in itself is a search based on predictions. You need to predict what a user wants to type.
The first question was how the user would like to receive the suggestions. Here are some factors that we can look at when deciding for autocomplete
- Freshness of suggestions,
- Dependency on the long term data,
- Regional dependency,
- Language dependency,
So the first thing was removing from the above list what was not needed. We did not need freshness of the searches. So that eliminated a lot complexity. And we did not need special regional matches. But language was still an issue.
Now that we have the initial analysis we needed to care about long term data search and language dependency search.
So we provided directly the complete text sentence in the autocomplete because it was very specific to the field of search. As we talked to our users, we came to know that they wanted the word corrections as they started typing. So as they typed without a space the suggestions should be only the word correction or the near matches to the typed words and also predicted word. The word should be from the text what is already in the system. And with this also lucene helps as it has already a builtin suggestion-builder.
As the user types a space one word is already known that can be searched to find the complete sentences. So with a space in the search text the suggestions displayed were sentences. And the suggestions were first based on the language of the user. And if the user entered text that was not found in the user’s language then the English dictionary was searched . And if he chose one word from english all the next suggestions were only from english. until and unless he deleted all the characters in the textbox then the whole process started again.
I think this is a very basic version of autocomplete that can be used in case you don’t have resources like google and wouldn’t spend a lot of time of that.