This article by the authors David Smiley, Eric Pugh, Kranti Parisa, and Matt Mitchel of the book, Apache Solr Enterprise Search Server – Third Edition, covers one of the most effective features of a search user interface—automatic/instant-search or completion of query input in a search input box. It is typically displayed as a drop-down menu that appears automatically after typing. There are several ways this can work:
(For more resources related to this topic, see here.)
Monitor your user’s queries!
Even if you don’t plan to do query log completion, you should capture useful information about each request for ancillary usage analysis, especially to monitor which searches return no results. Capture the request parameters, the response time, the result count, and add a timestamp.
There are other interesting query completion concepts we’ve seen on sites too, and some of these can be combined effectively. First, we’ll cover a basic approach to instant-search using edge n-grams. Next, we’ll describe three approaches to implementing query term completion—it’s a popular type of query completion, and these approaches highlight different technologies within Solr. Lastly, we’ll cover an approach to implement field-value suggestions for one field at a time, using the Suggester search component.
As mentioned in the beginning of this section, instant-search is a technique in which a partial query is used to suggest a set of relevant documents, not terms. It’s great for quickly finding documents by name or title, skipping the search results page.
Here, we’ll briefly describe how you might implement this approach using edge n-grams, which you can think of as a set of token prefixes. This is much faster than the equivalent wildcard query because the prefixes are all indexed. The edge n-gram technique is arguably more flexible than other suggest approaches: it’s possible to do custom sorting or boosting, to use the highlighter easily to highlight the query, to offer infix suggestions (it isn’t limited to matching titles left-to-right), and it’s possible to filter the suggestions with a filter query, such as the current navigation filter state in the UI. It should be noted, though, that this technique is more complicated and increases indexing time and index size. It’s also not quite as fast as the Suggester component.
One of the key components to this approach is the EdgeNGramFilterFactory component, which creates a series of tokens for each input token for all possible prefix lengths. The field type definition should apply this filter to the index analyzer only, not the query analyzer. Enhancements to the field type could include adding filters such as LowerCaseFilterFactory, TrimFilterFactory, ASCIIFoldingFilterFactory, or even a PatternReplaceFilterFactory for normalizing repetitive spaces. Furthermore, you should set omitTermFreqAndPositions=true and omitNorms=true in the field type since these index features consume a lot of space and won’t be needed.
The Solr Admin Analysis tool can really help with the design of the perfect field type configuration. Don’t hesitate to use this tool!
A minimalist query for this approach is to simply query the n-grams field directly; since the field already contains prefixes, this just works. It’s even better to have only the last word in the query search this field while the other words search a field indexed normally for keyword search. Here’s an example: assuming a_name_wordedge is an n-grams based field and the user’s search text box contains simple mi: http://localhost:8983/solr/mbartists/select?defType=edismax&qf=a_name&q.op=AND&q=simple a_name_wordedge:mi.
The search client here inserted a_name_wordedge: before the last word.
The combination of field type definition flexibility (custom filters and so on), and the ability to use features such as DisMax, custom boosting/sorting, and even highlighting, really make this approach worth exploring.
Most people don’t realize that faceting can be used to implement query term completion, but it can. This approach has the unique and valuable benefit of returning completions filtered by filter queries (such as faceted navigation state) and by query words prior to the last one being completed. This means the completion suggestions should yield matching results, which is not the case for the other techniques. However, there are limits to its scalability in terms of memory use and inappropriateness for real-time search applications.
Faceting on a tokenized field is going to use an entry in the field value cache (based on UnInvertedField) to hold all words in memory. It will use a hefty chunk of memory for many words, and it’s going to take a non-trivial amount of time to build this cache on every commit during the auto-warming phase. For a data point, consider MusicBrainz’s largest field: t_name (track name). It has nearly 700K words in it. It consumes nearly 100 MB of memory and it took 33 seconds to initialize on my machine. The mandatory initialization per commit makes this approach unsuitable for real-time-search applications.
Measure this for yourself. Perform a trivial query to trigger its initialization and measure how long it takes. Then search Solr’s statistics page for fieldValueCache. The size is given in bytes next to memSize. This statistic is also logged quite clearly.
For this example, we have a search box searching track names and it contains the following:
michael ja
All of the words here except the last one become the main query for the term suggest. For our example, this is just michael. If there isn’t anything, then we’d want to ensure that the request handler used would search for all documents. The faceted field is a_spell, and we want to sort by frequency. We also want there to be at least one occurrence, and we don’t want more than five suggestions. We don’t need the actual search results, either. This leaves the facet.prefix faceting parameter to make this work. This parameter filters the facet values to those starting with this value.
Remember that facet values are the final result of text analysis, and therefore are probably lowercased for fields you might want to do term completion on. You’ll need to pre-process the prefix value similarly, or else nothing will be found.
We’re going to set this to ja, the last word that the user has partially typed. Here is the URL for such a search http://localhost:8983/solr/mbartists/select?q=michael&df=a_spell&wt=json&omitHeader=true&indent=on&facet=on&rows=0&facet.limit=5&facet.mincount=1&facet.field=a_spell&facet.prefix=ja.
When setting this up for real, we recommend creating a request handler just for term completion with many of these parameters defined there, so that they can be configured separately from your application.
In this example, we’re going to use Solr’s JSON response format. Here is the result:
{ "response":{"numFound":1919,"start":0,"docs":[]}, "facet_counts":{ "facet_queries":{}, "facet_fields":{ "a_spell":[ "jackson",17, "james",15, "jason",4, "jay",4, "jacobs",2]}, "facet_dates":{}, "facet_ranges":{}}}
This is exactly the information needed to populate a pop-up menu of choices that the user can conveniently choose from.
However, there are some issues to be aware of with this feature:
A high-speed approach to implement term completion, called the Suggester, was introduced in Version 3 of Solr. Until Solr 4.7, the Suggester was an extension of the spellcheck component. It can still be used that way, but it now has its own search component, which is how you should use it. Similar to spellcheck, it’s not necessarily as up to date as your index and it needs to be built. However, the Suggester only takes a couple of seconds or so for this usually, and you are not forced to do this per commit, unlike with faceting. The Suggester is generally very fast—a handful of milliseconds per search at most for common setups. The performance characteristics are largely determined by a configuration choice (shown later) called lookupImpl, in which we recommend WFSTLookupFactory for query term completion (but not for other suggestion types). Additionally, the Suggester uniquely includes a method of loading its dictionary from a file that optionally includes a sorting weight.
We’re going to use it for MusicBrainz’s artist name completion. The following is in our solrconfig.xml:
<requestHandler name="/a_term_suggest" class="solr.SearchHandler" startup="lazy"> <lst name="defaults"> <str name="suggest">true</str> <str name="suggest.dictionary">a_term_suggest</str> <str name="suggest.count">5</str> </lst> <arr name="components"> <str>aTermSuggester</str> </arr> </requestHandler> <searchComponent name="aTermSuggester" class="solr.SuggestComponent"> <lst name="suggester"> <str name="name">a_term_suggest</str> <str name="lookupImpl">WFSTLookupFactory</str> <str name="field">a_spell</str> <!-- <float name="threshold">0.005</float> --> <str name="buildOnOptimize">true</str> </lst> </searchComponent>
The first part of this is a request handler definition just for using the Suggester. The second part of this is an instantiation of the SuggestComponent search component. The dictionary here is loaded from the a_spell field in the main index, but if a file is desired, then you can provide the sourceLocation parameter. The document frequency threshold for suggestions is commented here because MusicBrainz has unique names that we don’t want filtered out. However, in common scenarios, this threshold is advised.
The Suggester needs to be built, which is the process of building the dictionary from its source into an optimized memory structure. If you set storeDir, it will also save it such that the next time Solr starts, it will load automatically and be ready. If you try to get suggestions before it’s built, there will be no results. The Suggester only takes a couple of seconds or so to build and so we recommend building it automatically on startup via a firstSearcher warming query in solrconfig.xml. If you are using Solr 5.0, then this is simplified by adding a buildOnStartup Boolean to the Suggester’s configuration.
To be kept up to date, it needs to be rebuilt from time to time. If commits are infrequent, you should use the buildOnCommit setting. We’ve chosen the buildOnOptimize setting as the dataset is optimized after it’s completely indexed; and then, it’s never modified. Realistically, you may need to schedule a URL fetch to trigger the build, as well as incorporate it into any bulk data loading scripts you develop.
Now, let’s issue a request to the Suggester. Here’s a completion for the incomplete query string sma http://localhost:8983/solr/mbartists/a_term_suggest?q=sma&wt=json.
And here is the output, indented:
{ "responseHeader":{ "status":0, "QTime":1}, "suggest":{"a_term_suggest":{ "sma":{ "numFound":5, "suggestions":[{ "term":"sma", "weight":3, "payload":""}, { "term":"small", "weight":110, "payload":""}, { "term":"smart", "weight":50, "payload":""}, { "term":"smash", "weight":36, "payload":""}, { "term":"smalley", "weight":9, "payload":""}]}}}}
If the input is found, it’s listed first; then suggestions are presented in weighted order. In the case of an index-based source, the weights are, by default, the document frequency of the value.
For more information about the Suggester, see the Solr Reference Guide at https://cwiki.apache.org/confluence/display/solr/Suggester. You’ll find information on lookupImpl alternatives and other details. However, some secrets of the Suggester are still undocumented, buried in the code. Look at the factories for more configuration options.
The Terms component is used to expose raw indexed term information, including term frequency, for an indexed field. It has a lot of options for paging into this voluminous data and filtering out terms by term frequency.
The Terms component has the benefit of using no Java heap memory, and consequently, there is no initialization penalty. It’s always up to date with the indexed data, like faceting but unlike the Suggester. The performance is typically good, but for high query load on large indexes, it will suffer compared to the other approaches. An interesting feature unique to this approach is a regular expression term match option. This can be used for case-insensitive matching, but it probably doesn’t scale to many terms.
For more information about this component, visit the Solr Reference Guide at https://cwiki.apache.org/confluence/display/solr/The+Terms+Component.
In this example, we’ll show you how to suggest complete field values. This might be used for instant-search navigation by a document name or title, or it might be used to filter results by a field. It’s particularly useful for fields that you facet on, but it will take some work to integrate into the search user experience. This can even be used to complete multiple fields at once by specifying suggest.dictionary multiple times.
To complete values across many fields at once, you should consider an alternative approach than what is described here. For example, use a dedicated suggestion index of each name-value pair and use an edge n-gram technique or shingling.
We’ll use the Suggester once again, but using a slightly different configuration. Using AnalyzingLookupFactory as the lookupImpl, this Suggester will be able to specify a field type for query analysis and another as the source for suggestions. Any tokenizer or filter can be used in the analysis chain (lowercase, stop words, and so on). We’re going to reuse the existing textSpell field type for this example. It will take care of lowercasing the tokens and throwing out stop words.
For the suggestion source field, we want to return complete field values, so a string field will be used; we can use the existing a_name_sort field for this, which is close enough.
Here’s the required configuration for the suggest component:
<searchComponent name="aNameSuggester" class="solr.SuggestComponent"> <lst name="suggester"> <str name="name">a_name_suggest</str> <str name="lookupImpl">AnalyzingLookupFactory</str> <str name="field">a_name_sort</str> <str name="buildOnOptimize">true</str> <str name="storeDir">a_name_suggest</str> <str name="suggestAnalyzerFieldType">textSpell</str> </lst> </searchComponent>
And here is the request handler and component:
<requestHandler name="/a_name_suggest" class="solr.SearchHandler" startup="lazy"> <lst name="defaults"> <str name="suggest">true</str> <str name="suggest.dictionary">a_name_suggest</str> <str name="suggest.count">5</str> </lst> <arr name="components"> <str>aNameSuggester</str> </arr> </requestHandler>
We’ve set up the Suggester to build the index of suggestions after an optimize command. On a modestly powered laptop, the build time was about 5 seconds. Once the build is complete, the /a_name_suggest handler will return field values for any matching query. Here’s an example that will make use of this Suggester: http://localhost:8983/solr/mbartists/a_name_suggest?wt=json&omitHeader=true&q=The smashing,pum.
Here’s the response from that query:
{ "spellcheck":{ "suggestions":[ "The smashing,pum",{ "numFound":1, "startOffset":0, "endOffset":16, "suggestion":["Smashing Pumpkins, The"]}, "collation","(Smashing Pumpkins, The)"]}}
As you can see, the Suggester is able to deal with the mixed case. Ignore The (a stop word) and also the , (comma) we inserted, as this is how our analysis is configured. Impressive! It’s worth pointing out that there’s a lot more that can be done here, depending on your needs, of course. It’s entirely possible to add synonyms, additional stop words, and different tokenizers to the analysis chain.
There are other interesting lookupImpl choices. FuzzyLookupFactory can suggest completions that are similarly typed to the input query; for example, words that are similar in spelling, or just typos. AnalyzingInfixLookupFactory is a Suggester that can provide completions from matching prefixes anywhere in the field value, not just the beginning. Other ones are BlendedInfixLookupFactory and FreeTextLookupFactory. See the Solr Reference Guide for further information.
In this article we learned about the query complete/suggest feature. We saw the different ways by which we can implement this feature.
This article by the authors David Smiley, Eric Pugh, Kranti Parisa, and Matt Mitchel of the book, Apache Solr Enterprise Search Server, Third Edition, covers one of the most effective features of a search user interface—automatic/instant-search or completion of query input in a search input box. It is typically displayed as a drop-down menu that appears automatically after typing. There are several ways this can work:
Monitor your user’s queries!
Even if you don’t plan to do query log completion, you should capture useful information about each request for ancillary usage analysis, especially to monitor which searches return no results. Capture the request parameters, the response time, the result count, and add a timestamp.
There are other interesting query completion concepts we’ve seen on sites too, and some of these can be combined effectively. First, we’ll cover a basic approach to instant-search using edge n-grams. Next, we’ll describe three approaches to implementing query term completion—it’s a popular type of query completion, and these approaches highlight different technologies within Solr. Lastly, we’ll cover an approach to implement field-value suggestions for one field at a time, using the Suggester search component.
As mentioned in the beginning of this section, instant-search is a technique in which a partial query is used to suggest a set of relevant documents, not terms. It’s great for quickly finding documents by name or title, skipping the search results page.
Here, we’ll briefly describe how you might implement this approach using edge n-grams, which you can think of as a set of token prefixes. This is much faster than the equivalent wildcard query because the prefixes are all indexed. The edge n-gram technique is arguably more flexible than other suggest approaches: it’s possible to do custom sorting or boosting, to use the highlighter easily to highlight the query, to offer infix suggestions (it isn’t limited to matching titles left-to-right), and it’s possible to filter the suggestions with a filter query, such as the current navigation filter state in the UI. It should be noted, though, that this technique is more complicated and increases indexing time and index size. It’s also not quite as fast as the Suggester component.
One of the key components to this approach is the EdgeNGramFilterFactory component, which creates a series of tokens for each input token for all possible prefix lengths. The field type definition should apply this filter to the index analyzer only, not the query analyzer. Enhancements to the field type could include adding filters such as LowerCaseFilterFactory, TrimFilterFactory, ASCIIFoldingFilterFactory, or even a PatternReplaceFilterFactory for normalizing repetitive spaces. Furthermore, you should set omitTermFreqAndPositions=true and omitNorms=true in the field type since these index features consume a lot of space and won’t be needed.
The Solr Admin Analysis tool can really help with the design of the perfect field type configuration. Don’t hesitate to use this tool!
A minimalist query for this approach is to simply query the n-grams field directly; since the field already contains prefixes, this just works. It’s even better to have only the last word in the query search this field while the other words search a field indexed normally for keyword search. Here’s an example: assuming a_name_wordedge is an n-grams based field and the user’s search text box contains simple mi: http://localhost:8983/solr/mbartists/select?defType=edismax&qf=a_name&q.op=AND&q=simple a_name_wordedge:mi.
The search client here inserted a_name_wordedge: before the last word.
The combination of field type definition flexibility (custom filters and so on), and the ability to use features such as DisMax, custom boosting/sorting, and even highlighting, really make this approach worth exploring.
Most people don’t realize that faceting can be used to implement query term completion, but it can. This approach has the unique and valuable benefit of returning completions filtered by filter queries (such as faceted navigation state) and by query words prior to the last one being completed. This means the completion suggestions should yield matching results, which is not the case for the other techniques. However, there are limits to its scalability in terms of memory use and inappropriateness for real-time search applications.
Faceting on a tokenized field is going to use an entry in the field value cache (based on UnInvertedField) to hold all words in memory. It will use a hefty chunk of memory for many words, and it’s going to take a non-trivial amount of time to build this cache on every commit during the auto-warming phase. For a data point, consider MusicBrainz’s largest field: t_name (track name). It has nearly 700K words in it. It consumes nearly 100 MB of memory and it took 33 seconds to initialize on my machine. The mandatory initialization per commit makes this approach unsuitable for real-time-search applications.
Measure this for yourself. Perform a trivial query to trigger its initialization and measure how long it takes. Then search Solr’s statistics page for fieldValueCache. The size is given in bytes next to memSize. This statistic is also logged quite clearly.
For this example, we have a search box searching track names and it contains the following:
michael ja
All of the words here except the last one become the main query for the term suggest. For our example, this is just michael. If there isn’t anything, then we’d want to ensure that the request handler used would search for all documents. The faceted field is a_spell, and we want to sort by frequency. We also want there to be at least one occurrence, and we don’t want more than five suggestions. We don’t need the actual search results, either. This leaves the facet.prefix faceting parameter to make this work. This parameter filters the facet values to those starting with this value.
Remember that facet values are the final result of text analysis, and therefore are probably lowercased for fields you might want to do term completion on. You’ll need to pre-process the prefix value similarly, or else nothing will be found.
We’re going to set this to ja, the last word that the user has partially typed. Here is the URL for such a search http://localhost:8983/solr/mbartists/select?q=michael&df=a_spell&wt=json&omitHeader=true&indent=on&facet=on&rows=0&facet.limit=5&facet.mincount=1&facet.field=a_spell&facet.prefix=ja.
When setting this up for real, we recommend creating a request handler just for term completion with many of these parameters defined there, so that they can be configured separately from your application.
In this example, we’re going to use Solr’s JSON response format. Here is the result:
{ "response":{"numFound":1919,"start":0,"docs":[]}, "facet_counts":{ "facet_queries":{}, "facet_fields":{ "a_spell":[ "jackson",17, "james",15, "jason",4, "jay",4, "jacobs",2]}, "facet_dates":{}, "facet_ranges":{}}}
This is exactly the information needed to populate a pop-up menu of choices that the user can conveniently choose from.
However, there are some issues to be aware of with this feature:
A high-speed approach to implement term completion, called the Suggester, was introduced in Version 3 of Solr. Until Solr 4.7, the Suggester was an extension of the spellcheck component. It can still be used that way, but it now has its own search component, which is how you should use it. Similar to spellcheck, it’s not necessarily as up to date as your index and it needs to be built. However, the Suggester only takes a couple of seconds or so for this usually, and you are not forced to do this per commit, unlike with faceting. The Suggester is generally very fast—a handful of milliseconds per search at most for common setups. The performance characteristics are largely determined by a configuration choice (shown later) called lookupImpl, in which we recommend WFSTLookupFactory for query term completion (but not for other suggestion types). Additionally, the Suggester uniquely includes a method of loading its dictionary from a file that optionally includes a sorting weight.
We’re going to use it for MusicBrainz’s artist name completion. The following is in our solrconfig.xml:
<requestHandler name="/a_term_suggest" class="solr.SearchHandler" startup="lazy"> <lst name="defaults"> <str name="suggest">true</str> <str name="suggest.dictionary">a_term_suggest</str> <str name="suggest.count">5</str> </lst> <arr name="components"> <str>aTermSuggester</str> </arr> </requestHandler> <searchComponent name="aTermSuggester" class="solr.SuggestComponent"> <lst name="suggester"> <str name="name">a_term_suggest</str> <str name="lookupImpl">WFSTLookupFactory</str> <str name="field">a_spell</str> <!-- <float name="threshold">0.005</float> --> <str name="buildOnOptimize">true</str> </lst> </searchComponent>
The first part of this is a request handler definition just for using the Suggester. The second part of this is an instantiation of the SuggestComponent search component. The dictionary here is loaded from the a_spell field in the main index, but if a file is desired, then you can provide the sourceLocation parameter. The document frequency threshold for suggestions is commented here because MusicBrainz has unique names that we don’t want filtered out. However, in common scenarios, this threshold is advised.
The Suggester needs to be built, which is the process of building the dictionary from its source into an optimized memory structure. If you set storeDir, it will also save it such that the next time Solr starts, it will load automatically and be ready. If you try to get suggestions before it’s built, there will be no results. The Suggester only takes a couple of seconds or so to build and so we recommend building it automatically on startup via a firstSearcher warming query in solrconfig.xml. If you are using Solr 5.0, then this is simplified by adding a buildOnStartup Boolean to the Suggester’s configuration.
To be kept up to date, it needs to be rebuilt from time to time. If commits are infrequent, you should use the buildOnCommit setting. We’ve chosen the buildOnOptimize setting as the dataset is optimized after it’s completely indexed; and then, it’s never modified. Realistically, you may need to schedule a URL fetch to trigger the build, as well as incorporate it into any bulk data loading scripts you develop.
Now, let’s issue a request to the Suggester. Here’s a completion for the incomplete query string sma http://localhost:8983/solr/mbartists/a_term_suggest?q=sma&wt=json.
And here is the output, indented:
{ "responseHeader":{ "status":0, "QTime":1}, "suggest":{"a_term_suggest":{ "sma":{ "numFound":5, "suggestions":[{ "term":"sma", "weight":3, "payload":""}, { "term":"small", "weight":110, "payload":""}, { "term":"smart", "weight":50, "payload":""}, { "term":"smash", "weight":36, "payload":""}, { "term":"smalley", "weight":9, "payload":""}]}}}}
If the input is found, it’s listed first; then suggestions are presented in weighted order. In the case of an index-based source, the weights are, by default, the document frequency of the value.
For more information about the Suggester, see the Solr Reference Guide at https://cwiki.apache.org/confluence/display/solr/Suggester. You’ll find information on lookupImpl alternatives and other details. However, some secrets of the Suggester are still undocumented, buried in the code. Look at the factories for more configuration options.
The Terms component is used to expose raw indexed term information, including term frequency, for an indexed field. It has a lot of options for paging into this voluminous data and filtering out terms by term frequency.
The Terms component has the benefit of using no Java heap memory, and consequently, there is no initialization penalty. It’s always up to date with the indexed data, like faceting but unlike the Suggester. The performance is typically good, but for high query load on large indexes, it will suffer compared to the other approaches. An interesting feature unique to this approach is a regular expression term match option. This can be used for case-insensitive matching, but it probably doesn’t scale to many terms.
For more information about this component, visit the Solr Reference Guide at https://cwiki.apache.org/confluence/display/solr/The+Terms+Component.
In this example, we’ll show you how to suggest complete field values. This might be used for instant-search navigation by a document name or title, or it might be used to filter results by a field. It’s particularly useful for fields that you facet on, but it will take some work to integrate into the search user experience. This can even be used to complete multiple fields at once by specifying suggest.dictionary multiple times.
To complete values across many fields at once, you should consider an alternative approach than what is described here. For example, use a dedicated suggestion index of each name-value pair and use an edge n-gram technique or shingling.
We’ll use the Suggester once again, but using a slightly different configuration. Using AnalyzingLookupFactory as the lookupImpl, this Suggester will be able to specify a field type for query analysis and another as the source for suggestions. Any tokenizer or filter can be used in the analysis chain (lowercase, stop words, and so on). We’re going to reuse the existing textSpell field type for this example. It will take care of lowercasing the tokens and throwing out stop words.
For the suggestion source field, we want to return complete field values, so a string field will be used; we can use the existing a_name_sort field for this, which is close enough.
Here’s the required configuration for the suggest component:
<searchComponent name="aNameSuggester" class="solr.SuggestComponent"> <lst name="suggester"> <str name="name">a_name_suggest</str> <str name="lookupImpl">AnalyzingLookupFactory</str> <str name="field">a_name_sort</str> <str name="buildOnOptimize">true</str> <str name="storeDir">a_name_suggest</str> <str name="suggestAnalyzerFieldType">textSpell</str> </lst> </searchComponent>
And here is the request handler and component:
<requestHandler name="/a_name_suggest" class="solr.SearchHandler" startup="lazy"> <lst name="defaults"> <str name="suggest">true</str> <str name="suggest.dictionary">a_name_suggest</str> <str name="suggest.count">5</str> </lst> <arr name="components"> <str>aNameSuggester</str> </arr> </requestHandler>
We’ve set up the Suggester to build the index of suggestions after an optimize command. On a modestly powered laptop, the build time was about 5 seconds. Once the build is complete, the /a_name_suggest handler will return field values for any matching query. Here’s an example that will make use of this Suggester: http://localhost:8983/solr/mbartists/a_name_suggest?wt=json&omitHeader=true&q=The smashing,pum.
Here’s the response from that query:
{ "spellcheck":{ "suggestions":[ "The smashing,pum",{ "numFound":1, "startOffset":0, "endOffset":16, "suggestion":["Smashing Pumpkins, The"]}, "collation","(Smashing Pumpkins, The)"]}}
As you can see, the Suggester is able to deal with the mixed case. Ignore The (a stop word) and also the , (comma) we inserted, as this is how our analysis is configured. Impressive! It’s worth pointing out that there’s a lot more that can be done here, depending on your needs, of course. It’s entirely possible to add synonyms, additional stop words, and different tokenizers to the analysis chain.
There are other interesting lookupImpl choices. FuzzyLookupFactory can suggest completions that are similarly typed to the input query; for example, words that are similar in spelling, or just typos. AnalyzingInfixLookupFactory is a Suggester that can provide completions from matching prefixes anywhere in the field value, not just the beginning. Other ones are BlendedInfixLookupFactory and FreeTextLookupFactory. See the Solr Reference Guide for further information.
In this article we learned about the query complete/suggest feature. We saw the different ways by which we can implement this feature.
Further resources on this subject:
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…