4

Typeahead and autosuggest with pure Solr and Nginx

 3 years ago
source link: http://charlesnagy.info/it/typeahead-and-autosuggest-with-pure-solr-and-nginx
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Typeahead and autosuggest with pure Solr and Nginx

A long time ago I was writing about a very simple technic which can be used to quickly provide auto-suggest for websites with the support of Solr: Incredibly fast Solr autosuggest . This was using the terms function of Solr which enables us to search for terms, surprisingly.

This solution is working well if you have mostly single term searches or most of your queryables are strongly related.

But what happens if you have a lot of different domains in your websites? Let’s say your selling electronics and clothes also. Weird suggestions can arise. Like typing ‘widescreen t’ in some cases may return ‘widescreen top’ or ‘widescreen trousers’. Is this relevant or close to what you were looking for? Possibly not. Most likely won’t even produce result. You want your previously typed full words to effect the suggestion. Like the image on the right.

Typeahead with relevancy match

So what we need is a typeahead where previous words are being taken into consideration and only provide autosuggest (or typeahead) which makes sense: further filter the resultset and relevant to the already typed words. Like the example below.

autocomp-article-image01

Autosuggest relevant to the already typed words

To remedy the situation we can tune our previous query a bit without sacrifycing any of the awesomeness of Solr.

We can treat the previous words (if any) as existing search where the last word (that we are typing at the moment) is one of the possible facets on the same full-text field. Think like this:

Search: widescreen

Facets on full-text terms:

  • tv (312)
  • monitor (27)
  • tablet (12)
  • dvd (3)
Query SOLR for facets
○ curl 'http://localhost:8983/solr/select?q=widescreen&facet=on&facet.field=text&wt=json&omitHeader=true&facet.limit=5&rows=0&facet.mincount=1' |pp_json
    "facet_counts": {
        "facet_dates": {},
        "facet_fields": {
            "text": [
                "tv",
                "monitor",
                "tablet",
                "dvd",
        "facet_queries": {},
        "facet_ranges": {}
    "response": {
        "docs": [],
        "numFound": 2497,
        "start": 0

So all you have to do is filter the facets which match your already type prefix. Solr has this feature build in with the so called facet.prefix parameter.

Parameter Value Description q ‘widescreen’ the query string except the last word, in case only one word was (partially) typed this should be ‘*’ facet ‘on’ Turning faceting on facet.prefix ‘t’ The fragment of the last word being typed facet.field ‘text’ The name of the full text field that you’re querying against wt ‘json’ Make the output JSON for better parsability omitHeader ‘true’ We don’t need all the crap facet.limit 5 Limit the facets (suggestions) to 5 rows 0 Limit the results to 0 because we are not interested in the results at the moment facet.mincount 1 Only return facets which will actually have result if searched for

In a url this looks like this

Query SOLR with facet.prefix parameter
○ curl 'http://localhost:8983/solr/select?q=widescreen&facet=on&facet.prefix=t&facet.field=text&wt=json&omitHeader=true&facet.limit=5&rows=0&facet.mincount=1' |pp_json
    "facet_counts": {
        "facet_dates": {},
        "facet_fields": {
            "text": [
                "tv",
                "tablet",
        "facet_queries": {},
        "facet_ranges": {}
    "response": {
        "docs": [],
        "numFound": 2497,
        "start": 0

Setting up Nginx rules

Same as we did before we can setup nginx location to proxy our query to Solr.

location ~ ^/suggest/ {
    rewrite /suggest/(.*)/(.*) /solr/select?q=$1&facet=on&facet.prefix=$2&facet.field=text&wt=json&omitHeader=true&facet.limit=5&rows=0&facet.mincount=1 break;
    proxy_pass        http://[SOLR_HOST]:8983;

Please note we have used two parameters here. One is the query (the full words), second is the fragment or prefix.

You might like these too


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK