stevenferrer.github.io

Vivamus, moriendum est.

Multi-Select Facet with Solr, Vue and Go

Posted at — Jun 30, 2020

multi-select facet amz

Contents

Changelog

Introduction

A multi-select facet feature give the users the ability to quickly filter the product catalog and help them zero-in on the product that they need. The majority of e-commerces websites support this feature and if you have shopped online before, then you had probably used it before.

I needed to support exactly the same feature for one my project but I didn’t know where to start. I didn’t even know what this feature was called. I scoured the internet until, luckily, I found this amazing blog post. It’s a very detailed explanation of how to implement a multi-select facet query in Solr.

In this blog post, we’re going to take it further and implement a web API and a single-page application.

Indexing and querying

The data

We’re going to use smartphones and e-readers as our product data since they both have similar attributes. Each record will have a set of SKUs. For example, the first record in our data contains a Apple iPhone 11 Pro Max which has 2 variants. The first one has a color of Black with storage capacity of 256G and the other one has a color of Space Grey with a storage capacity of 64GB.

[
  {
    "id": "1",
    "name": "Apple iPhone 11 Pro Max",
    "brand": "Apple",
    "category": "Electronic Devices",
    "productType": "Smartphones",
    "docType": "product",
    "skus": [
      {
        "id": "10",
        "docType": "sku",
        "colorFamily_s": "Black",
        "operatingSystem_s": "iOS",
        "storageCapacity_s": "256GB"
      },
      {
        "id": "11",
        "docType": "sku",
        "colorFamily_s": "Space Grey",
        "operatingSystem_s": "iOS",
        "storageCapacity_s": "64GB"
      }
    ]
  }
  ...
]

Defining our schema

From the above, we have the following list of fields.

You may be asking, Why are we only defining those fields? Great question.

The sku fields colorFamily_s, operatingSystem_s, storageCapacity_s are dynamic which means, Solr can infer the type of the field based on its suffix (_s). This is really handy when you’re dealing with data that might have dynamic attributes like an sku. Also, the skus do not need to be defined in the schema. More details on nested child documents.

...
fields := []solr.Field{
  {
    Name:    "name",
    Type:    "text_general",
    Indexed: true,
    Stored:  true,
  },
  {
    Name:    "category",
    Type:    "text_gen_sort",
    Indexed: true,
    Stored:  true,
  },
  {
    Name:    "brand",
    Type:    "text_gen_sort",
    Indexed: true,
    Stored:  true,
  },
  {
    Name:    "productType",
    Type:    "string",
    Indexed: true,
    Stored:  true,
  },
  {
    Name:    "docType",
    Type:    "string",
    Indexed: true,
    Stored:  true,
  },
}

err := solrClient.AddFields(ctx, collection, fields...)
...

Indexing the data

Indexing is pretty straightforward, just open the data file and feed it to Solr and commit. That’s it!

  // open the json containing the data
  f, err := os.OpenFile(dataPath, os.O_RDWR, 0644)
  ...

  // send the json to solr
  err = solrClient.Update(ctx, collection, solr.JSON, f)
  ...

  // commit 
  err = solrClient.Commit(ctx, collection)
  ...

Now, we should be able to see that we have sucessfully indexed our data. Notice the fq param is set to docType:product. This will filter all the non-product documents.

smartphones query from admin ui

But, where are the product skus? Another great question.

Nested child documents are indexed like a regular document. Internally, Solr knows which sku documents are related to which products.

To help us identify which document is a product or sku, we are using the docType field. This is the suggested way to handle nested child documents in Solr.

products query from admin ui - skus and products

We can include the skus to the product by specifying the value [child] for fl parameter.

smartphones query from admin ui with skus

Query with facet

The query in this example is heavily inspired the blog post mentioned earlier. I suggest reading it thoroughly before proceeding to the next section.

{
  "query": "{!parent tag=top filters=$skuFilters which=docType:product v=docType:sku}",
  "queries": {
    "skuFilters": ["{!tag=colorFamily_s}colorFamily_s:Black"]
  },
  "filter": ["{!tag=top}brand:Amazon"],
  "facet": {
    "Brand": {
      "facet": { "productCount": "uniqueBlock(_root_)" },
      "field": "brand",
      "limit": -1,
      "type": "terms"
    },
    "Color Family": {
      "domain": {
        "excludeTags": "top",
        "filter": [
          "{!filters param=$skuFilters excludeTags=colorFamily_s v=$sku}",
          "{!child of=docType:product filters=$filter v=docType:product}"
        ]
      },
      "facet": { "productCount": "uniqueBlock(_root_)" },
      "field": "colorFamily_s",
      "limit": -1,
      "type": "terms"
    }
  }
}

Let’s break it down a little bit so that we can have a better understanding each part of the query.

Here, we’re using block join parent query parser. This parser takes a query value (v=docType:sku) and filter (filters:$skuFilters) that matches the child documents (skus) and returns their parents (which=docType:product).

Note the $ (dollar symbol) in the filters:$skuFilters. We use this syntax to reference other values in query body.

Also, we’re specifying tag:top which will be useful when we want to exclude the product filters in the child facet settings. More details on tagging and excluding filters.

{
  "query": "{!parent tag=top filters=$skuFilters which=docType:product v=docType:sku}",
  "queries": {
    "skuFilters": ["{!tag=colorFamily_s}colorFamily_s:Black"]
  },
  "filter": ["{!tag=top}brand:Amazon"],
  ...
}

Next, we’re specifying a terms facet. This will produce a list of values based on the field colorFamily_s which will only be applied to sku documents. We’re excluding the top filter so it doesn’t get applied to this facet setting. We’ve also included the "productCount": "uniqueBlock(_root_)" to count the unique products that matches the child skus.

{
  ...
  "facet": {
    ...
    "Color Family": {
      "field": "colorFamily_s",
      "type": "terms",
      "limit": -1,
      "facet": {
        "productCount": "uniqueBlock(_root_)"
      },
      "domain": {
        "excludeTags": "top",
        "filter": [
          "{!filters param=$skuFilters excludeTags=colorFamily_s v=docType:sku}",
          "{!child of=docType:product filters=$filter v=docType:product}"
        ]
      }
    }
  }
}

The next facet setting is very similar to above, but is a lot simplier, as it is only applied to top level documents (products).

{
  ...
  "facet": {
    "Brand": {
      "field": "brand",
      "limit": -1,
      "type": "terms",
      "facet": {
        "productCount": "uniqueBlock(_root_)"
      }
    }
    ...
  }
}

Let’s test our query and see the results.

{
  "Brand": {
    "buckets": [
      {
        "val": "Amazon",
        "count": 1,
        "productCount": 1
      }
    ]
  },
  "Color Family": {
    "buckets": [
      {
        "val": "Black",
        "count": 6,
        "productCount": 5
      },
      {
        "val": "Space Grey",
        "count": 2,
        "productCount": 2
      },
      {
        "val": "Blue",
        "count": 1,
        "productCount": 1
      },
      ...
    ]
  }
}

For each bucket, we can see that we have the val which is unique value for the field, count which indicates the number of skus that matched, and the productCount which indicates the number of unique products that matched (i.e. Assume that 2 skus matched the query but they both belong to the same parent product, thus productCount is 1).

API implementation

Our API will have a /search endpoint which will simplify the querying and processing of results for us. Optionally, we can have a /suggest endpoint for autosuggest feature.

$ curl http://localhost:8081/search?q=Apple&colorFamilies=Black%2CGold&operatingSystems=iOS

Response:

{
  "facets": [
    {
      "buckets": [
        {
          "productCount": 1,
          "skuCount": 1,
          "val": "Apple"
        }
      ],
      "name": "Brand",
      "param": "brands"
    },
    {
      "buckets": [
        {
          "productCount": 2,
          "skuCount": 2,
          "val": "Space Grey"
        },
        {
          "productCount": 1,
          "skuCount": 1,
          "val": "Black"
        },
        {
          "productCount": 1,
          "skuCount": 1,
          "val": "Gold"
        },
        {
          "productCount": 1,
          "skuCount": 1,
          "val": "Midnight Green"
        }
      ],
      "name": "Color Family",
      "param": "colorFamilies"
    },
    ...
  ],
  "products": [
    {
      "brand": "Apple",
      "category": "Electronic Devices",
      "id": "1",
      "name": "Apple iPhone 11 Pro Max",
      "productType": "Smartphones"
    }
  ]
}

Front-end implementation

Our web app will display the filters and the list of products that matched. We should be able to change the filters by checking/unchecking the filters on the left side of the screen.

webapp

Conclusion

For beginners like me, implementing a multi-select facet in Solr is very intidimidating task when you don’t know where to start. Thankfully, Solr has a very good documentation and there are blogs written that explains how to implement them which guided me in my implementation.

The complete implementation can be found in this repository. If you have any suggestions, you can open an issue in the repository.

References