Faceted search has become a critical feature for enhancing user search experience for all types of search applications.
This article gives you an introduction to faceted search with Apache Solr, and a hands-on example to get you started.
What Is Faceted Search?
Faceted search is the dynamic clustering of items, or search results, into categories, that let users navigate into search results.
Faceted search, also called faceted navigation or faceted browsing, allows users who are running searches, to see a high-level breakdown of their search results, based upon one or more aspects, or facets of their documents. This allows users to select filters, to drill into those search results.
It’s easiest to understand what faceted search is with an example.
For instance, when checking a job website, you expect to see options to filter results by city, job category, industry, or even company name.
In this example, the job category can be a facet of these search results, and the facet constraints or facet values for this facet include Developer, Administrator, and Designer.
In addition, these filtering options display not only the available values for each of these facets but also a count of the total search results, matching each of those values.
However, because you can display only a limited number of values on the screen for each facet, search engines often sort the values for each facet, based upon the most popular values, those matching the most documents. This allows users to quickly see a bird’s-eye view, of their results set, without having to look through every single search result.
For that reason, if we select some facets, we’ll basically add search constraints. To further narrow the search results, by that constraint, we can click any of the displayed facets. Applied constraints should be removable by clicking the checkbox again.
Therefore, Faceted search, provides an effective way to allow users to refine search results, continually drilling down, until the desired items are found. The benefits include:
- Search feedback – users can see at a glance a summary of the search results and how those results break down by different criteria.
- Reduce dead-end searches – users know how many results match before they click. Values with zero counts are normally removed to reduce visual noise, and eliminate the possibility of a user accidentally selecting a constraint that would lead to no results.
Implementing Faceting With Solr
With Solr, it’s relatively simple to get faceting information. Solr offers the following types of faceting, all of which can be requested with no prior configuration:
- Field faceting – Retrieve the counts for all terms or just the top terms in any given field. The field must be indexed.
- Query faceting – Although it’s great to be able to return the top values within any indexed field as a facet, it can also be extremely useful to bring back counts for arbitrary subqueries. This way, you know how many results might match a future search, and provide analytics based upon that number.
- Range faceting – Return the number of documents that fall within certain ranges. This can be particularly useful, as a replacement for creating many different query facets, to represent multiple ranges of values.
To get started with faceting search, we need some sample data. We’ll use the example data “techproducts” that comes with Solr installation.
But first, we need to create a Solr core, or index, where we’ll add our data. Assuming that you have a running instance of Solr on your local machine, run the following commands to create a Solr core.
//Start the Solr server cd $SOLR_INSTALL bin/solr start -p 8983 //create solr core with sample_techproducts_configs configs curl -X GET 'http://localhost:8983/solr/admin/cores?action=CREATE&name=tech_products&instanceDir=configsets/sample_techproducts_configs'
The sample data is found under Solr installation directory, then example/exampledocs. Then, we’ll use a tool, called post.jar, which comes with Solr installation, to index some sample documents.
cd $SOLR_INSTALL/example/exampledocs java -jar -Dc=tech_products post.jar *.xml
The post.jar file sends XML documents to Solr using HTTP POST. To verify that the example documents were added successfully, go to the Query page in the Solr administration console and select “tech_products” core, from the dropdown box, and access the Query tab. Execute the find all documents query (*:*) and you should see all added documents.
Implementing Field Facet
First, let’s implement some field facets, using the tech_products collection.
For instance, to implement a manufacturer facet, I’ll send a field faceting command to Solr. This example assumes a field named “manu” exists on the schema file. Usually, the “string” type is an appropriate field type, since they are indexed as a single token.
First, let’s assume for a moment, that the user typed “memory”, into the search box. The Solr query to retrieve the top “memory” matches would be:
curl -X GET "http://localhost:8983/solr/tech_products/select?q=memory"
For this query, we got back 5 results.
Then, faceting commands are added to any normal Solr query request, and the faceting counts come back in the same query response. Now, we would like to further drill down our query by manufacturers.
To retrieve facet counts, for the “manu” field, we would simply add the following parameters to that query request:
curl -X GET "http://localhost:8983/solr/tech_products/select?q=memory&facet=true&facet.field=manu"
The query response will now contain facet count information for the given fields, in addition to the top matches for the query.
Additionally, any number of facet commands can be added to a single query request. To facet on both the “manu” field and the “popularity” field, we would add the following parameters:
curl -X GET "http://localhost:8983/solr/tech_products/select?q=memory&facet=true&facet.field=manu&facet.field=popularity"
Facet counts returned is always in the context of the current query. For example, there may be 100 electronics by Corsar manufacturer in the index, but only 2 that match the current search parameter.
Implementing Range Facet
However, if we request field faceting on the “price” field, we get back counts for individual prices. Yet, we want price ranges, not individual prices.
Since we have the “price” field indexed, we want to get the facet counts for the following ranges of prices: $100 or less, $100-$200, $200-$300, and so on up to 500. We simply add a facet.range command to our query request such as:
curl -X GET "http://localhost:8983/solr/tech_products/select?q=memory&facet=true&facet.field=manu&facet.range=price&f.price.facet.range.start=0&f.price.facet.range.end=500&f.price.facet.range.gap=100"
We got 1 result between 0$ and 100$ and 2 results from 100$ to 200$.
In addition to the standard query results, and any field faceting counts requested, the query response will also contain a facet count for each range.
Now, that we’ve learned how to retrieve facet counts, the next step is, how do we allow the user, to drill down and narrow the search results, with any of those constraints?
The answer is standard Solr filter queries fq. Search results are simply filtered, by any number of arbitrary filter queries.
Now, let’s assume that the user wants to drill down, on the constraint $100-$200 from the price facet, in order to get a new set of results, that include only memory cards in that price range.
So, we use the fq (filter query) parameter, which allows one to filter by a query. Also, we’ll send the relevant faceting commands again since we also want to update the facet counts.
curl -X GET "http://localhost:8983/solr/tech_products/select?q=memory&facet=true&facet.field=manu&fq=price:\[100%20TO%20200\]"
And now I receive only 2 results.
Please note, that here I escaped the bracket symbol, and I used the ASCII code, %20 for whitespace. The fq command can appear anywhere in the query request. Parameter order does not matter.
As you’ve seen, faceting provides a fast way to let users see a high-level overview of the kinds of documents their queries match.
With Solr, you have the ability to bring back the top values within each field using field facets, to bring back bucketed ranges of numbers or date values using range facets, or to bring back the counts of any number of arbitrarily complex queries by using query faceting.