paint-brush
Searching for Meaning: What Real World Semantic Search Looks Likeby@mwiermann

Searching for Meaning: What Real World Semantic Search Looks Like

by Marcelo WiermannJuly 7th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Semantic Search is a search method for surfacing highly relevant results based on the**meaning** of the query, context, and content. It allows users to find things more naturally and with better support nuance than highly sophisticated but rigid traditional relevancy methods. Examples of companies implementing some form of semantic search include eBay, Shopee, Ikea, Walmart, and many others. You can build semantic search experiences mixing vector databases, traditional search engines, and LLMs.
featured image - Searching for Meaning: What Real World Semantic Search Looks Like
Marcelo Wiermann HackerNoon profile picture

It’s remarkable how so many things are made better with great searches. Google made it easy for normal folks to find whatever they needed online, no matter how obscure. IDEA’s fuzzy matching and symbol search helped programmers forget the directory structure of their code bases.


AirTag added an advanced spatial location to my cat. A well-crafted discovery feature can help add that “wow” factor that iconic, habit-forming products have.

In Search of Meaning

Semantic Search is a search method for surfacing highly relevant results based on the meaning of the query, context, and content. It goes beyond simple keyword indexing or filtering.


It allows users to find things more naturally and with better support nuance than highly sophisticated but rigid traditional relevancy methods.


In practice, it feels like the difference between asking a real person or talking to a machine.


Tech companies from all over the world are racing to add these capabilities to their existing products. Instacart published an extensive article on how they added semantic deduplication to their search experience.


Examples of companies implementing some form of semantic search include eBay, Shopee, Ikea, Walmart, and many others.


Source: Instacart

The reason for this rush towards semantic search is simple: more relevant results = happier customers = more money. Discovery, relevancy, and trustworthiness are some of the hardest problems to solve in e-commerce, and an entire ecosystem exists to help companies solve them.

Vectors to the Rescue


Algolia NeuralSearch. Source: Algolia

There is an emerging group of highly capable semantic search SaaS offerings. A prime example is Algolia’s NeuralSearch - if you want a top-notch, batteries-included system that will take care of most of the complications of implementing search right, this is a great place to look.


Sadly, you are going to pay - a lot. This might be OK for a low to medium-traffic site or a POC, but do your math before you fully commit to them.


Don’t worry though: you can still create an awesome semantic search experience even if you have a more down-to-earth budget. It will just require a bit of doing.


Many companies working on semantic search today are using document embeddings - a way of representing meaning as vectors.


Since semantic search alone may not be able to provide enough relevant hits, traditional full-text indexing is used as a backup method. A feedback loop is added to track user interactions, and use them to provide super relevant results through result re-ranking.


This is what the architecture looks like:


Query and feedback loop

This system has three key processes: indexing, querying, and tracking.


Indexing is done by converting a document’s content to an embeddings vector through a text-to-vector encoder (ex: OpenAI’s Embeddings API) and inserting it into a Vector Database (ex; Qdrant, Milvus, Pinecone, etc.).


Documents are also indexed in a traditional full-text search engine (ex: Elasticsearch). This combination is usually referred to as “hybrid search.”


Querying relies on encoding incoming queries into vectors (preferably using the same encoder as the previous step) and querying the vector database using them. These results are then combined with traditional full-text results and re-ranked for relevancy.


Search re-ranking is usually a complex problem, and often relies on a mix of machine learning and heuristics.


Tracking involves capturing important user interactions - ex: clicking on results, liking items, etc. - and using these events to update the machine learning models involved in re-ranking.


This provides a feedback loop that uses user input to continuously improve relevancy. Snowplow is an example of a capable tracking system.

What’s Next?

If you have the budget for a SaaS solution, then congratulations: you are well on your way to impressing your users with a spanking new search function. If, like most of us, you are not made of money, then it’s time to roll up your sleeves.


Implementation can be a daunting challenge. If you need any help, I wrote about the subject to help you get started. In either case, you should seriously consider whether your users could benefit from semantic search.


It’s a hard problem to solve, but the upside is definitely there and users are getting more used to this raised bar every day. Happy searching!