my-notes

This project is maintained by spoddutur

Information Retrieval via SemanticSearch on LinkedData

This blog is continuation of part1.. So far, we have covered:

  1. How traditional keyword search works and how it falls behind in capturing user intent
  2. We also saw the challenges in capturing the intent of the searcher

In this blog, am going to talk about addressing one of the challenges i.e., Vocabulary Problem.

3. Solution1 - How to capture user intent

Let’s see how can we capture user intent by addressing Vocabulary problem. Note that, this is the most natural and successful technique for this problem.

3.1 Concern with Vocabulary problem:

We should capture the Semantic (instead of the keywords) of user query.

3.2 Addressing Vocabulary Problem via Query Expansion:

There are 2 steps involved here:

  1. The indexers and the users need to use the related and synonym words instead of using the same words.
  2. Expand the original query with suitable words that best capture the actual user intent

3.3 How to expand the original query with suitable words?

  1. User inputs query in natural language.
  2. Use tools like StanfordParser to identify the noun phrases and other grammar in the query.
  3. Related synonym sets of various words in the query are also obtained from Ontology and Word Net API.
  4. Add these words to the original query and form the new query.
  5. The queries formed will be more refined and are sent to Search API which fetches the results related to the user query. Following diagram depicts the same: image

3.4 Example run

Tradional keyword search will not be able to understand the difference between: USA Players in Catalan basket team Vs Catalan Palyers in USA teams. Such cases are not a problem for semantic search.

3.7 Disadvantages:

3.8 Solution2:

A more generic approach to address some of the above mentione disadvantages is here

References: