Information Retrieval via SemanticSearch on LinkedData

This blog is continuation of part1.. So far, we have covered:

How traditional keyword search works and how it falls behind in capturing user intent
We also saw the challenges in capturing the intent of the searcher

In this blog, am going to talk about addressing one of the challenges i.e., Vocabulary Problem.

3. Solution1 - How to capture user intent

Let’s see how can we capture user intent by addressing Vocabulary problem. Note that, this is the most natural and successful technique for this problem.

3.1 Concern with Vocabulary problem:

We should capture the Semantic (instead of the keywords) of user query.

3.2 Addressing Vocabulary Problem via Query Expansion:

There are 2 steps involved here:

The indexers and the users need to use the related and synonym words instead of using the same words.
Expand the original query with suitable words that best capture the actual user intent

3.3 How to expand the original query with suitable words?

User inputs query in natural language.
Use tools like StanfordParser to identify the noun phrases and other grammar in the query.
Related synonym sets of various words in the query are also obtained from Ontology and Word Net API.
Add these words to the original query and form the new query.
The queries formed will be more refined and are sent to Search API which fetches the results related to the user query. Following diagram depicts the same:

3.4 Example run

Step 1 - User Query: name of football clubs in EEFA
Step 2 - Parsed words for this user query using Stanford Parser:
Step 3 - Word Net and Ontology Synonym words: list, soccer
Step4 - Expanded Query: Name or list the football Soccer clubs in EEFA

3.5 Advantages of semantic search over traditional keyword search:

Tradional keyword search will not be able to understand the difference between: USA Players in Catalan basket team Vs Catalan Palyers in USA teams. Such cases are not a problem for semantic search.

3.6 Some sample queries to realise the potential of Semantic search:

Query-1 List the team names in EEFA
Query-2 Events of olympics
Query-2 Persons who won medal in olympics
Query-4 Persons who won medal in chess
Query-5 What are the upcoming sports events in Europe?
Query-6 What is sports concepts?
Query-7 200 meter players
Query-8 Martial arts sports

3.7 Disadvantages:

These approaches are typically based on a single knowledge source such as WordNet or Wikipedia
Bound to the specific structure of the knowledge source assumed to be known a-priori.
Highly restricted by the scope of the knowledge source (WordNet or wikipedia in this case).
This method may not detect the different senses for ambiguous keywords. For example, for the word “bass” which could mean “fish” or “music” based on user context.

3.8 Solution2:

A more generic approach to address some of the above mentione disadvantages is here

my-notes