my-notes

This project is maintained by spoddutur

Information Retrieval via SemanticSearch on LinkedData

Semantic Search understands the intent of the searcher via the contextual meaning of search keywords to generate more relevant results, thereby, improves accuracy of the search. Two of the most fundamental “semantic” techniques are “named entity identification” and “semantic ambiguity resolution”. Good solutions to these is very critical for high precision in information retrieval. In this blog, I’ll cover following topics:

  1. Compare the traditional keyword search with semantic search
  2. Challenges in semantic search
  3. Solution1 - How to capture user intent
  4. Solution2 - A more generic approach for capturing user intent
  5. Conclusion

1. Traditional Method for Document Retrieval:

Searching is done in four stages in classical search engines:

  1. Document indexing: Simply index documents :)
  2. Term weighting: Importance of the terms used within the document are calculated with the help of term frequency.
  3. Similarity coefficients: Documents and queries are represented by vectors of term weight.
  4. Retrieval: Retrieval is done by cosine similarity.

Obvious disadvantages:

Term mismatch is the most concerning problem for effective information retrieval. In that, there are multiple kinds of problems namely:

  1. Vocabulary problem:
    • The words on which the documents are indexed (vs) the words in user query are not same
  2. Synonymy:
    • Same words different meanings (Ex: “apple” as company [vs] fruit)
    • Synonymy may result in a failure to retrieve relevant documents
    • Decreases Recall
  3. Polysemy:
    • Different words with same meaning (Ex: “television” and “tv”)
    • Polysemy may cause retrieval of erroneous or irrelevant documents
    • Decreases Precision of retrieval.
  4. Hypernymy and Hyponymy:

image

  1. Meronymy and Holonym:
    • A meronym is a word that denotes a part or member of something.
    • The opposite of a meronym is a holonym.
    • For example, finger is meronym of hand and hand is the homonym of finger

Above listed items are some of the challenges which leads to term mismatch and thereby effect search efficeiency.

Continuation in part2..

So far, we’ve seen how traditional search fails in retrieving documents with similar context and the challenges in capturing user intent. Please find continuation of this, which covers the solutions to perform contextual search here

References: