Semantic Search understands the intent of the searcher via the contextual meaning of search keywords to generate more relevant results, thereby, improves accuracy of the search. Two of the most fundamental “semantic” techniques are “named entity
identification” and “semantic ambiguity resolution”. Good solutions to these is very critical for high precision in information retrieval.
In this blog, I’ll cover following topics:
- Compare the traditional keyword search with semantic search
- Challenges in semantic search
- Solution1 - How to capture user intent
- Solution2 - A more generic approach for capturing user intent
- Conclusion
1. Traditional Method for Document Retrieval:
Searching is done in four stages in classical search engines:
- Document indexing: Simply index documents :)
- Term weighting: Importance of the terms used within the document are calculated with the help of term frequency.
- Similarity coefficients: Documents and queries are represented by vectors of term weight.
- Retrieval: Retrieval is done by cosine similarity.
Obvious disadvantages:
- Search keywords must be precise
- Document with similar context but different term won’t be retrieved
2. Challenges in Semantic Search:
Term mismatch is the most concerning problem for effective information retrieval. In that, there are multiple kinds of problems namely:
- Vocabulary problem:
- The words on which the documents are indexed (vs) the words in user query are not same
- Synonymy:
- Same words different meanings (Ex: “apple” as company [vs] fruit)
- Synonymy may result in a failure to retrieve relevant documents
- Decreases Recall
- Polysemy:
- Different words with same meaning (Ex: “television” and “tv”)
- Polysemy may cause retrieval of erroneous or irrelevant documents
- Decreases Precision of retrieval.
- Hypernymy and Hyponymy:
- We can place a hypernym and its hyponyms in a hierarchy
- The more general hypernym above the hierarchy and the more specific hyponyms below as shown in the picture above
- Meronymy and Holonym:
- A meronym is a word that denotes a part or member of something.
- The opposite of a meronym is a holonym.
- For example, finger is meronym of hand and hand is the homonym of finger
Above listed items are some of the challenges which leads to term mismatch and thereby effect search efficeiency.
Continuation in part2..
So far, we’ve seen how traditional search fails in retrieving documents with similar context and the challenges in capturing user intent. Please find continuation of this, which covers the solutions to perform contextual search here
References: