my-notes

This project is maintained by spoddutur

Information Retrieval via SemanticSearch on LinkedData

This blog is continuation of part2.. So far, we have covered:

  1. How traditional keyword search works and how it falls behind in capturing user intent
  2. We also saw the challenges in capturing the intent of the searcher and
  3. Solution1 - How to capture user intent

In this blog, we are going to discuss a more generic solution for capturing user intent

4. Solution2 - A more generic approach for capturing user intent

4.1 Generic approach to get Semantic Synonyms

Generic Query Expansion via Mapping keywords to LinkedData Resources: As the title suggests, in this generic method, to expand query we map user keywords to linkeddata resources and get its corresponding semantic synonyms from owl:class and rdf:property labels of the Dataset.

Goal: Given a keyword w, find representative resources in Linked Data.

Input Dataset: The dataset we operate on consists of a set of resources, R, stored as Linked Data, where every resource R has a set of properties used to denote specific relationships between resources.

Labelling Properties: We have chosen a set of labelling properties, i.e. properties whose values are expected to be literals which might be worthwhile in identifying distinct concepts. Ex: rdfs:Label, foaf:name, dc:title, skos:prefLabel, skos:altLabel, fb:type.object.name.

Method: In order to find representative concepts, we construct from w (i.e., given keyword) an expanded set of keywords, Ew that improve the chance of finding the most fitting concept in the target vocabulary according to its labelling (under the Labelling Properties).

4.1.1 Algorithm to get semantic synonyms:

For every keyword w in user query, if a resource R exists for w in Dataset, then to get its semantic synonyms, explore the neighbours N of R such that:

Example walk-through:

For userquery “Honda”, if there is a resource name “Honda” in the dataset as shown in the picture below, this is how we find its semantic synonyms:

  1. Check the one-hop neighbouring resources associated to “Honda” one-by-one like dbo:Automobile, dbo:engine, dbo:vehicle, dbo:organisation, Tokyo, 198561, 1.19E11 etc

image

  1. According to the algorithm, there are two conditions to check.
  2. Condition1: The relation between “Honda” and the neighbour should be one of the relations mentioned in LabellingProperties.
  3. Condition2: The type of neighbouring resource should be either owl:class or rdf:Property
  4. Tokyo neighbour is of type rdf:location. Not satisfying Condition1. So, this is not a semantic synonym for Honda
  5. 198561 is of type rdf:literal. Not satisfying Condition1. So this is also not a semantic synonym for Honda
  6. dbo.Automobile and dbo:organisation are of type owl:class and are associated to Honda via LabellingProperties. So, they are semantic synonyms of Honda.
  7. Similarly, dbo:engine and dbo:vehicle are of type rdf:property and are linked to Honda via LabellingProperties. So, they are also considered as semantic synonyms.
  8. Resulted semantic synonyms for Honda are: Automotive, organisation, vehicle and engine. (Note: This is just a curated example to get an idea)

4.1.2 Examples:

UserQuery and its corresponding semantic synonyms found by this approach:

4.1.3 Advantages:

5. Conclusion:

This generic approach uses semantic similarity to expand query. These expanded sets are more general than ‘synsets’ (sets of synonyms within dictionary-oriented terms) in terms of both, including a huge potential range of named entities and in the flexibility of the semantic relationships covered.

Hybrid approach: In general, a multi-strategy approach is recommended where this generic approach is used only after the lexical expansion with WordNet failed to give desired result.

References: