Context-Aware Semantic Association Ranking

A second generation “Semantic Web” is being realised as a scalable ontology-driven information system where the heterogenous data content is linked meaning with semantic metadata. It is not uncommon that the quest for entity associations surfaces too many relationships between them thereby, making it equally important to rank them in order to find interesting and meaningful relationships before presenting them to end user.

To rank the relevance of semantic associations, it is necessary to capture the context within which they are going to be interpreted and used. This is a different problem compared to ranking documents in traditional search-engines, where ranking is done based on number of references to them (TF-IDF).

In this blog, we look at formulating a custom ranking function which incorporates user-defined semantics (e.g., context) and universal semantics (e.g., associations conveying more information) via weight-assignments.

1. Weight Assignments

A path connecting two entities can contain many entities in between. Here, we’ll look at how can we rank a path The semantic associations i.e., the path connecting two entities can have multiple entities/properties in between. To rank these paths, a rank function is defined which is constituted by a lot of intermediate weights assigned to edges connecting them. The weights are particularly categorized into two types:

Universal Weights
User-Defined Weights

Let’s look at these two weights in detail:

2. Universal Weights

Certain weights will influence a path rank regardless of the query or context of interest. We call them Universal Weights. Let’s discuss the one such kind of universal weight that contribute to the overall path rank:

2.1 Subsumption Weight

Intuition: Assign more weight to more specific semantic associations because they convey more meaning than general associations.

Following figure depicts a class Organization and its subclasses. Organization being the highest class in hierarchy is the most general while Political Organization is a more specific organization.

Computing subsumption weight of a path P: For this, we’ve to first compute component weight based on calss hierarchy where a component is any entity or property contained in path P.

Compute component weight of the ith component in Path P is defined as follows:

According to the above formula, Democratic Political Organisation(c3) will have a component weight of 1 and Political Organisation(c2) will have a component weight of 0.6 as shown below:

Given component weights, the subsumption weight of a path P is computed as shown below:

where |c| is the total number of components in the path P excluding starting and ending entities. Thus , i.e., the Subsumption Weight of a path P is the product of all the component weights within P, normalised by the number of components in the path (to avoid bias in path length).

Example:

Subsumption weight of Path e1 -> e2 -> e5 = 1/3 * (1/2 * 1/2 * 1/1) = 0.083
Subsumption weight of Path e1 -> e3 -> e5 = 1/3 * (1/2 * 2/2 * 1/1) = 0.167
Subsumption weight of Path e1 -> e4 -> e5 = 1/3 * (1/1 * 2/2 * 1/1) = 0.334

Thus, Subsumption Weight will assign higher weights to paths which have more specific meaning.

3.User-Defined Weights

Unlike Universal Path Weights, User-Defined weights are more specific to the query (or context).

Path Length Weight:

Intuition: Let path length influence rank of a path.
Computing rank of a path based on path length: If user wants to rank shortest paths high, then as shown in the figure below, (a) is used and if user wants to favor long paths then (b) is used.

Here,

is number of components in the path P (excluding first and last nodes).

Example:

Path length weight in the below example for longer path will be 1/9 and that of direct path will be 1/1. Shorter path will get higher weight as expected. Alternatively, if we have to favor longer paths, then of longer path = 1- (1/9 )= 0.889 and for direct path = 1-(1/1) = 0.

Trust Weight:

Intuition: Rank trusted sources higher (e.g., Reuters could be regarded as a more trusted source on international news than some of the other news organizations)
Computing rank of a path based on source trust weight:

Trust weight of an overall path P is defined as product of trust weights of all properties in P (where ) is the trust weight of _i_th property in the path P

4. Final Ranking criterion:

We will now define the overall path rank, using all the different weights discussed above.

where all the Ki’s add up to 1.0 and are intended to allow fine-tuning of the different ranking criteria (e.g., trust can be given more weight than path length).

5. Conclusion:

We have seen how we can plugin different ranking criteria’s to score the semantic associations based on user’s interest namely:

Subsumption Weight (Sp): How much meaning a semantic association conveys depending on the places of its components in the ontology
Path Length Weight (Lp): Allows preference of either immediate or distant relationships
Trust Weight (Tp): Determines how reliable a relationship is according to its provenance

One can tweak and add more ranking criteria’s and come up with their own formula. This blog is to give an idea on how to go about devising ranking strategy. There are indeed a lot of other advanced ways to rank the results depending on your domain and usecase. Hope it helps!!