my-notes

This project is maintained by spoddutur

RDF Overview

1. What is RDF?

RDF is used to represent information about remote resources (in particular, semantic information of the data). Capturing semantic information was fuelled by the need to exchange knowledge in the web. It defines a model for describing relationships among resources in terms of uniquely identified properties (attributes) and values.

2. <S,P,O> - DataModel of RDF:

RDF expresses knowledge as tuple of <S,P,O> where subject S has property P with value O. S and P are resource URIs and O is either a URI or a literal value.

2.1 RDF is flexible and dynamic DataModel:

Any resource can be associated with any property or type irrespective of its type. This flexibility makes RDF a perfect match to represent metadata because resource descriptions cannot necessarily be bound by fixed schemas.

2.2 Comparison of RDF with OO (object-oriented):

Common points between an object and RDF:

3. Persistence

3.1 XML storage

A naïve approach might be to map the RDF data to XML and rely on the efficient storage of XML.

3.2. Relational/Object DB

Many RDF systems have used relational or object databases for persistent storage and retrieval. For relational databases, the schema consisted of 3 tables: statement table, a literals table and a resources table. image

To distinguish literal objects from resource URIs, two columns were used. The literals table contained all literal values and the resources table contained all resource URIs in the graph.

3.3 BerkeleyDB format

In this approach, each statement i.e., SPO tuple, was stored three times: once indexed by subject, once by predicate and once by object.

3.4 Hybrid approach

Drawing on experience from the denormalized schema in which resource URIs and simple literal values are stored directly in the statement table. A separate literals table is only used to store literal values whose length exceeds a threshold, such as blobs. Similarly, a separate resources table is used to store long URIs.

4. Optimizations for common statement patterns

Applications typically have access patterns in which certain subjects and/or properties are accessed together. For example, a graph of data about persons might have many occurrences of objects with properties name, address, phone, gender that are referenced together. Using knowledge of these access patterns to influence the underlying database storage structures can provide a performance benefit.

4.1 Property-class table:

We could potentially cluster properties that are commonly accessed together and store it in a separate table as shown below:

image

Advantages:

5. Performance evaluation

A synthetic database of 10,000 reified RDF statements was generated and stored in two different formats. In the first case, the reified statement was stored in an optimised form as a property-class table. In the second case, the reified statement was stored unoptimised as RDF triples, i.e., each reified statement was stored as four RDF statements. Consequently, the first table contained 10,000 rows while the second table contained 40,000 rows. Each test was run four times with different random number seeds and three different test sizes were run of 200, 1000, 5000 retrievals. For a small number of retrievals, the optimised format shows a large improvement between the first and fourth run. We attribute this to caching effects that decrease with larger numbers of retrievals. The speed-up for large numbers of retrievals exceeds our expectations. This may be due to database caching effects. Since the optimised table is smaller, it is possible to cache a larger percentage of the entire table which reduces the number of relatively slow disk seek operations.

image

References:

https://www.cs.uic.edu/~ifc/SWDB/proceedings.pdf