Semantic Web

RDF

    • RDF Schema describes how to use RDF to describe RDF vocabularies. It provides mechanisms for describing groups of related resources and the relationships between these resources.
  • RDFizers project is directory of tools for converting various data formats (JPEG, EML, TEX, DEB, JAVA, ICAL) into RDF.

To read:

Where not to use RDF:

Highly granular data (like absolute expression-level changes for microarrays) might not be appropriate for conversion into RDF because it explodes the size of the dataset in a circumstance where:

  1. the dataset is generally going to be used as a whole anyway
  2. there are completely adequate parsers for existing file-formats
  3. the benefit of being able to reason over an RDF representation of the data is limited, or absent

Implementation Bootcamp

Mapping the data, which has natural horizontal representation (records in the table) into vertical representation (triples) makes sense only if all below is true:

  1. Many heterogeneous objects of similar classes are needed to be stored in the database.
  2. These classes might have some common properties, but the weight of common properties is low. That means if the objects of these classes are put into one table, the weight of table cells with NULL value should be high.
  3. It is not known, which classes/properties will appear in the future (but we know they certainly will).

Other Triple Formats

  • N3 (Notation 3) – a compact and readable alternative to RDF's XML syntax.
  • N-Triples a line-based, plain text format for encoding an RDF graph. It was designed to be a fixed subset of N3.
  • Turtle (Terse RDF Triple Language) – an extension of N-Triples carefully taking the most useful and appropriate things added from N3. Turtle is intended to be compatible with, and a subset of, N3.
  • TriG – plain text format for serializing Named Graphs and RDF Datasets (extension of Turtle).
  • TriX – an experimental alternative serialization for expressing RDF triples in XML, which aims to provide a highly normalized, consistent XML representation for RDF graphs.

RDF Storage Engines / Libraries

  • Sesame Triple Store
  • Jena1) is a Java framework that provides a programmatic environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine. It supports reading and writing RDF in RDF/XML, N3 and N-Triples and provides in-memory and persistent storage implementations.
  • Mulgara is a scalable RDF database (triplestore) and fork of the original Kowari project written entirely in Java. Can be queried via iTQL and SPARQL query language.
  • JRDF2) is an attempt to create a standard set of APIs and base implementations to RDF which includes a graph API (e.g. graph comparison, manipulating graph objects), IoC support, RDF datatypes, query handling (SPARQL support). It does not currently provide support for OWL.
  • 4store is an efficient, scalable and stable RDF database3).
  • Redland RDF Libraries is a set of free software C libraries that provide support for RDF.
  • OpenLink Virtuoso is Universal Server to implement Web, File, and Database server functionality alongside Native XML Storage, and Universal Data Access Middleware, as a single server solution. It includes support for key Internet, Web, and Data Access standards such as: XML, XPATH, XSLT, SOAP, WSDL, UDDI, WebDAV, SMTP, SQL, ODBC, JDBC, and OLE-DB. It has native connectors to the following frameworks: Jena, Sesame and Redland.
  • Aperture is an open source Java framework for extracting full-text content and metadata from various information systems (e.g. file systems, web sites, mail boxes) and the file formats (e.g. documents, images) occurring in these systems. Data exchange based on Semantic Web standards (e.g. RDF).
  • AllegroGraph is a database and application framework for building Semantic Web applications. Provides RDFS reasoning, SPARQL and Sesame 2.0 HTTP interfaces.
  • Bigdata is high-performance RDF store supporting RDFS and OWL Lite inference. Bigdata Cluster Setup Guide contains notes about how to optimize Linux nodes to build a cluster.

RDF Mapping

Hadoop MapReduce

Benchmarks

SPARQL

Available endpoints

  • Biogateway – an integrated system offering an interface (via SPARQL) to the entire set of the OBO foundry candidate ontologies, the whole set of GOA files, SwissProt, the NCBI taxonomy as well as in-house ontologies.
  • Cell Cycle Ontology (CCO) extends existing ontologies for cell cycle knowledge. CCO integrates and manages knowledge about the cell cycle components and regulatory aspects in OBO, OWL, RDF and other commonly used ontology representations. This knowledge is assembled from a diverse set of already existing resources (GO, UniProt, IntAct, GOA, NCBI taxonomy, and so forth): the combination of the knowledge gives an overall picture of the cell division process.
  • Linked Life Data – search and explore over 5 billion triples from various sources including UniProt, PubMed, EntrezGene and more.

Federation:

OWL

OWL is based on the DL formalism. It provides a set of rich data modeling constructs like classes, class hierarchies, property hierarchies etc. Such features are used to define data schemata or ontologies for a domain, which describe entities in the domain, their properties and relationships, and constraints between them.

  • OWLIM is a scalable semantic repository which has full RDFS and limited OWL Lite support. It is available as SAIL4) for Sesame.
  • Pellet is an open source reasoner for OWL 2 DL in Java which provides standard and cutting-edge reasoning services for OWL ontologies. Free for non-commercial use.
  • RACER stands for Renamed ABox and Concept Expression Reasoner. RacerPro can process OWL Lite as well as OWL DL documents (knowledge bases) with some restrictions. Implementation of the SWRL is provided. Commercial.
  • OntoBroker is scalable Semantic Web middleware that supports OWL, RDF, RDFS, SPARQL and F-logic. It provides a Java API for programmatic management of OWL DL and SWRL ontologies, an inference engine for answering, and conjunctive queries using SPARQL. Commercial.
  • Oracle RDF management platform. Features of Oracle Spatial 11g Option for Oracle Database 11g Enterprise Edition (requires Partitioning and Advanced Compression options):

Converting Natural Language to RDF:

  • AquaLog is a portable question-answering system which takes queries expressed in natural language and an ontology as input and returns answers drawn from one or more knowledge bases, which instantiate the input ontology with domain-specific information.
  • ThoughtTreasure is commonsense knowledge base and architecture for natural language processing.
  • ACE View is an ontology and rule editor that uses Attempto Controlled English (ACE) in order to create, view and edit OWL 2 ontologies and SWRL rulesets (project page, Protege page).

Online tools:

To read:

SKOS

To read:

Semantic desktop

Vocabulary

reification/reincation – is a form of RDF in which any RDF statement itself can be the subject or object of a triple. This means graphs can be nested as well as chained. On the Web this allows us, for example, to express doubt or support for statements created by other people. A description of RDF statement using RDF reification vocabulary is called a reification of the statement. The examples are given here and here. SeRQL example for Sesame is here.

query statements inferencer – the ability of query processor to intercept and preprocess new statements as needed to enable data semantic support (e.g. RDF), which in Sesame is implemented for SAIL7).

ontology reasoner – basically checks that ontology makes sense (consistency checking, concept satisfiability). A reasoner creates an entailment of the RDF graph.

The terms 'class' and 'subclass' also appear in the context of XML Schemas, and more generally in object-oriented programming. There is an analogy between the use of the terms in those contexts and in this one, but it is a loose analogy: the use of types in the XSchema and O-O contexts is broadly to constrain behaviour and help identify errors, whereas the corresponding assertions in the context of RDF allow a reasoner to deduce a larger volume of implicit information. In particular, RDF schemata do not function as constraints, and mistakes made when defining concepts in an ontology, or when asserting information about resources, do not manifest themselves as 'schema violations', but instead more indirectly, when a reasoner finds it is able to deduce contradictory information, for example being able to prove that some resource urn:example#X is simultaneously a Person and not a Person8).

Reasoning can be performed either when the data is loaded into the knowledge base or when a query is issued. The former class of knowledge bases, which perform reasoning when data is loaded are called materialized knowledge bases. Materialized knowledge-bases trade-off space and increased loading time for shorter query times. This approach is suited for applications domains where the frequency of data being added is much smaller than that of queries being presented. Examples of such applications are data warehouses and (for most part) web-search. Moreover, since the worst case for OWL reasoning is exponential in time and memory, this approach is often considered to be a good way to store and query OWL knowledge-bases. Most reasoning engines for OWL are implemented using either tableau algorithms or rule based/logic programming based engines. The OWL reasoners that are implemented using rule based engines, have been suggested as a practical alternative to the more correct and complete tableau algorithms. In rule based reasoners, the OWL ontology definitions are first compiled into a set of rules which are then applied on the presented data-set to create the new inferred triples. The main advantages of this class of reasoners are that they are well studied and many robust implementations exist. The disadvantages are that only a subset of the OWL specification can be implemented using them. Many popular open source (Jena) and commercial OWL toolkits (OWLIM, Oracle), are implemented using rule based reasoners.9)

entailment – the process of transforming the RDF graph by following DL rules through the unification/resolution process to its transitive closure. The process of deriving new information is sometimes called reasoning.




RDF Meta RDF Sesame W3C RDF W3C SPARQL

4) , 7) SAIL stands for Storage And Inference Layer
5) SPARQL-like capability is not full SPARQL because the standard wasn't finalized at the time of Oracle Database 11g release. SPARQL support in the database is planned for the next major release.
8) This text was taken from here
programming/semantic_web/start.txt · Last modified: 2010/08/19 16:04 by dmitry
 
 
Recent changes RSS feed Driven by DokuWiki