====== Semantic Web ====== * [[wp>Semantic Web]] * [[http://www.w3.org/2001/sw/]] * [[esw>FrontPage|Semantic Web wiki]] on W3C. * [[http://semanticweb.org/wiki/Tools|SemanticWeb Software Tools]] * [[dropbox>7c6cq0mttcld7xo/a_semantic_web_primer.pdf|A Semantic Web Primer (Grigoris Antiniou, Frank von Harmelen) [2004]]] * [[dropbox>pvqgbgjas2qcs7g/ontoknowledge_ontology_based_tools_for_knowledge_management.pdf|OnToKnowledge -- Ontology-based Tools for Knowledge Management (Dieter Fensel)]] * [[dropbox>p32vsj3zaz8nyhu/semantic_web_for_the_working_ontologist.pdf|Semantic Web for the Working Ontologist (Morgan Kaufmann) [2008]]] * [[dropbox>gduh7zewnlmlnzy/tutorial_on_the_semantic_web.pdf|Tutorial on the Semantic Web (Ivan Herman)]] [2009] ([[http://www.w3.org/People/Ivan/CorePresentations/SWTutorial/|online]]) * [[http://hackathon3.dbcls.jp/wiki/ImplementationBootcamp|Implementation Bootcamp]] * [[http://www.jbiomedsem.com/|Journal of Biomedical Semantics]] encompasses all aspects of semantic resources and their use in data integration, mining, modeling, interpretation and exploitation in biomedical research. LinkedData: * [[http://linkeddata.org/|Linked Data community]] * [[http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html|Tim Berners-Lee on the next Web]] * [[http://esw.w3.org/DutchSemanticWebMeetups/LinkedDataResources|Dutch Linked DataSources]] * [[http://docs.google.com/View?id=dgrhqnj8_213krspgsns|Publiek data in Nederland]] * [[http://twitter.com/search?q=%23ODNL|Open Data in Netherlands in Twitter]] * [[http://inkdroid.org/journal/2011/04/25/dois-as-linked-data/|Using DOIs for linking the data]] ===== RDF ===== * [[http://www.w3.org/RDF/|Resource Description Framework]] * [[http://www.w3.org/TR/rdf-schema/|RDF Schema]] describes how to use RDF to describe RDF vocabularies. It provides mechanisms for describing groups of related resources and the relationships between these resources. * [[http://simile.mit.edu/wiki/RDFizers|RDFizers project]] is directory of tools for converting various data formats (JPEG, EML, TEX, DEB, JAVA, ICAL) into RDF. * [[http://www.ninebynine.org/RDFNotes/RDF-Datatype-inference.html|Using datatype-aware inferences with RDF]] by [[GK@ninebynine.org|Graham Klyne]]. To read: * [[http://www.w3.org/TR/rdf-primer/|RDF Primer]] * http://www.w3.org/TR/rdf-syntax-grammar/ * http://www.w3.org/TR/REC-rdf-syntax/ * http://www.w3.org/TR/rdf-concepts/ * http://www.w3.org/TR/rdf-mt/ * Embedding RDF in XHTML: [[http://www.w3.org/2006/07/SWD/wiki/RDFa.html|RDF/A Task Force]], [[http://rdfa.info/wiki/RDFa_Wiki|RDFa Wiki]], [[http://infomesh.net/2002/rdfinhtml/|RDF in HTML: Approaches]] * http://gearon.blogspot.com/ Where not to use RDF:
Highly granular data (like absolute expression-level changes for microarrays) might not be appropriate for conversion into RDF because it explodes the size of the dataset in a circumstance where: - the dataset is generally going to be used as a whole anyway - there are completely adequate parsers for existing file-formats - the benefit of being able to reason over an RDF representation of the data is limited, or absent [[http://hackathon3.dbcls.jp/wiki/ImplementationBootcamp|Implementation Bootcamp]]
Mapping the data, which has natural horizontal representation (records in the table) into vertical representation (triples) makes sense only if all below is true: - Many heterogeneous objects of similar classes are needed to be stored in the database. - These classes might have some common properties, but the weight of common properties is low. That means if the objects of these classes are put into one table, the weight of table cells with NULL value should be high. - It is not known, which classes/properties will appear in the future (but we know they certainly will). ==== Other Triple Formats ==== * [[http://www.w3.org/DesignIssues/Notation3|N3 (Notation 3)]] -- a compact and readable alternative to RDF's XML syntax. * [[http://www.w3.org/TR/rdf-testcases/#ntriples|N-Triples]] a line-based, plain text format for encoding an RDF graph. It was designed to be a fixed subset of N3. * [[http://www.w3.org/TeamSubmission/turtle/|Turtle (Terse RDF Triple Language)]] -- an extension of N-Triples carefully taking the most useful and appropriate things added from N3. Turtle is intended to be compatible with, and a subset of, N3. * [[http://www4.wiwiss.fu-berlin.de/bizer/TriG/|TriG]] -- plain text format for serializing Named Graphs and RDF Datasets (extension of Turtle). * [[http://sw.nokia.com/trix/TriX.html|TriX]] -- an experimental alternative serialization for expressing RDF triples in XML, which aims to provide a highly normalized, consistent XML representation for RDF graphs. ==== RDF Storage Engines / Libraries ==== * [[Sesame]] Triple Store * [[http://jena.sourceforge.net/|Jena]]((See also [[wp>Jena (framework)]])) is a Java framework that provides a programmatic environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine. It supports reading and writing RDF in RDF/XML, N3 and N-Triples and provides in-memory and persistent storage implementations. * [[http://mulgara.org/|Mulgara]] is a scalable RDF database (triplestore) and fork of the original Kowari project written entirely in Java. Can be queried via iTQL and SPARQL query language. * [[http://jrdf.sourceforge.net/|JRDF]]((See also [[wp>JRDF (framework)]])) is an attempt to create a standard set of APIs and base implementations to RDF which includes a graph API (e.g. graph comparison, manipulating graph objects), IoC support, RDF datatypes, query handling (SPARQL support). It does not currently provide support for OWL. * [[http://4store.org/|4store]] is an efficient, scalable and stable RDF database((See [[http://cloudofdata.com/2009/07/garlik-releases-open-source-rdf-triple-store-claims-capacity-for-60-billion-triples/|release of this triple store under GNU GPL]])). * [[http://librdf.org/|Redland RDF Libraries]] is a set of free software C libraries that provide support for RDF. * [[http://docs.openlinksw.com/virtuoso/whatisnewto2x.html|OpenLink Virtuoso]] is Universal Server to implement Web, File, and Database server functionality alongside Native XML Storage, and Universal Data Access Middleware, as a single server solution. It includes support for key Internet, Web, and Data Access standards such as: XML, XPATH, XSLT, SOAP, WSDL, UDDI, WebDAV, SMTP, SQL, ODBC, JDBC, and OLE-DB. It has [[http://docs.openlinksw.com/virtuoso/rdfnativestorageproviders.html|native connectors]] to the following frameworks: [[http://jena.sourceforge.net/|Jena]], [[Sesame]] and [[http://librdf.org/|Redland]]. * [[http://www.aduna-software.com/technology/aperture|Aperture]] is an open source Java framework for extracting full-text content and metadata from various information systems (e.g. file systems, web sites, mail boxes) and the file formats (e.g. documents, images) occurring in these systems. Data exchange based on Semantic Web standards (e.g. RDF). * [[http://franz.com/support/documentation/3.0/agraph-introduction.html#header3-110|AllegroGraph]] is a database and application framework for building Semantic Web applications. Provides RDFS reasoning, SPARQL and [[Sesame]] 2.0 HTTP interfaces. * [[http://www.systap.com/bigdata.htm|Bigdata]] is high-performance RDF store supporting RDFS and OWL Lite inference. [[http://sourceforge.net/apps/mediawiki/bigdata/index.php?title=ClusterGuide|Bigdata Cluster Setup Guide]] contains notes about how to optimize Linux nodes to build a cluster. ==== RDF Mapping ==== * RDF/XML -> XSL/[[http://simile.mit.edu/mail/ReadMsg?listId=9&msgId=3438|Fresnel lens]]/JSON. * [[Sesame]] [[http://www.openrdf.org/doc/elmo/1.5/|Elmo]] POJO mapping tool module. * [[http://blogs.sun.com/bblfish/entry/java_annotations_the_semantic_web|Java Annotations & the Semantic Web]] by Henry Story describes the possibilities of RDF-to-Java mapping. * {{extending_relational_databases_to_support_semantic_web_queries.pdf|Extending Relational Databases to Support Semantic Web Queries (Zhengxiang Pan)}} ([[http://swat.cse.lehigh.edu/pubs/pan04a.pdf|online]]) * {{storage_and_querying_of_e-commerce_data.pdf|Storage and Querying of E-Commerce Data (Rakesh Agrawal) [2001]}} -- makes a comparison of horizontal, vertical, and binary (table per attribute = predicate) presentations of XML data ([[http://www.almaden.ibm.com/cs/projects/iis/hdb/Publications/papers/vldb01_ecom.ps.gz|online]]) * You can find the comparison of different approaches to map RDF to SQL in [[http://www.w3.org/2001/sw/Europe/reports/scalable_rdbms_mapping_report/|Mapping Semantic Web Data with RDBMSes]]. ==== Hadoop MapReduce ==== * [[dropbox>g6htlcvfye16zr6/rdfs_owl_reasoning_using_the_mapreduce_framework.pdf|RDFS/OWL reasoning using the MapReduce framework (Jacopo Urbani) [2009]]] ([[http://www.few.vu.nl/~jui200/thesis.pdf|online]], [[http://data.semanticweb.org/conference/iswc/2009/paper/research/374/html|short article]]) -- The introduction describes very well the basic principles of Semantic Web, the relation between RDF, RDFS and OWL, as well as different OWL classes (Full, DL, Lite and Horst) and the reasoning problems for them. Gives very good background to Hadoop programming model. * [[http://halcyon.usc.edu/~pk/prasannawebsite/papers/ram_icpp08.pdf|Parallel Inferencing for OWL Knowledge Bases (Ramakrishna Soma) [2008]]] -- provides the algorithms for data and rule partitioning approaches. * [[http://www.springerlink.com/content/l805560670136163/|Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce [2009]]] by [[mfh062000@utdallas.edu|Mohammad Farhan Husain]] ==== Benchmarks ==== * [[esw>RdfStoreBenchmarking|RDF Store Benchmarking]] provides some testing material and [[esw>LargeTripleStores|Large Triple Stores]] lists those of them with good scalability. * [[http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/index.html|Berlin SPARQL Benchmark Results - 09/17/2008]] * {{triple_stores_evaluation.pdf|RDF Triple Stores Evaluations}} ([[http://www.bioontology.org/wiki/index.php/RDF_Triple_Stores|online]]) * [[dropbox>tnk5wbmo0ik3o1o/an_evaluation_of_triple-store_technologies_for_large_data_stores.pdf|An Evaluation of Triple-Store Technologies for Large Data Stores (Kurt Rohloff) [2007]]] ([[http://www.springerlink.com/content/m14k476lr726x1g2/fulltext.pdf|online]]) * [[dropbox>jwtawjnmm64vhjs/an_evaluation_of_knowledge_base_systems_for_large_owl_datasets.pdf|An Evaluation of Knowledge Base Systems for Large OWL Datasets (Yuanbo Guo) [2004]]] ([[http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA451855&Location=U2&doc=GetTRDoc.pdf|online]]) * [[http://simile.mit.edu/reports/stores/|Scalability Report on Triple Store Applications]] * [[w3>TripleStoreLoadTesting|Triple store load testing]] * [[w3>TripleStoreScalability|Triple store scalability]] * [[w3>LargeTripleStores|Large triple stores]] * [[w3>RdfStoreBenchmarking|RDF store benchmarking]] * (outdated) [[http://www.w3.org/2001/sw/Europe/reports/rdf_scalable_storage_report/|Scalability and Storage: Survey of Free Software / Open Source RDF storage systems]] * (outdated) [[http://www.w3.org/2001/05/rdf-ds/DataStore?request=help|Survey of RDF/Triple Data Stores]] ===== SPARQL ===== To read: * http://www.w3.org/TR/rdf-sparql-query/ * http://www.w3.org/blog/SW/2009/10/23/first_drafts_for_sparql_1_1_published * http://www.thefigtrees.net/lee/sw/sparql-faq#what-is * http://xsparql.deri.org/spec/xsparql-language-specification.html * http://articles.techrepublic.com.com/5100-10878_11-6096519.html * http://lambda-the-ultimate.org/node/549 * http://nunolopes.org/publications/2008LopesA-XATA.pdf ==== Available endpoints ==== * [[http://www.semantic-systems-biology.org/biogateway|Biogateway]] -- an integrated system offering an interface (via [[http://www.semantic-systems-biology.org/biogateway/sparql-viewer/|SPARQL]]) to the entire set of the OBO foundry candidate ontologies, the whole set of GOA files, SwissProt, the NCBI taxonomy as well as in-house ontologies. * [[http://www.semantic-systems-biology.org/cco|Cell Cycle Ontology]] (CCO) extends existing ontologies for cell cycle knowledge. CCO integrates and manages knowledge about the cell cycle components and regulatory aspects in OBO, OWL, RDF and other commonly used ontology representations. This knowledge is assembled from a diverse set of already existing resources (GO, UniProt, IntAct, GOA, NCBI taxonomy, and so forth): the combination of the knowledge gives an overall picture of the cell division process. * [[http://linkedlifedata.com/sparql|Linked Life Data]] -- search and explore over 5 billion triples from various sources including UniProt, PubMed, EntrezGene and more. Federation: * [[esw>HCLSIG_BioRDF_Subgroup/QueryFederation2|Query Federation Task]] on ESW wiki. * [[http://www.biomedcentral.com/1471-2105/10/S10/S10|A journey to Semantic Web query federation in the life sciences]] ([[http://www.w3.org/2009/08/7tmdemo|7tm Receptor Demo]]) * [[http://www.semanticuniverse.com/blogs-common-sparql-extension.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+SemanticUniverse+%28Semantic+Universe%29|Creating the federated queries using GRAPH URI]] ===== [[wp>Web Ontology Language|OWL]] ===== OWL is based on the DL formalism. It provides a set of rich data modeling constructs like classes, class hierarchies, property hierarchies etc. Such features are used to define data schemata or ontologies for a domain, which describe entities in the domain, their properties and relationships, and constraints between them. * [[http://www.ontotext.com/inference/rdfs_rules_owl.html#owl_fragments|OWLIM]] is a scalable semantic repository which has full RDFS and limited OWL Lite support. It is available as SAIL((SAIL stands for //Storage And Inference Layer//)) for [[Sesame]]. * [[http://clarkparsia.com/pellet/|Pellet]] is an open source reasoner for OWL 2 DL in Java which provides standard and cutting-edge reasoning services for OWL ontologies. Free for non-commercial use. * [[http://www.racer-systems.com/products/racerpro/index.phtml|RACER]] stands for **R**enamed **A**Box and **C**oncept **E**xpression **R**easoner. RacerPro can process OWL Lite as well as OWL DL documents (knowledge bases) with some restrictions. Implementation of the SWRL is provided. Commercial. * [[http://www.ontoprise.de/en/home/products/ontobroker/|OntoBroker]] is scalable Semantic Web middleware that supports OWL, RDF, RDFS, SPARQL and F-logic. It provides a Java API for programmatic management of OWL DL and SWRL ontologies, an inference engine for answering, and conjunctive queries using SPARQL. Commercial. * [[http://www.oracle.com/technology/tech/semantic_technologies/index.html|Oracle]] RDF management platform. Features of Oracle Spatial 11g Option for Oracle Database 11g Enterprise Edition (requires Partitioning and Advanced Compression options): * An RDF Data Model with inferencing (RDFS, OWL DL and user-defined rules) * Performs SQL-based access to triples and inferred data, combines SQL query of relational data with RDF graphs and ontologies * SPARQL-like queries((SPARQL-like capability is not full SPARQL because the standard wasn't finalized at the time of Oracle Database 11g release. SPARQL support in the database is planned for the next major release.)) * Jena plug-in for Oracle can be used which includes a full SPARQL API * SKOS inference support * See also: [[http://ontolog.cim3.net/file/work/DatabaseAndOntology/2007-10-18_AlanWu/RDBMS-RDFS-OWL-InferenceEngine--AlanWu_20071018.pdf|A Scalable RDBMS-Based Inference Engine for RDFS/OWL]], [[http://www.oracle.com/technology/tech/semantic_technologies/pdf/oracle%20db%20semantics%20tech%20talk%2020080722.pdf|Oracle Database 11g Semantics Technical Talk]], [[http://www.oracle.com/technology/tech/semantic_technologies/pdf/semantic_infer_bestprac_wp.pdf|Oracle Semantic Technologies Inference Best Practices with RDFS/OWL]] Converting Natural Language to RDF: * [[http://technologies.kmi.open.ac.uk/aqualog/|AquaLog]] is a portable question-answering system which takes queries expressed in natural language and an ontology as input and returns answers drawn from one or more knowledge bases, which instantiate the input ontology with domain-specific information. * [[http://alumni.media.mit.edu/~mueller/papers/tt.html|ThoughtTreasure]] is commonsense knowledge base and architecture for natural language processing. * [[http://attempto.ifi.uzh.ch/aceview/|ACE View]] is an ontology and rule editor that uses Attempto Controlled English (ACE) in order to create, view and edit OWL 2 ontologies and SWRL rulesets ([[googlecode>p/aceview/|project page]], [[http://protegewiki.stanford.edu/wiki/ACE_View|Protege page]]). * [[googlecode>p/lucene-skos/|A SKOS analyzer module for Apache Lucene and Solr]] Online tools: * [[http://www.mygrid.org.uk/OWL/Validator|WonderWeb OWL Ontology Validator]] To read: * http://www.w3.org/TR/owl-features/ * http://www.w3.org/TR/owl-guide/ * http://www.w3.org/TR/owl-ref/ ===== SKOS ===== * [[http://www.w3.org/2004/02/skos/|SKOS on W3C]] ((See also [[wp>Simple Knowledge Organization System]])) * [[http://www.w3.org/TR/skos-primer/|SKOS Primer]] provides introductory examples and guidance in the use of the SKOS vocabulary. * [[http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102/#sechidden|SKOS Core Guide]] * [[http://skosapi.sourceforge.net/|JAVA SKOS API]] * [[esw>SkosDev/ToolShed|SKOS tools]] * [[http://protege.stanford.edu/download/protege/4.0/installanywhere/|Protege]] * {{skos_core-simple_knowledge_organisation_for_the_web.pdf|SKOS Core - Simple Knowledge Organisation for the Web (Alistair Miles) [2005]}} ([[http://epubs.cclrc.ac.uk/work-details?w=33977|online]]) * [[http://conceptweblog.wordpress.com/about/|Concept Web]] -- a dynamic, interactive fabric of concepts and their relationships. The Concept Web is constructed from research literature, Internet databases and other web sites together with off-line resources. The aim of creating the Concept Web is to remove both redundancy and ambiguity from available knowledge in order to help deal with information overload, to semantically "connect" concepts, and so to maximize the potential for knowledge discovery. To read: * http://www.w3.org/TR/skos-reference/ ===== Semantic desktop ===== * [[http://blog.ibeentoubuntu.com/2009/04/gnome-30-please-get-rid-of-file.html|Gnome 3.0: Get rid of the file hierarchy]] ===== Vocabulary ===== **reification**/**reincation** -- is a form of RDF in which any RDF statement itself can be the subject or object of a triple. This means graphs can be nested as well as chained. On the Web this allows us, for example, to express doubt or support for statements created by other people. A description of RDF statement using [[http://www.w3.org/TR/rdf-schema/#ch_reificationvocab|RDF reification vocabulary]] is called a //reification of the statement//. The examples are given [[http://www.w3.org/TR/rdf-primer/#reification|here]] and [[wp>Reification_%28computer_science%29#RDF_and_OWL|here]]. SeRQL example for [[Sesame]] is [[http://www.openrdf.org/doc/sesame2/2.2/users/ch09.html#d0e1540|here]]. **query statements inferencer** -- the ability of query processor to intercept and preprocess new statements as needed to enable data semantic support (e.g. RDF), which in [[Sesame]] is implemented for SAIL((SAIL stands for //Storage And Inference Layer//)). **ontology reasoner** -- basically checks that ontology makes sense (consistency checking, concept satisfiability). A reasoner creates an entailment of the RDF graph.
The terms 'class' and 'subclass' also appear in the context of XML Schemas, and more generally in object-oriented programming. There is an analogy between the use of the terms in those contexts and in this one, but it is a loose analogy: the use of types in the XSchema and O-O contexts is broadly to constrain behaviour and help identify errors, whereas the corresponding assertions in the context of RDF allow a reasoner to deduce a larger volume of implicit information. In particular, RDF schemata do not function as constraints, and mistakes made when defining concepts in an ontology, or when asserting information about resources, do not manifest themselves as 'schema violations', but instead more indirectly, when a reasoner finds it is able to deduce contradictory information, for example being able to prove that some resource ''urn:example#X'' is simultaneously a ''Person'' and not a ''Person''((This text was taken from [[http://labserv.nesc.gla.ac.uk/projects/agast/architecture.html#rdfintro|here]])).
Reasoning can be performed either when the data is loaded into the knowledge base or when a query is issued. The former class of knowledge bases, which perform reasoning when data is loaded are called //materialized knowledge bases//. Materialized knowledge-bases trade-off space and increased loading time for shorter query times. This approach is suited for applications domains where the frequency of data being added is much smaller than that of queries being presented. Examples of such applications are data warehouses and (for most part) web-search. Moreover, since the worst case for OWL reasoning is exponential in time and memory, this approach is often considered to be a good way to store and query OWL knowledge-bases. Most reasoning engines for OWL are implemented using either tableau algorithms or rule based/logic programming based engines. The OWL reasoners that are implemented using rule based engines, have been suggested as a practical alternative to the more correct and complete tableau algorithms. In rule based reasoners, the OWL ontology definitions are first compiled into a set of rules which are then applied on the presented data-set to create the new inferred triples. The main advantages of this class of reasoners are that they are well studied and many robust implementations exist. The disadvantages are that only a subset of the OWL specification can be implemented using them. Many popular open source (Jena) and commercial OWL toolkits (OWLIM, Oracle), are implemented using rule based reasoners.((Quoted from [[http://halcyon.usc.edu/~pk/prasannawebsite/papers/ram_icpp08.pdf|Parallel Inferencing for OWL Knowledge Bases]]))
**entailment** -- the process of transforming the RDF graph by following DL rules through the unification/resolution process to its transitive closure. The process of deriving new information is sometimes called //reasoning//. \\ \\ \\ {{rdf-meta.gif|RDF Meta}} {{rdf-sesame.png|RDF Sesame}} {{w3c-rdf.png|W3C RDF}} {{w3c-sparql.png|W3C SPARQL}} {{tag>semantic_web RDF SPARQL linked_data}}