Semantic Web/Linked Data

Many of you who have been around computers for as long as I have remember the Semantic Web. It was touted in the late 1990’s and early 2000’s as a way to organise and reason over the Web. This was before the rise of Google and the modern web as we know it today. Mostly, the Semantic Web died because of the problem with Metacrap. However, the idea was resurrected in 2006 as Linked Data. The problem has since been relegated to academia, a few Government projects, like UK’s data.gov.uk, and, interestingly, bio-medical.

However, I have since had a reason to come back to the Semantic Web. I had a problem where I wanted to have a Graph Database for a project on early Medieval Irish genealogies. I wanted an easy to use text based format for storing the data that was also well-known enough to be easily consumed by anyone who might want to use the data. In this case, much to my surprise, Semantic Web technologies were the easiest to use and were suited to the task.

One of the reasons for this is that I constrained the problem to something I could deal with. I was not attempting to model the entire Web, which is what, I believe, the Semantic Web community was attempting to do. This made the problem tractable, the solutions much clearer, and their benefits manifest. Additionally, unlike a random website that did not really want to invest much in metadata, I had a reason for using this particular technology.

Format

So, I had a reason to use the technology. How does this all fit together?

The Resource Description Framework is the basis for all the other technologies in the Semantic Web/Linked Data. However, RDF had a rather rough beginning because it was created right around the same time as XML which as all the rage back in the early 2000s. This created RDF/XML and became the standard way in which RDF was expected to be created and consumed. Unfortunately, RDF/XML is verbose and does not actually fit the Graph all that well. People really did not like it. Thankfully, there are now several formats available but the one I chose was Turtle which is both terse but human readable.

Ontologies

Since I was basically creating this database by hand, I wanted to have to input the least amount of information. One of the benefits of Semantic Web technologies is that you can create logical schemas. In this way, you can apply logic to your data. To do this, you will need to create what is called an Ontology using the Web Ontology Language (OWL). An Ontology serves two purposes. First, it describes the logical implications of your data. Second, it describes some of the kinds of data that is allowed in certain positions. Unlike a SQL Schema, an Ontology does not restrict how data is stored or what types data are. It just stores the logical relations of the data and what implications can be drawn from a certain set of data. This is one of the strengths and weaknesses of OWL. Unlike SQL schemas, it will not stop you from storing data in a way that may be confusing (or even incorrect). This can cause confusion. There is a newly finished specification called the Shapes Constraint Language (SHACL) which does some of this.

Storage and Tooling

The main problem with OWL is that there is a lack of tooling. For SQL, there are a couple of well-known and battle tested Open Source SQL systems (MySQL/MariaDB and PostgreSQL). There is only one (these are often called Triplestores) available for RDF that I know of Blazegraph which has the limitation of not supporting all of the OWL 2 specification. While this is annoying, there is Stardog, while it is closed source, it supports all of the OWL 2 specification and it has a limited Community Edition which you can use. I use Stardog because my database will probably never grow to the point of needing anything other than the Community Edition but you will need to take this in to consideration when thinking about tooling for this.

Querying

For a long time, RDF was, to be honest, rather inert. It existed but you had to figure out for yourself how to find it, store it, and search it. In 2008, the SPARQL specification fixed that. There is now a standard query language for RDF and Triplestores. Honestly, I found the query language specification one of the easiest to read that I have ever encountered. You can pretty much understand it from the examples given in the text of the specification.

One of the strengths here is that you do not find yourself having to use ORMs and other mechanisms to paper over syntax differences between commercial databases. If your Triplestore supports SPARQL, you can rest easy that it will work in mostly the same way across different Triplestores.

Conclusion

I have found working with Linked Data and Semantic web technologies rather fun. This probably has to do with the fact that I am working on a personal project that I find interesting. Additionally, I am working on something that is much more constrained than attempting to model the entire Web. However, I think over the years the Linked Data/Semantic Web story has developed rather nicely if slowly. There is a general lack of open source tooling, especially around a Triplestore that fully supports all the standards.

Would I use this for production services? If you have lots of heterogeneous data that you want to integrate together, yes, I would use it. For your normal everyday CRUD, maybe not but I would not dismiss it out of hand.

Leave a comment

About cyocum

user-pic Celticist, Computer Scientist, Nerd, sometimes a poet…