Semantic Web/Linked Data

Many of you who have been around computers for as long as I have remember the Semantic Web. It was touted in the late 1990’s and early 2000’s as a way to organise and reason over the Web. This was before the rise of Google and the modern web as we know it today. Mostly, the Semantic Web died because of the problem with Metacrap. However, the idea was resurrected in 2006 as Linked Data. The problem has since been relegated to academia, a few Government projects, like UK’s data.gov.uk, and, interestingly, bio-medical.

However, I have since had a reason to come back to the Semantic Web. I had a problem where I wanted to have a Graph Database for a project on early Medieval Irish genealogies. I wanted an easy to use text based format for storing the data that was also well-known enough to be easily consumed by anyone who might want to use the data. In this case, much to my surprise, Semantic Web technologies were the easiest to use and were suited to the task.

One of the reasons for this is that I constrained the problem to something I could deal with. I was not attempting to model the entire Web, which is what, I believe, the Semantic Web community was attempting to do. This made the problem tractable, the solutions much clearer, and their benefits manifest. Additionally, unlike a random website that did not really want to invest much in metadata, I had a reason for using this particular technology.

Format

So, I had a reason to use the technology. How does this all fit together?

The Resource Description Framework is the basis for all the other technologies in the Semantic Web/Linked Data. However, RDF had a rather rough beginning because it was created right around the same time as XML which as all the rage back in the early 2000s. This created RDF/XML and became the standard way in which RDF was expected to be created and consumed. Unfortunately, RDF/XML is verbose and does not actually fit the Graph all that well. People really did not like it. Thankfully, there are now several formats available but the one I chose was Turtle which is both terse but human readable.

Ontologies

Since I was basically creating this database by hand, I wanted to have to input the least amount of information. One of the benefits of Semantic Web technologies is that you can create logical schemas. In this way, you can apply logic to your data. To do this, you will need to create what is called an Ontology using the Web Ontology Language (OWL). An Ontology serves two purposes. First, it describes the logical implications of your data. Second, it describes some of the kinds of data that is allowed in certain positions. Unlike a SQL Schema, an Ontology does not restrict how data is stored or what types data are. It just stores the logical relations of the data and what implications can be drawn from a certain set of data. This is one of the strengths and weaknesses of OWL. Unlike SQL schemas, it will not stop you from storing data in a way that may be confusing (or even incorrect). This can cause confusion. There is a newly finished specification called the Shapes Constraint Language (SHACL) which does some of this.

Storage and Tooling

The main problem with OWL is that there is a lack of tooling. For SQL, there are a couple of well-known and battle tested Open Source SQL systems (MySQL/MariaDB and PostgreSQL). There is only one (these are often called Triplestores) available for RDF that I know of Blazegraph which has the limitation of not supporting all of the OWL 2 specification. While this is annoying, there is Stardog, while it is closed source, it supports all of the OWL 2 specification and it has a limited Community Edition which you can use. I use Stardog because my database will probably never grow to the point of needing anything other than the Community Edition but you will need to take this in to consideration when thinking about tooling for this.

Querying

For a long time, RDF was, to be honest, rather inert. It existed but you had to figure out for yourself how to find it, store it, and search it. In 2008, the SPARQL specification fixed that. There is now a standard query language for RDF and Triplestores. Honestly, I found the query language specification one of the easiest to read that I have ever encountered. You can pretty much understand it from the examples given in the text of the specification.

One of the strengths here is that you do not find yourself having to use ORMs and other mechanisms to paper over syntax differences between commercial databases. If your Triplestore supports SPARQL, you can rest easy that it will work in mostly the same way across different Triplestores.

Conclusion

I have found working with Linked Data and Semantic web technologies rather fun. This probably has to do with the fact that I am working on a personal project that I find interesting. Additionally, I am working on something that is much more constrained than attempting to model the entire Web. However, I think over the years the Linked Data/Semantic Web story has developed rather nicely if slowly. There is a general lack of open source tooling, especially around a Triplestore that fully supports all the standards.

Would I use this for production services? If you have lots of heterogeneous data that you want to integrate together, yes, I would use it. For your normal everyday CRUD, maybe not but I would not dismiss it out of hand.

Jedi Anchorites and Early Ireland

I went to see the new Star Wars film on my birthday. I feel now that the movie has been out for a few weeks that I could discuss the striking final scene. This final scene is also of great interest to any one wants to understand and appreciate the early Irish subtext of the final scene and how this may play out in the future.

Spoilers Ahead

Citations In the Humanities (Update)

This is a followup from this post. You will want to read that first before continuing.

I just wanted to let everyone know that the British Library now has a means like the Library of Congress to link to specific books. This is called the “British National Bibliography” and is available at http://bnb.data.bl.uk. This should ease the problem of some books not being available from the Library of Congress. For instance, Bechbretha has a URL of http://bnb.data.bl.uk/id/resource/012025232. You can place extensions at the end to get various formats: for example http://bnb.data.bl.uk/id/resource/012025232.rdf will get you the RDF version of the document.

Things are now to the point where we can really drop most citation frameworks and go straight with something that looks like my citation proposal.

Book Review: The Origins of the Irish

The Origins of the Irish, J. P. Mallory, Thames & Hudson: London, 2013, ISBN:9780500051757

To give away the game before starting, I would like to say that this book would be an excellent addition to the library of anyone who has an interest in the pre-history of Ireland. While it is not my area of expertise, it illuminated much which I had already gleaned from the writings of others; however, this was done in a highly engaging and effective way. One of the main advantages of this book is the way in which the key points are gathered at the end of each chapter which reminds the reader of all the foregoing material. This would easily make this a book which should appear in any Celtic Studies course. The additional factor which makes this book even more valuable is the easy style and humour of one who has an absolute command of the primary and secondary material. However, this is not a book without its flaws which will be discussed anon.

Open Access: Second Thoughts

I have always been a proponent of Open Access scholarship. The days where dissemination of scholarship cost a significant amount of money are over. However, I am having some second thoughts. Most of these lie in the fact that, while I like open access, I like academic freedom even more. It is this juncture that bothers me the most.

Open Access began mostly in the sciences as a reaction to the fact that science publishers were continuing to mark-up the amount it cost to purchase journals without thinking of the stagnating and declining library budgets. This has lead to a confrontation between libraries and publishers in the sciences. The outcome of this continuing debate is two forms of Open Access called “Green”, preferred by libraries and university administrators, and “Gold”, preferred by the UK government and publishers. A good discussion of the pros and cons can be found here.

My main concern comes from the fact that, whatever kind of Open Access you choose, they are backed by mandates from funders and university administrators. This is the most problematic part of Open Access from my point of view. The tradition is that scholars knew their audiences and were free to write and research for them in whatever venue they best knew. Now, however, there is a thick layer of “research managers” who are ever more insistent that they know scholarship better than those who actually do the scholarship or research. This, coupled with the statistically dubious impact factor, is now the driving narrative around Open Access.

For the Humanities, all of these debates are being foisted upon them as, in their reality, nothing much has changed. Monographs and journals are still reasonably priced. Scholarship continues just as it has for many, many years. The reality is that most readers who maybe interested in the output of Humanities scholars prefer physical books to ebooks. This means that there just is no market or interest in Open Access online monographs or books.

The point is that university administrators are now using Open Access as a tool of control over the scholastic process, which was always managed by the academics themselves. This is causing a slow moving power struggle between them with Open Access getting a bad name in the Humanities in the process. This is mainly an administrative over-reach and a dismissive attitude towards the Humanities, whether justified or not, by those in positions of influence or power. Although, I will maintain that Humanities scholars have remained myopic in the face of the rise of new communications technologies.

I guess my new stance is that Open Access is a good idea but the implementation is awful and seriously needs re-evaluation in light of the principles of Academic Freedom and a respect for those who actually engage in research and scholarship, which seems missing from the current climate of debate.