Database Wiki's

As some of you may know, I have been involved in a project at the University of Edinburgh’s School of Informatics. This project has attempted to solve the problem of curated databases. These databases are used mainly in the biological sciences and are very expensive due to the fact that you need experts to enter the data.

How do I fit into this? Well, I got on the project because I have friends in the Celtic Studies department who study and research placenames in Scotland. Now, placenames have much in the way of solid data about them. For instance, Ordinance Servey Northing and Eastings, height from sea level, etc. This data would nicely fit into a database and I know that you are thinking GIS but that is not really what this is about. It is not the strict data about a placename that is important. What is important about a placename is the scholarly discussion and research that surrounds it. This kind of knowledge cannot be easily captured in a user-friendly way.

A note about users is in order here. The most sophisticated of them know about relational databases but do not use them. I have spent some time teaching a few scholars in the Edinburgh area about SQL and how do use it. The normal user in this instance thinks that Word is difficult and hard to use. Just as aside, I also think Word is difficult and hard to use but for vastly different reasons.

So, Peter Buneman was thinking of curated databases and infinite annotation on them and how to achieve these goals. Thus we have been working on this for a while now and finally we have a paper based on our work (the paper is forthcoming and should be available soon; I will update this when it is).

One reason to read the paper is to get an introduction to the Links system which I am sure you might be interested in looking at.

At this point though, I will be getting together with a friend of mine who does placenames full time and give him a poke around the system as it stands.

2 Comments

There are many defined schemas available for the life sciences. I’m curious (being a BioPerl developer) as to your take with GMOD and Chado, which is meant to be an extensible database schema. Or, for a more sequence-centric one, BioSQL.

From the Perl perspective, have you looked at CouchDB? Or, maybe some Perl-based solutions such as KiokuDB?

All in all, sounds like a very interesting paper, so I would definitely be interested in more details.

Leave a comment

About cyocum

user-pic Celticist, Computer Scientist, Nerd, sometimes a poet…