September 2010 Archives

Database Wiki's

As some of you may know, I have been involved in a project at the University of Edinburgh’s School of Informatics. This project has attempted to solve the problem of curated databases. These databases are used mainly in the biological sciences and are very expensive due to the fact that you need experts to enter the data.

How do I fit into this? Well, I got on the project because I have friends in the Celtic Studies department who study and research placenames in Scotland. Now, placenames have much in the way of solid data about them. For instance, Ordinance Servey Northing and Eastings, height from sea level, etc. This data would nicely fit into a database and I know that you are thinking GIS but that is not really what this is about. It is not the strict data about a placename that is important. What is important about a placename is the scholarly discussion and research that surrounds it. This kind of knowledge cannot be easily captured in a user-friendly way.

A note about users is in order here. The most sophisticated of them know about relational databases but do not use them. I have spent some time teaching a few scholars in the Edinburgh area about SQL and how do use it. The normal user in this instance thinks that Word is difficult and hard to use. Just as aside, I also think Word is difficult and hard to use but for vastly different reasons.

So, Peter Buneman was thinking of curated databases and infinite annotation on them and how to achieve these goals. Thus we have been working on this for a while now and finally we have a paper based on our work (the paper is forthcoming and should be available soon; I will update this when it is).

One reason to read the paper is to get an introduction to the Links system which I am sure you might be interested in looking at.

At this point though, I will be getting together with a friend of mine who does placenames full time and give him a poke around the system as it stands.

Writing Large Systems in Perl is a Privilege Not A Right

I have had an experience which has caused me to reevaluate my relationship with Perl. Members of my local PM have also had to deal with this for the last six months or so. After a post-pub conversation with pozorvlak the other night, I have come the conclusion that while writing large systems in Perl is possible and sometimes even desirable, this is a privilege of showing that you can be disciplined and not a right.

Perl 5, at least, allows you to have many different ways of attacking a programming problem. This can be a benefit but it comes with two large problems. First, if your programmer is not disciplined, Perl is almost guaranteed to give you spaghetti code. Second, if your programmer is inexperienced, Perl will give you something that works but will be unmaintainable by either the programmer themselves or an experienced programmer. Perl does not teach nor require good programming practice.

When the programming problem is small, this does not usually cause unsolvable problems. However, as the problem/system grows and these systems get into production, the lack of the enforcement of instilling proper large scale programming techniques leaves Perl in the dust when thinking of other programming languages such as Java or Ocaml.

Now, yes, I do realize that you can write spaghetti code in any language. If you let an inexperienced programmer without proper mentoring at any language, you will get an unmanageable mess.

I guess what I am trying to say is Perl is great for people who “just want it to work” and do not describe themselves as programmers. Perl is also great for those who are experienced and disciplined programmers who want to build large systems. Where Perl breaks down is the “squishy middle” of people. Remember most people are lazy and undisciplined. Believe me, I am very lazy and undisciplined so I know whereof I speak. Perl allows this to an extreme when building large systems and this causes them to become, if not tackled directly at a management level, unmanageable messes of code that “works” but it becomes exponentially more dangerous to add anything to it. Even when you get disciplined coders in to work on it, the mess left behind by previous generations of code that cannot be improved.

The idea of test driven development is a community driven answer to this problem. “Write tests!”, they say. Again, this assumes that the programmer is disciplined. If not, the tests can become as bad as the code itself. Test driven development is a virtuous activity and should be integrated in to any development situation in any language. However, this will not solve the inherent problems within the language itself. This also assumes that the programmer’s management cares about cleaning up code messes. For an example of this, see all COBAL still infesting the world. Once something is written, management generally could not give a toss whether it is good or not and they only care when something breaks. One interesting research project would be to have a code analysis system that could give management a percentage chance of code failing. Basically, you cannot talk to management without some kind of “metric”. Logic does not seem to penetrate their collective skulls.

A secondary thing that I want to bring up is that Perl is slow. It is very, very slow. Now that may not bother many but it does bother me from time to time. I would like something that is a more efficient usage of resources. Mostly because using less clock cycles is bound to use less power and is thus more “green”. My grandparents lived through the last depression and learned the valuable lesson of “waste not; want not” and I think we should seriously consider applying that to programming.

So, I guess my want is that if I were doing programming interviews for Perl jobs, I would want to see code examples that were not in Perl. I would want to see proper use of modularity, function/class use, and other examples that would show me that the person under consideration is disciplined in their use of all the systems available in a programming language to write good, large systems.

Anyway, I guess that is my rant for the day. A programming language should enforce certain kinds of discipline on the programmer. Perl 5 does not do this well enough for me to feel comfortable writing large systems in it.

Translating LaTeX to Word: Pandoc

As some of you might notice from my old tech blog, I often have problems with my colleagues in the Humanities because they use Word and I use LaTeX. This cause me recently to have a lost day as I had to translate a PDF by hand into Open Office so I could send it in Word to my editor.

In my search to make this process much less arduous, I believe that I have found the panacea: pandoc. I tried it on my XeLaTeX file for my current article and it worked very, very well. It outputs in OpenOffice format and from there, it is easy to translate into Word.

There are two problems (as there always are). First, it does not handle BibTeX at all so you must copy and paste that information by hand from your PDF. Second, it mangled the Greek that I had in the file which means that pandoc does not handle UTF-8 very well at some point in the process of producing the Open Office file. I will need to file a bug report. Other than that, however, I am very impressed

use.perl refugee

While things over at use.perl get sorted out, I have decided to move my technical blogging over to this space. I know I am not the most prolific tech blogger out there but I like from time to time to post a few things here and there about tech stuff.

About cyocum

user-pic Celticist, Computer Scientist, Nerd, sometimes a poet…