Perl NLP: Stemming and Lemmatizing

Tom Christiansen will give a talk at YAPC::NA 2012 described as:

Perl is used in the NLP (natural language community) for a variety of tasks. In biomedical texts, words derived from Latin and Greek pose a big problem for English-language stemmers, because existing standard algorithms like Porter and Snowball fail to produce the base lemmas when faced with irregular plurals. 

This talk reviews the problems with existing tools and presents the new Lingua::EN::Biolemmatizer module, which interfaces with the University of Colorado’s “BioLemmatizer” code to produce much more accurate results than were previously available.

[From the YAPC::NA Blog.]

App::ArchiveDevelCover 1.000

You can read about my latest module that helps you archive coverage reports generated with Devel::Cover in this post about App::ArchiveDevelCover on my blog. There is even a screen shot!

The new ORLite 2.0 learns some amazing SQLite tricks

ORLite is a light weight SQLite-specific ORM which is particularly handy for working with ad-hoc SQLite database and creating internal database APIs for large applications, most prominently the database API inside of Padre.

Aligning so closely with the features of a single database engine keeps the implementation size down to a minimum, at less than 1000 lines of code, and allows ORLite to do things that would be completely impossible in more general ORMs.

This is particularly true in the upcoming 2.0 release, a preview of which is available now.

http://svn.ali.as/cpan/releases/ORLite-1.90.tar.gz

This new major revision embraces SQLite's slightly unique rowid mechanism, allowing it to accurately distinguish between different copies of identical data.

Lets start with the following database in the file adam.sqlite

Divide and Conquer

At least that's the goal. I've split Catalyst::Model::REST in two, so there's a new Role::REST::Client as well as the original distribution.

Originally I started CMR because I needed to access some REST services from a Catalyst application. There was nothing useful around, so I wrote a simple Catalyst Model and wrapped it up on CPAN.

It's been further developed to when there has been new requirements, and a couple of days ago the nice people on #catalyst suggested to split the functionality into a distribution of its own. The use case being that a role can be applied to an already existing Catalyst Model. Or to any other class that needs to connect to a REST server.

So here it is. Use Role::REST::Client to access REST services from your Moose based Class.
Use Catalyst::Model::REST if you just want to start a new Catalyst Model Class,

Conference Hotel SOLD OUT!

Unfortunately, due to the huge number of attendees at YAPC::NA 2012 this year, we’ve sold out the original conference hotel. We still have lots of rooms at the dorms, which you can book for only $42 per night for a single or $63 per night for a double.

However, we know many of you want to stay in a hotel. So we’ve arranged for an additional block of rooms at another hotel. The Hilton DoubleTree is only five blocks away from the conference facilities, and rooms under our group rate are going for $159 per night. Click here to make a reservation. The group code is TPF. If you want one of these rooms, book fast, or you’ll have to either stay in the dorm, or get a hotel that is farther away. Also, this block of rooms dematerializes on May 10th if they are not already sold out by then.

[From the YAPC::NA Blog.]

How I roll - choosing the epigraph for 5.15.8

Thinking about the epigraph of my Perl release started about 10 days before the release. Certainly, I had been unconsciously mulling back and forth about killer quotes from books or other media for longer than that. About 10 days ago, I made the short list of two books that could give interesting quotes.

One was "Friday" by R.A. Heinlein. I've always liked the book, and it opens with the introduction of the protagonist as she kills a pursuer and stuffs him into a cabinet, reacting only on a hunch. Heinleins way of laconic writing should have made for an interesting quote from that scene.

RFC: Single or multiple instances of ORM objects?

In our homegrown ORM we have an in-memory cache, which enables us to ensure that only one instance of any object is live in memory at any one time.

In other words:


    $one = MyObject->get(123);
    $two = MyObject->get(123);

    refaddr($one) == refaddr($two)

I find this setup useful because:

  • if you update one copy of the object, all other copies automatically update
  • get’ing the object again is cheap

When I do a search against the DB, it returns a list of objects, which I can then retrieve (in bulk) from:

Delhi.pm Perl Monger user group: Need suggestions

Hello,
Please visit my blog article at http://pradeeppant.com/2012/02/20/delhi-pm-perl-monger-user-group-need-suggestions/

I am looking for suggestions to revive Delhi.pm. A non-active Perl monger user group in New Delhi, India.

I have already received some valuable suggestions in comments section of my blog.

More ideas are welcome.

Thanks

Hardware Hackathon Talks

The hackathon and hardware hackathon have proven so popular that we’ve already sold out. However, we’ve acquired an additional room that will be available for hacking through-out the entire YAPC::NA 2012 conference. We did this to ensure there’s always a space to spread out and collaborate on projects. 

Some people are planning on giving talks or small demonstrations at the already sold out Hardware Hackathon. For example Robert Blackwell will give a talk called Moving Servo Motors with Perl and Andrew Rodland will give a talk called A Man With Two Watches is Never Sure

The hardware hackathon will be freeform for the most part, but if you would like to give a talk or a demonstration on the official schedule, go ahead and submit it. We’ll get it on the schedule. Also, in the notes indicate which of the 5 days of the hardware hackathon you’d like to give your talk. You’ll likely get a bigger audience if you do it in one of the first two days, however, you’d have to have already purchased your badge for the Hackathon since it’s already sold out. 

YAPC is nothing if not about collaboration and sharing ideas. We want to make sure everybody has that opportunity, so that’s why we’ve extended the Hackathon to be all 5 days. 

[From the YAPC::NA Blog.]

Using WebKit to generate PDF slides

Today we can find a lot of presentations written in HTML or any other variant (XHTML, HTML5, etc) using JavaScript frameworks such as s5 or deck.js.

These frameworks allow the creator to do advanced presentations with simple HTML. This has a lot of advantages as the user

  • is in full control of the layout
  • can easily embed images, links, code, etc
  • can use a revision control tool

Although one major drawback of such frameworks is sharing the slides. Of course the slides can be put online on any HTTP server and be easily read from a browser, but not everyone has access to a public web server.

Date arithmetic can be dangerous

Every time I review code others have written, I blame people for doing date arithmetic of their own. However, some time ago, I received a pull request for a module that had some date arithmetic inside. As all tests passed, I could not see something dangerous in it and followed the pull request. Today, I found the date tests failing. Why? Why today? Well, this is worth some investigation.

The main part of the module generates an HTTP-Header using this construct ($c is the mocked catalyst context, expire_in is a method containing the nr of seconds to expire in):

$c->response->headers->expires(time() + $self->expire_in)
    if ....some_condition...

Well, adding a number of seconds to an epoch value cannot hurt. Can it? The test looked like this:

Some new releases

I released Test::File 1.33 yesterday, which fixed a minor MANIFEST glitch with 1.32, which I released three days ago.  (I know it’s been discussed before, but I guess I never really appreciated it: it sure would be nice if CPAN Testers could report stuff like that).  Version 1.32 fixes a number of CPAN RT tickets (in fact, it pretty much closes out all the open bugs), most of which you won’t care about.  If you happen to be using Windows, this may fix a number of test failures, although there are still a few left that I’m working with schwern to fix.  (If you are running Windows and you happen to see some mysterious errors which boil down to the fact that “skip” isn’t the same as “SKIP”, it’s definitely safe to ignore those.)

Why You Hate Writing Documentation (and What I'm Doing About It)

Rocco Caputo will give a talk at YAPC::NA 2012 described as:

Documentation is anathema to hackers. Releasing early and often is much harder when every code change requires an editorial pass to an ever-growing body of documentation. The common solutions are to either not document anything, to let the documentation fall into disrepair, or to release late and not so often. What’s a fun-loving but conscientious hacker to do?

[From the YAPC::NA Blog.]

Learning Perl Challenges

I'm starting a series of Learning Perl Challenges at www.learning-perl.com, the blog I maintain for Learning Perl. While I was posting about the Student Workbook for Learning Perl, I started thinking about the difference for exercises based on a particular chapter or feature, and capstone exercises that would use anything or everything in the book.

The first challenge is to reimplement which to find programs based on a pattern instead of an exact name.

A milestone for Alien::Base

I have been working on a set of base classes intended to make creating a new Alien:: distribution for some library as easy as making a simple Module::Build based distro. So far the code isn’t on CPAN yet, follow its progress on GitHub.

I haven’t been feeling so well today, so I have been sitting around watching movies (which I own on DVD) on TV. Of course I can’t sit still that long without doing anything so Alien::Base saw a burst of activity today.

Along with testing I am also keeping an Alien::Base-based Alien::GSL (which provides the Gnu Scientific Library) in the examples folder. The big news today is that this example distro can now query the GNU FTP server, pick the newest version of the library. It then downloads, extracts and builds the library in a temporary folder. Finally it “installs” the library in a File::ShareDir directory in the Alien::GSL root/share directory. Even this isn’t as cool as how it does this:

cpXXXan is moving ...

... although you probably won't notice.

Executive summary: your disks hate you

Until about 20 minutes ago, cpXXXan ran in a virtual machine on a box that I rent. That box also hosts VMs for CPANdeps, for some of my own CPAN-testing activities, and a few other things. I did it that way because it was cheap and convenient. However, over the three years that it's been running (gosh, is it really that long?!?) this has become a rather, umm, "sub-optimal" solution.

That's because the CPAN has got much larger, as has the number of CPAN-testers reports. Even worse, the rate of increase of both has been consistently increasing. This means that the amount of work to be done for the daily imports of new data, both for cpXXXan and for CPANdeps, has increased dramatically. This means that the jobs take longer, and scheduling them has become a Hard Problem.

Hackathon SOLD OUT!

The Hackathon & pre-conference Hardware Hackathon at YAPC::NA 2012 has sold out! Now all the pre-conference activities have completely sold out.

We have less than 50 tickets remaining for YAPC::NA 2012 before all 400 of those are sold out as well. If you’ve been procrastinating about whether to buy your ticket now or later, don’t wait. They’ll be gone soon. Buy your badge today!

[From the YAPC::NA Blog.]

The Perl Learning Environment

For YAPC::NA, I'm creating a new course called "From Zero to Perl" (although I'll probably actually call it "0..Perl"). JT Smith wants to create not only new Perlers, but new programmers, and he wants to start them with Perl. I'm up for the challenge. However, there are some things that you might have opinions and suggestions on.

The Learning Perl course I teach assumes that you already know how to program, just not in Perl. Some non-programmers do alright, many struggle, and a few outright fail. Most of those have nothing to do with Perl as a language. Programming as a way of thinking is hard, especially for the complex things people what to do right away with Perl. It's easy to make a turtle draw geometric shapes, it's not conceptually easy to design a blogging platform.

Why is this "use" a syntax error?

Many Perl developers are unaware that they can assert a module version with an import list at the same time. For example:

use Test::More 0.96 tests => 13;

However, the following is a syntax error:

use Test::More  .96 tests => 13;

Frankly, I don't know why. Here's a program which demonstrates my confusion. It exhibits more or less the same behavior on 5.8.9, 5.10.1, 5.12.4 and 5.14.2.

And the output is:

Video Recordings for YAPC::Europe 2012

We are looking at recording some or all talks at YAPC::Europe. The most promising option for recording and publishing the talks seems to be to hire a professional team. We don't want to hire that team just to find out that posting of the talk material is unwanted, like Andrew did in Riga.

As a way forward, we will likely ask for the (audio/video) publishing rights on your talk if you submit one. This will not mean that your talk will necessarily get recorded and published, because we don't know whether we will record all rooms on all days. But all other things being equal, we will give submitted talks preference that allow us to publish the video afterwards.

If you think that recording talks is a waste of money and time, as nobody will watch them anyway, please also comment below. It would save us a great deal of organization if there is consensus that videos are undesireable anyway.

About blogs.perl.org

blogs.perl.org is a common blogging platform for the Perl community. Written in Perl with a graphic design donated by Six Apart, Ltd.