Perl NLP: Stemming and Lemmatizing

Tom Christiansen will give a talk at YAPC::NA 2012 described as:

Perl is used in the NLP (natural language community) for a variety of tasks. In biomedical texts, words derived from Latin and Greek pose a big problem for English-language stemmers, because existing standard algorithms like Porter and Snowball fail to produce the base lemmas when faced with irregular plurals. 

This talk reviews the problems with existing tools and presents the new Lingua::EN::Biolemmatizer module, which interfaces with the University of Colorado’s “BioLemmatizer” code to produce much more accurate results than were previously available.

[From the YAPC::NA Blog.]

Leave a comment

About JT Smith

user-pic My little part in the greater Perl world.