December 2011 Archives

A new web site for Marpa

By Jeffrey Kegler on December 26, 2011 5:35 PM

official web site for Marpa. Marpa is attracting new users, to the point where I thought it might be useful to have a site to act as a central directory. The official web site won't have much in the way of new content. With new content, I plan to continue to do what I've been doing -- post it to this blog.

I've started the site with an annotated list of the most important Marpa-related posts in this blog. I hope this will help people newly interested in Marpa figure out where they want to start. Those who've been fol…

3 comments

Marpa::XS is now 1.000000

By Jeffrey Kegler on December 20, 2011 9:34 PM

Marpa::XS is now 1.000000. Marpa::XS is the current lead implementation of Marpa, an algorithm that I hope will become standard for those parsing problems which are too complex for regular expressions. Apparently quite a number of people have put the beta to use. Feedback has been positive -- often extremely so.

What is Marpa?

Marpa is a general BNF parser -- it parses anything you can write in BNF, no exceptions. Left-recursion, right-recursion, ambiguity and even infinite ambiguity, you name it, Marpa parses it. If the …

11 comments

How to parse HTML, part 3

By Jeffrey Kegler on December 14, 2011 5:44 PM

as the problem, it is a very good thing, and not just because it looks pretty. In previous posts, I have described Marpa::HTML, a Marpa-based, "Ruby Slippers" approach to parsing liberal and defective HTML. A major advantage of Marpa::HTML is that it looks like the problem it solves.

HTML parsing: the problem

The problem of parsing an HTML document is essentially the problem of finding the hierarchy of i…

0 comments

Marpa::XS release candidate now available

By Jeffrey Kegler on December 11, 2011 9:17 PM

the latest release of Marpa::XS is a release candidate for the first full release, Marpa::XS 1.000000. Most user's experience with the previous beta releases seems to have been trouble-free. The one significant issue that was identified was a failure to properly evaluate null symbols under an unusual combination of circumstances. This problem (a one line error in the C rewrite of the parse engine) is fixed in this release. Unusual as the issue is, when it does occ…

0 comments

How to parse HTML, part 2

By Jeffrey Kegler on December 7, 2011 11:10 AM

a Marpa-based, "Ruby Slippers" approach to parsing liberal and defective HTML. This post assumes you have read the first post.

First, reduce the HTML to a token stream

Most computer languages can be viewed as a token stream. HTML is not an exception. HTML tokens can be blocks of text; comments and various other SGML entities; HTML element start tags; and HTML element end tags. The HTML token stream is unusual in that some of its toke…