official web site
for Marpa.
Marpa is attracting new users,
to the point where I thought it might be useful to have a site to act as
a central directory.
The official web site won't have much in the way of new content.
With new content,
I plan to continue to do
what I've been doing -- post it to this blog.
I've started the site with an annotated list of the
most important Marpa-related posts in this blog.
I hope this will help people newly interested in
Marpa figure out where they want to start.
Those who've been fol…
Marpa::XS is now
1.000000.
Marpa::XS is the current lead implementation of Marpa,
an algorithm that I hope will become
standard for
those parsing problems which are too
complex for regular expressions.
Apparently quite a number of people have put
the beta to use.
Feedback has been positive -- often extremely so.
What is Marpa?
Marpa is a general BNF parser --
it parses anything you can write in BNF, no exceptions.
Left-recursion, right-recursion, ambiguity and
even infinite ambiguity, you name it, Marpa parses it.
If the …
as the problem,
it is a very good thing,
and not just because it looks pretty.
In
previous posts,
I have described
Marpa::HTML,
a Marpa-based, "Ruby Slippers"
approach to parsing liberal
and defective HTML.
A major advantage
of
Marpa::HTML
is that it looks like
the problem it solves.
HTML parsing: the problem
-
The problem of parsing an HTML document
is essentially
the problem of finding
the hierarchy of i…
the latest release of Marpa::XS
is a release candidate for the first full release,
Marpa::XS 1.000000.
Most user's experience with the previous beta releases
seems to have been trouble-free.
The one significant issue that was identified
was a failure to properly evaluate null symbols under
an unusual combination of circumstances.
This problem
(a one line error in the C rewrite of the parse engine)
is fixed in this release.
Unusual as the issue is,
when it does occ…
a Marpa-based, "Ruby Slippers"
approach to parsing liberal
and defective HTML.
This post assumes you have
read
the first post.
First, reduce the HTML to a token stream
Most computer languages can be viewed
as a token stream.
HTML is not an exception.
HTML tokens can be blocks of text;
comments and various other SGML entities;
HTML element start tags;
and HTML element end tags.
The HTML token stream is unusual in that
some of its toke…