Mark Leighton Fisher [blogs.perl.org]

Stupid Lucene Tricks: Storing Non-Documents

By Mark Leighton Fisher on February 28, 2014 6:00 AM

Lucene's search capabilities are so powerful that it is tempting to store more than documents -- and that is OK. Here are some hints to make storing non-documents easier:

Do you want to allow phrase searches on your fields? A drawback of allowing phrase searches occurs when you keep the synonyms for a field value in that same field for ease of searching (which may well be the right strategy for the Lucene default field). For example, if you are indexing information about sugar beets, you could end with many synonyms about the "sugariness" of the beets when you care mostly about …

0 comments

1-line Endianness Detection in the C Preprocessor

By Mark Leighton Fisher on February 21, 2014 6:00 AM

Yeah, it's evil (or at least chaotic), but...

Go see 1-line Endianness Detection in the C Preprocessor.

(As someone who had a write a C preprocessor (we needed a consistent preprocessor across several architectures), I appreciate this trick.)

3 comments

Xerces-C++ for Validating Against Multiple Schemas

By Mark Leighton Fisher on February 14, 2014 6:00 AM

Xerces-C++ is Apache's C++ implementation of the Xerces XML parser. It turns out that it ships with a simple example program, stdinparse, that can validate your XML (which many tools do) against multiple schemas simultaneously (which few Open Source tools do).

A sample command line could be:

$ ./stdinparse -n -s  /tmp/10.5072__FK250925-xerces.xml

0 comments

POE::Session object_states: handlers are sub names not CODEREFs

By Mark Leighton Fisher on February 12, 2014 8:01 PM

This works as expected:

    sub _poll_start {
        my $self = $_[OBJECT];
    [...]
    POE::Session->create(
        'object_states' => [
            $self => {
                '_start' => '_poll_start',
                'Work'   => '_poll_work',
                '_stop'  => '_poll_stop',
            }
        ],
    );

This, on the other hand, calls the handlers but without filling the @_ array:

    sub _poll_start {
        my $self = $_[OBJECT];    # $self will be undef
    [...]
    POE::Session->create(
        'object_states' => [
    …

2 comments

pmtools v2.0.0 - Now with pmtools::new_pod_iterator()!

By Mark Leighton Fisher on February 7, 2014 9:56 PM

pmtools (Perl Module Tools) v2.0.0 has been unleashed upon the unsuspecting world.

v2.0.0 accommodates when the POD (.pod) file is separate from the module (.pm) file. (I gather that this is the case in upcoming Debian.) As I had to modify both pman and podpath for this change, it was easier to just push that functionality into an iterator-generator routine in the pmtools module itself (the first time the pmtools module has contained any useful code.) I only have 1 data point (my con…