Stupid Lucene Tricks: Full-Text Search Without the Full Text

This is a really stupid (and kludgy) Stupid Lucene Trick: how do you search the full text of your document + metadata without the full text? Simple - you create a separate field (call it fulltext, for example), then populate it with the text of all of your other fields. Voilà! You can now search against "fulltext" for all of the text in your Lucene document without having to know whether, for example, "physical" is in the document body, the abstract, or only in the keywords.

Note that fulltext should be created dynamically from the fields for the particular document, rather than from a fixed set of named fields so you maximize the amount of text you can search against.

And yes, you are duplicating the text of the all of the other fields in this field :(.

Leave a comment

About Mark Leighton Fisher

user-pic Perl/CPAN user since 1992.