Automating On Call Jury Instructions

Early in February, I received a jury summons for the United States District Court, Southern District of California. Prospective jurors for federal jury service (at least in this court) are placed on call for a period of about 30 days. I was to call for instructions on April 1 and potentially proceed to do so periodically until May 4 (assuming I wasn’t instructed to report).

Since my initial instruction date was nearly two months away, I created an entry for it in Google Calendar, and promptly forgot about it. On Monday, April 2, I was riding the train to work when I realized that I hadn’t remembered to check my instructions. Fortunately, after arriving at my office and checking my instructions, I had been deferred to the next day.

So I added a new entry in Google Calendar, this time with an SMS reminder. I proceeded to do this for most of April, checking my instructions and duplicating the calender entry with another SMS reminder.

I’m embarrassed to admit that it wasn’t until the last week of April that it occurred to me that I could automate the whole process. After all, isn’t automating drudgery the whole reason I ended up programming Perl in an engineering support group at my day job?

In addition to a telephone recording, jury instructions can be obtained online. In fact, this is the method I used all month. The form uses the HTTP POST method, so it wasn’t a simple matter of constructing an URL to fetch my instructions. While I could construct a POST request with curl(1) or the LWP module, it’s so much easier to do with with the WWW::Mechanize module.

my $mech = WWW::Mechanize->new();
$mech->get('http://jury.casd.uscourts.gov/AppearWeb/Default.aspx');
$mech->submit_form(
    form_name => 'Form1',
    fields    => {
        'ctl02$txtPart' => 'PARTICIPANT_ID',
        'ctl02$txtZip'  => 'ZIP_CODE',
    },
    button => 'ctl02$btnInstructions',
);

When I’m not supposed to report, the following message appears in the returned content:

<span id="ctl02_lblMsg">Please check again Sunday, April 29, after 6:00pm for further reporting instructions. Do NOT report at this time.</span>

Given how simple this is, I could parse it with a regular expression. But, I figured it was worth trying to do it right, so I searched CPAN and found the HTML::DOM module. I’ve worked a bit with DOM in JavaScript, so the module appealed to me. Annoyingly, the parse method only supports file names or file handles. Fortunately, this isn’t terribly difficult to work around and the whole thing isn’t much more verbose than using a regular expression.

my $dom = HTML::DOM->new;
$dom->parse_file( IO::Scalar->new( do { my $c = $mech->content; \$c } ) );
my $message = $dom->getElementById('ctl02_lblMsg')->innerHTML;

Now that I have the message what does it say? Thus far my instructions have always been to check again on another day, so I’ll need to work with what I know and defensively code for the exceptions.

if ( $message !~ /Do NOT report at this time/ ) {
    # We didn't see the message we wanted to see, so we'd better alert...
}

If I don’t see the known message, I send myself an alert (I happened to use the Email::Sender module in the script) and exit. If this happens, I’ll need to address it as it probably means I need to report (or I’m no longer on call).

However, if I do see the above message, I need to figure out when I’m supposed to check again. If this fails for some reason (e.g., I don’t know what the format looks like if the day is a single digit), I go through the alert process again. It’s rather important that this script be noisy, given the nature of what I’m doing and the limited knowledge I’m working with.

if ( $message !~ /Please check again (?<weekday>\w+), (?<month>\w+) (?<day>\d+)/ ) {
    # We couldn't parse the next date to check, so we'd better alert...
}

my $dt = DateTime::Format::DateParse->parse_datetime("$+{'weekday'}, $+{'day'} $+{'month'} 18:15");

I’ve hard-coded the time to check as 6:15 PM, because the instructions are always updated at 6:00 PM.

Finally, the script schedules itself to run again at the time indicated. Here I’ve broken out of Perl to use the at(1) command. Since I’m running the script on my Linode VPS, this seemed an easy way to accomplish the task of rescheduling.

open my $at, '|-', 'at', $dt->strftime('%R'), $dt->strftime('%F');
say {$at} "$0 2>/dev/null"; # $0 must be fully qualified or in PATH
close $at;

Running this script once will set the rescheduling process in motion, alleviating me of the need to run it again. If I’d thought of this at the beginning of April, I could have forgotten about the whole bother of checking for instructions several times per week. Oh well, live and learn.

I’ve posted the full script as a Gist on GitHub.

After going to all of this effort, I thought about outsourcing the work and perhaps offering this type of service to a wider audience. I looked at ifttt and Yahoo! Pipes. Unfortunately, the former doesn’t appear to offer a way of triggering by scraping an arbitrary web page, and the latter doesn’t appear to support the HTTP POST method. If anyone knows of an approach using existing services, I’m open to suggestions.

Mojo::UserAgent

After Joel Berger mentioned Mojo::UserAgent in the comments, I gave it a go.

My initial stab wasn’t particularly successful. Unlike WWW::Mechanize, Mojo::UserAgent was simply returning the main form after performing the HTTP POST. After a while, I realized that I needed to manually do some of the work that WWW::Mechanize was doing for me. Namely, fetch the page, extract the hidden fields, and submit the form with these fields included (there’s a cookie involved, but it’s taken care of behind the scenes by both modules).

Because of this, the Mojo::UserAgent version is a bit more annoying to write, but I think this is more than made up for with the built-in access to the DOM.

my $ua  = Mojo::UserAgent->new;
my $url = 'http://jury.casd.uscourts.gov/AppearWeb/Default.aspx';
my $res = $ua->get($url)->res;  # initial fetch to get cookie and form fields
my $tx  = $ua->max_redirects(3)->post_form(
    $url => {
        '__VIEWSTATE'       => $res->dom('form#Form1 > input#__VIEWSTATE')->[0]->attrs('value'),
        '__EVENTVALIDATION' => $res->dom('form#Form1 > input#__EVENTVALIDATION')->[0]->attrs('value'),
        'ctl02$txtPart'         => 'PARTICIPANT_ID',
        'ctl02$txtZip'          => 'ZIP_CODE',
        'ctl02$btnInstructions' => 'Reporting Instructions',
    }
);

$res = $tx->success or die $tx->error;
my $message = $res->dom('span#ctl02_lblMsg')->[0]->text;

WWW::Scripter

As I was working on the Mojo::UserAgent, I kept thinking how perfect my script would be if WWW::Mechanize gave me access to the DOM in the same way. Well, as I was updating my Gist with my new jury-mojo.pl script, cpansprout left a comment to not only tell me how I could remove my use of IO::Scalar but that WWW::Scripter does exactly what I was wishing for.

my $mech = WWW::Scripter->new();
$mech->get('http://jury.casd.uscourts.gov/AppearWeb/Default.aspx');
$mech->submit_form(
    form_name => 'Form1',
    fields    => {
        'ctl02$txtPart' => 'PARTICIPANT_ID',
        'ctl02$txtZip'  => 'ZIP_CODE',
    },
    button => 'ctl02$btnInstructions',
);

my $message = $mech->document->getElementById('ctl02_lblMsg')->innerHTML;

I like this last version the most and have updated my Gist accordingly. Also, my automation worked and emailed me tonight (30 April 2012) to inform me that my jury service has concluded.

8 Comments

Without meaning to start the flamewar that I know will follow: you might want to look at Mojo::UserAgent for these tasks. It has a built-in DOM and it all works nicely together.

The following is a rant:

Here's the thing that really, really bugs me: Somewhere there is some government IT guy sitting, building retirement funds putting together web pages like this. Looking at the page, I see:

<noscript>
Javascript is disabled. You must use a javascript-enabled browser in order to view this page.
<br /><br />
<a href="http://www.microsoft.com/windows/ie/downloads/critical/ie6sp1/default.asp" target="_blank">Click here to download the latest Internet Explorer updates</a><br />

and other crap.

1. Why would one need JavaScript for this?

Presumably because the drone who put the page together did not know about <input type="password" ...

2. Why, oh why, do you have a link to download IE6SP1? I know, I know, it was the state of the art when the page was written.

In any case, if I really wanted to tinker, I would have tried out Twilio's transcription service.

More ranting:

The best part is the following gem in the source:

<a href="http://bobby.watchfire.com/" title="Bobby's Home Page"></a>

JavaScript required and Bobby (via Archive.org)!

Chris: If you do please post a link, I would be curious to see the difference!

All that effort just to get a single “it’s over, you can go home” mail… :-)

This makes for a very interesting comparison. I had never heard of WWW::Scripter, so now I have something new to look at. Well done sir!

Leave a comment

About Chris Grau

user-pic I use Perl to support an EDA engineering organization. Occasionally I write about the things I do with Perl.