Automating On Call Jury Instructions
Early in February, I received a jury summons for the United States District Court, Southern District of California. Prospective jurors for federal jury service (at least in this court) are placed on call for a period of about 30 days. I was to call for instructions on April 1 and potentially proceed to do so periodically until May 4 (assuming I wasn’t instructed to report).
Since my initial instruction date was nearly two months away, I created an entry for it in Google Calendar, and promptly forgot about it. On Monday, April 2, I was riding the train to work when I realized that I hadn’t remembered to check my instructions. Fortunately, after arriving at my office and checking my instructions, I had been deferred to the next day.
So I added a new entry in Google Calendar, this time with an SMS reminder. I proceeded to do this for most of April, checking my instructions and duplicating the calender entry with another SMS reminder.
I’m embarrassed to admit that it wasn’t until the last week of April that it occurred to me that I could automate the whole process. After all, isn’t automating drudgery the whole reason I ended up programming Perl in an engineering support group at my day job?
In addition to a telephone recording, jury instructions can be obtained
online. In fact, this
is the method I used all month. The form uses the HTTP POST method, so it
wasn’t a simple matter of constructing an URL to fetch my instructions. While
I could construct a POST request with curl(1)
or the LWP
module, it’s so
much easier to do with with the
WWW::Mechanize
module.
my $mech = WWW::Mechanize->new();
$mech->get('http://jury.casd.uscourts.gov/AppearWeb/Default.aspx');
$mech->submit_form(
form_name => 'Form1',
fields => {
'ctl02$txtPart' => 'PARTICIPANT_ID',
'ctl02$txtZip' => 'ZIP_CODE',
},
button => 'ctl02$btnInstructions',
);
When I’m not supposed to report, the following message appears in the returned content:
<span id="ctl02_lblMsg">Please check again Sunday, April 29, after 6:00pm for further reporting instructions. Do NOT report at this time.</span>
Given how simple this is, I could parse it with a regular expression. But, I
figured it was worth trying to do it right, so I searched CPAN and found the
HTML::DOM
module. I’ve worked a
bit with DOM in JavaScript, so the module appealed to me. Annoyingly, the
parse method only supports file names or file handles. Fortunately, this
isn’t terribly difficult to work around and the whole thing isn’t much more
verbose than using a regular expression.
my $dom = HTML::DOM->new;
$dom->parse_file( IO::Scalar->new( do { my $c = $mech->content; \$c } ) );
my $message = $dom->getElementById('ctl02_lblMsg')->innerHTML;
Now that I have the message what does it say? Thus far my instructions have always been to check again on another day, so I’ll need to work with what I know and defensively code for the exceptions.
if ( $message !~ /Do NOT report at this time/ ) {
# We didn't see the message we wanted to see, so we'd better alert...
}
If I don’t see the known message, I send myself an alert (I happened to use
the Email::Sender
module in the
script) and exit. If this happens, I’ll need to address it as it probably
means I need to report (or I’m no longer on call).
However, if I do see the above message, I need to figure out when I’m supposed to check again. If this fails for some reason (e.g., I don’t know what the format looks like if the day is a single digit), I go through the alert process again. It’s rather important that this script be noisy, given the nature of what I’m doing and the limited knowledge I’m working with.
if ( $message !~ /Please check again (?<weekday>\w+), (?<month>\w+) (?<day>\d+)/ ) {
# We couldn't parse the next date to check, so we'd better alert...
}
my $dt = DateTime::Format::DateParse->parse_datetime("$+{'weekday'}, $+{'day'} $+{'month'} 18:15");
I’ve hard-coded the time to check as 6:15 PM, because the instructions are always updated at 6:00 PM.
Finally, the script schedules itself to run again at the time indicated. Here
I’ve broken out of Perl to use the at(1)
command. Since I’m running the script on my Linode VPS, this seemed an easy
way to accomplish the task of rescheduling.
open my $at, '|-', 'at', $dt->strftime('%R'), $dt->strftime('%F');
say {$at} "$0 2>/dev/null"; # $0 must be fully qualified or in PATH
close $at;
Running this script once will set the rescheduling process in motion, alleviating me of the need to run it again. If I’d thought of this at the beginning of April, I could have forgotten about the whole bother of checking for instructions several times per week. Oh well, live and learn.
I’ve posted the full script as a Gist
on
GitHub.
After going to all of this effort, I thought about outsourcing the work and perhaps offering this type of service to a wider audience. I looked at ifttt and Yahoo! Pipes. Unfortunately, the former doesn’t appear to offer a way of triggering by scraping an arbitrary web page, and the latter doesn’t appear to support the HTTP POST method. If anyone knows of an approach using existing services, I’m open to suggestions.
Mojo::UserAgent
After Joel Berger mentioned
Mojo::UserAgent
in the
comments, I gave it a go.
My initial stab wasn’t particularly successful. Unlike WWW::Mechanize
,
Mojo::UserAgent
was simply returning the main form after performing the HTTP
POST. After a while, I realized that I needed to manually do some of the work
that WWW::Mechanize
was doing for me. Namely, fetch the page, extract the
hidden fields, and submit the form with these fields included (there’s a
cookie involved, but it’s taken care of behind the scenes by both modules).
Because of this, the Mojo::UserAgent
version is a bit more annoying to
write, but I think this is more than made up for with the built-in access to
the DOM.
my $ua = Mojo::UserAgent->new;
my $url = 'http://jury.casd.uscourts.gov/AppearWeb/Default.aspx';
my $res = $ua->get($url)->res; # initial fetch to get cookie and form fields
my $tx = $ua->max_redirects(3)->post_form(
$url => {
'__VIEWSTATE' => $res->dom('form#Form1 > input#__VIEWSTATE')->[0]->attrs('value'),
'__EVENTVALIDATION' => $res->dom('form#Form1 > input#__EVENTVALIDATION')->[0]->attrs('value'),
'ctl02$txtPart' => 'PARTICIPANT_ID',
'ctl02$txtZip' => 'ZIP_CODE',
'ctl02$btnInstructions' => 'Reporting Instructions',
}
);
$res = $tx->success or die $tx->error;
my $message = $res->dom('span#ctl02_lblMsg')->[0]->text;
WWW::Scripter
As I was working on the Mojo::UserAgent
, I kept thinking how perfect my
script would be if WWW::Mechanize
gave me access to the DOM in the same way.
Well, as I was updating my Gist with my new jury-mojo.pl
script,
cpansprout left a comment to not only
tell me how I could remove my use of IO::Scalar
but that WWW::Scripter
does exactly what I was wishing for.
my $mech = WWW::Scripter->new();
$mech->get('http://jury.casd.uscourts.gov/AppearWeb/Default.aspx');
$mech->submit_form(
form_name => 'Form1',
fields => {
'ctl02$txtPart' => 'PARTICIPANT_ID',
'ctl02$txtZip' => 'ZIP_CODE',
},
button => 'ctl02$btnInstructions',
);
my $message = $mech->document->getElementById('ctl02_lblMsg')->innerHTML;
I like this last version the most and have updated my Gist accordingly. Also, my automation worked and emailed me tonight (30 April 2012) to inform me that my jury service has concluded.
Without meaning to start the flamewar that I know will follow: you might want to look at Mojo::UserAgent for these tasks. It has a built-in DOM and it all works nicely together.
Mojo::UserAgent looks neat, thanks for the pointer. Maybe, as a demonstration, I'll write a version of my script using it. I suspect my local Perl Mongers group would enjoy seeing different approaches to solving this problem.
The following is a rant:
Here's the thing that really, really bugs me: Somewhere there is some government IT guy sitting, building retirement funds putting together web pages like this. Looking at the page, I see:
<noscript>
Javascript is disabled. You must use a javascript-enabled browser in order to view this page.
<br /><br />
<a href="http://www.microsoft.com/windows/ie/downloads/critical/ie6sp1/default.asp" target="_blank">Click here to download the latest Internet Explorer updates</a><br />
and other crap.
1. Why would one need JavaScript for this?
Presumably because the drone who put the page together did not know about
<input type="password" ...
2. Why, oh why, do you have a link to download IE6SP1? I know, I know, it was the state of the art when the page was written.
In any case, if I really wanted to tinker, I would have tried out Twilio's transcription service.
More ranting:
The best part is the following gem in the source:
<a href="http://bobby.watchfire.com/" title="Bobby's Home Page"></a>
JavaScript required and Bobby (via Archive.org)!
Chris: If you do please post a link, I would be curious to see the difference!
As requested, I've posted a version using Mojo::UserAgent. As a bonus, I've updated the original to use WWW::Scripter.
All that effort just to get a single “it’s over, you can go home” mail… :-)
This makes for a very interesting comparison. I had never heard of WWW::Scripter, so now I have something new to look at. Well done sir!