The original site at bkmrx.com was mainly built in PHP & MySQL, however since finishing that site I've taught myself how to use Mojolicious, and did a fairly comprehensive rewrite of the website into a Mojolicious and MongoDB stack. As such it's not a full replication of the features available on bkmrx.com (see the about page for a comparison).
Now with a new job on the horizon, I want to spend less time on building out a better bookmarking service, but at the same time don't want it to stagnate.
To that end, I've released the code for bkmrx.org on Github, and uploaded a live version of the site at bkmrx.org.
This is the first time I've released a relatively big project onto Github and since I'm by no means a full time developer, go easy on me :) However I'm hoping there are others in the Perl community who might be interested in contributing to, forking or otherwise using the code.
You can read more details on my blog if you're interested.
]]>However will definitely investigate those alternatives for future projects, thanks for sharing!
]]>The problem with PhantomJS (up until the v1.8 release on 23 December 2012), was that if you were unfamiliar with JavaScript, CoffeeScript or Node.js (if you were using the Casper.js fork), was that it wasn't very easy understand or control. Since the v1.8 release in December, PhantomJS now supports WebDriver, which basically means you can control it from pretty any language you like (although Perl isn't explicitly mentioned).
Since I like Perl, I decided to give it a go after trying WWW::Mechanize::Firefox + MozRepl, which is great, but doesn't work if you're going double-headless and are running it on a GUI-less server.
I was previously using Mojo::UserAgent as the scraping agent for this project, however it was ridiculously simple to plug in Selenium::Remote::Driver to perform the get request and return the fully rendered HTML, back into the awesome Mojo::DOM parser for easy manipulation of the data. (I found out about Wight which offers more native support for PhantomJS after working on the project, but the below still applies if you just want to use the PhantomJS API.)
All you need to do to get PhantomJS up & running for your scraper is:
1. Install it
2. Run the command `phantomjs --webdriver=9134 &` to send PhantomJS into the background as a proxy for your requests
3. Combine with Mojolicious:
#!/usr/bin/env perl
use Modern::Perl;
use Mojo::DOM;
use Mojo::URL;
use Selenium::Remote::Driver;
my $url = 'http://www.google.co.uk';
# fetch the web page
my $res = _fetch_page($url);
# store the URL as a Mojo::URL object (useful for making links absolute etc)
my $mojo_uri = Mojo::URL->new($url);
# check for success of request
if ($res) {
# Grab an array of the items (allows granular control)
my $dom = Mojo::DOM->new($res);
say $dom->at('title')->text;
}
sub _fetch_page {
my $url = shift;
my $driver = new Selenium::Remote::Driver('remote_server_addr' => 'localhost',
'port' => '9134',
'browser_name' => 'chrome',
'platform' => 'VISTA');
$driver->get($url);
my $dom = Mojo::DOM->new( $driver->get_page_source() );
$driver->quit();
return $dom;
}
It's also stupidly easy to walk through a document's DOM, or even serve up a screengrab of the web page:
sub screengrab {
my $self = shift;
my $url = $self->param('url');
my $driver = new Selenium::Remote::Driver('remote_server_addr' => 'localhost',
'port' => '9134',
'browser_name' => 'chrome',
'platform' => 'VISTA');
$driver->get($url);
my $png_base64 = $driver->screenshot();
$driver->quit();
$self->render( data => MIME::Base64::decode_base64($png_base64), format => 'png' );
}
]]>
However since getting into the Mojolicious framework and a few other Perl modules that require a Perl version greater than the 5.8 that cPanel is currently tied into, it's become a bit of a nightmare trying to run any Modern Perl apps on it.
While there is some Mojolicious documentation around running an app on Apache, I thought I'd document the exact steps I took to get a non-lite app up & running on my Hostgator server (although I'm sure it would work equally well for Dreamhost etc or other shared hosting solutions):
The steps I took were:
source ~/perl5/perlbrew/etc/bashrc
# set apache handler to treat your specified script name(s) as a CGI program
Options +ExecCGI
<Files ~ "(appname)$">
SetHandler cgi-script
</Files>
# rewrite any requests into the app
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ script/appname/$1 [L]
#!/home/username/user-perl-symlink
# in my case the above points to:
# /home/username/perl5/perlbrew/perls/perl-5.16.2/bin/perl
# set env variable to use root for pretty URLs
$ENV{SCRIPT_NAME} = '/';
# cpanm in perlbrew appears to install into the following directory by default:
# /home/username/perl5/lib/perl5/
# this isn't in perlbrew's @INC path it seems so I set up a symlink and added a 'use lib' statement
use lib qw(/home/username/user-perl-lib-symlink);
All being well this should give you a fully functioning Mojolicious app behind a default Apache install, on a limited access cPanel shared or VPS server.
If you've got any improvements to this process I'd love to hear them!
* Credit to the thread on Google Groups for a couple of these fixes: https://groups.google.com/d/topic/mojolicious/bxdlP-MKuIQ/discussion
]]>