<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Rob Hammond</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/robhammond/" />
    <link rel="self" type="application/atom+xml" href="http://blogs.perl.org/users/robhammond/atom.xml" />
    <id>tag:blogs.perl.org,2009-11-03:/users/robhammond//1726</id>
    <updated>2013-03-15T14:08:58Z</updated>
    <subtitle>A blog about the Perl programming language</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type Pro 4.38</generator>

<entry>
    <title>Social bookmarking in Mojolicious</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/robhammond/2013/03/social-bookmarking-in-mojolicious.html" />
    <id>tag:blogs.perl.org,2013:/users/robhammond//1726.4436</id>

    <published>2013-03-15T14:05:45Z</published>
    <updated>2013-03-15T14:08:58Z</updated>

    <summary>I&apos;ve spent the last year or so of my daily commute building a social bookmarking site that for me has now replaced my use of delicious, and I think offers more than other services such as diigo, Google Bookmarks et...</summary>
    <author>
        <name>Rob Hammond</name>
        <uri>http://robhammond.co/</uri>
    </author>
    
    <category term="bookmarking" label="bookmarking" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="mojolicious" label="mojolicious" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="social" label="social" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/robhammond/">
        <![CDATA[<p>I've spent the last year or so of my daily commute building a social bookmarking site that for me has now replaced my use of delicious, and I think offers more than other services such as diigo, Google Bookmarks et al.</p>

<p>The original site at <a href="https://bkmrx.com/">bkmrx.com</a> was mainly built in PHP & MySQL, however since finishing that site I've taught myself how to use Mojolicious, and did a fairly comprehensive rewrite of the website into a Mojolicious and MongoDB stack. As such it's not a full replication of the features available on bkmrx.com (see the <a href="http://bkmrx.org/about">about page</a> for a comparison).</p>

<p>Now with a new job on the horizon, I want to spend less time on building out a better bookmarking service, but at the same time don't want it to stagnate.</p>

<p>To that end, I've released the code for <a href="https://github.com/robhammond/bkmrx">bkmrx.org on Github</a>, and uploaded a live version of the site at <a href="http://bkmrx.org/">bkmrx.org</a>.</p>

<p>This is the first time I've released a relatively big project onto Github and since I'm by no means a full time developer, go easy on me :) However I'm hoping there are others in the Perl community who might be interested in contributing to, forking or otherwise using the code.</p>

<p>You can read more details on <a href="http://robhammond.co/blog/open-sourcing-bkmrx/">my blog</a> if you're interested.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Web Scraping with Perl &amp; PhantomJS</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/robhammond/2013/02/web-scraping-with-perl-phantomjs.html" />
    <id>tag:blogs.perl.org,2013:/users/robhammond//1726.4254</id>

    <published>2013-02-13T19:27:27Z</published>
    <updated>2013-02-13T19:39:18Z</updated>

    <summary>PhantomJS is a &apos;headless&apos; WebKit browser, mainly intended for use as a web testing framework, and is controlled by a JavaScript API. The &apos;headless&apos; aspect of that also makes the framework extremely useful for scraping JavaScript heavy websites. The problem...</summary>
    <author>
        <name>Rob Hammond</name>
        <uri>http://robhammond.co/</uri>
    </author>
    
        <category term="phantomjs" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="scraping" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="perl" label="perl" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="phantomjs" label="phantomjs" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="scraping" label="scraping" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="webdriver" label="webdriver" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="webkit" label="webkit" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/robhammond/">
        <![CDATA[<p><a href="http://phantomjs.org/">PhantomJS</a> is a 'headless' WebKit browser, mainly intended for use as a web testing framework, and is controlled by a JavaScript API. The 'headless' aspect of that also makes the framework extremely useful for scraping JavaScript heavy websites.</p>

<p>The problem with PhantomJS (up until the v1.8 release on 23 December 2012), was that if you were unfamiliar with JavaScript, CoffeeScript or Node.js (if you were using the Casper.js fork), was that it wasn't very easy understand or control. Since the v1.8 release in December, PhantomJS now supports <a href="http://www.w3.org/TR/2013/WD-webdriver-20130117/">WebDriver</a>, which basically means you can control it from pretty any language you like (although Perl <a href="http://phantomjs.org/release-1.8.html">isn't explicitly mentioned</a>).</p>

<p>Since I like Perl, I decided to give it a go after trying <a href="https://metacpan.org/release/WWW-Mechanize-Firefox">WWW::Mechanize::Firefox</a> + <a href="https://addons.mozilla.org/en-us/firefox/addon/mozrepl/">MozRepl</a>, which is great, but doesn't work if you're going double-headless and are running it on a GUI-less server.</p>

<p>I was previously using <a href="http://mojolicio.us/perldoc/Mojo/UserAgent">Mojo::UserAgent</a> as the scraping agent for this project, however it was ridiculously simple to plug in <a href="https://metacpan.org/module/Selenium::Remote::Driver">Selenium::Remote::Driver</a> to perform the get request and return the fully rendered HTML, back into the awesome <a href="http://mojolicio.us/perldoc/Mojo/DOM">Mojo::DOM</a> parser for easy manipulation of the data. (I found out about <a href="https://github.com/motemen/Wight">Wight</a> which offers more native support for PhantomJS after working on the project, but the below still applies if you just want to use the PhantomJS API.)</p>

<p>All you need to do to get PhantomJS up & running for your scraper is:</p>

<p>1. <a href="http://phantomjs.org/download.html">Install it</a></p>

<p>2. Run the command `phantomjs --webdriver=9134 &` to send PhantomJS into the background as a proxy for your requests</p>

<p>3. Combine with Mojolicious:</p>

<pre><code class="prettyprint">#!/usr/bin/env perl
use Modern::Perl;
use Mojo::DOM;
use Mojo::URL;
use Selenium::Remote::Driver;
my $url = 'http://www.google.co.uk';
# fetch the web page
my $res = _fetch_page($url);
# store the URL as a Mojo::URL object (useful for making links absolute etc)
my $mojo_uri = Mojo::URL->new($url);

<p># check for success of request<br />
if ($res) {<br />
	# Grab an array of the items (allows granular control)<br />
	my $dom = Mojo::DOM->new($res);<br />
	say $dom->at('title')->text;<br />
}</p>

<p>sub _fetch_page {<br />
	my $url = shift;<br />
	my $driver = new Selenium::Remote::Driver('remote_server_addr' => 'localhost',<br />
                                             'port' => '9134',<br />
                                             'browser_name'       => 'chrome',<br />
                                             'platform'           => 'VISTA');<br />
	$driver->get($url);<br />
	my $dom = Mojo::DOM->new( $driver->get_page_source() );<br />
	$driver->quit();<br />
	return $dom;<br />
}</code></pre></p>

<p>It's also stupidly easy to walk through a document's DOM, or even serve up a screengrab of the web page:</p>

<pre><code class="prettyprint">sub screengrab {
	my $self = shift;
	my $url = $self->param('url');
	my $driver = new Selenium::Remote::Driver('remote_server_addr' => 'localhost',
                                             'port' => '9134',
                                             'browser_name'       => 'chrome',
                                             'platform'           => 'VISTA');
	$driver->get($url);
	my $png_base64 = $driver->screenshot();
	$driver->quit();
	$self->render( data => MIME::Base64::decode_base64($png_base64), format => 'png' );
}</code></pre>]]>
        
    </content>
</entry>

<entry>
    <title>Running a Mojolicious non-lite app on a cPanel VPS server</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/robhammond/2013/01/mojolicious-on-cpanel.html" />
    <id>tag:blogs.perl.org,2013:/users/robhammond//1726.4192</id>

    <published>2013-01-10T12:31:56Z</published>
    <updated>2013-01-21T19:47:09Z</updated>

    <summary>I&apos;ve used cPanel/WHM for a very long time as a personal server manager, and up until recently it&apos;s served my purposes pretty well. However since getting into the Mojolicious framework and a few other Perl modules that require a Perl...</summary>
    <author>
        <name>Rob Hammond</name>
        <uri>http://robhammond.co/</uri>
    </author>
    
        <category term="cpanel" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="mojolicious" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="cpanel" label="cpanel" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="mojolicious" label="mojolicious" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="perl" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perlbrew" label="perlbrew" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="sharedhosting" label="shared-hosting" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="vps" label="vps" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/robhammond/">
        <![CDATA[<p>I've used cPanel/WHM for a very long time as a personal server manager, and up until recently it's served my purposes pretty well. </p>

<p>However since getting into the Mojolicious framework and a few other Perl modules that require a Perl version greater than the 5.8 that cPanel is currently tied into, it's become a bit of a nightmare trying to run any Modern Perl apps on it.</p>

<p>While there is some Mojolicious documentation around running an app on Apache, I thought I'd document the exact steps I took to get a non-lite app up & running on my Hostgator server (although I'm sure it would work equally well for Dreamhost etc or other shared hosting solutions):</p>

<p>The steps I took were:</p>

<ol>
	<li>Install <a href="http://perlbrew.pl/">perlbrew</a> in the user account (not root) you want to run the Mojolicious app on</li>
<ul><li>Append the following to your <i>~/.bash_profile</i> file to add the perlbrew command to your path (then exit & reload your shell):<br>
<pre>source ~/perl5/perlbrew/etc/bashrc</pre></li></ul>
	<li>Install Perl 5.10+ using perlbrew (you may need to use the --force option). Then switch to that version by default (as per instructions on the perlbrew website)</li>
	<li>Install <a href="http://mojolicio.us/">Mojolicious</a> and any other required modules using <a href="http://search.cpan.org/dist/App-cpanminus/lib/App/cpanminus.pm#Installing_to_local_perl_%28perlbrew%29">cpan minus</a> (the main CPAN app crashed on my VPS)</li>
	<li>Change the shebang line in your main Mojolicious Perl script (ie <b>script/appname</b>) to point to the perlbrew version in use. I set up a symlink in the user's home directory for this as the path was quite long (see below).</li>
<ul>	
<li>You may also want to add a symlink to the relevant lib directory if you find it's not in the @INC path (also see below).</li>
</ul>
	<li>Edit your <b>.htaccess</b> file - this required a lot of fiddling around but this worked for me:</li>
</ol>

<pre><code class="prettyprint"># set apache handler to treat your specified script name(s) as a CGI program
Options +ExecCGI
&lt;Files ~ "(appname)$"&gt;
  SetHandler cgi-script
&lt;/Files&gt;

<p># rewrite any requests into the app<br />
RewriteEngine on<br />
RewriteCond %{REQUEST_FILENAME} !-f<br />
RewriteRule ^(.*)$ script/appname/$1 [L]</code></pre></p>

<ol start=6>
<li>Then you need to alter the main script's shebang line and add 1-2 more lines to the top of your script:</li>
</ol>

<pre><code class="prettyprint">#!/home/username/user-perl-symlink
# in my case the above points to:
# /home/username/perl5/perlbrew/perls/perl-5.16.2/bin/perl

<p># set env variable to use root for pretty URLs<br />
$ENV{SCRIPT_NAME} = '/';</p>

<p># cpanm in perlbrew appears to install into the following directory by default:<br />
# /home/username/perl5/lib/perl5/<br />
# this isn't in perlbrew's @INC path it seems so I set up a symlink and added a 'use lib' statement<br />
use lib qw(/home/username/user-perl-lib-symlink);<br />
</code></pre></p>

<p>All being well this <i>should</i> give you a fully functioning Mojolicious app behind a default Apache install, on a limited access cPanel shared or VPS server.</p>

<p>If you've got any improvements to this process I'd love to hear them!</p>

<p>* Credit to the thread on Google Groups for a couple of these fixes: <a href="https://groups.google.com/d/topic/mojolicious/bxdlP-MKuIQ/discussion">https://groups.google.com/d/topic/mojolicious/bxdlP-MKuIQ/discussion</a></p>]]>
        
    </content>
</entry>

</feed>
