<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Holy Zarquon&apos;s Singing Fish</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/holy_zarquons_singing_fish/" />
    <link rel="self" type="application/atom+xml" href="http://blogs.perl.org/users/holy_zarquons_singing_fish/atom.xml" />
    <id>tag:blogs.perl.org,2009-11-03:/users/holy_zarquons_singing_fish//23</id>
    <updated>2012-12-27T02:32:36Z</updated>
    <subtitle>A blog about the Perl programming language</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type Pro 4.38</generator>

<entry>
    <title>Zotero/Perl integration</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/holy_zarquons_singing_fish/2012/12/zoteroperl-integration.html" />
    <id>tag:blogs.perl.org,2012:/users/holy_zarquons_singing_fish//23.4156</id>

    <published>2012-12-27T02:21:30Z</published>
    <updated>2012-12-27T02:32:36Z</updated>

    <summary>Zotero is in my opinion the best solution to citation management available. It&apos;s a firefox/xulrunner based solution for collecting and maintaining bibliographic data and it&apos;s associated full text. Zotero has a cloud based API but also an internal (largely undocumented)...</summary>
    <author>
        <name>Holy Zarquon&apos;s Singing Fish</name>
        <uri>http://www.uow.edu.au/~kd21</uri>
    </author>
    
    <category term="perlzoterobibliography" label="perl zotero bibliography" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/holy_zarquons_singing_fish/">
        <![CDATA[<p><a href="http://zotero.org">Zotero</a> is in my opinion the best solution to citation management available.  It's a firefox/xulrunner based solution for collecting and maintaining bibliographic data and it's associated full text.  Zotero has a cloud based API but also an internal (largely undocumented) Javascript API.  I find the JS API much more interesting than the cloud API due to having total control over my data (and works offline etc etc...).</p>

<p>Two frustrations I've had with a lack of good alternative to the msword integration code, and  problems with interacting with the internals of a running firefox/xulrunner.  I finally cracked open mozrepl, and the cpan module <a href="https://metacpan.org/module/MozRepl">MozRepl</a> and prototyped a bidirectional bridge between Zotero and perl.</p>

<p>The code is <a href="https://github.com/singingfish/Citeproc-Markdown">here</a>, and should be a useful start for anyone who wants to do any bibliographic data mangling in perl.  I'm not going to publish it to CPAN until I work out how to test it properly.</p>]]>
        
    </content>
</entry>

<entry>
    <title>... I like to push the pramalot ...</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/holy_zarquons_singing_fish/2012/10/-i-like-to-push-the-pramalot.html" />
    <id>tag:blogs.perl.org,2012:/users/holy_zarquons_singing_fish//23.3955</id>

    <published>2012-10-14T21:09:17Z</published>
    <updated>2012-10-14T21:18:35Z</updated>

    <summary>During a recent trip to Sydney I visited the Camelot Lounge. Nothing to do with small children, or arthurian legends, it contains much camel memorabilia. Being a very short train ride (two stops) from the OSDC venue, I thought it...</summary>
    <author>
        <name>Holy Zarquon&apos;s Singing Fish</name>
        <uri>http://www.uow.edu.au/~kd21</uri>
    </author>
    
    <category term="camel" label="camel" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/holy_zarquons_singing_fish/">
        <![CDATA[During a recent trip to Sydney I visited the <a href="http://camelotlounge.wordpress.com/">Camelot Lounge</a>.  Nothing to do with small children, or arthurian legends, it contains much camel memorabilia.  Being a very short train ride (two stops) from the <a href="http://www.osdc.com.au/">OSDC venue</a>, I  thought it might be a good place to have a perl gathering as part of the conference.  Apologies for poor quality of the the photos below, but you get thie idea.  By the way their food is reasonably priced (pizza and pides), and tasty.

<object type="text/html" data="http://www.flickr.com/slideShow/index.gne?group_id=&user_id=62903259@N00&set_id=72157631770180482&tags=Camel,lot" width="500" height="500"></object><br/><small>Created with <a href="http://www.admarket.se" title="Admarket.se">Admarket's</a> <a href="http://flickrslidr.com" title="flickrSLiDR">flickrSLiDR</a>.</small>]]>
        
    </content>
</entry>

<entry>
    <title>Unusual uses for perl</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/holy_zarquons_singing_fish/2011/01/unusual-uses-for-perl.html" />
    <id>tag:blogs.perl.org,2011:/users/holy_zarquons_singing_fish//23.1325</id>

    <published>2011-01-09T11:02:32Z</published>
    <updated>2011-01-09T11:44:46Z</updated>

    <summary>The alternative title for this post would be Best Holiday Ever where I got to show my family some of the parts of Indonesia I visited as a child, and where we got to visit a hotel in a national...</summary>
    <author>
        <name>Holy Zarquon&apos;s Singing Fish</name>
        <uri>http://www.uow.edu.au/~kd21</uri>
    </author>
    
    <category term="fingers" label="fingers" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/holy_zarquons_singing_fish/">
        <![CDATA[<p>The alternative title for this post would be <em><strong>Best Holiday Ever</strong></em> where I got to show my family some of the parts of Indonesia I visited as a child, and where we got to visit a <a href="http://www.ecolodgesindonesia.com/rimba/index.html">hotel in a national park in Borneo</a> in which my family has a business interest.  In case you think this is a shameless plug, the vast majority of the income generated by the hotel goes back to the local people -- the investors are facilitators, for whom there <i>may be a long term return</i>. </p>

<p>Anyway, the point of my post was that the organisers of German Perl Workshop were good enough to invite me to talk to them earlier this year, and at that time I obtained one of the wooden <a href="https://www.socialtext.net/perl5/index.cgi?tuit">round tuits</a> being distributed by <a href="">The Perl Foundation</a>.  As a part of our trip to Borneo, we had to take some anti-malarial drugs.  My wife and I had to take a whole tablet each. My daughter 3/4 of a tablet, and my son, 1/2.  This meant some chopping up.  Well TPF's mini-chopping board came in handy:</p>

<p><a href="http://www.flickr.com/photos/singingfish42/5337484220/" title="DSCN0063 by singingfish42, on Flickr"><img src="http://farm6.static.flickr.com/5207/5337484220_d1766ac32f.jpg" width="500" height="375" alt="DSCN0063" /></a></p>

<p>A nice portable mini-chopping borard, which with my habit of shaving with a disposable razor meant that I could prepare medication daily for the two weeks that we needed it with the minimal risk of lascerating myself.  Big thanks to The Perl Foundation for helping me keep my programming fingers :)</p>]]>
        
    </content>
</entry>

<entry>
    <title>Final Perl Survey Grant Report</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/holy_zarquons_singing_fish/2010/11/final-perl-survey-grant-report.html" />
    <id>tag:blogs.perl.org,2010:/users/holy_zarquons_singing_fish//23.1203</id>

    <published>2010-11-25T20:22:27Z</published>
    <updated>2010-11-25T21:27:33Z</updated>

    <summary>The final report for the Perl Survey is now available, after many delays. The report is fairly bare bones, but it should be sufficient for you to get a handle on the structure of the Perl community (or at least...</summary>
    <author>
        <name>Holy Zarquon&apos;s Singing Fish</name>
        <uri>http://www.uow.edu.au/~kd21</uri>
    </author>
    
    <category term="perlsurvey2010" label="perl survey 2010" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/holy_zarquons_singing_fish/">
        <![CDATA[<p>The <a href="http://survey.perlfoundation.org/Data-PerlSurvey-2010/R">final report</a> for the Perl Survey is now available, after many delays.  The report is fairly bare bones, but it should be sufficient for you to get a handle on the structure of the Perl community (or at least the sample who responded to the survey).  I've gone for the approach that I'm presenting salient findings, rather than overwhelming you with detail.</p>

<p>There is a lot of data summarised in this report, and rather than producing a long turgid document with every possible analysis that I can think of, I thought that the better way to approach things would be to make a fairly short summary report so that people can ask questions, or request any additional information via the comments here, <a href="mailto:zarquon@cpan.org">by email</a>, or by grabbing me (kd) on irc.perl.org.</p>

<p>The most important outcome from this grant is that all of the R code I've written is solid enough ensure that we can run the survey again in a couple of years time, and quickly get longitudinal comparitive data out of it (so long as the questions remain reasonably similar).  R isn't the easiest language in the world to work with, and I'm pretty pleased with some of the data management gymnastics I performed in getting the survey data management tools working reasonably well.</p>

<p>Oh yes, the data and analysis is freely redistributable under the same conditions as Perl itself :)</p>]]>
        
    </content>
</entry>

<entry>
    <title>Fun with recursive anonymous subroutines</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/holy_zarquons_singing_fish/2010/10/fun-with-recursive-anonymous-subroutines.html" />
    <id>tag:blogs.perl.org,2010:/users/holy_zarquons_singing_fish//23.1117</id>

    <published>2010-10-18T23:59:17Z</published>
    <updated>2010-10-19T04:41:49Z</updated>

    <summary>I&apos;m doing lots of work with representing stuff stored in the file system as trees at the moment as part of my toolkit for open source qualitative research software. One of the things I need to do (for making reports)...</summary>
    <author>
        <name>Holy Zarquon&apos;s Singing Fish</name>
        <uri>http://www.uow.edu.au/~kd21</uri>
    </author>
    
    <category term="subrefscoperecursive" label="subref scope recursive" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/holy_zarquons_singing_fish/">
        <![CDATA[<p>I'm doing lots of work with representing stuff stored in the file system as trees at the moment as part of my <a href="http://github.com/singingfish/Text-TranscriptMiner">toolkit for open source qualitative research software</a>.

<p>One of the things I need to do (for making reports) is to transform this:

<pre><code
> [ [qw/foo bar/],
   [qw/a b /],
   [qw/x y/], ];</code></pre>

<p>into this tree structure:

<pre><code
> {
   'foo' => {
       'some_data' => 'lvl0',
       'children' => {'a' => {
           'some_data' => 'lvl1',
           'children' => { 'y' => 'leaf', 'x' => 'leaf' } },
                      'b' => {
                          'some_data' => 'lvl1',
                          'children' => {
                              'y' => 'leaf', 'x' => 'leaf' }}}}};</code></pre>

<p>Being a nice golf problem I thought I'd ask on irc if there was a  hacker better than me who felt like taking a look at this.  ribasushi++ obviously had a little procrastination time available and  wrote me a nice solution which I needed to make into a closure via a recursive subref:

<script src="http://gist.github.com/633326.js?file=gistfile1.pl"></script>

<p>There are two gotchas with this code.  Firstly we need to weaken() the coderef before calling it in order to prevent a memory leak.  The second problem is that we have to jump through some hoops to make the coderef in scope  when $visit->(@args) is called inside itself.  There are three ways of dealing with this.  I could use <a href="http://search.cpan.org/perldoc?Sub::Recursive">Sub::Recursive</a>, but the implementation is apparently non-ideal as it uses local, which is expensive.  A more robust way via the CPAN is to use <a href="http://search.cpan.org/perldoc?Sub::Current">Sub::Current</a> which provides a special function ROUTINE, which returns a code reference pointing at the currently executing subroutine.  This is probably the correct way to do things.  But the solution I plumped for (for now) is a little scope trick:

<pre><code
>my ($visit, $v); 
$v = $visit = sub { ... $visit->(@args) ... };
weaken($visit);
my $data $visit->();</code></pre>

<p>This is more fragile in terms of memory leak protection than using Sub::Current, but for my purposes is adequate for the present day.  I have of course added a comment to the effect that I may need to upgrade to Sub::Current at some point in the future. (Thanks to mst++ for explaining this stuff to me in words that I could understand).

<p>Now I have to work out how to get the real data into the leaf nodes :)
]]>
        
    </content>
</entry>

<entry>
    <title>Telling you about my not-yet-acute-enough itches.</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/holy_zarquons_singing_fish/2010/09/telling-you-about-my-not-yet-acute-enough-itches.html" />
    <id>tag:blogs.perl.org,2010:/users/holy_zarquons_singing_fish//23.1008</id>

    <published>2010-09-14T00:05:47Z</published>
    <updated>2010-09-14T00:40:50Z</updated>

    <summary>This was going to be a reply to cyocum, but I&apos;ve promoted it to a full post in its own right. Given my desire to eliminate Microsoft Word from my life as much as possible, pandoc the round trip markdown...</summary>
    <author>
        <name>Holy Zarquon&apos;s Singing Fish</name>
        <uri>http://www.uow.edu.au/~kd21</uri>
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/holy_zarquons_singing_fish/">
        <![CDATA[<p>This was going to be a reply to <a href="http://blogs.perl.org/users/cyocum/2010/09/translating-latex-to-word-pandoc.html">cyocum</a>, but I've promoted it to a full post in its own right.  </p>

<p>Given my desire to eliminate Microsoft Word from my life as much as possible, <a href="http://johnmacfarlane.net/pandoc/">pandoc</a> the round trip markdown to pretty much any other markup format parser written in haskell is of great interest to me.  However, I haven't got around to using it with <a href="http://code.haskell.org/citeproc-hs/">citeproc-hs</a>, the  haskell implementation for for the emerging standard <a href="http://xbiblio.sourceforge.net/csl/">Citation Style Language</a>, due to lack of comprehensive enough documentation - and the <a href="http://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar">itch</a> isn't strong enough for me to scratch it yet, and likely won't be for the next eighteen months or so.</p>

<p>Here are two further related itches that are also not going to be scratched by me in the near future:</p>

<ul> 
<li>A round trip arbitrary-markup-formatter in perl like pandoc is for haskell.  
<li> Possibly of interest to the <a href="http://news.open-bio.org/news/category/obf-projects/bioperl/">bioperl</a> guys would be perl bindings for the emerging standard <a href="http://xbiblio.sourceforge.net/csl/">Citation Style Language</a>
</ul>

<p>Just putting it out there in case anyone else is feeling itchy ;)</p>]]>
        
    </content>
</entry>

<entry>
    <title>Making Microsoft Word less painful</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/holy_zarquons_singing_fish/2010/09/making-microsoft-word-less-painful.html" />
    <id>tag:blogs.perl.org,2010:/users/holy_zarquons_singing_fish//23.975</id>

    <published>2010-09-06T13:08:06Z</published>
    <updated>2010-09-06T13:40:52Z</updated>

    <summary>Personally I think that Microsoft word is the worst, most widely deployed piece of software ever, and I despise it with a vengance. Over the years I&apos;ve learned to use it in such a way that it doesn&apos;t egregiously waste...</summary>
    <author>
        <name>Holy Zarquon&apos;s Singing Fish</name>
        <uri>http://www.uow.edu.au/~kd21</uri>
    </author>
    
    <category term="gitmswordcommithook" label="git msword commit-hook" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/holy_zarquons_singing_fish/">
        <![CDATA[<p>Personally I think that Microsoft word is the worst, most widely deployed piece of software ever, and I despise it with a vengance.  Over the years I've learned to use it in such a way that it doesn't egregiously waste my time, but I'm never going to like it.</p>

<p>One of my many many annoyances is that track changes sucks, and it doesn't work well with the <a href="http://zotero.org">bibliographic software that I use</a>.  Unfortunately all my colleagues use word, so I don't really have a choice.  But I've just started a new major piece of writing, and I want to be able to diff it properly.</p>

<p>So here's a little pre-commit hook that I made to commit plain text versions of my word docs in a separate tree off the root dir of the git repository.  The only downside is that it's OS X specific because I use the built in textutil command to do the .doc to .txt conversion, but <a href="http://www.winfield.demon.nl/">antiword</a> or <a href="http://freshmeat.net/projects/catdoc/">catdoc</a> would probably be better.</p>

<p><b>Update:</b> I decided I was not terribly happy with the output of textutil, so I changed it over to antiword, which works much better, both for layout, and with the field codes that my bibliographic manager uses.  I also fixed a rather nasty bug that stopped it working with modified files.</p>  

<script src="http://gist.github.com/567045.js"> </script>

]]>
        
    </content>
</entry>

<entry>
    <title>Well that was painful</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/holy_zarquons_singing_fish/2010/08/well-that-was-painful.html" />
    <id>tag:blogs.perl.org,2010:/users/holy_zarquons_singing_fish//23.872</id>

    <published>2010-08-11T06:16:13Z</published>
    <updated>2010-08-11T06:51:56Z</updated>

    <summary>I just blew my time budget on the Perl survey stuff for looking at programming language info, and I want to offload what I had to do to get to this point. Firstly the tl;dr: You can see the complete...</summary>
    <author>
        <name>Holy Zarquon&apos;s Singing Fish</name>
        <uri>http://www.uow.edu.au/~kd21</uri>
    </author>
    
    <category term="perlsurveyrresults" label="perl survey R results" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/holy_zarquons_singing_fish/">
        <![CDATA[<p>I just blew my time budget on the Perl survey stuff for looking at programming language info, and I want to offload what I had to do to get to this point.</p>

<p>Firstly the tl;dr:  You can see the complete results for language usage <a href="http://survey.perlfoundation.org/Data-PerlSurvey-2010/R/10_language_info/">at the survey website</a>.</p>

<p>Here are the gory details (just a note: the <code>&lt;-</code> operator is a bit like the <code>=</code> operator in other languages, but there's also a <code>-&gt;</code> assignment operator too, which means you can do all sorts of clever things in 1 line of code):</p>

<p>What we asked was for the 5 programing languages that you use most, roughly in order of how much they're used.  Then we asked where perl came in that list.  </p>

<p>I'm doing this in R because it's optimised for statistical calculations.  In this case though the operations required were a mixure of the kind of thing that R is good at, and the kind of thing that perl is good at.  I decided that dropping out to perl wherever wasn't going to be a terribly good option, as I wanted to have the analysis file self contained.  Anyway, <a href="http://github.com/singingfish/Data-PerlSurvey-2010/raw/final_report/R/10_language_info/analysis.R">here's the source code</a>, let me talk you through it.  The code is in <a href="http://r-project.org">R</a></p>

<p>So if we think about the original data file as if it's a database, then we have the language name stored in its rank column. So the most used language is in column <code>language$most.used</code>, the second most is in the column <code>language$second.most</code> and so on through to 5.  Finally the position of perl is stored in a separate column so that eventually we'll do a shuffle along one procedure to splice perl into the list correctly.</p>

<p>Source code time:</p>

<pre><code>load("10_language_info.RData") # grab data we sliced out of the raw data earlier
for (i in 1:5) data[,i] &lt;- tolower(data[,i]) # make the text all lower case
</code></pre>

<p>Here we're normalising the list of languages.  The raw data has about 350 languages.  In fact this is actually around 67 unique languages, but people are inconsistent:</p>

<pre><code>## languages &lt;- sort(unique(unlist(c(sapply(data[1:5],levels)),use.names=FALSE))) 
# write.csv(languages,file="lang.csv")
#  some manual processing to normalise the languages list a bit 
languages &lt;- read.csv(file="lang.csv")


for (i in 1:length(languages[,1])) data &lt;- replace(data, data==toString(languages[i,1]), toString(languages[i,2]))
data &lt;- replace(data, data=="rubh", "ruby") # a typo we missed in normalisation process
</code></pre>

<p>Now rather than having one column per rank, we need one column per language.  The following code for each rank explodes the column into the number of columns that corresponds to the the number of languages mentioned in that column (a standard statistical procedure called creating a dummy variable).  We do this for each rank:</p>

<pre><code>library(dummies) # like use Whatever; in perl

# it's like a bomb in a mannequin factory!
l1 &lt;- dummy.data.frame(as.data.frame(data$language_1))
l2 &lt;- dummy.data.frame(as.data.frame(data$language_2))
l3 &lt;- dummy.data.frame(as.data.frame(data$language_3))
l4 &lt;- dummy.data.frame(as.data.frame(data$language_4))
l5 &lt;- dummy.data.frame(as.data.frame(data$language_5))

# this is just to make the column names prettier for display
names(l1) &lt;- sub("^data.language_.","",names(l1))
names(l2) &lt;- sub("^data.language_.","",names(l2))
names(l3) &lt;- sub("^data.language_.","",names(l3))
names(l4) &lt;- sub("^data.language_.","",names(l4))
names(l5) &lt;- sub("^data.language_.","",names(l5))
</code></pre>

<p>At this point each new data frame (the container we put the exploded variables in if you like) is just numbers 1 and 0 - we want to make this number the actual rank if it isn't zero:</p>

<pre><code>l1 &lt;- replace(l1,l1==1,1)
l2 &lt;- replace(l2,l2==1,2)
l3 &lt;- replace(l3,l3==1,3)
l4 &lt;- replace(l4,l4==1,4)
l5 &lt;- replace(l5,l5==1,5)
</code></pre>

<p>And we need a complete list of all languages mentioned.</p>

<pre><code>languages.list &lt;- unique(names(c(l1,l2,l3,l4,l5)))
</code></pre>

<p>This is the variable that's the number of responses:</p>

<pre><code>cases &lt;- rep(0,length(l1[,1]))
</code></pre>

<p>Then we need to make sure there are the same number of columns for each data frame:</p>

<pre><code>for ( i in 1:length(languages.list) ) {
    if ( length (l1[[languages.list[i]]]) == 0 ) {
        l1[[languages.list[i]]] &lt;- cases
    }
    if ( length (l2[[languages.list[i]]]) == 0 ) {
        l2[[languages.list[i]]] &lt;- cases
    }
    if ( length (l3[[languages.list[i]]]) == 0 ) {
        l3[[languages.list[i]]] &lt;- cases
    }
    if ( length (l4[[languages.list[i]]]) == 0 ) {
        l4[[languages.list[i]]] &lt;- cases
    }
    if ( length (l5[[languages.list[i]]]) == 0 ) {
        l5[[languages.list[i]]] &lt;- cases
    }
 }
</code></pre>

<p>And then we add the 5 data frames together in a matrix addition operation (a data frame is basically a special matrix).  Note that we need to make sure that each data frame returns the columns in the same order.</p>

<pre><code>all.langs &lt;- l1[names(l1)] +l2[names(l1)] +l3[names(l1)] +l4[names(l1)] + l5[names(l1)]

perl &lt;- data$where_perl_belongs_in_list

insert.perl.order &lt;- function(row.idx) {
    x &lt;- all.langs[row.idx,]
    y &lt;- perl[row.idx]
    change.logical &lt;- which(x &gt;= y);
    all.langs[row.idx,change.logical] &lt;- x[change.logical]+1
}

all.langs$perl &lt;- perl #append it to the data frame
</code></pre>

<p>We're at an important point here, because <code>all.langs</code> can be glued to other R data structures created in other parts of the survey, so at some point we can work out which programmers are the smartest perl programmers (or something like that).</p>

<p>Next up there's some jiggery-pokery to make sure that R knows that there are six levels of variable in the data frame, otherwise it will only report on the actual counts, and won't report zeros:</p>

<pre><code>for (i in 1:length(names(all.langs)) ) all.langs[,i] &lt;- factor(all.langs[,i],levels=c(1:6) )
</code></pre>

<p>This single line of code generates a counts for all languages:</p>

<pre><code>lang.summary &lt;- sapply(all.langs, summary)
</code></pre>

<p>In case you hadn't realised R is a functional language, and a variant of lisp.</p>

<p>Finally we want a report, so we make a new data frame that contains the summary statistics:</p>

<pre><code>report.df &lt;- data.frame('Most used'=integer(),
                    'Second most'=integer(),
                    'Third most'=integer(),
                    'Fourth most'=integer(),
                    'Fifth most'=integer(),
                    'Sixth most'=integer(),
                    'Non-users'=integer(),
                    'Total-users'=integer(),
                    'Percent users'=numeric(),
                    'Mean Rank'=numeric()
                    )
</code></pre>

<p>And we have to iterate through the <code>lang.summary</code> matrix to append lines to the report data frame:</p>

<pre><code>for ( i in 1:report.rows ) {
    this.lang &lt;- names(lang.summary[1,])[i]
    this.counts &lt;- lang.summary[,i]

    # UGLY and possibly FRAGILE hack to remove non-perl users from counts
    this.counts[7] &lt;- this.counts[7] - lang.summary[7,dim(lang.summary)[2]]

    this.counts[8] &lt;- sum(this.counts[1:6])
    this.counts[9] &lt;- round(this.counts[8]/sum(this.counts[1:7]) * 100,2)
    this.counts[10] &lt;- round(sum(this.counts[1:6]*c(1:6))/sum(this.counts[1:6]),2)
    report.df[i,] &lt;- this.counts
    rownames(report.df[1,]) &lt;- this.lang
}

rownames(report.df) &lt;- names(lang.summary[1,])
</code></pre>

<p>Note the comment in the above code.  Getting all of this working just right was very fiddly and has taken more than three times my allocated time budget on this for the last three days.  You'll also notice that the code is very procedural, for a functional language.  This is basically because statistical computing is often long periods of exploration on the command line (or GUI), followed by consolidating what you've done into a script.  The script is basically a duplicate of the procedure you went through on the command line.  On the other hand, it's done now, it's reasonably robust, and it's replicatable for future runs of the survey.</p>
]]>
        

    </content>
</entry>

<entry>
    <title>A thousand mile journey starts with one step</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/holy_zarquons_singing_fish/2010/08/a-thousand-mile-journey-starts-with-one-step.html" />
    <id>tag:blogs.perl.org,2010:/users/holy_zarquons_singing_fish//23.830</id>

    <published>2010-08-04T23:14:37Z</published>
    <updated>2010-08-04T23:20:08Z</updated>

    <summary>I&apos;m slowly (one section per day, on top of my other responsibilities) getting through the questions for the Perl Survey data. My initial step is to convert as much as the quick and dirty code I wrote for German Perl...</summary>
    <author>
        <name>Holy Zarquon&apos;s Singing Fish</name>
        <uri>http://www.uow.edu.au/~kd21</uri>
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/holy_zarquons_singing_fish/">
        <![CDATA[<p>I'm slowly (one section per day, on top of my other responsibilities) getting through the questions for the Perl Survey data. My initial step is to convert as much as the quick and dirty code I wrote for <a href="http://conferences.yapceurope.org/gpw2010/talk/2711">German Perl Workshop</a>, into replicable <a href="http://www.r-project.org">R</a> code, so that we have similar graphics and reports.  Once I'm through that I'll start writing some text to explain the graphs, and take some key cross comparisons between different aspects of the results.  You can see the results as they are put together on the <a href="http://github.com/singingfish/Data-PerlSurvey-2010/tree/final_report">final_report branch on github</a>.</p>

<p>tl;dr: <i>The end is in sight!</i></p>]]>
        
    </content>
</entry>

<entry>
    <title>Oops Hiatus</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/holy_zarquons_singing_fish/2010/07/oops-hiatus.html" />
    <id>tag:blogs.perl.org,2010:/users/holy_zarquons_singing_fish//23.800</id>

    <published>2010-07-28T07:57:20Z</published>
    <updated>2010-07-28T08:01:11Z</updated>

    <summary>I have done very little to the Perl Survey since I got back from Germany. However, finally I&apos;ve decided how to format and present the report. Mostly with the help of R2HTML and Makefile. Now that I&apos;ve worked out how...</summary>
    <author>
        <name>Holy Zarquon&apos;s Singing Fish</name>
        <uri>http://www.uow.edu.au/~kd21</uri>
    </author>
    
    <category term="perlsurvey" label="perl survey" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/holy_zarquons_singing_fish/">
        <![CDATA[<p>I have done very little to the <a href="http://survey.perlfoundation.org">Perl Survey</a> since I got back from Germany.  However, finally I've decided how to format and present the report.  Mostly with the help of <a href="http://cran.r-project.org/web/packages/R2HTML/index.html">R2HTML</a> and Makefile.</p>

<p>Now that I've worked out how to write the report in a reasonably replicable way (i.e. minimum code differences between revisions of the survey - so running and analysing the 2012 survey should be much much quicker and easier), I should be able to get through a section every day between now and when it's done, then I can call the grant finished!</p>]]>
        
    </content>
</entry>

<entry>
    <title>Unicode abuse</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/holy_zarquons_singing_fish/2010/07/unicode-abuse.html" />
    <id>tag:blogs.perl.org,2010:/users/holy_zarquons_singing_fish//23.772</id>

    <published>2010-07-20T21:19:47Z</published>
    <updated>2010-07-20T21:29:24Z</updated>

    <summary>I was looking at doing a little bit of political activism on twitter, and as part of this, though about maximising the amount of information in each tweet a la Tweet Compressor which is an abuse of unicode to increase...</summary>
    <author>
        <name>Holy Zarquon&apos;s Singing Fish</name>
        <uri>http://www.uow.edu.au/~kd21</uri>
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/holy_zarquons_singing_fish/">
        <![CDATA[<p>I was looking at doing a little bit of political activism on twitter, and as part of this, though about maximising the amount of information in each tweet a la <a href="http://tweetcompressor.com/">Tweet Compressor</a> which is an abuse of unicode to increase the 140 character (not byte!) limit for tweets.</p>

<p>Here's the implementation:</p>

<p><code><br />
use utf8;<br />
sub tweet_compress {<br />
    my $tweet = shift;<br />
    $tweet =~ s/\. ?$//; # we don't need no end of sentence punctuation<br />
    my @orig = ( qw/cc ms ns ps in ls fi fl ffl ffi iv ix vi oy ii xi nj/, ". " ,", ");<br />
    my @new = qw/㏄ ㎳ ㎱ ㎰ ㏌ ʪ ﬁ ﬂ ﬄ ﬃ ⅳ ⅸ ⅵ ѹ ⅱ ⅺ ǌ ． ，/;<br />
    $tweet =~ s/\Q$orig[$_]\E/$new[$_]/g for 0 .. $#orig;<br />
    return  $tweet;<br />
}<br />
</code></p>

<p>Doing the rest of the right thing with unicode is a bit annoying (e.g. <pre>binmode STDOUT, ':utf8';</pre> to output the tweet correctly to stdout), and I really wish there were better unicode docs that didn't have high <a href="http://en.wikipedia.org/wiki/Cognitive_load">cognitive load</a>.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Super easy editable html pages.</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/holy_zarquons_singing_fish/2010/07/super-easy-editable-html-pages.html" />
    <id>tag:blogs.perl.org,2010:/users/holy_zarquons_singing_fish//23.685</id>

    <published>2010-07-06T00:04:57Z</published>
    <updated>2010-06-30T00:25:39Z</updated>

    <summary>As part of an ongoing project to improve the woeful state of Qualitative data analysis software, I&apos;ve already put together some tools to meet my needs. The tools are very much at the prototype stage, and will remain so for...</summary>
    <author>
        <name>Holy Zarquon&apos;s Singing Fish</name>
        <uri>http://www.uow.edu.au/~kd21</uri>
    </author>
    
    <category term="browsereditingtextanalysis" label="browser editing text analysis" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/holy_zarquons_singing_fish/">
        <![CDATA[<p>As part of an ongoing project to improve the woeful state of <a href="http://en.wikipedia.org/wiki/Computer_Assisted_Qualitative_Data_Analysis_Software">Qualitative data analysis software</a>, I've already put together <a href="http://github.com/singingfish/Text-TranscriptMiner">some tools</a> to <a href="http://github.com/singingfish/Text-TranscriptMiner-Web">meet my needs</a>.   The tools are very much at the prototype stage, and will remain so for the foreseeable future, but I'm using them to prepare publishable work.  </p>

<p>For the data management end of things I use a super-simple SGML format to tag chunks of text that I'm interested in, and use my <a href="http://github.com/singingfish/Text-TranscriptMiner">Text::TranscriptMiner</a> library to retrieve data.  I use Git for version control, as a kind of high granularity lab book.  I also have a <a href="http://github.com/singingfish/Text-TranscriptMiner-Web">web application</a> which deals with the data visualisation side of things.</p>

<p>One of the things about qualitative analysis software is that there's a lot of post-processing handling of the data, and a need to write summaries (technically called memos) of the raw data.  To that end I need to think about improving the visualisation tools in the web application, and that suggests to me editing data directly in the browser.  This is super easy with the following javascript function: </p>

<p><code>document.getElementById("my_id").contentEditable = "true"; </code>.</p>

<p>So here's how to make a crudely editable section of a web page (the content div inside the form), and return it to the back-end web application:</p>

<p><code><br />
&lt;script type=&quot;text/javascript&quot;&gt;<br/>$(document).ready(function() {<br/>    document.getElementById(&quot;content&quot;).contentEditable = &quot;true&quot;;<br/>    $('#data_container').submit(function() {<br/>        $('[name=data]').val( $('#content').html() );<br/>    });<br/>});<br/>&lt;/script&gt;<br/>&lt;form id=&quot;data_container&quot; action=&quot;doit&quot; method=&quot;post&quot;&gt;<br/>&lt;div id=&quot;content&quot;&gt;<br/>&lt;h1&gt;hi there&lt;/h1&gt;<br/>&lt;/div&gt;<br/>&lt;input type=&quot;hidden&quot; name=&quot;data&quot; value=&quot;default&quot; /&gt;<br/>&lt;input type=&quot;submit&quot; value=&quot;submit&quot; /&gt;<br/>&lt;/form&gt;<br />
</code></p>

<p>At which point, we have well structured html returned to the web app, with which I can use a tool like <a href="http://search.cpan.org/perldoc?Web::Scraper">Web::Scraper</a> to extract the data and put it somewhere sensible for the back end to retreive at a later date.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Authenticating Proxy Pain</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/holy_zarquons_singing_fish/2010/06/authenticating-proxy-pain.html" />
    <id>tag:blogs.perl.org,2010:/users/holy_zarquons_singing_fish//23.676</id>

    <published>2010-06-28T07:25:43Z</published>
    <updated>2010-06-28T07:36:26Z</updated>

    <summary>Having just finished an enormous pile of marking, but not having enough time to get on with something more substantial I thought I&apos;d work out (again) how to use LWP::UserAgent and WWW::Mechanize behind an authenticating proxy. Sometimes you&apos;re lucky and...</summary>
    <author>
        <name>Holy Zarquon&apos;s Singing Fish</name>
        <uri>http://www.uow.edu.au/~kd21</uri>
    </author>
    
    <category term="scraperproxyauthentication" label="scraper proxy authentication" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/holy_zarquons_singing_fish/">
        <![CDATA[<p>Having just finished an enormous pile of marking, but not having enough time to get on with something <a href="http://survey.perlfoundation.org">more substantial</a> I thought I'd work out (again) how to use LWP::UserAgent and WWW::Mechanize behind an authenticating proxy.  </p>

<p>Sometimes you're lucky and a proxy url of the form <pre>http://user:pass@my.proxy.server:8080</pre> will work nicely.  No such luck for the proxy I usually use.  </p>

<p>Anyway, here's the way to use LWP::UserAgent behind an authenticating proxy:</p>

<p><code><br />
use LWP::UserAgent;<br />
my $ua = LWP::UserAgent->new;<br />
my ($user, $pass) = qw/user pass/;<br />
$ua->proxy(['http', 'ftp', 'https'], 'http://my.proxy.server:8080/');<br />
my $req =  HTTP::Request->new('GET',"http://www.google.com");<br />
$req->proxy_authorization_basic($user, $pass);<br />
my $res = $ua->request($req);<br />
print $res->content;<br />
</code></p>

<p>And here's how to do it with WWW::Mechanize:</p>

<p><code><br />
 use WWW::Mechanize;<br />
 my $mech = WWW::Mechanize->new();<br />
 my ($user, $pass, $proxy) = qw(user pass http://my.proxy.server:8080 );<br />
 $mech->credentials( $user, $pass);<br />
 $mech->proxy('http',$proxy);<br />
 my $url = 'http://whatsmyip.net/';<br />
 my $response = $mech->get($url);<br />
 print $mech->content;<br />
</code></p>

<p>I'm not really sure how you'd go about accessing a basic auth protected page with the WWW::Mechanize method mind you.  It'd also be nice if (in both cases) whether to use a proxy, and the username and password could be handled during the call to <pre>->new</pre>.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Perl Survey - Initial data analysis and presentation</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/holy_zarquons_singing_fish/2010/06/perl-survey---initial-data-analysis-and-presentation.html" />
    <id>tag:blogs.perl.org,2010:/users/holy_zarquons_singing_fish//23.651</id>

    <published>2010-06-20T01:31:13Z</published>
    <updated>2010-06-20T04:22:02Z</updated>

    <summary>Thanks to Stuttgart.pm I presented the initial results from the Perl Survey at German Perl Workshop last week. Unfortunately due to a combination of jet-lag (after a 36 hour journey) and stupidity, I didn&apos;t record the talk. There appears to...</summary>
    <author>
        <name>Holy Zarquon&apos;s Singing Fish</name>
        <uri>http://www.uow.edu.au/~kd21</uri>
    </author>
    
    <category term="perlsurveyresults" label="perl survey results" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/holy_zarquons_singing_fish/">
        <![CDATA[<p>Thanks to Stuttgart.pm I presented the initial results from the Perl Survey at German Perl Workshop last week.  Unfortunately due to a combination of jet-lag (after a 36 hour journey) and stupidity, I didn't record the talk.   </p>

<p>There appears to be some very good news in that people use and are enthusiastic about new versions of perl, and the cutting edge perl modules like Moose, Catalyst, DBIx::Class and so on.</p>

<p>Most of the analysis and graphs were done with <a href="http://www.r-project.org/">R</a>, a little Perl, and unfortunately for the CPAN and programming languages questions I had to result to a mix of Perl and manual munging with a spreadsheet.</p>

<p>The survey has generated a very large amount of information, and now that I've got an initial cut of the data analysis, I'll be writing a formal report, and investigating any interesting patterns I find on the way.</p>

<p>All the data is available <a href="http://github.com/singingfish/Data-PerlSurvey-2010">github</a>.  The presentation slides are available from <a href="http://github.com/singingfish/Data-PerlSurvey-2010/raw/master/report/perl_survey.pdf">here</a> (sorry about the small font size in some of the legends).</p>]]>
        
    </content>
</entry>

<entry>
    <title>The Perl Survey is Closed</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/holy_zarquons_singing_fish/2010/06/the-perl-survey-is-closed.html" />
    <id>tag:blogs.perl.org,2010:/users/holy_zarquons_singing_fish//23.607</id>

    <published>2010-06-04T07:51:05Z</published>
    <updated>2010-06-04T09:01:42Z</updated>

    <summary>The Perl Survey 2010 is now closed. A quick preliminary look at the data set indicates that of the 4847 responses, 3256 were complete responses (by looking at who answered the final question). I&apos;m on my way to German Perl...</summary>
    <author>
        <name>Holy Zarquon&apos;s Singing Fish</name>
        <uri>http://www.uow.edu.au/~kd21</uri>
    </author>
    
    <category term="perlsurveyclosed2010" label="perl survey closed 2010" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/holy_zarquons_singing_fish/">
        <![CDATA[<p>The Perl Survey 2010 is now closed.  A quick preliminary look at the data set indicates that of the 4847 responses, 3256 were complete responses (by looking at who answered the final question).  I'm on my way to German Perl Workshop tomorrow, and as I'm coming from Australia, it's going to take me over 24 hours to get there I'll have plenty of time to prepare the data and get a preliminary analysis going during that time.</p>

<p>Most importantly I'll be able to release the (lightly processed) raw data for you by that time as well.</p>]]>
        
    </content>
</entry>

</feed>
