iCPAN: Now Bigger, Faster and with Syntax Highlighting

Since we've been porting CPAN to the iPhone via iCPAN, we've had the chance to re-imagine the CPAN a little bit. We've been thinking, what would we change about search.cpan.org if we were able? We've added the bookmarking, we save searches of recently viewed modules and now we've got syntax highlighting.

A lot of us see our code highlighted in our editors. That's how we're accustomed to viewing code. So why not view it that way in our module documentation? (Now, we're fully aware that this can be done with GreaseMonkey, but it would be nice to have this available for everyone without needing to use a special extension). A great example is perldoc.perl.org, which does have syntax highlighting, but it only (as far as I can tell) does so for core modules.

I'm happy to say that as of today, if you download iCPAN to your iPhone you can view the docs for over 61,000 modules with syntax highlighting. We've done so by using Alex Gorbatchev's excellent SyntaxHighlighter. The docs now take a little longer to render on my 1st generation iPhone, but it's well worth the wait. They look great and are much more enjoyable to read. Of course, the newer your phone is, the faster the docs will render.

Faster Search

The biggest complaint about the first version of the app as that searches were painfully slow. We released with a slow search because we felt it was more important to release a slow app rather than no app at all. As part of our iterative approach, we've disabled the "search-as-you-type" feature and it turns out that this greatly improves the search speed. We hope to reintroduce the "as-you-type" functionality once we've got it working at a good speed.

More Modules

This latest release contains over 4,000 more modules than the previous version. We've still got more to include in the next version (such as cookbook docs), but this is a big improvement as well.

Reporting Bugs

We've gotten some really positive feedback since we first announced the project and I've opened issues for the bug reports folks have sent to me. If you have issues or wishlist items, please feel free to create and tag your issues at GitHub.

Switching to Dist::Zilla - a usage report

For a while there has been a problem in one of my modules - the tests for Postscript::TextDecode make use of Test::Most which it turned out most CPAN testers don't use by default. That's fair. It was on my list of things to fix, but never with high priority.

Shortly after YAPC::EU::2010 I received a RT request from Andreas Koenig alerting me to the issue. At YAPC I had heard about this utility called Dist::Zilla which was supposed to make the managing of distributions a lot easier.

Off to CPAN. D::Z took a while to install but the install itself had no problems. I started reading the CPAN page and read Dist::Zilla::Tutorial. Since this was however limited, I directed my attention to the tutorial on dzil.org.

This tutorial has the style of a 'make your own adventure' children's book which was amusing. I found out that I could either start anew or switch my current distrubution over.

I decided to start anew and ran

$ dzil setup

to configure dzil (the Dist::Zilla::App commandline utility) - it creates ~/dzil.ini.
followed by

$ dzil new Postscript-TextDecode

which created the Postscript-TextDecode directory for me, with in it dzil.ini and lib/Postscript/TextDecode.pm.

After that I pasted my existing code into TextDecode.pm and created t/Postscript-TextDecode.t. I could remove $VERSION since I now define that in dist.ini.

After some fiddling, $ dzil test did the testing and $ dzil release uploaded a gzipped tarball onto the PAUSE server.

In between all this, I encountered the following hurdles:

  • dzil test and dzil build won't work without [@Classic] in the dist.ini. I do not know why.
  • the documentation on dzil.org seems...incomplete, or at least appears to assume the reader has in-depth knowledge of the distribution creation process, while I am relatively new to this

Still, it did not cost me huge amounts of time and in the long run this utility should really simplify the creation and maintenance of new modules. Not bad at all. And from what I've seen on CPAN, it's very easy to extend.

If I've missed obvious things, please direct me so I may learn.

Git-backed wikis, Gollum, and simple installation experiences

bric-wiki-in-gollum.png

Last night I upgraded the Bricolage wiki on Github to the new git-backed wiki that Github rolled out last week. May sound like a trivial thing and not worth a blog post, but it’s quite the opposite, actually — the changes are (almost) revolutionary.

The first really interesting thing about the upgrade is that all of a project’s wiki pages are now simple text files in their own git repository. Now I can update these pages anyway I like, in any one of several markup languages, including POD. On it’s own, that’s pretty useful — now I can clone a project’s wiki along with the project itself and submit changes back as I would any other changes via Git.

The second interesting thing about the upgrade is the offline viewing and editing tool that Github released called Gollum. This is a small Ruby on Sinatra application that — when run in the git-backed wiki repository — runs a local copy of the Github wiki that can be used to view and edit those wiki pages offline (see screenshot above).

On a final note: I remarked to Theory last night that I hadn’t played with RubyGems for a while and I was impressed at how painless and easy the Gollum installation was. He pointed out that ‘gem install’ almost always runs without any tests (unlike the cpan client). I did make me wonder about the best way to distribute a “mini app” like Gollum within the current Perl ecosystem of mini tools and micro frameworks… perhaps cpanm plus Mojolicious::Lite could create a similar “no brain required” installation?

Food for thought.

Benchmarking string trimming

Clever Regexps vs Multiple Simple Regexps:

In reading some code I ran across the expression s/^\s*|\s*$//g which is a trim function. It is not the optimal way to write this. The optimal way is two simpler expressions: s/^\s+//; s/\s+$//. Justification follows.

Conclusion:

  • Use of + instead of * means regexps that will would do no effective work will also fail to match. Failing to match when the work would be useless yielded some 3x to 4x improvement.

  • Use of multiple simpler patterns like s/^...//;s/...$// instead of compound patterns like s/^...|...$//g enabled boundary checking optimizations.

Testing:

String length:

long:  +80 chars
short: -80 chars

Pre/postfixes:

pre/post: "  string  "
pre:      "  string"
post:       "string  "
base:       "string"

Coding styles:

g*: s/^\s*|\s*$//g
g+: s/^\s+|\s+$//g
2*: s/^\s*//
    s/\s*$//
2+: s/^\s+//
    s/\s+$//

Calculated results:

>>  short pre 2+      1638810/s
>>  short base 2+     1622457/s
>>  short post 2+     1351812/s
>>  short pre/post 2+ 1152253/s
>>  long base 2+       564477/s
>>  long pre 2+        534890/s
    short base +g      532709/s
    short post +g      502626/s
>>  long post 2+       501015/s
    short pre +g       479683/s
    short pre/post +g  465137/s
>>  long pre/post 2+   463741/s
    short base 2*      462448/s
    short pre 2*       456719/s
    short pre/post 2*  450081/s
    short post 2*      449661/s
    short base *g      394226/s
    short pre *g       384360/s
    short post *g      367736/s
    short pre/post *g  367624/s
    long post 2*       114832/s
    long base 2*       113787/s
    long pre 2*        110305/s
    long pre/post 2*   110169/s
    long post +g       100847/s
    long base +g        99830/s
    long pre +g         98871/s
    long pre/post +g    98331/s
    long base *g        87066/s
    long post *g        86520/s
    long pre *g         84080/s
    long pre/post *g    81429/s

YAPC::Europe 2010 - Thoughts Pt 2/3 - Promoting A YAPC

This year, YAPC::Europe was reasonably well attended, with roughly 240 people. However, a few weeks prior to the event, the officially registered attendees for YAPC::Europe 2010 was considerably lower. Although every year it seems that many register in the last 2 weeks, there is usually a higher number registered before then. So why did we have such low numbers registering, until just before the conference this year? I'm sure there are several factors involved, but 2 strike me as significant.

The first is the current dates for the event. As mentioned in my previous post, the Perl community attending YAPCs is getting older, and many of us now have young families. August is notoriously bad for anyone with a family, as the school holidays govern a lot of what you're able to do. Those that can take time out to attend the conferences also have to juggle that with family holidays. Employers are often reluctant to have staff away during August, as too easily they can become short-staffed due to others taking holiday. Having said that, the attendances haven't fluctuated that much in recent times, regardless of whether early/mid-August is chosen or late-August/early-September. Although, the exception does seem to be Vienna in 2007 which attracted 340 attendees. As such, when deciding dates for a YAPC, bear in mind that some of your potential attendees may find it difficult to attend, or only be able to decide almost at the last moment.

The second factor was a pitfall that this year's organisers fell into too. Lack of communication. Immediately prior to the conference and during it, there was lots of news and promotion. However, 6 months ago there was largely nothing. Although, we finally had about 240 attendees, it is possible that there could have been many more. Big splashes across the Perl community with significant updates (website launch, call for papers, opening registration and unveiling the schedule) are a great way to make people aware of what is happening and can generate a buzz about the event long before it begins.

This year I noticed that a twitter search for 'yapc' in the weeks before YAPC::Europe, featured mostly posts about YAPC::Brasil, and I'm currently seeing several posts for YAPC::Asia. Last year, José and Alberto kept a constant feed of news, snippets, and talk link posts onto twitter and other social network micro-blogging services, which helped to generate posts from others attending or thinking of attending. This year that potential audience attracted via the marketing efforts, seems to have been lower than in previous years. The results of the Conference Surveys will hopefully give a better picture of this.

In recent times the Perl community has talked about marketing Perl in various ways. However, promoting our own events seems largely left to the organisers. While the organisers can certainly add fuel for the fire, it's the rest of the community that are needed to fan the flames. In the past YAPCs and Workshops have been promoted across various Perl sites, and in various Linux and OpenSource channels, which in turn generated a lot of interest from attendees and sponsors. The latter target audience are just as important as the former. While we want more people to attend the events, the sponsors are the people who fund them to make the happen. But not marketing the events to get maximum exposure likely means there are potential sponsors who either never get to hear of our events, or are turned off by the lack of exposure the event is generating.

Although the events do manage to get sponsors, for the organisers it can often be a very traumatic process getting sponsors involved. Once you've made initial contact, you'll need to persuade them that sponsoring the event is a good way to market their company. If they're able to see photos online of the events (possibly including sponsor branding), or read blog posts that direct people to the conference website (with all the event sponsors listed), it gives potential sponsors a feeling that it may be a worthwhile investment. Some sponsors are strong supporters of OpenSource and want to give back, but a large number are looking to promote their own brand. They're looking to make maximum revenue for a minimum outlay. They want to see that funding events is going to generate further interest and brand recognition to their target audience. Exposure through blogs and other online sources all helps.

As I've implied, much of this exposure is down to the community. If you attended YAPC::Europe (or YAPC::NA or any other Perl event, including Workshops) have you written a blog post about it? Did you tweet about the event before you went, during or even after? Have you posted photos online and tagged them with the event, in a way that others can find them? YAPC::Brasil and YAPC::Asia attendees seem to be doing this rather well, and there is a lot we can learn from them. In the last week, there have been several posts by attendees of YAPC::Europe 2010, but of the 240 people attending, it really is a small percentage. And likewise I saw a similar kind of percentage posting about YAPC::NA this year too. Several years ago use.perl and personal blogs were full of reports of the event. What did you learn at the event, who did you meet, what aspects of Perl are you going to take away with you from the event? There is a lot you can talk about, even if it was to mention one specific talk that you felt deserved comment.

With aggregators, such as Iron Man, Planet Perl and Perlsphere, whether you post via use.perl, Perl Blogs or your own personal site, you can get the message out. Next year, anyone wondering whether attending a YAPC is worthwhile is likely to search for blog posts about it. Are they going to find enough reasons to attend, or persuade their manager that they should attend? I hope so. YAPCs and Workshops are a great way to promote what is happening in Perl, and by talking about them we can keep that interest going long after the event itself.

In Gabor's lightning talk, looking at Perl::Staff and events group, he highlighted the differences in attendances between the conferences. Typically a YAPC::Europe has 200-300 attendees, YAPC::NA has 300-400 and YAPC::Asia has around 500 attendees. However, FOSDEM (5,000), LinuxTag (10,000) and CeBit (400,000) all attract much higher numbers. It's a fair point that we should try and provide a presence at these other OpenSource events, but a dedicated language interest event is unlikely to attain those attendances. The hope though is that we may have a knock-on effect, with people seeing Perl talks and a good Perl presence at those other events, might just take more of an interest in Perl, the community and the various Perl specific events.

I'd be very interested to see attendance figures for other dedicated language conferences, particularly for Europe, as I think Perl is probably about average. The EuroPython guys certainly attract similar numbers to Birmingham. In the past I've done a fair amount of pitching Perl at Linux, OpenSource and Security Conferences in Europe and to Linux User Groups around the UK. Birmingham Perl Mongers undertook 3 "world" tours in 2006, 2007 & 2008 doing exactly that. It was great fun, and we got to meet a lot of great people. If you have a local non-Perl group, such as a LUG, would they be interested in a Perl topic? Are you able to promote Perl, the Perl community or Perl events to them? Sometimes even just attending is enough, as you'll get to talk to plenty of other interesting people. The initial 2006 tour was primarily used to promote YAPC::Europe 2006, which Birmingham Perl Mongers were hosting that year, and it did help to raise the profile of the event, and evenutally got sponsors interested too.

One thing that the Pisa organsiers did, specifically osfameron, was to broadcast Radio YAPC podcasts (Episodes 0, 1, 2 & 3). Genius. I got to listen to them after each day, but I can imagine many haven't been able to hear until they returned home. It would have been great to have something before the conference too, even just the news updates and some of the highlights to look forward. Interviews with the organisers and any registered attendees would have been great too. It was a nice touch to the event, and it's promotion, to be able to feature interviews with speakers and attendees to get their experiences. I hope future organisers can try something similar too.

There are several people trying to raise the profile of Perl at the moment, but it takes the whole community to support their efforts by blogging, talking beyond our community and promoting events to those who might not have considered treating the conference as part of their training. We have a great community, and one that I'm pleased to be a part of. I want the community and the events to continue for many years to come, and talking about them can only help that. It's why Matt Trout shouted at many of us to blog about Perl and promoted the Iron Man aggregation competition.

The Perl community and events are very healthy at the moment, we just don't seem to be talking about them enough. As the business cards state, we do suck at marketing. If we want to avoid the mistakes of O'Reilly at OSCON last month, and the badly named tags, then promoting YAPCs and your experiences at them, are a good way to show how it can be done right.

In my next post I'll be looking more at the YAPC event itself.

Cross-posted from Calling All The Heroes.

Perl 6, Surely but Slowly

Last month I wrote a post about Rakudo being ready for release. I ported a substantial framework from Perl 5 to Perl 6, and it just works. Surely, but sloooooowly!

I'd like to make the case that Rakudo is now ready for something else... A serious performance boost!

Last night I timed the basically equivalent TestML test suites in Perl 5 and Perl 6 (Rakudo). Here's the results: http://gist.github.com/525796.

Perl 5 took just over a second. Rakudo snuck in just under 2...

Minutes!

Let's add Git userdiff defaults for Perl and Perl 6

Git allows you to define a custom hunk-header which'll be used by git diff as the context line in diff hunks. Git includes presets for several languages but no presets for Perl and Perl 6. I'd like to change that.

If you have no idea what these are, consider a file that contains this code:

sub foo {
    my $x = "a";
    my $y = "b";
    my $z = "c";
    my $poem = <<"POEM";
This is a
Long string
In a heredoc
POEM
    I'm::On::A::Horse();
}

Now, if you change the last statement in that subroutine to something more clever and run git diff you'll get this:

diff --git a/file.pl b/file.pl
index 7ed4207..ffb1ff9 100644
--- a/file.pl
+++ b/file.pl
@@ -7,5 +7,5 @@ This is a
 Long string
 In a heredoc
 POEM
-    I'm::On::A::Horse();
+    The::Tickets::Are::Now::Diamonds();
 }

In that diff this is the context line:

@@ -7,5 +7,5 @@ This is a

Having "This is a" there doesn't provide very useful context, but that can be changed with userdiff, just add this to .gitattributes:

*.pl diff=perl

And this to .git/config:

[diff "perl"]
      xfuncname = "^\\s*(sub.*)"

And the hunk context is now more useful, and shows the name of the subroutine that's being changed:

@@ -7,5 +7,5 @@ sub foo {

I'd like to extend the Git defaults to include Perl, but I'm probably forgetting some cases where something is subroutine-ish that doesn't match simply "\\s*(sub.*)". Other cases I can think of are:

  • The package statement
  • my $x = sub { ... } (needs a complex regex to match `my/our ...)
  • The BEGIN/INIT/END etc. routines
  • Maybe "method ..." from MooseX::Declare and friends? It shouldn't hurt to include this
  • Something else?

Then there's the issue of Perl 6. I'm completely unfamiliar with it, but I can add it while I'm at it if I'm given some examples.

The userdiff facility also has support for defining a "word" for the --word-diff option to git diff. I don't use this option, but I wouldn't be surprised if it did the wrong thing for Perl code.

Begin at the BEGIN and go on till you come to the END: then stop.

In Perl we can run user defined code blocks at different stages when running a program.

  1. BEGIN blocks are run as soon as Perl finds them. If there is more than one block they get executed in the order they are found.
  2. CHECK blocks are run as soon as Perl finishes compiling. If there is more than one CHECK block they get executed in the reverse order they are found.
  3. INIT blocks are executed after CHECK blocks, and if more than one exists they get executed in the order they appear.
  4. END blocks get executed before the program finishs running, and no errors where found. If more than one END block is found they are executed in the reverse order they appear.
END { print "Twelve\n"; }
BEGIN { print "One\n"; }
CHECK { print "Six\n"; }
INIT { print "Seven\n"; }
BEGIN { print "Two\n"; }
END { print "Eleven\n"; }
CHECK { print "Five\n"; }
INIT { print "Eight\n"; }
BEGIN { print "Three\n"; }
INIT { print "Nine\n"; }
CHECK { print "Four\n"; }
END { print "Ten\n"; }


When we run the script it prints someting like:

$ perl blocks.pl
One
Two
Three
Four
Five
Six
Seven
Eight
Nine
Ten
Eleven
Twelve

About blogs.perl.org

blogs.perl.org is a common blogging platform for the Perl community. Written in Perl and offering the modern features you’ve come to expect in blog platforms, the site is run by Dave Cross and Aaron Crane, with a design donated by Six Apart, Ltd.