Sharding Your Database

Yesterday I was in a class taught by one of the folks at Percona, the MySQL and LAMP stack performance experts (they're the ones behind mysqlperformanceblog.com). Sharding was covered and though I learned a lot, it also reinforced my opinion that this is a dubious technique which fanbois like to reach for waaaaay too soon.

First post

This is the obligatory first post on the site..and back in the perl world.. I've been away for a while (long story for another blog).. but am back and excited to continue the trek!

Will write at u soon..

perlmv: Renaming files with Perl code

perlmv is a script which I have personally been using all the time for years, but has only been uploaded to CPAN today. The concept is very simple, to rename files by manipulating $_ in specified Perl code. For example, to rename all .avi files to lowercase,

$ perlmv -de '$_=lc' *.avi

The -d option is for dry-run, so that we can test our code before actually renaming the files. If you are sure that the code is correct, remove the -d (or replace it with -v, for verbose).

perlmv can also save your code into scriptlets (files in ~/.perlmv/scriptlets/), so if you do:

$ perlmv -e 's/\.(jpe?g|jpe)$/.jpg/i' -W normalize-jpeg

You can later do this:

$ perlmv -v normalize-jpeg *.JPG *.jpeg

In fact, perlmv comes with several scriptlets you can use (more useful scriptlets will be added in the future):

$ perlmv -L
lc
pinyin
remove-common-prefix
remove-common-suffix
uc
with-numbers

Let me know if you have tried out the script.

Word counting

There is a Perl module (by coincidence, also Portuguese authored) named Text::ExtractWords that performs more or less the same as the unix command toilet^H^H^H^H^H^Hwc. It returns an hash of words mapping them to their occurrence count.

The module is not bad. It is written in C making it quite fast when compared with Perl code on big strings. Unfortunately, it has a main limitation: unicode. Although it supports a 'locale' configuration parameter, it seems not to affect its behavior regarding unicode characters, that is, looking to them as single ASCII characters.

I do not have any experience on dealing with unicode from C. I remember looking to some 'w' functions (wchar.h) but not getting real good results. Probably when I have more time I will look into it.

But for now, I need a way to compute a word histogram from a unicode Perl variable. I am doing it by splicing the string with white spaces, and for each element, adding it to an hash.

It works. But it is slow.

This raises two different questions:

  • is there any faster way to do this from Perl?
  • is there any other module that I can use to perform this task?

C::DynaLib progress

I took over the first perl FFI, C::DynaLib some years ago, when gcc still defaulted to the simple cdecl calling convention. Push args with alloca on the stack, and call the functions without args, but the alloca'ed args were picked up by the called func. Simple and fast.

For some time now cdecl does not work anymore OOTB, and I thought it's the new gcc-4 stack-protector, tls or such which broke the ABI. It was not. It also fails with gcc-3.4 with the same new layout, but at least I understand now the new gcc ABI and found a way to continue with cdecl. Interestingly some platforms like freebsd work OOTB with gcc and cdecl, msvc also.
The fallback to cdecl is the hack30 scheme, which is ugly. hack30 cannot do floats nor double.

test page

This is a test page

Perl Survey and German Perl Workshop

The Perl Survey is reaching its final stages, and will be going live within the next week or so. While I had planned to host it myself, the autocomplete functions that I added into the survey weren't fast enough for the speed your typical programer types at, and I got the very kind offer from Strategic Data to host it using their existing infrastructure (and to continue to run the survey every couple of years).

German Perl Workshop are kindly sponsoring me to come and speak at their conference, where I will be giving two talks. The first will be on the preliminary results of the Perl Survey, and I hope to finish up the grant shortly after that. The second is titled Don't RTFM, WTFM! where I hope to go over some of the approaches to documentation we've used in the Catalyst project over the years, and then go through how to document a distribution with many moving parts in a way that a developer with basic skills can then use to write their own software. I'm hoping to use the example of WWW::Facebook::API, pending my ability to work out how to use it.

What is Kephra about?

Someone responded to my first post and asked a question: How differs Kephra from Padre?

In many ways:
* Kephra has much less dependencies (Wx and config parser)
* different feature set
* much older project
* smaller dev team (currently)

of course we stay upon the same core technologies but we have different goals i think. So far I understand Padre, they have the known IDE and larger editor (Ultraedit and Jedit and ...) in mind and want to slowly grow into that direction. And thats good for Perl, but thats not desirable to me. I started Kephra to make things differently then I saw it around. And the editors I've seen didn't provide the degree of freedom I needed to make the changes I wanted there. So you need to build a new "platform". Yes I was only one but brave. Today we have an editor with the

Threaded Forums Made Easy

Not too much about Perl, but just wanted to say that sometimes I have a lovely feeling of accomplishment after struggling to find a good approach. After considering my options about forums, I decided to continue on my own. With the proper tool selection, they're incredibly easy. Below is sample output of my alpha code.

Threaded Forum

20,000+ distributions on CPAN!

JJ just mentioned to me that the 18,000 figure on Perl.org was now out of date...

http://search.cpan.org/ - check out the bottom left corner...

  • 20127 Distributions
  • 56650 Uploads
  • 81688 Modules
  • 8176 Uploaders

Perl.org is now updated - wonder which distribution was 20,000.

CPAN is an amazing growing resource - Perl really does help get the job done.

CPAN Testers Summary - April 2010 - Close To The Edge

A late and very short summary this month, as I was hoping to have a bit more news this time around. As David notes in his update, the work on the Metabase has been continuing, with some stress testing to ensure we can handle a high volumes of report submissions. There have been some upgrades to the code during the last month, as we have refined the APIs. Unfortunately we've hit a hurdle with the search API at the current time. Once we overcome this, then we should be in a good position to make the switch to fully support the Metabase submission system. It's been a long time coming, but we are getting close.

This month has also seen lesser fixes to the Reports website and Statistics websites, mostly to correctly reference the newer GUID identification system. Unfortunately a lot of attention has been taken up by the release of version 2 of the CPAN Meta Spec, which several of us involved with CT2.0 have also been involved with, particularly David Golden. Hopefully we'll have a longer summary next month, with many more details of progress.

Cross-posted from the CPAN Testers Blog.

Onion and CPAN

Onion CPAN Logo

PL/Parrot, a DSL construction kit for PostgreSQL


What do you get when you cross a parrot with an elephant? Find out! PL/Parrot is a DSL construction kit for PostgreSQL, and much, much more. Stay tuned for more details. David Fetter will give us the ins and outs of PL/Parrot.

This meeting will take place on Tuesday, May 25th at 7pm at Six Apart World Headquarters.

Parrot home page: http://www.parrot.org/

PL/Parrot on GitHub: http://github.com/leto/plparrot

David Fetter's home page: http://fetter.org/

Announcement posted via App::PM::Announce

RSVP at Meetup - http://www.meetup.com/San-Francisco-Perl-Mongers/calendar/13415730/

Spot the error

Spent 10 minutes the other day scratching my head after my Perl code stopped working following a single added line (can you guess which one?):

LABEL1:
my @var;
for (...) {
next LABEL1;
}

The error message is "Label not found for 'next LABEL1'. I think Perl could be better at handling this kind of mistake.

A Tiny Affordance

What's wrong with this picture?

use parent qw/DBIx::Class::Core/;

On screen names and real names

Is this the obligatory "hello blogs.perl.org" post? I guess so. I will no longer be updating my old use.perl.org journal. Not that I updated it very much to begin with.

Which leads to my point. For a long time, I engaged the Perl community under my screen name "revdiablo", but I've been slowly migrating away from that in favor of simply using my real name. I haven't really thought about a way to smoothly transition, so I'm leaving my screen name on old accounts and using my real name for new accounts. Hopefully, at some point, my screen name will fade into obscurity.

Unfortunately, there are probably still some people who recognize me only by the screen name. Maybe they will see this and take notice. Are any of you one of them?

I want Perl Testing Best Practices

[I actually wrote this a long time ago and it's been stuck in the draft status. I don't have answers for these yet.]

I've been swamped with work lately, and despite perl5-porters giving me and everyone else plenty of time to update all of our modules for the next major release, I basically ignored Perl 5.11. Life sucks sometimes, then they release anyway. This isn't really a big deal because all the CPAN Testers FAILs go to a folder that I look at all at once. It's a big deal for other people when they try to install a borken dependency and cpan(1) blows up.

However, my negligence in updating my CPAN modules reminded me of a possible best practice that has been on my mind for a long time, and which I've casually brought up at a couple Perl QA workshops since I've written several Test modules. Don't rush to say Test::Class just yet.

So is wantarray() bad or not?

The style of returning different things in list vs scalar context has been debated for a long time (for a particular example, this thread in Perlmonks).

A few months ago I made a decision that all API functions in one of my projects should return this:

return wantarray ? ($status, $errmsg, $result) : $result;

That is, we can skip error checking when we don't want to do it.

Now, in the spirit of Fatal and autodie, I am changing the above to:

return wantarray ? ($status, $errmsg, $result) :
do { die "$status - $errmsg" unless $status == SUCCESS; $result };

But somehow I can still see myself and others tripping over this in the future, as I have, several times so far. It's bad enough that for each API function one already has to remember the arguments and their types, and one kind of return and its type.

Maybe I should just bite the bullet and admit the misadventure into wantarray(), and that context-sensitive return should be left to @foo, localtime(), and a few other classical Perl 5 builtins that have been ingrained in every Perl programmer's mind.

Centralized versus decentralized version control

While this is often a Subversion versus mercurial/git argument, I want to look beyond that. Can anyone offer me any compelling reasons to choose centralized over decentralized version control? I'm not even talking about weighing pros and cons (because if you did, centralized source control would lose badly). I'm just curious about any good, solid reasons to choose a non-decentralized source control system.

New unrestricted license for Perl software

Since someone saw fit to write a Wikipedia article about my DWTFYWWI license, I thought I'd write a Software::Licence module for it. Now you can specify license = DWTFYWWI in Dist::Zilla, and anything else that uses Software::License.

It's the first Software::License::* not in the core Software::License distribution (RJBS refused my patch to integrate it). It might not be for long, FLORA has threatened to release WTFPL support.

About blogs.perl.org

blogs.perl.org is a common blogging platform for the Perl community. Written in Perl with a graphic design donated by Six Apart, Ltd.