I was playing around with the Slope One collaborative filtering algorithm. Collaborative filtering is a way of trying to guess what users may like based on past choices. There are roughly two approaches. One is the "neighbor-to-neighbor" approach. In this approach, we try to find people who have expressed similar preferences to your own and we use preferences that they've expressed but you haven't to guess what you might like in the future. This has a few problems. One, it tends to be computationally expensive. Two, you might have identical preferences to many people but if you've not expressed any preferences or the ones expressed have no overlap, you lose.
I'm contemplating breaking a very well-established standard: Version arguments to Perl's "use" statement. The module writer is free to change the semantics of these, but despite the endless eccentricities you find on CPAN in other things, they rarely (never?) do.
So I'm having guilt feelings. Insecurities are coming out. I'm getting second thoughts. In this post I will handle these things the way many people do. Preach.
Consider a default module use like
use Marpa;
The standard (and almost universal) semantics is to load whatever version is out there. This works if you can assume that the modules you're using are well-behaved and upwardly compatible.
Or if you have strict controls on the libraries in the environment in which
you are running.
But what if you are dealing with software
which is avowedly alpha?
The Common Gateway Interface was revolutionary. It gave us, for the first time, an extremely simple way to provide dynamic content via HTTP. It was one of a combination of technologies that led to the explosive growth of the Web. For anyone writing an application that runs once per HTTP request, there is no other practical option. And for such applications, CGI is almost always adequate.
But modern web applications typically run in persistent environments. For anything with more than a small trickle of traffic, we don't want the overhead of launching a new process for every hit. Even in low-traffic environments, the startup costs involved with using modern Perl frameworks like Moose and DBIx::Class can make non-persistent applications prohibitive.
We have things like mod_perl and FastCGI for easily creating persistent applications. But these applications are generally built upon emulating aspects of the stateless, non-persistent CGI protocol within a persistent environment. Even pure mod_perl applications typically receive much of their input via environment variables specified in the CGI standard, often by instantiating CGI.pm or one of its clones.
This model is fundamentally broken. Read on for my list of reasons why CGI should not be used in persistent applications.
I tried to update an online website with some changes. Generally, I run a production and a testing environment. Recently, however, I moved the code from using SQLite to MySQL and did not create a testing DB, so changes that require changing the text on the site are done in production. Not good? I know!
So the website is built in Catalyst. Originally used SQLite and then migrated to MySQL (which had to be done manually). It uses HTML::FormHandler to display the forms, with a generic CRUD layer I added.
When trying to load the form, I get weird characters for some of the page. From what I gathered, the data in the MySQL isn't kept in UTF8 but in latin1 but we declare the page as UTF8 encoding. The form isn't displayed in utf8 (which was changed using "use utf8;" in the form .pm file, or using Encode::Guess which yielded a better result). David Wheeler has a really interesting article on UTF8 in Perl here.
I got some failing test reports for the latest version of one of my module. The problem turned out to be outside my own module.
A dependency of a dependency of mine (Sub::Uplevel) requires Module::Build 0.35. My module is first trying to build using the installed Module::Build (in this case 0.28). When installing the dependencies, Module::Build 0.35 is installed, but it chokes on the configuration data from the older version of M::B.
Ouch.
I suspect Sub::Uplevel doesn't really need version 0.35 of M::B, but that this was a case of the auto_configure_requires option gone astray, though it certainly would be nice if M::B could handle old configuration files in a useful way.
After typing ack 'sub foo' lib for the approximately thousandth time during some refactoring sessions, I couldn't be bothered anymore and added the following snippet to my realias (after some googling on how to get params into an alias, which does not work in bash, so I had to solve it via a bash function):
sack () {
ack "sub $1" lib
}
To find a given method in some of our labyrinthine code, I now say
~/projects/Foo-Bar$ sack annoying_method
and get a list of all occurrences.
yay!
P.S.: The name sack has nothing do with subroutine ack, but of course comes from the Austrian saying "Gemma ned am sack, oida!"
P.P.S.: Cross-posted from use.perl, because I haven't made up my mind yet if/when/how I migrate my blog form there to here...
As you may know, Perl is the second most popular language on github. Well, that's what the page says and that page is wrong for a variety of reasons, but first I'm going to talk about an unexpected problem at work.
In recent times we have seen a dramatic increase in the number of testers, smokers and reports. So much so that we are seeing over 400,000 reports each month. This in turn is putting a strain on the Perl NOC, specially the email and NNTP parts of the system.
I'd like to add my praise to the heap of it already piled onto NYTProf. This is a Perl profiler available on CPAN, with a very attractive HTML interface.
If you wait for your next efficiency issue before using NYTProf, you're making a mistake. For me optimizing is no longer NYTProf's primary purpose. NYTProf is a powerful debugging tool. The count of the number of times each line was executed yields marvelous insights quickly. Consider an example: a script to process a file. It is acting strangely. You don't know where to begin. Your test file is 1000 lines long. You notice certain lines in the per-line logic are not being executed 1000 times. Hmmm.
Simply checking for lines which are not executed at all is a surprisingly powerful technique. The HTML format allows you to skim the code, looking for these. This is particularly useful when the question is not localized or some matter of detail, but whether your overall logic makes sense, and whether your code actually implements the logic/algorithm you intended.
Adam Kaplan, Tim Bunce and Steve Peters, thank you.
Perlbal is something I always wanted to learn. A recent DDOS made sure I learned it in an hour or so. Apparently the regular stuff take about 4-5 minutes with it. This post will try to make it shorter.
Suppose you have three servers:
Web1 - webserver number 1 - 10.2.3.1.
Web2 - webserver number 2 - 10.2.3.2.
GW - your gateway server, which you want to use as a reverse proxy for Web1 and Web2.
What you basically need is 2 things:
- Perlbal configured for Web1 and Web2.
- Web1 and Web2's Apache (which is what I'm using) should set the forward headers correctly. This is optional but most people will want this. Also, it might be supported in Perlbal, but I didn't find it yet.
For the Apache, on Web1 and Web2 you just download and compile mod_rpaf using: apxs -i -c -n mod_rpaf-2.0.so mod_rpaf-2.0.c
Then you follow the simple 5 config lines available at the mod_rpaf page.
I wouldn't say that many of these count in the "best" category, some of the tasks can be accomplished with system utilities and some are obfuscated or golfed. But I find it useful to keep these around to remind me of solutions that I might otherwise forget or have to re-invent.
It's more than a touch frustrating for me, but I need help processing an Atom feed (having never done this before). Specifically, I need help with the gitpan Atom feed. Github has a useful API, but it can't handle the huge number of repos which gitpan has, not does it appear that the Github API offer any paging facilities.
I've already seen modules like XML::Atom, but what I'd like to see is something which allows me to pull past Atom entries (I know this is available because Google Reader can read the past entries. Heck, even reading the HTTP headers hasn't allowed me to decipher the exact incantation needed. Basically, I'm looking at the following (pseudo-code):
my $atom = Some::Atom::Module->new($atom_url);
my ( $limit, $offset ) = ( 100, 0 );
while ( my $results = $atom->fetch(
{ limit => $limit, offset => $offset } )
{
process($results);
$offset += $limit;
}
I see a number of Atom modules on the CPAN, but I've not found one which offers paging. Have I missed one? Is there a clear resource online to explain how I can at least fetch past Atom results via curl?
Josh McAdams and I are about to finish off Effective Perl Programming, which is already on Amazon for pre-order. We just have to finish off the last item, a list of really cool Perl one-liners. We have the stuff that we use ourselves, but we know there is a lot more out there.
What's in your login files? Have you written a really cool Perl one-liner?
We're especially interested in one-liners that do interesting Windows things. Do you have something that interacts with Office (or maybe plays MineSweeper for you)?
Maybe you have something that you type in the SSH shell on your iPhone? Are you doing something with Perl and Android?
Show us what you have, give us a couple of sentences about what it does, and how you'd like your name to appear in the book.
Like many projects, Plumage wants as many active contributors as it can get. In order to bring in new contributors, any project needs to do a few things:
Get users interested in contributing (more on this another time)
Let them know how to contribute
Make it as easy as possible for them to do so
One big way to address that list is to invest in contributor (AKA "hacking") documentation. The word "invest" is intentional -- you will need to put quite a bit of time into writing the documentation, and you hope that it will bring enough additional contributions (that you wouldn't have otherwise gotten) to more than make up for the time you invested.
Hacking docs are over and above the low-level API documentation, covering object attributes, method signatures, exception types, and so forth. Those API docs will certainly make development easier for your contributors, but they don't capture the workflows, mental checklists, and rules of thumb that guide you when implementing new features or tracking down bugs.
I like playing around with 3D software such as Autodesk Maya and Blender. I gave Blender a much greater focus during the last few months, as I developed a tendency to prefer opensource solutions whenever I can.
At the beginning I had some annoying problems with it, which surprisingly or not, aren't Blender's fault at all. Most of them are Nvidia's fault.
I have a graphics card based on the Nvidia GeForce 8600GT chipset.
On Ubuntu 9.04, my OpenGL didn't work at all, so I couldn't even get Blender to start.
Gladly, with the upgrade to 9.10 and the new ver 185 Nvidia drivers, that problem was solved and I could finally use Blender on a Linux platform.
All was well for a couple of months, and I greatly enjoyed the amazing speed at which Blender loads and runs on my 'Buntu.
Ever since Moose::Role came to life, I have been dying to refactor the default CRUD implementation in Reaction to use Roles. Over the last couple of months I slowly did that work as I had tuits, and today, after months of being finished, I finally got the chance to merge and push back the branch to trunk. As of today, all of the CRUD functionality in Reaction is supplied by independent roles. Which means one can do something like the following:
Which all-in-all is a pretty easy way to create Create / View / List functionality for one of the Collections in your Model. Look for more actions to be added in the near future.
At present, the web interface to the Reaction repo is undergoing maintenance and possibly migration, so I don't have a better link for you, but I hopefully will soon.
Fun:
When Brian mentioned the sale on Learning Perl and Mastering Perl, I went and purchased two of each. Presumably one for me and one for my girlfriend. I haven't read mine but instead gave it to my younger brother who decided to learn Perl. I've been helping him almost daily with the material and the homework.
Yesterday I presented him the idea of tests and showed him Test::More. You see, until now he's been manually running each exercise in the book and comparing the output with what the book says it should be.
Instead he can now write what the book expects it to be as a test and run the code, then just see if it's "ok" or "not ok". Much simpler. He likes it very much.