Unicode abuse
I was looking at doing a little bit of political activism on twitter, and as part of this, though about maximising the amount of information in each tweet a la Tweet Compressor which is an abuse of unicode to increase the 140 character (not byte!) limit for tweets.
Here's the implementation:
use utf8;
sub tweet_compress {
my $tweet = shift;
$tweet =~ s/\. ?$//; # we don't need no end of sentence punctuation
my @orig = ( qw/cc ms ns ps in ls fi fl ffl ffi iv ix vi oy ii xi nj/, ". " ,", ");
my @new = qw/㏄ ㎳ ㎱ ㎰ ㏌ ʪ fi fl ffl ffi ⅳ ⅸ ⅵ ѹ ⅱ ⅺ nj . ,/;
$tweet =~ s/\Q$orig[$_]\E/$new[$_]/g for 0 .. $#orig;
return $tweet;
}
binmode STDOUT, ':utf8';to output the tweet correctly to stdout), and I really wish there were better unicode docs that didn't have high cognitive load.
I had done this for fun a while back after seeing Tweet Compressor. It's important not to compress URLs in order for them to be useful without someone retyping them manually. TC detects them and leaves them unchanged.
Nick:
Good point. If this had been relevant to my problem space, I'd have realised this of course :)