Perl Startups: Lokku/Nestoria
In the first part of the "Perl Startups" intermittent series of blog posts, I interviewed JT Smith about the Lacuna Expanse. For the next post, I was very interested in Lokku/Nestoria. Many of you probably don't know much about them, but I learned about them when I was living in London and found them to be a great company and nice people. Recently I interviewed Alex Balhatchet (CPAN account, the CTO of Lokku/Nestoria and his company's love of Perl and the Perl community.
Ovid Tell us a bit about Lokku/Nestoria.
Alex: Lokku was founded in 2006 by Javier Etxebeste and and Ed Freyfogle, two ex-Yahoo! employees. The first product, which remains the largest and most successful, was Nestoria, though we've also just launched a new brand OpenCage Data.
Nestoria is a property search engine, helping millions of house hunters easily search through millions of listings to find the best house, flat, cottage, bungalow, maisonette, or villa. We operate in eight countries: the UK, Spain, Germany, Italy, France, Australia, Brazil and India. From a technical point of view we face a lot of the same challenges as a larger search engine: geocoding, de-duplication, search relevancy, site speed, localization, and metrics.
We regularly sponsor, attend and speak at Perl events. I'm happy to say the whole Lokku engineering team will be at YAPC::EU 2013 in Kiev this August, and two of us will be speakers. We have also already signed up to sponsor the 10th London Perl Workshop in November, which we've done for 6 or 7 years now. Besides Perl stuff we sponsor OpenStreetMap events and actually run our own quarterly event focused on location based services.
Ovid: Can you describe your technology stack/architecture?
Alex: I usually break down our architecture into three big pieces: the listings tier, the interactive tier, and the metrics tier.
The listings tier is where listings come in (usually in XML directly from our commercial partners, property portals such as Immonet, Fotocasa, and Domain) and we munge the data. This is where we clean up the data, geocode the listings, fetch images for thumbnailing, categorize the listings for our reporting infrastructure, de-duplicate the listings in the case where we have two or more listings (adverts) which are for the same property (building that is for sale or rent), perform keyword extraction based on natural language processing, and a host of other transformations.
The listings tier is 100% Perl, except of course where we use CPAN modules that have some C in them. Shout out to XML::Simple, Encode, and of course DBI and DBD::mysql :-)
Once the listings have gone through all that they are ready for display on the website, which is the interactive tier. We use HTML::Mason with Apache and mod_perl - a bit old school these days with technologies like Plack and Starman around, but with a small team sometimes you have to follow the tenant of "if it ain't broke, don't fix it." Where we have focussed our attention in the last couple of years has been on the client side with jQuery, sliders, maps, and responsive design for mobile and tablet devices, and on the search itself which is written in C. The interactive tier is used for all the localization and for doing our location lookup which does a lot of fancy things such as spell-checking/fuzzy matching.
The interactive tier writes a log line for every pageview or click that happens on the website, which then feeds into our final tier, which we call metrics. Metrics is used by everyone at Nestoria - the engineering team can use it to measure speed, the product team can keep an eye on user behaviour, and the commercial team use it to make sure we are hitting targets and to invoice our customers. This system is 100% Perl and once again makes use of lots of great CPAN modules.
Ovid: Why did you choose Perl?
Alex: When we started we already had the three tier approach in mind and we knew that we had three big challenges ahead of us: munging text data in a variety of languages, making a website that's super fast and reliable, and processing lots of structured log data. Perl is absolutely the best language for all three, and on top of that choosing a single language for all three tiers allowed us to keep doing what we do with a lean team of 4-5 developers.
Naturally the CPAN was and still is a huge motivator for using Perl. It is such a great resource to have available, especially for a startup that is aiming to get out a minimal viable product in a short space of time. There is so much good stuff on CPAN whether you need encryption, encoding, thumbnailing, geocoding, emailing, profiling or parsing. If there's a web service you need to interact with, somebody will have written the CPAN module for you - and most of the time it will have tests and be actively maintained.
Finally I definitely want to give a huge shout out to the London Perl community, and the wider European Perl community too. The London Perl Workshop, YAPC::EU, and the London.pm technical meetings, Dave Cross' Perl School - every single one is incredibly well organised and well worth attending.
Ovid: What Perl-specific technologies were used?
Alex: Massive shout out to Devel::NYTProf, which is simply the best tool available in any language for profiling code execution and finding inefficiencies. We've found that the speed of the website is critical to our ability to generate revenues and so once a quarter we have all members of the development team drop their current projects and spend a week purely focussed on the speed of the interactive tier. We use Devel::NYTProf::Apache and nytprofmerge to produce NYTProf output from our production servers.
We aim to release code every day, so of course we have a nice big test suite. That wouldn't be possible without the TAP and Test namespaces, and specifically the fact that Test::Builder makes every testing module work together. Lately we've been making good use of Test::FailWarnings and Test::MockModule/MockObject to get the most out of our unit tests. One day I hope we'll get around to using Test::WWW::Selenium (probably using a service such as Sauce Labs) to test our client-side functionality a bit more rigorously.
Finally, I really cannot oversell the importance of Perl's outstanding unicode support. We have to deal with all kinds of weird input and produce clean well-localized output, and Perl lets us handle that very nicely. Modules like Unicode::CaseFold, Unicode::Normalize and Encode are important, but much more critical is the support baked into the language itself - having regular expressions and built-ins such as lc()
and length()
be completely unicode-safe is fantastic.
Ovid: What was good and bad about the Perl language?
Alex: Well I think I've already gone a lot into what's good about Perl. As you may be able to tell from my answers above I'm a huge advocate for the CPAN, the language itself, and the Perl community especially here in London. For that reason I'm going to take a slightly different angle on this question.
What's bad about the Perl language is that it's not well-known enough. It isn't taught in universities, and sadly it isn't well publicised outside of universities either. Python and Ruby were much more recently the "hot new thing" and have gotten a lot more attention in the last decade or so because of that. Because of that, and for other reasons too, I would say that Perl companies have a hard time hiring developers because there are simply more jobs than there are developers out there.
The counterpoint to that is that as a language Perl is incredibly easy to learn. It reminds me a lot of English: it's easy to grasp the basics and get started, it can be quite forgiving of mistakes, and the people who really love it are more than willing to teach you. At Nestoria we often bring in interns who are still in university and very few of them come in knowing Perl, but every one of them as picked up enough Perl to become productive in a few days and has started making important changes to our codebase in a few weeks.
Ovid: What would you have done differently with Perl, had you known?
Alex: Name and shame time - yes, there are sub-systems at Lokku that have not been refactored away from byte strings to character strings. I look forward to the day when this is no longer the case, but in the meantime we deal with it by having crystal clear documentation about which systems, packages and methods expect byte strings and which expect character strings. I'm happy to say that in a recent "Speed Week" I switched the website itself over to character strings which has made a big difference to the performance, and as far as I can tell I didn't miss anything :-) If you see an encoding error on Nestoria please do get in touch via our feedback form!
Secondly I wish that we had moved away from the system Perl already. Currently we're stuck on 5.14.2 and while I know that others are stuck on much much older versions I keep eyeing up the new features in 5.16 and 5.18. I'm happy to say we're going to be making some changes this summer that should make upgrading Perl and other dependencies including CPAN modules much easier, so I'm looking forward to that. I have to say the change to yearly release cycles for Perl has been really encouraging and I feel confident that Perl is going to keep improving for years to come.
Finally there is one personal regret that I will always have. I wish that I had been aware of YAPC back in 2006 when it was held in Birmingham, England. I have yet to have the pleasure of attending a YAPC in my home country but I have a feeling it would be a completely different experience, and I hope one day I get another chance.
Ovid: Did the availability of Perl developers impact your business?
Alex: Yes! I already mentioned that our approach to this has been around hiring interns and training them up in Perl (which also happens to be the subject of my talk at YAPC::EU this coming August.) However when it comes to finding more experienced developers, or developers with specific skills such as web development, it can be a real challenge. Sadly I don't have a short-term solution, but long term I hope that other startups take on non-Perl people and bring them over to the light, and that courses like Perl School bring in new talent to our thriving but ultimately quite small community.
If anybody wants to come work with us on Nestoria then please check out our jobs page! We're currently looking for a web developer and a commercial director.
Ovid: Did the company have a chance to give back to the community?
Alex: Absolutely, in our first few months of being a company in 2006 we already open-sourced a couple of modules and we haven't stopped since. We have a Lokku Github page that has most of them (although actually our most popular module Geo::Coder::Many is missing from there because I haven't had a chance to extract it from our Subversion repo yet!)
We are always on the lookout for modules we can release to the CPAN. For example when we were working towards launching Nestoria India we had to support the South Asian numbering system which uses different digit separation ("1,00,00,000" rather than "10,000,000") and different naming ("one crore" rather than "10 million".) To me this is a quintessential CPAN module, and to be honest I was surprised when I didn't find it on CPAN already. Because of that when I wrote it I released it under the Perl 5 license as Number::Format::SouthAsian, and I hope others will find it useful.
On top of releasing some of our modules we also regularly sponsor and speak at Perl events.
Finally, while not Perl-specific, we also run a quarterly event called #geomob which as the name suggests is related to the intersection of geography and mobile technologies. Maps are a big part of what we do at Nestoria and we like to give back to the OpenStreetMap and larger geo community as well as the Perl community. The talks at geomob tend to be interesting to a wide audience so readers of this post may want to check out the next one at UCL in October.
Ovid: Do you have any Perl-related plans for the future?
Alex: Well personally I have a medium-sized stack of pull requests and bug reports for HTTP::Async that I inherited when I took over maintenance of that module from EVDB :-) I need to read up on HTTPS/SSL a bit more I think!
As for Nestoria, we will continue to use Perl, we will continue to teach Perl to pre-graduates and new graduates, and we will continue to be a part of the London and European Perl communities.
Ovid: Are there any questions you wish I asked, but didn't?
Alex: Hmm, how about: "Is it true that you have a framed photo of Larry Wall hanging on the wall of your office?"
Why yes Ovid, it is true that we have a framed photo of Larry Wall hanging on the wall of our office :-)
They ought to get that photo signed.
There is a typo in the link to Number::Format::SouthAsian.
Wow, i must say i'm impressed.
I'm moving pretty soon, so i checked out Nestoria, to see what is available in my neighborhood (i want to stay in the same area).
This website has bluffed me. It's pretty much what i imagined a "perfect" Property search engine should be. I used to surf through various search engines and think "there ought to be a better search engine somewhere". Well, I think I've found it :-)
It's extremely simple and intuitive to use. Blazing fast (thanks to jQuery in the right places i guess). And it puts the properties on a map, where i can click to view a property's description. Definitely not your common slow, complex, and un-informative property search engine.
Thank you very much for this. I'll thank you again later if i find a place that i like ;-)
Hi @mascip
many thanks for the kind words, I'm one of the founders of Lokku, the company behind Nestoria. Great that you appreciate what we're doing.
Making something simple is very hard, even more so in markets like India and Brazil where data quality isn't always what we might want it to be.
As Alex mentioned we're hiring, please get in touch if you'd like to help. And for those who would like to keep up with what we're up to the best way is via our twitter account @nestoria.
Thanks again, Ed
I definitely want to add my 2 cents to those of mascip.
It's impressive what Nestoria has managed, especially in terms of interface speed and geocoding. The latter simply baffles me - I got to see it at work when testing Nestoria's API.
Definitely the best property search engine in UK, for everyday users as well as developers.