CPAN modules for parsing User-Agent strings

This review is now hosted elsewhere.

15 Comments

Apart from the modules mentioned here, there is extensive parsing of the user agent string in the program "awstats", although I am not sure how modular it is (I don't know how easy it would be to use that in another program).

The main thing I would be interested in from a module like this would be detecting if a user agent was a robot or not.


Thanks for putting this together. I was surprised at how HTTP::BrowserDetect stacks up against the others in your breakdown. It will be 12 years old in February (according to BackPAN), so it may well be the crustiest of all of them. There are some design decisions from over a decade ago which are still firmly entrenched in the code.

It can be a lot of work to stay up to date with UA strings, so I would think everyone would stand to benefit from some sort of pooling of efforts.

Hi Neil

Congratulations. I can see a huge effort has gone into this.

If you have the time, I'd like to see this released as a module under the Benchmark::Featureset::* namespace.

Ah, i was about to give you an advice not to look at the awstats code. It is one of the most insane perl code I've ever seen in my life.

Thanks for the great article. It was very timely, as I was just trying to find a better mobile user-agent detection package recently. I have been using HTTP::BrowserDetect, which I find pretty good, and it is actively being updated. After I read your article, I tried the others on your list, but I ended up staying with HTTP::BrowserDetect because my app needs robot and mobile detection, and this module is by far the best in these categories at this point. It's still not perfect, though, but much better anything I've found so far. I'd love to find a more reliable way of detecting robots and mobile browsers (as well as browser type, though that turns out to be 3rd in importance for my needs).

Thanks again, and please post updates.

By the way, for mobile detection, I experimented with the 3 most promising modules for this purpose. I wasn't as diligent as you, but I did some experiments on some of our traffic that's mostly mobile: HTTP::BrowserDetect detected over 90% of unique user-agents as mobile, which isn't bad at all; HTTP::MobileAgent detected less than 5% (it may work well with Japanese carriers, but it doesn't seem very useful for general mobile detection); HTTP::DetectUserAgent detected none as mobile (or robot). I didn't try Mobile::UserAgent because it hasn't been updated since 2005, so it's unlikely to be useful at this point.

HTTP::BrowserDetect is also the best I've found for robot detection, as your section on robot detection also indicates. If anyone knows a more reliable robot/crawler detection scheme, I'd love to know about it because this is another important requirement I tend to have in traffic analysis.

Thanks again!

@Al, please feel free to get in touch with me about improving robot detection in HTTP::BrowserDetect. It is far from perfect, but I do my best to keep up with user contributions etc. You can open issues at https://github.com/oalders/http-browserdetect/ or contact me directly. My contact info is at https://metacpan.org/author/OALDERS

Thank you for this comprehensive and well-written review, I have found it most useful. You may have set a standard here.

Thank you. This is very usable information for me :-)

The executive summary mentions that HTTP::DetectUserAgent is recommended if robots are important.

The conclusion seems to recommend HTTP::BrowserDetect however.

Is this a mistake?

Any chance you’ll revisit this to include HTTP::UA::Parser in the line-up?

Leave a comment

About Neil Bowers

user-pic Perl hacker since 1992.