Atom Feed Help

By Ovid on December 17, 2009 10:16 AM

It's more than a touch frustrating for me, but I need help processing an Atom feed (having never done this before). Specifically, I need help with the gitpan Atom feed. Github has a useful API, but it can't handle the huge number of repos which gitpan has, not does it appear that the Github API offer any paging facilities.

I've already seen modules like XML::Atom, but what I'd like to see is something which allows me to pull past Atom entries (I know this is available because Google Reader can read the past entries. Heck, even reading the HTTP headers hasn't allowed me to decipher the exact incantation needed. Basically, I'm looking at the following (pseudo-code):

my $atom = Some::Atom::Module->new($atom_url); my ( $limit, $offset ) = ( 100, 0 ); while ( my $results = $atom->fetch( { limit => $limit, offset => $offset } ) { process($results); $offset += $limit; }

I see a number of Atom modules on the CPAN, but I've not found one which offers paging. Have I missed one? Is there a clear resource online to explain how I can at least fetch past Atom results via curl?

6 comments

Tagged as:

atom, git, github, gitpan, perl

6 Comments

Dave Cross | December 17, 2009 11:01 AM | Reply

Your problem is at a conceptual level I think :-/

It looks like the github atom feed contains 35 entries. So you can only ever get the most recent 35 entries from parsing the atom feed.

I know that Google Reader looks like it can get older stuff. But I'm pretty sure that's only because it downloaded the atom feed when those entries were there and then cached the information in a database.

All of which means that for a huge upload like gitpan, the atom feed is pretty much useless and you'll have to start digging around in the API - perhaps doing stuff a few repos at a time.

Let me know if I can be any more help.

Ovid replied to comment from Dave Cross | December 17, 2009 11:10 AM | Reply

I was beginning to worry that this might be the case. RFC 5005 explains how feeds and archives should be handled, but clearly Github does not present anything like that, so it sort of looks like I may be stuck. I may have to fall back to HTML scraping. At least that's available :/

Dave Cross | December 17, 2009 11:16 AM | Reply

I don't think I've ever seen an atom feed that follows those standards.

Ovid replied to comment from Dave Cross | December 17, 2009 11:22 AM | Reply

You're not making me feel better, Dave :)

I've raised a support request with Github to deal with the original source of my problem.

Brian Cassidy | December 17, 2009 1:05 PM | Reply

FWIW, your pseudo-code is kind of like OpenSearch.

Aristotle replied to comment from Dave Cross | December 21, 2009 2:26 AM | Reply

Dave: all Blogger feeds have paging links per RFC 5005. (Hardly much help to Ovid, though.)

Ovid: there is no automagical paging mechanism for feeds. A feed is no more special than a web page. The https://blogs.perl.org front page doesn’t have dynamic paging either, f.ex., so there’s simply no way you can page backward.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Ovid

Freelance Perl/Testing/Agile consultant and trainer. See http://www.allaroundtheworld.fr/ for our services. If you have a problem with Perl, we will solve it for you. And don't forget to buy my book! http://www.amazon.com/Beginning-Perl-Curtis-Poe/dp/1118013840/

More info »

Ovid

Atom Feed Help

Tagged as:

6 Comments

Leave a comment

About Ovid

Search this blog