Atom Feed Help
It's more than a touch frustrating for me, but I need help processing an Atom feed (having never done this before). Specifically, I need help with the gitpan Atom feed. Github has a useful API, but it can't handle the huge number of repos which gitpan has, not does it appear that the Github API offer any paging facilities.
I've already seen modules like XML::Atom, but what I'd like to see is something which allows me to pull past Atom entries (I know this is available because Google Reader can read the past entries. Heck, even reading the HTTP headers hasn't allowed me to decipher the exact incantation needed. Basically, I'm looking at the following (pseudo-code):
my $atom = Some::Atom::Module->new($atom_url);
my ( $limit, $offset ) = ( 100, 0 );
while ( my $results = $atom->fetch(
{ limit => $limit, offset => $offset } )
{
process($results);
$offset += $limit;
}
I see a number of Atom modules on the CPAN, but I've not found one which offers paging. Have I missed one? Is there a clear resource online to explain how I can at least fetch past Atom results via curl?
Your problem is at a conceptual level I think :-/
It looks like the github atom feed contains 35 entries. So you can only ever get the most recent 35 entries from parsing the atom feed.
I know that Google Reader looks like it can get older stuff. But I'm pretty sure that's only because it downloaded the atom feed when those entries were there and then cached the information in a database.
All of which means that for a huge upload like gitpan, the atom feed is pretty much useless and you'll have to start digging around in the API - perhaps doing stuff a few repos at a time.
Let me know if I can be any more help.
I was beginning to worry that this might be the case. RFC 5005 explains how feeds and archives should be handled, but clearly Github does not present anything like that, so it sort of looks like I may be stuck. I may have to fall back to HTML scraping. At least that's available :/
I don't think I've ever seen an atom feed that follows those standards.
You're not making me feel better, Dave :)
I've raised a support request with Github to deal with the original source of my problem.
FWIW, your pseudo-code is kind of like OpenSearch.
Dave: all Blogger feeds have paging links per RFC 5005. (Hardly much help to Ovid, though.)
Ovid: there is no automagical paging mechanism for feeds. A feed is no more special than a web page. The https://blogs.perl.org front page doesn’t have dynamic paging either, f.ex., so there’s simply no way you can page backward.