MyCPAN indexes 97% of BackPAN
My goal a long time ago was to index about 90 to 95% of BackPAN, thinking that if I didn't get some ancient distributions that would be just fine and no one would miss them. There are about 140,000 distributions to index, and I'm figuring out why I can't get the last 4,200. That means I'm indexing
I've been back to my BackPAN indexing project since I've stopped traveling to all of these conferences. Now I'm looking at the edge cases. Here's a breakdown of what I can't index just yet. Most of this is my fault. That is, I have the goal to index all of BackPAN, trying many methods to get the right answer. Most of the missing 3% are edge cases I don't handle. Some of that missing 3% I'll never be able to index, like the 0 byte distros.
Error | Count | Comments |
---|---|---|
Could not find distro files | 1707 | I'm probably doing something wrong. |
Could not unpack dist | 749 | Some tarballs seem to not like my tar, or not even be valid |
Could not find file list | 735 | Some things don't unpack normally, or aren't actually Perl dists |
Could not find distribution directory | 307 | I expect everything to be in a directory. Some distros unpack to the current dir. |
Could not find module list | 276 | Some distros don't have modules. |
Could not parse META.yml | 163 | Haven't figured this out. Not all YAML formats and parsers are compatible |
No idea | 139 | This is a catch-all for things I couldn't classified |
Could not run build file | 91 | Some of these are missing a library, etc., so the build file dies. |
Unparseable YAML files | 45 | Same as the META.yml, but for the report I created. Something didn't store correctly. |
Other YAML errors | 13 | A parser started to parse something then gave up. |
Dist has 0 size | 8 | Some tarballs are 0 bytes. Two of them are mine. |
Permission denied | 8 | I can't read some files in some dists because their permissions are wacky. |
Leave a comment