Trying to make confluence usable.

=head1 RESTfluence

I've tried to make this blog post copy/pastable as valid perl and valid markdown. So with luck it can be copy/pasted into an editor if you want to use this.

Confluence. I don't really like it, but the major thing it's got going for it is that it's not Sharepoint. As I am spending the summer holidays doing some documentation at work, one of the things I wanted to do was to make confluence less hateful. So I cracked open the REST API to see how far I could get.

There used to be good tools, but atlassian got rid of the XMLRPC API not that long ago.

Progress I made was:

  • Got a list of all spaces, and all pages in each space.
  • Worked out how to obtain the content of a page.
  • Worked out how to change the content of a page (for when the time comes).

Where I got stuck:

  • Working out how to round-trip the confluence markup to/from markdown.

The rest of this post describes the script I put together. It's not useful enough for me to put on the CPAN but it's worth putting up somewhere.

=cut

package RESTfluence;

=for post

This is a script at the moment, but it still has its own package name. I'm using Moo because it makes everything easy. You'lll notice that I use lazy attributes (built on demand). Making lazy attributes so easy is the #1 thing I get out of modern perl. Remeber, Moo turns on warnings and strict automatically.

=cut

use Moo; # provides ->new, $self and easy to declare lazy attributes

=for post

There's a small list of CPAN modules needed:

=cut

use URI;                    # I hate URI string manipulation
use Net::Netrc;             # Credentials store
use REST::Client;           # Some convenence wrappers on REST
use JSON::MaybeXS qw(JSON); # the only worthwhile CPAN json module ;)
use feature 'say';          # modern perl print $string . "\n";

=for post

We're going to want storage for the wiki url and its credentials. For authentication, we're using a ~/.netrc file (chmod 0600) and store with this format:

machine https://myco.atlassian.net/wiki
  login me@myco.net
  password mypassword

=cut

has wiki => (
    is => 'ro',
    default => $ENV{MY_CONFLUENCE}, # e.g. https://myco.atlassian.net/wiki
);

has netrc => (
    is => 'lazy',
    default => sub {
        return Net::Netrc->lookup($_[0]->wiki);
    }
);

=for post

The other component is to provide a JSON parser, and an attribute that contains the REST::Client instance.

=cut

has json => (
    is => 'lazy',
    default => sub {
        JSON->new->pretty(1);
    }
);

has rest => (
    is => 'lazy',
    default => sub {
        my ($self) = @_;
        my $c = REST::Client->new;
        $c->getUseragent->default_headers->authorization_basic(
            $self->netrc->login, $self->netrc->password);
        $c->setHost($self->wiki . '/rest/api');
        return $c;
    }
);

=for post

run if ! caller is one of the best things ever.

=cut

__PACKAGE__->run() if ! caller;

=for post

As I said I got to the point where I could download content, so that's what run currently does.

=cut

sub run {
    my ($self, %args) = @_;
    $self = $self->new(%args);

    # my $all_pages = $self->get_all_pages();
    my $data = $self->get_page_content(title => q/Page Title HERE/, spaceKey => 'DKB');
    return $data;
}

=for post

I was going to add post, put and delete methods when necessary.

=cut

sub get {
    my ($self, $path, $args) = @_;
    $self->request('GET', $path, $args);
}

=for post

This is the bit that drives the request. It does some very gentle path mangling and returns the parsed json if status is 200, otherwise returns a data structure with the response content.

=cut

sub request {
    my ($self, $method, $path, $args) = @_;
    $args ||= [];
    $path = join '/' if ref($path); # ARRAY
    $path = "/$path" unless $path =~ m{^/};
    $self->rest->$method($path, @$args);

    my $rc = $self->rest->responseCode;
    say STDERR "Response: $rc";
    my $res;
    if ($rc == 200) {
        $res = $self->json->decode($self->rest->responseContent);

    }
    else {
        $res = { response_code => $rc,
                 message => $self->rest->responseContent,
             };
    }
    return $res;
}

=for post

The query subroutine is a convenience wrapper around REST::Client

=cut

    sub query {
    my ($self, %query) = @_;
    return $self->rest->buildQuery(%query);
}

=for post

This is where I start working with the API. This method just lists all spaces in the wiki, and returns them.

=cut

sub list_spaces {
    my ($self) = @_;
    my $orig = URI->new($self->rest->getHost);
    my $new = $orig->clone;

    my $offset = 0;
    my $all_spaces;
    my $spaces = {};
    my $limit = 50;

    while (! $spaces->{results} || @{$spaces->{results}} == $limit ) {
        $spaces = $self->get("/space?limit=$limit&start=$offset");
        push @$all_spaces,  map {
            {   name => $_->{name},
                key => $_->{key},
                id => $_->{id},
                api => $_->{_links}->{self},
                web => $self->wiki . $_->{_links}->{webui}   }
        } @{$spaces->{results}};
        $offset = $offset + $limit;
    }
    return $all_spaces;
}

=for post

get_pages_for returns all the pages in a space. From this result we can get the page space and the page title. Which is important for obtaining content.

=cut

sub get_pages_for {
    my ($self, $space) = @_;
    my $key = $space->{key};
    my $limit = 25;
    my $offset = 0;
    my $info;
    my $pages;

    while (! $pages || @{$pages->{results}} == $limit ) {
        $pages = $self->get("content?spaceKey=$key&type=page&limit=$limit&start=$offset");
        say STDERR "Limit: $limit Offset: $offset Key: $key";
        my $res = $pages->{results};
        push @$info, map { {
              title => $_->{title},
              id => $_->{id},
              web => $self->wiki . $_->{_links}->{webui},
              edit => $self->wiki . $_->{_links}->{editui},
              api => $_->{_links}->{self} }
                       } @$res;
        $offset = $offset + $limit;
    }
    return $info;
}

=for post

get_all_pages just gets all pages in all spaces

=cut

sub get_all_pages {
    my ($self) = @_;
        my $spaces = $self->list_spaces();
    foreach my $s (@$spaces) {
        $s->{pages} = $self->get_pages_for($s);
    }
    return $spaces;
}

=for post

get_page_content requires a hash:

( title => 'Page Title', spaceKey => 'whatever)

Currently this returns atlassian storage format which seems to be a mix of XHTML and XML. I'd really like to be able to round trip this to markdown so I can edit and create pages easily. But I'm currently at a loss as to how to proceed.

=cut

sub get_page_content {
    my ($self, %location) = @_;
    my $q = $self->query(%location, expand => 'body.storage');
    my $res = $self->get("/content$q");
    return ref $res ?  $res->{results}->[0]->{body}->{storage}->{value} : undef;
}


1; # end package on truth :)

Leave a comment

About kd

user-pic Australian perl hacker. Lead author of the Definitive Guide to Catalyst. Dabbles in javascript, social science and statistical analysis. Seems to have been sucked into the world of cloud and devops.