PTS 2024 - Day 4 - here comes the sun... it's all right!

Following on from The bad days

We made the decision that our problems in Kubernetes were exactly the sort of thing that should not be distractions to the project. We had been trying to save costs when we choose Hetzner for hosting... especially as we did not know where our ElasticSearch cluster (needing 3x32Gig of ram) was going to live. The great news is last week ElasticSeach agreed to host this for us, which really is a game changer.

With this in mind, we reviewed hostin…

PTS 2024 - day 2 and 3... the bad days

Following on from day 1

Joel and I spent some more time working out disk provisioning and then decided to upgrade the nodes in the cluster... this is where the problems started...

I shutdown a node to resize it... and the site went down, no healthy backends was then displayed to all users by Fastly (our CDN) for any content that wasn't in their cache. This is not meant to happen!

We also couldn't connect to Argo (web UI for Kuberneties deployment and a view on the K8's API status) or…

PTS 2024 - day 1

I am always flattered to be invited to the Perl Toolchain Summit, and reinvigorated in working on MetaCPAN each time.

Currently I am focused on building on the work I and others did last year in setting up Kubernetes for more of MetaCPAN (and other projects) to host on.

Last week I organised the Road map which was the first thing we ran through this morning. I was very fortunate to spend the day with Joel and between us we managed to setup:

- Hetzner (hosting company) v…

PTS 2018 - Day 1

I'm at the Perl Toolchain Summit 2018 in Oslo for a few days working with the MetaCPAN team. This is the 10th year of the summit (although confusingly the 11th actual summit!), and the 3rd year I've been able to attend.

My focus for day 1 has been making MetaCPAN front end and API more resilient and also to put together a what to do if site down and Disaster Recovery plan (day 2 I will be testing that DR plan).

I've setup our 2nd d…

Meta::Hack 2

Meta::Hack 2 - 2017

What?

Meta::Hack is about getting the core MetaCPAN team together for a few days to work on improving... well as much as possible! Last year we focused on deploying to a new infrastructure, with new version of Elasticsearch. This year was a much more varied set of things we wanted to achieve.

Why get together?

Whilst Olaf couldn't attend in person, we had him up on the big screen in the ServerCentral (who kindly hosted us and bought us lunch) offices so it was almost as good as him being physically there. Having us together meant we could really support each other as questions arose.. a fun one was tracking down that the string JSON::PP::Boolean, incorrectly identifies as is_bool in JSON::PP - there is a pull request - though that's not released yet. We also found bugs in our own code!

group.jpg

Achieved...

I spent a lot of my time with Brad, who has been setting up logging and visualisation, I setup Kibana readly for when we get the data and I reviewed some of Graham's changes that will make the logs from our applications easier to control. Brad got most of it working and hopes to finish it off in the next couple of weeks. This should give us much better visability of any errors, allow us to load balance better and also give us an overview of our infrastrcture in a what we don't currently have.

Panopta, a monitoring service who kindly donate us an account sent along one of their engeneers, Shabbir, who talked us through some of the features that would be useful to us.

IMG_6567.jpg

The biggest thing that is visible so far is the autocomplete on the site is now SO much better thanks to the work of Joel and Mickey, this was something Mickey and I started a year or so ago, but they've taken it much further and included user favourites to help boost what you are most likely to want in an autocomplete.

At Graham's request I've converted all Plack apps to run under Gazelle which is faster.

I also spent some time cleaning up our infrastructure, moving some sites off an old box (still running our old code from 2+ years ago as a backup) which has then been brought up to using the same versions of everything as the other boxes.

LiquidWeb - who host most of our production hardware came to our resque, not once, but twice. With some reindexing - it looks like we heated up the servers to the point that over the 4 days 2 of them rebooted themselves at various points! LiquidWeb responded quickly each time, replacing the power supply in both and a fan in one of them.

Some of the other smaller bits I worked in included automating purging of our Elasticsearch snapshots (which I set running last year). I created some S3 buckets for Travis CI artifact storage (our cpanfile.snapshot built from PR runs).

Tired...

Jetlag has it's own sort of fun, so my days have been starting at 4:30, with 3 hours of hacking from the hotel, before heading out for breakfast and then to the ServerCentral offices... for a productive morning. Usually by 3pm though my brain just freezes and my fingers are tired... but I'm just starting to get used to it... which is because I'm heading home tonight!

Thanks!

This wouldn't have been possible without our sponsors:

Booking.com, cPanel, ServerCentral, Kritika