Skip to main content

Analytical Platform Compute Maintenance

On the first day of the month the workflow schedule-issue-compute-infrastructure.yml will automatically raise a ticket for example Maintenance - Analytical Platform Compute.

This maintenance ticket includes EKS Cluster Upgrade if applicable and/or patching to ensure all components are up to date.

Check for new release

Check if a new release of Amazon EKS has been made available here.

Upgrade and patch the EKS Control Plane, EKS Nodes, EKS add-ons and all components where new releases are available.

The Approach

  1. Create a new branch / Pull Request.
  2. Make changes to the code.
  3. Create one Pull Request for all the changes.
  4. Check the Terraform plan is as expected for each environment.
  5. Request approval.
  6. Once approved, release the apply workflow gate for each environment in turn, testing before proceeding to the next environment.

Order

Apply in the Development, Test and Production environments, resolving any issues before progressing to the next.

Working on Analytical Platform Compute in Modernisation platform

Be aware that due to restrictions with the state file multiple people cannot work on the environment at once. Terraform plan is fine but once you have released the apply workflow the state file cannot be used by anyone until the changes are complete. This only affects the environment you are carrying out the apply.

Make the team aware and check before starting the work to avoid conflicts.

Workflows

Once you create a Pull Request in the Modernisation Platform Environments repository, a workflow run will be started that carries out the required checks etc. If you then subsequently push any changes up to the branch, you will need to go into GitHub Actions and cancel the previous workkflow so the new one can start.

Assumptions

  • You are operating in the modernisation-platform-environments repository Development Container.
  • To interrogate the cluster, you are exec‘d into the same account as the cluster you are operating on aws-sso exec --profile analytical-platform-compute-test:modernisation-platform-developer.
  • Use account modernisation-platform-developer for Test and Production and modernisation-platform-sandbox for Development.
  • If necessary update ~/kube/config as follows aws eks update-kubeconfig --region eu-west-2 --name analytical-platform-compute-test.
  • Set context as follows kubectl config use-context arn:aws:eks:eu-west-2:767397661611:cluster/analytical-platform-compute-test.

Note: amend above appropriately for the environment you are working in.

Impact on Users

As this is a live service there could be an impact on users so this will have to be taken into consideration when planning the work.

The impact on users depends on what is planned to be upgraded/patched.

For Example

If upgrading cloudwatch logs agent, the user impact is minimal, applications will run, logs might be delayed and you will not require a maintenance window.

If upgrading karpenter, the user impact is potentially higher because jobs might not schedule as expected so you will have to agree when to schedule a maintenance window.

Schedule a Maintenance Window

To schedule a maintenance window for Test and Production go to Pagerduty Maintenance Page and use the Post Maintenance button.

Example Pull Requests

Upgrade the EKS Control Plane

  1. Update the eks_cluster_version to the new version in terraform/environments/analytical-platform-compute/environment-configuration.tf.
  2. Commit and push your results to the branch.

Upgrade the EKS Nodes

  1. Use the script here to identify the latest version appropriate for the cluster.
  2. Update eks_node_version in environment-configuration.tf with the value from above.
  3. Commit and push your results to the branch.

Upgrade the EKS add-ons

  1. Use the script here to identify the latest version appropriate for the cluster.
  2. Commit and push your results to the branch.

Source: Describe EKS Add-on versions

Patch Terraform modules

Patching is a manual process. This means you will have to check each module in each file as follows.

  1. Open each .tf file in the terraform/environments/analytical-platform-compute directory.
  2. Check each module i.e source = "terraform-aws-modules/eks/aws//modules/karpenter" in eks-custer.tf and cmd + click to follow the link.
  3. Also check any helm_release for example in helm-charts-system.tf for any new versions.
  4. Amend the version if appropriate.
  5. Commit and push your results to the branch.

Applying/Releasing the Changes

Once the Terraform plan is checked and as expected, the changes can then be applied by the workflow. This needs approving via the Review pending deployments of the apply job for the environment.

  • Development - The changes can be applied prior to the Pull Request approval.

  • Test - If the apply in the Development environment has completed as expected the changes can be applied and this can also be carried out prior to pull request approval. This should be carried out in the agreed maintenance window for Test.

  • Production - If the apply in the Test environment has completed as expected seek approval for the pull request and merge into main. The changes can then be applied by aproving the workflow apply process. This should be carried out in the agreed maintenance window for Production.

PagerDuty Maintenance Scheduling

Before performing any upgrades, a maintenance window must be scheduled on PagerDuty.

Please follow the notice periods below for different environments:

  • Development: No prior notice required. Work may proceed as needed.

  • Test: Requires a minimum of 1 working day’s notice before scheduling.

  • Production: Requires a minimum of 2 working days’ notice before scheduling.

This page was last reviewed on 17 February 2025. It needs to be reviewed again on 17 May 2025 by the page owner #analytical-platform-notifications .