Driving Quality and Efficiency with the DevOps Checklist

BY

12 May 2024

Software Development

post-cover-image

The WorkingMouse DevOps Report

This article showcases the WorkingMouse DevOps Report. Within the ideals of agile, DevOps and good software development, a project will have a series of features that will maximise success for both business outcomes and software functionality.

These features are detailed in this checklist and by investing the time and resources into enabling these features in all projects, we empower projects to be successful within the development and support phases of the software phases of the software development lifecycle.

How to use

This checklist used as a best practice guide on what a project needs, ideally at any stage within a project, all items are addressed however, it is possible that early on this may not possible.

The goal is to have it as complete as possible at all times. The traffic light system detailed below will outline how to track and measure the completeness of this checklist.

Traffic light system

Each item in the checklist may have varying degrees of completeness based upon the current stage of the project.

For example, at the start of a project, while environments are set up, the system may not be complete enough to ensure a full compliance, but that does not necessarily mean a project fails a particular criterion.

  • 🔴 Not addressed: A red indicates failure in a particular criterion.

  • 🟠 Partially complete: Orange indicates that a criterion has partial completion, and as such, still needs more work.

  • 🟢 The ideal state: Full compliance with the criterion

Checklist criterion

The following pages detail the various requirements for each item on the checklist. They will detail what is required and what will result in a particular rating (🔴,🟠, or 🟢) for that item.

1. Pipeline state

A pipeline is a hugely valuable tool for measuring the quality and readiness for a project, in order to maximise the value offered by a pipeline, a few key items need to be addressed.

Some of these include:

  • A failing pipeline indicates a legitimate problem that needs to be addressed - No false negatives
  • A pipeline runs on the latest code available
  • Intermediate and deployment artifacts are create by the pipeline
  • The pipeline is passing and passing indicates a healthy application

Criterion

Result
🔴- Pipeline is incomplete or cannot be used to measure application health/quality. Some automated tooling is run external to the pipeline, i.e. some tests need to be run locally.
🟠- Pipeline is complete but may have some known, documented defects (i.e. a test that always fails) but can still be used to measure the health and quality. All automated tooling is running on the pipeline, i.e. no tests need to be run manually.
🟢- Pipeline is complete and passing with green ticks across the board.

2. Pipeline tools

The pipeline that has been setup for the project has all the tools appropriate for the project.

The key types of tools required for a complete pipeline include:

  • Code quality
  • Testing
  • Security
  • Performance

Please note: This criterion requires the tools be setup and configured, not the application itself be passing any requirements set by them. i.e. the pipeline can still be failing if the application is not meeting the requirements set by the tools.

Criterion

Result
🔴- Less than 3 of the tools have been added to the pipeline. For example, a pipeline only testing and security tools would result in a rating of RED.
🟠- All tools are setup on the pipeline but not passing or have some small defects in their configuration.
🟢- All the tools are on the pipeline and are configured and working as intended.

3. Testing coverage

Is there adequate testing coverage and are all main users flows addressed.

HINT: the traceability report is a powerful tool for identifying coverage and if used properly can greatly ease the push from RED to GREEN.

Criterion

Result
🔴- Testing coverage is unknown or known to be insufficient.
🟠- >80% of main user flows are addressed with test scripts for the areas that are not covered.
🟢- 100% of main user flows are addressed and automated.

4. Production environment

The existence of the environment in which the application will be running in for production is a crucial step for project readiness. Without a production environment, there is no way to predict the problems that may occur once this environment is setup and deployed to. To manage this risk early is it important to have this ready as soon as possible.

NOTE: A production environment does not have to be available to the public.

ASSUMPTION: The existence of a production environment infers the existence of a beta environment, this criterion will fail if beta does not exist.

Criterion

Result
🔴- Production environment is not setup
- The application has not been deployed to the production environment
- Production is not a replica of beta
- There is not beta environment/application has not been deployed to beta
🟠- Customer has not been actively using the production environment with realistic workloads or some parts of the application remain untested (i.e. project with multiple clients, only some have been used)
🟢- Production environment is setup
- Is a replica of beta
- The app is deployed and in used by the customer (passing customer UAT)

5. Infrastructure monitoring and logging

The observability of a system is a critical to managing the state of an application. Without monitoring, defects and failures can go unnoticed or can be difficult to diagnose and resolve.

Managing the risk of a running environment requires that environment to be able to be observed.

Criterion

Result
🔴- Logging is either off or only textual
- No alarms have been configured
- Resource monitoring has not been setup.
🟠- Alarms/monitoring have been set up but not verified
🟢- Alarms/monitoring has been tested and verified
- Application and environment logging has been configured to output to a structured log store

Risk register

The risk register is up to date and the client is aware of the outstanding risk for the project.

Criterion

Result
🔴- There is no risk register or it does not represent reality. No mitigations are in place
🟠- The risk register is mostly up to date, the client may not be fully aware but mitigations are in place for high priority risks
🟢- Risk register is up to date, mitigations are in place for required risks and the client is aware of the content.

7. Contributing / quick start guide

Different stakeholders will contribute to a project over it's life. New team members will be on-boarded and different teams may resolve issues or fix issues.

A key factor of the quick start is that a project needs to be able to be set up quickly. Tools exist to make this easier, such as docker, VS Code development environments etc.

An example of what this might look like is;

  ## Environment Setup
  1. 
  ## Start the Application
  1. 
  ## Test
  ## Debug
  ## Build
  ## Release

Criterion

Result
🔴- No contributing/quick start documentation exists or the existing documentation is incomplete or is defective.
🟠- Contributing/quick start documentation exists but is separate from the source code.
- Contributing/quick start documentation addresses only the details to get the application running.
- Project takes longer than 5 minutes to setup and get running from on a new computer.
- Testing can be done locally but is slow/unreliable.
🟢- Contributing/quick start documentation exists with the source code.
- Contributing/quick start documentation addresses how to get the application running and how to use its key features (i.e. default accounts)
- Project can be setup within 5 minutes on a new computer
- Tests can easily be run locally or in a pipeline with clear instructions for a new comer.

8. Bot version

Codebots provides many benefits, however, a project running on an old version of a Bot creates risk (performance, security, etc) that newer versions of the bot have resolved. As such, maintenance and development costs will typically be increased slowing down progress and costing the client more.

Resources

  • Bot Upgrades
  • Bot Upgrade Support process

Criterion

Result
🔴- Project is >2 major versions behind the latest version of their bot
🟠- Project is no more than 2 major versions behind the latest version of their bot.
🟢- Project is on the latest major version of their bot.

9. Project analytics

Analytics are setup and active within the application for application the usage of the app.

Criterion

Result
🔴- There are no analytics.
🟠- Some analytics are setup but they have not been tested or are not being reported on.
🟢-Analytics are setup, tested and are being reported on

10. Documentation

Documentation is present and accurate for main user flows. This includes:

  • Smoke testing scripts,
  • Developer documentation,
  • Videos,
  • Release process,
  • Key decisions,
  • Etc

Criterion

Result
🔴- There is little to no documentation or the documentation is completely out of date.
- Documentation cannot be easily followed/found.
🟠- There is documentation for the main flows, and it is mostly up to date/mostly complete.
- Documentation can easily be followed.
- All special instructions/details unique to the application have been accurately documented.
🟢- All main user flows are covered and the documentation is accurate and up to date and can easily be followed.
- All special instructions/details unique to the application have been accurately documented.

11. Project state

A project has many element, code base, version control system, release checklists, risk register. To ensure that a project is going well, these all need to be up to date.

Criterion

Result
🔴- Version tagging is inconsistent
- There are outstanding stale merge requests or branches
- Master/main does not represent the latest production code (or as otherwise dictated by the versioning and branching policy)
- If defects exist, they are not documented with steps to reproduce or at all.
- Must work remains to be released and UAT'd by the client.
- There exists unaddressed failed UAT's
🟠- There are some stale branches but no stale merge requests
- If defects exist, they are documented in tickets but do not have detailed steps to reproduce or priorities.
- No work older than one milestone remains unreleased, some outstanding UAT's and all failed UATs have been addressed.
🟢- There are no outstanding stale merge requests or branches
- Repository is up to date as per the versioning and branching policy.
- If defects exist, they are documented in tickets with priorities and detailed steps to reproduce.
- All released work has been UATed and passed by the client.
- No work older than one iteration remains unreleased.

Other comments

While not part of the checklist, the following are highly valued features of a good project:

  • Speed of pipeline --> Fast feedback is important as it allows for team to move quickly. All tests should ideally run within 2 minutes max, with a preference for less time.
  • Delivery team can release their own work to both beta and production

Summary

In conclusion, the DevOps Report encapsulates the ethos of WorkingMouse—a relentless pursuit of quality, a dedication to best practices, and an unwavering commitment to delivering exceptional software solutions. It is a narrative that not only informs but inspires, urging stakeholders to embrace the DevOps culture and reap the myriad benefits it offers.

How we empower departments and enterprises

Government

author-thumbnail
ABOUT THE AUTHOR

David Burkett

Growth enthusiast and resident pom

squiggle

Your vision,

our expertise

Book a chat