The Missing Part of GitHub Actions Workflows: Monitoring

Clear visibility into a continuous integration (CI) pipeline is more important than before to sustain health and performance in the development life cycle.
Development teams tend to develop more securely and comfortably when they know the success/failure rate, duration, and cost of their CI workflows at a high, granular level.
When issues are solved in preproduction, you automatically reduce production defects and recessions. This leads to boosting all your success metrics such as mean time to respond (MTTR), change failure rate, development frequency and more.
Achieving this is not easy. Teams require visibility into continuous integration pipelines to know where the bottlenecks are or how to optimize them. And they need to know immediately. Without that visibility, when a CI workflow fails, software teams rely on guesswork and try to reproduce the error in their local machines to understand the error root cause.
Why Does a CI Pipeline Fail?
The existing CI/CD providers do not focus on making your CI processes visible in appealing user interfaces. To understand the status of your builds, jobs and workflow runs, you must travel through many tabs or just rely on guesswork.
The GitHub UI is not designed to troubleshoot failures or examine the reasons for high latencies. It is your job to examine between CI workflows and jobs to understand which workflow fails the most and why, what’s the failure trend, the durations and the reasons for jumping costs.
In this article, we’ve gathered the visibility needs of some GitHub users for their workflows. Our inference is that it is tough to stay in control of your GitHub Actions workflows by means monitoring. Let’s take a look at what people want to monitor.
Workflow Performance and Status
One of the most essential metrics when you run workflows for your continuous integration pipeline is being able to see the performance and the success/failure status of the workflow runs.
You would want to understand how many times your workflows ran and the count of failures and success. You may also want to have the same information in a more granular level about jobs and steps of your workflow runs.
The tweets below illustrate the need for visibility in the GitHub Actions workflows. Understanding and even preventing CI failures or optimizing the workflow run durations can be possible with a bird’s-eye view of all your workflow runs.
Some timeline of job duration, success etc. for each GitHub actions workflow.
— Ashhar Hasan (@hashhar) May 5, 2022
GitHub actions statistics. Like duration, fail rate, etc.
— Nicolai Antiferov 🇺🇦 (@Nklya_) May 5, 2022
More granular GitHub actions metrics: # workflows runs, # failures/ success (general & by repo), mean time to prod, etc
— Nahuel batista (@nahuelbatista_) May 5, 2022
GitHub Actions: runtime, success / error status and a simple count of job invocations.
— Maz (@mazin_power) May 5, 2022
Having across-the-board visibility into the software delivery process is a common challenge for most software organizations. You need workflow analytics to identify issues and enhance organization-wide visibility into your CI workflows.
If you want to optimize your CI pipelines, then you should keep an eye on success and failures, and also track how the workflows perform.
Workflow Cost
Organizations try to keep their costs at the optimum level, and that’s only possible by watching the trends of the cost of resources. Since GitHub Actions does not provide granular insights into the cost of workflows, we have trouble detecting which workflows are burning more money and why.
The cost-monitoring challenge for GitHub Actions workflows is spoken about in the tweet below.
Better billing metrics without having to download CSV files and do my own pivot tables.
— Ken Collins (@metaskills) May 4, 2022
Incontestably, any production defect is more costly than the biggest preproduction failure. That’s why proactively monitoring the cost peaks and downs of CI workflows is as important as monitoring the failures, duration, etc.
A monitoring approach that shows the breakdown of the workflow costs will help speed things up and optimize costs for CI workflows on GitHub Actions.
Workflow Duration
Software teams can significantly reduce the time spent sustaining iterative software development by proactively monitoring their CI pipelines. When using GitHub Actions for continuous integration automation, it is not possible to know the duration breakdown of each workflow run, the duration trends and the reasons why workflow runs are failing or taking longer than expected.
In the tweets below, we clearly see why having visibility across the duration of GitHub Actions workflows is needed.
I want to know how long my actions are waiting for runners.
— Olivia Montgomery (@longcommonname) May 4, 2022
Roughly:
+ Duration per workflow
+ Duration per stepQuestions I want to ask:
+ Where are people waiting for feedback?
+ Where are the minutes per month being spent?— Jason Wieringa (@jwieringa) May 5, 2022
DevOps, site reliability and development teams hate wasting time on troubleshooting failures. But when they have clear visibility on their CI pipelines, they can easily keep their master branch green without spending so much time.
To easily and securely understand and troubleshoot GitHub Actions workflow activity, developers should have a handy monitoring tool in place. It becomes easier to resolve bottlenecks, reduce CI costs and deliver better software by having a comprehensive view into CI activity.
There’s an iterative, non-agile process required today for debugging failed CI workflows. Thundra’s Foresight is platform-agnostic and works on premises, in the cloud, on containers and on serverless code, making it possible to boost productivity and have successful production delivery.