Saif Gunja at software intelligence firm Dynatrace describes how automation helps site reliability engineers to combat the relentless tide of vulnerabilities
As the global appetite for digital services continues to grow, building reliable applications is more important than ever. Every second of downtime leads to lost revenue, declining share prices, and lasting reputational damage.
As a result, site reliability engineering (SRE) is indispensable to application development, helping organisations to maintain service availability and performance.
SRE scope broadens to include security
As SRE matures, organisations are realising it goes beyond just maintaining application performance and availability. Security is increasingly a factor in reliability, as vulnerability disclosures lead to businesses having to take apps offline while they investigate and tackle their exposure.
Therefore, SRE professionals are often tasked with helping security teams find, prioritise, and resolve vulnerabilities as quickly as possible to minimise the financial and reputational impact of applications and services being offline.
However, as the number of security flaws rises, this becomes a significant drain on their time. In 2021, the National Vulnerability Database logged a record 21,957 new threats.
To combat this relentless tide, SRE is evolving into a more strategic role, where engineers bridge the gap between security, development, and operations teams. SRE equips these teams with the solutions, data, and capabilities they need to deliver services that are reliable and secure by default.
However, driving this DevSecOps collaboration places a heavy burden on SRE professionals, who have a growing range of tasks to perform. This highlights the need for a new approach to vulnerability management that enables teams to get apps back online faster, with less manual effort.
Vulnerability whack-a-mole
While third-party code libraries can accelerate application development, they also contain significant security risks — as we saw with Log4Shell and the more recent Spring4Shell. According to the State of SRE Report: 2022 Edition, 68% of site reliability engineers expect their role in security to become even more central in the future — particularly as the use of open-source code libraries increases.
This increased usage results in an intense game of whack-a-mole among SRE and security teams, as dynamic cloud and Kubernetes environments, make it more difficult to quickly locate, prioritise, and patch vulnerable open-source code. By the time teams manually track down all instances of a vulnerability and identify which contain the biggest risk, myriad new instances pop up elsewhere. This increases an app’s downtime, as finding, prioritising, and resolving the security flaw becomes more time-consuming.
As a result, engineers are persistently tied up in vulnerability management, which distracts them from higher-value tasks that are more specific to their role. Hiring more SRE professionals isn’t an option due to their scarcity. So organisations must find another way to reduce their engineers’ burden.
Automating the future
The State of SRE Report found that detecting and eliminating vulnerabilities quickly was SREs’ third-most time-consuming task. This highlights the need for improved efficiency and automated vulnerability management processes to lessen the burden on SRE teams.
Therefore, converging security with real-time, continuous observability is critical, as it shows teams what code is running in production and where vulnerabilities are within their environment. This means teams can quickly access the context needed to understand their attack surface and evaluate the risk whenever a new vulnerability is disclosed — for example, by identifying whether sensitive data is exposed.
Real-time observability can also power artificial intelligence that provides precise, data-driven answers to support self-healing automation. So, SRE teams can enable applications that are reliable and secure by default. This empowers them to move from a reactive to proactive posture when it comes to vulnerability management.
If everyone – SRE professionals, developers, and security – can access these precise answers from a single source of truth, they can act much faster, driving reliability and ensuring apps and services remain available. There is a clear desire for this collaborative approach, with 85% of SREs saying they want to standardise on the same observability platform from development to operations and security by 2025.
A more reliable future
While reliability is a core pillar of the modern digital business, security is a critical part of reliability. To ensure apps and services remain available and uphold customer expectations, organisations need to drive DevSecOps adoption to improve collaboration and relieve some of the manual burden their SRE teams face.
By shifting from a reactive stance on vulnerability management to a proactive one, SREs can focus on the tasks that are core to their role — maximising performance, building resiliency, and delivering better business outcomes.
Saif Gunja is Director of Product Marketing, at software intelligence firm Dynatrace
Main image courtesy of iStockPhoto.com
© 2024, Lyonsdown Limited. Business Reporter® is a registered trademark of Lyonsdown Ltd. VAT registration number: 830519543