Building a Security Program That Works
2026.04.05
Security programs fail often in a predictable pattern. They accumulate tools, policies, and compliance obligations until the overhead of managing them crowds out the actual work. Teams spend time chasing audit findings and maintaining dashboards while the systems that would genuinely end the business if compromised sit underprotected, underinventored, and inadequately monitored.
Security Brutalism is a response to that. Every control, tool, process, or policy in a security program has to justify its existence against three questions. Does it reduce susceptibility, meaning how easily an attacker can reach the systems that matter? Does it limit damage, meaning the blast radius when compromise happens? Does it reduce recovery time, meaning how long the organization stays failed after an incident? If something cannot answer yes to at least one of those three questions with evidence, it is adding complexity without improving security posture. And complexity is not free. Every tool, integration, and access grant that cannot justify its presence is attack surface. The starting point is not a new tool. It is knowing what you already have.
You cannot evaluate susceptibility for a system you do not know exists. You cannot scope blast radius for a system that appears in no inventory. You cannot recover quickly from a compromise when you first learn about the affected system during the incident. The fastest path to a working inventory is pulling from sources that already track what exists. Your identity provider is the highest-signal starting point: query it for every application with an SSO or SAML integration and you get every system employees authenticate to centrally. Follow that with your cloud provider's resource inventory across every account and region, covering running compute, managed databases, storage buckets, serverless functions, and load balancers. Those two sources cover the majority of what matters in most environments. DNS reveals the rest of the external surface: pull your full zone, enumerate subdomains, and check certificate transparency logs for any certificate issued against your domains, including ones you did not authorize. Finance records are underused for this: accounts payable has a more accurate list of SaaS tools than IT does in most organizations, because payment approval creates a paper trail that adoption does not. Pull SaaS subscriptions from expense records and compare against what IT knows about. The gap is your shadow IT surface.
Then ask five people: the engineering lead, IT lead, product lead, operations lead, and the most senior engineer who has been at the organization the longest. One prompt each, thirty minutes each: list the systems your team owns or depends on that, if compromised, would cause the most damage. This surfaces institutional knowledge about legacy systems and dependencies that appear in none of the automated sources.
The identity inventory is its own problem and usually the larger one. Service accounts, API keys, OAuth tokens, CI/CD credentials, and machine identities outnumber human users in most environments by a factor of three to ten. They accumulate through integrations, automation, and team turnover and are almost never revoked. Query your IdP for non-interactive and service accounts. Pull IAM roles and service principals from every cloud account. Check your CI/CD system for stored credentials. Run secret scanning against your code repositories. For each non-human identity, record what it has access to, when it was last used, and whether it has a documented current owner. Anything with no recent activity and no documented owner gets revoked before anything else happens. Do not wait for a consequence map to justify this. An unused credential with no owner is unambiguous attack surface regardless of what it touches.
With a working system list and identity inventory, you have enough to begin the work that drives every prioritization decision in the program: mapping what it actually costs when each system fails. The consequence map is a ranked list of systems ordered by what failure actually costs the business. Top entries are systems whose compromise produces outcomes the organization does not recover from. Bottom entries are painful but contained. No probability estimates. No maturity scores. Honest answers to one question per system: if this fails or is compromised, what does the business lose, and is that loss recoverable?
Three classifications. Existential means the realistic worst case produces an unrecoverable outcome: regulatory action that shuts down operations, permanent loss of data the business cannot reconstruct, financial loss that exceeds survival capacity, or compromise that permanently destroys customer trust. High-recoverable means costly but survivable: revenue loss measured in days or weeks, absorbable fines, attrition that can be addressed over time. Low-impact means contained, readily restored, minimal business effect. When uncertain between existential and high-recoverable, classify as existential. The cost of over-protecting something less critical is lower than the cost of under-protecting something that turns out to be more critical.
To build the map, run sessions. Thirty to forty-five minutes per system. Security facilitates; the technical owner and business owner do the talking. For each system, work through the same questions in the same order: what does it do in one sentence? What does it connect to and what depends on it? What data does it hold? What happens if it is unavailable for an hour, a day, a week? What happens if the data is corrupted rather than just the system being down? What happens if the data is exfiltrated silently with no disruption to the service? And finally: is there any realistic scenario where compromise of this system produces an outcome the organization does not recover from? Ask that last question last. People give more honest answers to the existential question after they have described specific consequences than when it is asked cold at the start. Owners consistently underestimate the consequence of their own systems failing. Counter it by grounding the question in what has already been said in the session: "You told me an attacker with admin credentials could reach your billing system and your customer database from here. What does that look like on day three?" Do not add likelihood scores. The map uses consequence only. Likelihood estimates are too easy to adjust downward and too hard to defend.
The inventory and the consequence map are not sequential steps. A rough inventory gets you into the first mapping sessions. Those sessions surface gaps in the inventory. The consequence map tells you which gaps to close first. The top of the existential list tells you where to go deep: full trust relationship mapping, complete data flow documentation, tested recovery procedures. You do not do that work for every system. You do it depth-first, starting where the consequences are worst.
Once you have the consequence map, you have a prioritization order. Everything that follows runs against that list, highest-consequence first.
Start by revoking everything that cannot justify its presence. Go through every standing permission, long-lived credential, and service account grant to systems at the top of your consequence map. Every access grant needs a documented current business need. Anything that cannot be justified gets revoked. Permissions accumulate silently through normal operations: integrations added without being mapped, accounts created for a project that ended, credentials shared across teams that have since changed. You cannot know what exposure you are carrying until you enumerate what access actually exists, compare it against documented business need, and close the gap. This single step does more to reduce susceptibility than most tool purchases. The cost of removing something legitimate by mistake is recoverable. The cost of leaving unreviewed access on your most critical systems is not bounded in the same way.
Structural hardening follows. No standing access to consequential systems: access is granted for specific tasks, scoped to minimum necessary permissions, with a defined expiration. Separation of duties for any irreversible action, meaning deleting production data, modifying access controls, deploying to production, or moving significant funds, requires two humans or a mandatory review step. The security reason is not compliance: requiring two parties for a destructive action slows attacker progression and creates a detection window that would not otherwise exist. Every consequential system should be segmented such that full compromise of a neighboring system does not automatically yield access to it. The architecture question to ask about every high-consequence system: if everything with a trust relationship to this system is fully owned by an attacker, what can they do? If the answer is "everything the system can do", the segmentation needs to change.
Hardening also starts with removal. Every security and infrastructure tool in the environment needs to answer the same three questions. If a tool cannot demonstrate that it reduces susceptibility, limits damage, or speeds recovery for the systems that matter, it is attack surface. Security tooling has been a primary attack vector in supply chain compromises. The burden of proof for keeping something in the environment is that it demonstrably moves one of those three numbers for what actually matters.
Detection is built around the attack paths that lead to the systems at the top of the map. The standard is not whether a SIEM is deployed or whether EDR coverage numbers look good. The standard is whether you know when your consequential systems are being attacked before the attacker reaches the objective. Build behavioral baselines on the systems that matter: which identities access them, at what times, performing what operations. First-time access from any identity, access at unusual hours, access from unusual locations, unusual volume of data access, any changes to access controls or audit logging configuration on high-consequence systems — these are the anomalies worth alerting on immediately.
This is where deception technology pays its highest return. Honeytokens and canary credentials placed where legitimate users would never look produce near-zero false positives and require almost no maintenance after deployment. A canary credential in an old backup directory fires only when someone is actively exploring. A fake API key in a configuration file that is no longer referenced by any active system fires only when someone has access they should not. A honeytoken in a build artifact fires only when someone with source code access is looking for credentials to use. When any of these activate, it is a priority investigation regardless of other workload. Deploy them at multiple layers of the environment: source repositories, internal documentation no longer actively referenced, decommissioned service account configurations, historical build artifacts. The signal they produce is among the most reliable in security operations, precisely because the only way to reach them is to be doing something an attacker does.
Detection of lateral movement requires instrumenting the paths between systems, not only the endpoints. If an attacker pivots from a compromised workstation to an internal service to a database, you want to detect the movement at each stage, not only at arrival. Alert volume discipline is part of detection engineering, not an operational afterthought. An alert that no one reads is noise with a logging cost. If the team has normalized skipping or ignoring alerts because volume is too high, prune alert rules until every alert gets investigated. Signature-based detection requires knowing what an attack looks like in advance. Behavioral anomaly detection surfaces deviations from normal regardless of the specific technique used. The investment should prioritize behavioral baselines and deviation alerting over expanding signature coverage.
Recovery is the most commonly fictional part of a security program. Backup systems exist but are untested. Procedures are documented but have never run under pressure. The standard is not whether a recovery procedure exists. The standard is: for each consequential system, how long does it actually take to detect compromise, contain it, and restore to a known-good state? Not from a runbook estimate. From evidence.
Test this now. Pick your top-consequence system. Assume it is compromised right now. Time how long it takes to detect, contain, and restore. What almost always surfaces is not a technical gap: it is unclear ownership of the response, slow escalation because nobody agreed on who makes the call, and communication failures that compound every other problem under real time pressure. These are what determine actual recovery time, not the quality of the runbook.
Kill switches are a concrete architectural requirement. For every consequential system, you need to be able to revoke all access within minutes. The implementation requires knowing every identity with access to the system and having tested revocation procedures for each one. The test is not reading the documentation. It is revoking access and measuring how long the process takes end to end. If revoking a compromised service account requires coordinating multiple teams and manual steps taking hours, that duration is the real blast radius window.
Restoring tests run on a quarterly schedule for consequential systems. Full restoration to a test environment, timed and documented, with every gap that surfaces noted. The gaps that appear in a scheduled quarterly test can be fixed before they matter. The gaps that appear during an actual incident under time pressure produce different outcomes. Incident response exercises should use the actual attack paths identified during inventory work, not generic scenarios. Run them with the actual people who would respond. The goal is finding coordination failures, escalation delays, and ownership confusion before an attacker does.
None of this has a finish line. Security degrades the moment a system goes live. Permissions accumulate. Integrations get added without being mapped. Credentials go unreviewed. Alert volumes grow until teams start ignoring them. The consequence map goes stale. This is not a failure of vigilance. It is the natural behavior of any live environment under continuous change pressure. The operational response is a predictable cadence: quarterly entitlement reviews against current business need, quarterly restoration tests on at least one consequential system, and an annual red team exercise scoped to the actual attack paths against the systems at the top of the map. Continuously, every proposed new tool, integration, or access grant gets evaluated against the three questions before it is approved. Anything that cannot justify its contribution does not get added.
A practical monthly check: pick one consequential system and walk through its current access list, its current network paths, and its current detection coverage. Ask what has changed since you last looked. Something always has. The discipline is catching drift while it is still small.
For programs that have the fundamentals in place, a working consequence map, a clean identity inventory, hardened crown jewels with tested recovery capability and a functioning deception layer, the ceiling for the program's effectiveness rises significantly when you add an active disruption capability. The brutalist program limits blast radius and makes recovery deterministic. An offensive operational capability raises the cost of reaching your hardened systems before damage occurs. They answer different questions across the attack timeline and have almost no functional overlap, which is why they combine cleanly.
This is the role that Security Unconventional Warfare fills when deployed against a working brutalist foundation. Small specialist cells operating continuously with an offensive mindset: deception technology, active threat hunting, adversary war-gaming. Active disruption operates during reconnaissance and initial access. The brutalist program takes over at containment and recovery. The deception infrastructure a SUW cell operates is the advanced detection layer: behavioral hunting against the specific attack paths that lead to your consequential systems, adversary intelligence about who is probing and how. The continuous war-gaming it generates maps directly to the chaos engineering that brutalism requires and that most programs never consistently operationalize. Every assumption about detection and response times gets tested against actual adversary behavior rather than hypothetical scenarios.
One gap worth naming: active disruption raises the cost of attacking through the front. It does nothing about the back door that is already open. Most consequential breaches do not defeat perimeter controls or bypass deception infrastructure. They use credentials that were already stolen, supply chain access that was already trusted, or SaaS integrations with standing permissions that were never revoked. An attacker who buys access from an initial access broker does not need to touch deception infrastructure. They are already inside the trust boundary. This is why the brutalist foundation comes first. Deception technology deployed in a noisy, undocumented environment loses most of its signal value. Canary credentials buried in shadow IT noise and stale service accounts are indistinguishable from canary credentials that fire because an attacker found them. The clean baseline that comes from the inventory work, revocation, and hardening is what makes the deception layer sharp.
The sequence matters. Build the consequence map. Harden the crown jewels. Establish tested recovery capability. Revoke everything that cannot justify its presence. Then deploy active disruption and deception against the specific attack paths that map to the top of your consequence list. Connect the threat intelligence those operations generate back into your restoration tests and red team exercises. Keep the programs in their respective lanes: active disruption owns deception and hunting; the brutalist program owns inventory, hardening, and recovery. When they stay separated, each is more effective than it would be alone.
If capacity is limited, start with five systems: authentication, customer data storage, payment processing if applicable, the production deployment pipeline, and source code. Get the inventory accurate for those five. Map the consequences. Understand the attack paths, the blast radius, and the current recovery capability for each one. Test recovery on the highest-consequence system. Deploy honeytokens on all five. Revoke every access grant touching any of them that cannot be justified today. An honest program built around five systems beats a fictional program built around fifty. The fictional one produces the appearance of coverage while leaving the things that matter exposed. The honest one tells you exactly where you are vulnerable and how bad it gets when those vulnerabilities are used.
Security programs that survive are the ones where someone can answer, for any consequential system: how does an attacker reach it, what can they do when they get there, and how long does it take to detect and recover? Those three questions, answered with evidence and tested under realistic conditions, are what the program is for.