Practice and strategy · Module 8
System ilities
System ilities are the properties that decide whether you survive a bad day.
Previously
Privacy, ethics, and auditability
This section is the glue.
This module
System ilities
System ilities are the properties that decide whether you survive a bad day.
Next
Capstone professional practice
Pick one system you understand.
Progress
Mark this module complete when you can explain it without rereading every paragraph.
Why this matters
Reliability, recoverability, auditability, and the ability to change safely.
What you will be able to do
- 1 Explain attacker economics and how defenders change cost of attack
- 2 Run a simple failure analysis without blame
- 3 Turn lessons learned into a concrete system change
Before you begin
- You have read at least one incident or outage report
Common ways people get this wrong
- Optimising one quality only. A system can be secure and unusable, or fast and unsafe. Balance is the job.
- No owner for the system view. Without a system view, local optimisations create global risk.
System ilities are the properties that decide whether you survive a bad day. Reliability, recoverability, auditability, and the ability to change safely. They are easy to admire and hard to build.
Adversaries optimise for cost and probability of success, not elegance. Defence is the same. You rarely get perfect coverage, so you choose controls that change attacker economics and give you time to respond.
Failure analysis is how you stop repeating the same incident with new branding. Look for the broken assumption, the missing guardrail, and the point where a human had to make a call without enough information.
Mental model
Security is a quality attribute
Security competes with cost, speed, and usability. Good teams make the trade-offs explicit.
-
1
Quality attributes
-
2
Security
-
3
Performance
-
4
Usability
-
5
Cost
Assumptions to keep in mind
- Trade-offs are recorded. If you do not write down trade-offs, future changes break the system by accident.
- Constraints are real. Budget and latency are real constraints. Pretending otherwise produces fragile designs.
Failure modes to notice
- Optimising one quality only. A system can be secure and unusable, or fast and unsafe. Balance is the job.
- No owner for the system view. Without a system view, local optimisations create global risk.
Check yourself
Quick check. System ilities
0 of 4 opened
What do attackers typically optimise for
Low cost and high probability of success, not perfect technique.
What does it mean to change attacker economics
Make attacks more expensive, noisy, or slow so they are less likely to succeed.
Scenario. A cache served stale prices and caused customer harm. What question do you start with in failure analysis
Which assumption failed, such as cache invalidation or data freshness, and why the system was allowed to rely on it without detection.
What is one good post incident outcome
A concrete system change that prevents or detects the same class of failure next time.
Artefact and reflection
Artefact
A one page failure analysis note with owner and target date
Reflection
Where in your work would explain attacker economics and how defenders change cost of attack change a decision, and what evidence would make you trust that change?
Optional practice
Capture the failed assumption, evidence, and one concrete system change.