Dimension scores are derived from public data and fields; weighted into the composite. Reference only.
Chaotic Monkey is an Infrastructure Resilience Platform positioned around continuously running chaos experiments on production infrastructure. It emphasizes that “Resilience is a practice, not a prayer,” meaning it aims to turn resilience work from occasional manual Game Days into an automated, scheduled, and controlled engineering practice. The page claims 2M+ experiments, a 99.99% safe rollback rate, and adoption by 340 teams, but does not provide further supporting sources.
Its core features include automated Game Days, which can be scheduled weekly or monthly, with the platform selecting experiments, executing them, and generating reports. Smart Blast Radius uses dependency graphs for AI-assisted target selection, starting with a small scope and expanding the impact area as confidence increases. SLO-Aware Scheduling pauses experiments when the SLO burn rate exceeds a threshold, helping avoid additional risk when the error budget is already tight. The platform also provides a Resilience Score to track changes in system resilience with a single metric, and Team Insights to show experiment frequency, discovered failure modes, and MTTR improvements by team. The onboarding flow involves installing an agent or using a Kubernetes operator, after which it automatically discovers service dependencies, runs targeted fault injection, and drives remediation.
The captured text does not disclose the pricing model, plans, trial options, payment methods, or enterprise support terms. On integrations, it only explicitly mentions an agent, Kubernetes operator, dependency graphs, and SLO/error budget-related capabilities. It does not specify which monitoring systems, cloud platforms, CI/CD tools, alerting tools, or identity and access management systems are supported. There is also no public information about APIs/SDKs, auditing, security compliance, or self-hosting options.
The main advantage is that the product concept aligns well with mature SRE practices: automated experiments, limited blast radius, SLO-aware pauses, automatic rollback, and team-level metrics, all of which can reduce the burden of manual drills. The downside is that the public information is fairly marketing-oriented and lacks technical documentation depth. Chaos experiments in production are inherently risky, so adoption should be cautious if permissions, rollback, observability, and approval mechanisms are not clearly defined. It is better suited to mid-sized and large teams that already have Kubernetes, microservices, SLOs, and DevOps/SRE processes in place, rather than small teams whose reliability practices are still immature.
Access from mainland China, payment availability, and local compliance status are not reflected in the text, so they are assessed as unknown. If access or procurement is restricted, open-source alternatives such as Chaos Mesh and LitmusChaos, or cloud-provider fault injection services, may be worth evaluating.
⚠ This review is compiled from public sources and does not constitute a purchase recommendation. Verify all facts on the vendor's official site. Verify on chaotic-monkey.com official site.
chaotic-monkey.com is an Unknown Dev Tools provider. TG4G tracks its product information, an overall rating of 7.0/10, and a China-accessibility score of Workable. Click "Visit Official Site" to reach chaotic-monkey.com directly.