Simmy is a chaos-engineering and fault-injection tool, integrating with the Polly resilience project for .NET
Simmy is a chaos-engineering and fault-injection tool, integrating with the Polly resilience project for .NET. It is releasing April 2019 and works with Polly v7.0.0 onwards.
Simmy allows you to introduce a chaos-injection policy or policies at any location where you execute code through Polly.
There are a lot of questions when it comes to chaos-engineering and making sure that a system is actually ready to face the worst possible scenarios:
Using Polly helps me introduce resilience to my project, but I don't want to have to wait for expected or unexpected failures to test it out. My resilience could be wrongly implemented; testing the scenarios is not straight forward; and mocking failure of some dependencies (for example a cloud SaaS or PaaS service) is not always straightforward.
What do I need, to simulate chaotic scenarios in my production environment?
Simmy offers the following chaos-injection policies:
Policy | What does the policy do? |
---|---|
Exception | Injects exceptions in your system. |
Result | Substitute results to fake faults in your system. |
Latency | Injects latency into executions before the calls are made. |
Behavior | Allows you to inject any extra behaviour, before a call is placed. |
var chaosPolicy = MonkeyPolicy.InjectException(Action<InjectOutcomeOptions<Exception>>);
For example:
// Following example causes the policy to throw SocketException with a probability of 5% if enabled
var fault = new SocketException(errorCode: 10013);
var chaosPolicy = MonkeyPolicy.InjectException(with =>
with.Fault(fault)
.InjectionRate(0.05)
.Enabled()
);
var chaosPolicy = MonkeyPolicy.InjectResult(Action<InjectOutcomeOptions<TResult>>);
For example:
// Following example causes the policy to return a bad request HttpResponseMessage with a probability of 5% if enabled
var result = new HttpResponseMessage(HttpStatusCode.BadRequest);
var chaosPolicy = MonkeyPolicy.InjectResult<HttpResponseMessage>(with =>
with.Result(result)
.InjectionRate(0.05)
.Enabled()
);
var chaosPolicy = MonkeyPolicy.InjectLatency(Action<InjectLatencyOptions>);
For example:
// Following example causes policy to introduce an added latency of 5 seconds to a randomly-selected 10% of the calls.
var isEnabled = true;
var chaosPolicy = MonkeyPolicy.InjectLatency(with =>
with.Latency(TimeSpan.FromSeconds(5))
.InjectionRate(0.1)
.Enabled(isEnabled)
);
var chaosPolicy = MonkeyPolicy.InjectBehaviour(Action<InjectBehaviourOptions>);
For example:
// Following example causes policy to execute a method to restart a virtual machine; the probability that method will be executed is 1% if enabled
var chaosPolicy = MonkeyPolicy.InjectBehaviour(with =>
with.Behaviour(() => restartRedisVM())
.InjectionRate(0.01)
.EnabledWhen((ctx, ct) => isEnabled(ctx, ct))
);
All the parameters are expressed in a Fluent-builder syntax way.
Determines whether the policy is enabled or not.
PolicyOptions.Enabled();
PolicyOptions.Enabled(bool);
PolicyOptions.EnabledWhen(Func<Context, CancellationToken, bool>);
A decimal between 0 and 1 inclusive. The policy will inject the fault, randomly, that proportion of the time, eg: if 0.2, twenty percent of calls will be randomly affected; if 0.01, one percent of calls; if 1, all calls.
PolicyOptions.InjectionRate(Double);
PolicyOptions.InjectionRate(Func<Context, CancellationToken, Double>);
The fault to inject. The Fault
api has overloads to build the policy in a generic way: PolicyOptions.Fault<TResult>(...)
PolicyOptions.Fault(Exception);
PolicyOptions.Fault(Func<Context, CancellationToken, Exception>);
The result to inject.
PolicyOptions.Result<TResult>(TResult);
PolicyOptions.Result<TResult>(Func<Context, CancellationToken, TResult>);
The latency to inject.
PolicyOptions.Latency(TimeSpan);
PolicyOptions.Latency(Func<Context, CancellationToken, TimeSpan>);
The behaviour to inject.
PolicyOptions.Behaviour(Action);
PolicyOptions.Behaviour(Action<Context, CancellationToken>);
All parameters are available in a Func<Context, ...>
form. This allows you to control the chaos injected:
Context.OperationKey
and introducing chaos targeting particular tagged operationsThe example app demonstrates both these approaches in practice.
// Executes through the chaos policy directly
chaosPolicy.Execute(() => someMethod());
// Executes through the chaos policy using Context
chaosPolicy.Execute((ctx) => someMethod(), context);
// Wrap the chaos policy inside other Polly resilience policies, using PolicyWrap
var policyWrap = Policy
.Wrap(fallbackPolicy, timeoutPolicy, chaosLatencyPolicy);
policyWrap.Execute(() => someMethod())
// All policies are also available in async forms.
var chaosLatencyPolicy = MonkeyPolicy.InjectLatencyAsync(with =>
with.Latency(TimeSpan.FromSeconds(5))
.InjectionRate(0.1)
.Enabled()
);
var policyWrap = Policy
.WrapAsync(fallbackPolicy, timeoutPolicy, chaosLatencyPolicy);
var result = await policyWrap.ExecuteAsync(token => service.GetFoo(parametersBar, token), myCancellationToken);
// For general information on Polly policy syntax see: https://github.com/App-vNext/Polly
It is usual to place the Simmy policy innermost in a PolicyWrap. By placing the chaos policies innermost, they subvert the usual outbound call at the last minute, substituting their fault or adding extra latency. The existing Polly policies - further out in the PolicyWrap - still apply, so you can test how the Polly resilience you have configured handles the chaos/faults injected by Simmy.
Note: The above examples demonstrate how to execute through a Simmy policy directly, and how to include a Simmy policy in an individual PolicyWrap. If your policies are configured by .NET Core DI at StartUp, for example via HttpClientFactory, there are also patterns which can configure Simmy into your app as a whole, at StartUp. See the Simmy Sample App discussed below.
This Simmy sample app shows different approaches/patterns for how you can configure Simmy to introduce chaos policies in a project. Patterns demonstrated are:
StartUp
so that Simmy chaos policies are only introduced in builds for certain environments (for instance, Dev but not Prod).The patterns shown in the sample app are intended as starting points but are not mandatory. Simmy is very flexible, and we would love to hear how you use it!
All chaos policies (Monkey policies) are designed to inject behavior randomly (faults, latency or custom behavior), so a Monkey policy allows you to specify an injection rate between 0 and 1 (0-100%) thus, the higher is the injection rate the higher is the probability to inject them. Also it allows you to specify whether or not the random injection is enabled, that way you can release/hold (turn on/off) the monkeys regardless of injection rate you specify, it means, if you specify an injection rate of 100% but you tell to the policy that the random injection is disabled, it will do nothing.
See Issues for latest discussions on taking Simmy forward!
Simmy was the brainchild of @mebjas and @reisenberger. The major part of the implementation was by @vany0114 and @mebjas, with contributions also from @reisenberger of the Polly team.
Dylan Reisenberger presents an intentionally simple example .NET Core WebAPI app demonstrating how we can set up Simmy chaos policies for certain environments and without changing any existing configuration code injecting faults or chaos by modifying external configuration.
Geovanny Alzate Sandoval made a microservices based sample application to demonstrate how chaos engineering works with Simmy using chaos policies in a distributed system and how we can inject even a custom behavior given our needs or infrastructure, this time injecting custom behavior to generate chaos in our Service Fabric Cluster.
Bjørn Einar Bjartnes made a red-green load-testing resilience workshop to understand how errors and resiliency mechanisms affect a system under load. It has been used to run workshops at for example NDC Oslo and there is a video from the workshop at DotNext.