Trixter: A Chaos Proxy for Simulating Network Faults

Posted: Oct 2025

Github: https://github.com/brk0v/trixter

Contents

Break your network before production does

Chaos Engineering and Network Fault Injection #

Chaos engineering is the practice of deliberately injecting controlled failures into systems to observe their behavior and improve their resilience. One common chaos experiment is network-level fault injection, which involves artificial inserting network problems like latency, packet loss, or corruption to test how services cope. This is valuable because many modern systems are distributed (at least have clients and servers); introducing network slowness or errors can reveal hidden bugs, timeout issues, or insufficient retry logic in microservices. By simulating unreliable networks in a controlled way, SREs can ensure their applications are resilient against real-world network chaos (like spikes in latency or occasional outages).

Introducing Trixter – A Chaos Monkey for TCP #

Trixter: A Chaos Proxy for Simulating Network Faults

Trixter is a high-performance chaos proxy designed for injecting network faults at the TCP layer. In essence, it’s a TCP proxy that sits between a client and server, forwarding traffic but intentionally sabotaging it according to your specifications.

Trixter is runtime-tunable too, so you can adjust fault parameters on the fly per connection without restarting the proxy.

How it works: You run Trixter as a proxy on a specified listening port, pointing it to an upstream service. Under normal conditions, it simply relays traffic. But you can configure Trixter to inject various network failures in transit. For example, add artificial delay to each packet, throttle the bandwidth, drop a connection with some probability, corrupt a percentage of packets by injecting a trash, or even terminate connections to simulate outages by some timeout or by calling a REST API call.

Note

Think of Trixter as a minimalist, blazing-fast layer written in Rust using the async Tokio framework for efficiency.

Internally, it leverages the tokio-netem library (netem = network emulator), which provides pluggable async I/O adaptors to simulate latency, bandwidth limits, packet slicing, connection termination, and data corruption at the stream level.

This user-space approach means no kernel modules or root access is needed – the chaos is applied in the application layer (within the proxy) rather than the OS network stack.

What makes Trixter interesting is its combination of performance, portability, and simplicity. Being written in Rust, it’s memory-safe and built for speed. Configuration is straightforward: Trixter uses simple cli arguments and REST JSON API for per connection runtime changes, making it easy to emulate chaos scenarios.

The binary size is also quite small (3.3M), making it easy to bootstrap in every test suite run:

$ ls -lh ./target/release/trixter
-rwxrwxr-x 2 user user 3.3M Oct 10 21:39 ./target/release/trixter

If you want to jump straight to the examples and skip the comparison to Linux kernel capabilities.

Why Trixter vs GNU/Linux `tc` `netem` (Kernel Network Emulator) #

If you’ve ever simulated network faults on GNU/Linux, you might know tc netem, the traffic control tool’s network emulator. netem is powerful – it can impose delay, packet loss, duplication, reordering, and bandwidth limits at the kernel level. However, using tc netem in practice has some drawbacks for developers and SREs:

Usability: netem must be configured via the tc command syntax, which can be arcane. For example, to add 100ms latency and 5% packet loss on interface eth0, you’d run something like: tc qdisc add dev eth0 root netem delay 100ms loss 5%. It’s a manual process, and applying it to specific traffic (say, only one service or container) often requires setting up traffic control filters or isolation networks. Trixter, on the other hand, is as easy as running a proxy and setting cli arguments with the faults. You point your service or client to the proxy address, and only that traffic gets the chaos. No special networking setup needed.
Root Privileges: Configuring tc netem requires root (CAP_NET_ADMIN) privileges on the host. This is fine in a local VM but problematic in environments where you can’t easily get root (Kubernetes pods, developer laptops on macOS/Windows, CI pipelines, etc.). Trixter requires no root – it runs in userland. Anyone can use it in a dev environment or CI job, and it can even be packaged as a container sidecar to inject faults for a specific application.
Portability: netem is Linux-only (built into the kernel). If you’re developing on macOS or running tests on Windows, tc netem isn’t natively available. Trixter’s user-space proxy approach works across platforms, since it’s just a Rust binary. This makes it accessible to a wider range of use cases and teams.
Dynamic Control: While you can adjust netem parameters by running more tc commands, it’s not designed for frequent on-the-fly changes (and each change affects the whole interface traffic globally). There are ways, but they are not ergonomic and usually requires some iptables configuration. Trixter is designed to be dynamically tunable. It provides an optional API endpoint you can run (e.g. REST JSON API) to tweak fault settings at runtime. In chaos experiments, being able to script fault injection (start with 0ms latency, then gradually increase to 500ms, etc.) is very useful – Trixter was built with this in mind.
Scope and Limitations: netem operates at the packet level and can affect any protocol (TCP, UDP, ICMP, etc.) on the chosen interface. Trixter focuses on TCP streams (it’s a TCP proxy), which covers common cases like HTTP calls, gRPC, database connections, etc. The benefit is that Trixter targets exactly the connection you care about, rather than messing with the entire network stack of a host.

In short, Trixter trades some of the breadth of kernel-level simulation for ease-of-use and flexibility. It’s an ideal choice for testing how your service handles network blips without the complexity of system-wide tools.

Using Trixter: Examples of Injecting Chaos #

Using Trixter is straightforward. First, you’ll run the proxy pointing to a real service. Then you define what network conditions to impose. Below are a couple of examples.

For your test and play you have two main options:

Setup global failures with cli arguments;
and/or control failures per connection with REST JSON API.

Example 1: Adding latency and packet loss #

Suppose you have a service running on localhost:3000 that you want to test with high latency and some connection loss. You could run Trixter with a config like this:

$ docker run --network host -it --rm ghcr.io/brk0v/trixter \
 --listen 0.0.0.0:8080 \
 --upstream 127.0.0.1:3000 \
 --api 127.0.0.1:8888 \
 --delay-ms 1000 \
 --terminate-probability-rate 0.001 \
 --connection-duration-ms 5000

Here the new port to connect is 8080, every read and write will be delayed by 1 second, any I/0 on 0.1% probability fails, and every connection is terminated with TCP RST packet after 5 seconds.

This could be used, for example, to see how your app’s retry logic handles a few lost responses or how a web UI behaves on a slow connection (does a loading spinner show up, do requests time out gracefully, etc.).

Example 2: Throttling bandwidth #

Trixter can also emulate bandwidth constraints. Let’s say you want to simulate a slow network (congested mobile network). You can configure a throttle in bytes per second. For instance:

$ docker run --network host -it --rm ghcr.io/brk0v/trixter \
 --listen 0.0.0.0:8080 \
 --upstream 127.0.0.1:3000 \
 --api 127.0.0.1:8888 \
 --throttle-rate-bytes 1048576

This will make Trixter buffer and drip-feed data to achieve roughly 1 MB/s, regardless of how fast the real upstream is. It’s a great way to test video streaming or large file downloads under limited network conditions. Your service might start queuing or compressing data differently once this bottleneck is introduced. This is a excellent way to test your application under network backpressure conditions.

Example 3: Running in CI/CD with chaos #

A powerful way to integrate Trixter into your pipeline is to run it automatically during integration or E2E tests with periodic connection drops and random read/write bytes injections.

This lets you uncover non-deterministic failure patterns – but still reproduce them later.

For example, create an integration test with something like:

$ docker run --network host -it --rm ghcr.io/brk0v/trixter \
 --listen 0.0.0.0:8080 \
 --upstream 127.0.0.1:3000 \
 --api 127.0.0.1:8888 \
 --terminate-probability-rate 0.001 \
 --corrupt-probability-rate 0.001

Each run it would pick a random seed for chaos parameters. If a test fails, you can open stdout.log and look for a line like:

2025-10-10T20:38:43.925064Z  INFO trixter: random seed: 10382352052268666911

Then, reproduce the exact same chaos locally:

$ docker run --network host -it --rm ghcr.io/brk0v/trixter \
 --listen 0.0.0.0:8080 \
 --upstream 127.0.0.1:3000 \
 --api 127.0.0.1:8888 \
 --terminate-probability-rate 0.001 \
 --corrupt-probability-rate 0.01 \
 --random-seed 10382352052268666911

This pattern makes chaos deterministic, reproducible, and CI-friendly – like property-based testing for your network.

Example 4: Control failures on per connection in runtime #

Spin up the Trixter proxy without default failures:

$ docker run --network host -it --rm ghcr.io/brk0v/trixter \
 --listen 0.0.0.0:8080 \
 --upstream 127.0.0.1:3000 \
 --api 127.0.0.1:8888

Make a connection with your client/application/service, and discover it with REST JSON API:

$ curl -s http://127.0.0.1:8888/connections | jq
[
  {
    "conn_info": {
      "id": "1J8UO7eCuqUMqdoP5KmvN",
      "downstream": "127.0.0.1:45528",
      "upstream": "127.0.0.1:3000"
    },
    "delay": {
      "secs": 0,
      "nanos": 0
    },
    "throttle_rate": 0,
    "slice_size": 0,
    "terminate_probability_rate": 0.0,
    "corrupt_probability_rate": 0.0
  }
]

Store the connection ID in variable to future usage:

$ ID=$(curl -s http://127.0.0.1:8888/connections | jq -r '.[0].conn_info.id')

Now we can easily inject failures – for instance, add a 1 second latency:

$ curl -i -X PATCH \
  http://127.0.0.1:8888/connections/$ID/delay \
  -H 'Content-Type: application/json' \
  -d '{"delay_ms":1000}'

And check that it applied:

$ curl -s http://127.0.0.1:8888/connections | jq
[
  {
    "conn_info": {
      "id": "1J8UO7eCuqUMqdoP5KmvN",
      "downstream": "127.0.0.1:45528",
      "upstream": "127.0.0.1:3000"
    },
    "delay": {
      "secs": 1, # <-------------------------- Changed
      "nanos": 0
    },
    "throttle_rate": 0,
    "slice_size": 0,
    "terminate_probability_rate": 0.0,
    "corrupt_probability_rate": 0.0
  }
]

This way, you can build a flexible test setup or gradually introduce latency, throttling, and other network conditions.

Documentation, recipes, reference #

For more examples, API reference for controlling per connection settings in run time and recipes please go to https://github.com/brk0v/trixter.

Summary #

Trixter is a lightweight, blazing-fast chaos proxy for SREs and developers.

It bridges the gap between kernel tools like tc netem and high-level testing frameworks, letting you inject network chaos safely and precisely.

Run it locally, use it in CI with random seeds, and reproduce failures with one command – all without root privileges.

If you care about resilience and performance under adverse conditions – or just want to break things the smart way – give Trixter a try.