Testground Versions Save

πŸ§ͺ A platform for testing, benchmarking, and simulating distributed and p2p systems at scale.

v0.6.0

1 year ago

Introduction

As an outcome of the hardwork done in 2022 with revitalising Testground, we can run large scale tests in k8s with a cluster that auto-scales up and down once idle. Please read more about achievements in this blog post: https://blog.ipfs.tech/testground-highlights-in-2022/

In order to use cluster:k8s with EKS, please install the latest release of testground/infra

What's Changed

New Contributors

Full Changelog: https://github.com/testground/testground/compare/v0.5.3...v0.6.0

v0.5.3

3 years ago

πŸ“‘ Traffic Routing Policies

We now support traffic routing policies, i.e., you can decide whether or not your tests can have access to the external Internet. This feature is available on the local:docker and cluster:k8s runners and it's defined via a new option on the network.Config struct of our Go SDK:

cfg := &network.Config{
     Network: "default",
     RoutingPolicy: network.AllowAll // or network.DenyAll
}

By default, DenyAll is applied.

πŸ‘·β€β™€οΈ Builders

docker:go suffered a few improvements since last version:

  • You can now configure the base image to build from (#1081).
  • We added an option to skip the runtime image (#1078).
  • The builder is now terminatable via testground terminate --builder docker:go (#1079).
  • It is now possible to specify cross-repository replacements for dependencies (#1099).
  • We now support Go dependency caching in order to speed up builds (#1115).

For all builders, you can now ship additional files with the test plan source code (#1085), using a extra_sources map, consisting of builder => []paths.

πŸ” Basic Authentication

The daemon now supports basic authentication through the tokens array in the configuration file. The client should, in turn, provide a single token on its configuration to connect to the daemon (#1113).

πŸ’§ Trickle default configs to all groups

Compositions can now specify [build] and [run] subtables in the [global] table. Build and run settings in groups will inherit the default configuration (1aaab).

🐞 Fixes and small improvements

  • Use gzip on all runners when collecting outputs.
  • Fixed misleading default values on some commands, such as testground run single help (#1072).
  • Fixed healthcheck deadlock on cluster:k8s (#1074).
  • Increase build timeout (#1075).
  • User can now defined custom exposed ports (#1093).
  • Port numbers are now injected as env variables (#1095).
  • Detect Redis port collisions (#1108).
  • Copy sources for each unique build to avoid sharing directories (#1117).

πŸ“¦ Upgrading?

A few breaking changes were made since the last release. When upgrading please take this into consideration:

  • You should update your .env.toml client.endpoint to include a scheme - for example http (#1118). See sample .env.toml for details.
  • By default, DenyAll is applied for the Traffic Routing Policies. On previous versions, AllowAll was applied to cluster:k8s and DenyAll to local:docker (#1082, #1084, #1107 #1119).

v0.5.2

3 years ago

This is a smaller release, but there are a few bug fixes which are important.

Bugfixes

  • Build on macOS works again.
  • Improved error reporting about configuration errors.

Improved testing

  • Sidecar runs with pprof enabled by default.

Laying the groundwork for TAAS

  • Testground users will not notice yet, but Testground-as-a-service is coming. This release includes some groundwork that will come in handy for asynchronous task-based execution, coming soon in a future release.

Upgrading from v0.5.1

Note that a few configuration parameters have been changed. Consult .env.toml and update your environment configuration if necessary.

v0.5.1

3 years ago

Testground v0.5.1 is a maintenance release, but also includes a few new features.

🌐 Network partitions

Testground now includes functionality for introducing network partitions. Users can now include network rules when configuring networks with the sidecar, specifying if they want to allow, drop or reject certain packets.

πŸ§ͺ Integration tests

We improved the test coverage of Testground and now run a lot more tests on every PR and commit to the master branch, such as integration tests for all our runners

πŸ‘· Builders

docker:go builder extensions - we've included a few simple hooks to allow users to modify the default Dockerfile build steps, enabling functionality such as:

  • adding an extra binary to the final image.
  • adding static libraries that need to be present at runtime.
  • adding debugging tools.
  • copying input assets (e.g. test vectors).

docker:generic builder - users can now include their own Dockerfile along with their test plan.

Additional improvements

  • We use deterministic Docker image tags, which results in much faster test plan builds, runs and feedback loops when running a given test plan.
  • Users now see additional auxiliary outputs from the sidecar when running test plans with the local:docker runner. This enables faster debugging of more complex plans that take advantage of the network shaping functionality in Testground.

For a full list of every change in this release, click here.


If you are compiling Testground on macOS, use commit 277364c160882e6ac3c83106d17ce438dfdefdd9, rather than tag v0.5.1. Due to adding tests to the sidecar (which works only under linux), we forgot to add a build directive, which results into failed compilation on macOS for v0.5.1.


Thanks πŸ™Œ

Huge shout out to all contributors and users of Testground. Your help is invaluable!

v0.5.0

4 years ago

image

>>> Read the announcement blog post! <<<

The v0.5.0 release of Testground comes packed with a complete overhaul of the codebase and lots of features, and stability, hardening and usability improvements!

πŸŽ“ This release also marks our graduation from the warm and friendly IPFS community into a top-level GitHub organisation of our own. πŸš€

The ambitious mission of building bulletproof, unstoppable networks and systems requires rock-solid testing platforms. We hope Testground brings about a quantum leap in the way weβ€”as a communityβ€”engineer p2p and distributed systems for the decentralized future.

Read on to discover all the features, improvements, and changes that this release brings about!


πŸ‘―β€β™€οΈ SDK: The sync service is now turbocharged and nicer to use (v2)

  • The API has been revamped almost entirely for ease of use, removal of friction, and performance.
  • Subscriptions no longer create a blocking connection each.
  • Barriers now use MGET, and are governed by a single Ticker.
  • No more painful pointer/value confusion when subscribing and publishing. The Topic can use pointer or value types.
    • Similarly, Publish() and Subscribe() are intelligent enough to recognise and convert between pointer and value types of the underlying type.
  • Usage of Redis TTLs has been dropped. The previous implementation was unreliable, as it made assumptions about test participant lifetime and duration.
  • Lots of API sugar with Must* variants that panic on error (useful for usage with runner.Invoke[Map]), and composite operations (e.g. SignalAndWait, PublishSubscribe)

☎️ SDK: new network package to expose network operations such as traffic shaping

  • network.NewClient creates a network client, with which you can wait for the network to be initialized, apply traffic shaping, and obtain your data network IP easily.
  • Check out the full docs at pkg.go.dev.

πŸ“Š Observability pipeline rearchitected, migrated to InfluxDB, with tight SDK integration

  • The Prometheus + Pushgateway setup was brittle and unstable.
  • Test plans now emit:
    • Life-cycle events: facilitate real-time progress monitoring, either by a human, or by the forthcoming watchtower/supervisor service.
      • runenv.RecordStart(), runenv.RecordFailure(), runenv.RecordSuccess(), etc.
    • Diagnostics: track insights about the test instance execution itself in real-time, e.g. sync service stats, stages entry/exit, network API events, runtime stats (e.g. go runtime metrics).
    • Results: observations about the subsystems and components under test. Conceptually speaking, results are a part of the test output. Results are the end goal of running a test plan, and feed comparative series over runs of a test plan, along time, across dependency sets.
      • runenv.R()
      • Results are batch-uploaded to InfluxDB when the test concludes.
  • See architecture diagram below for a complete view of the circuitry.
  • Healthchecks now instantiate InfluxDB.

image

🧩 Daemon no longer knows about test plans

  • We brought sanity to the client/daemon separation. The daemon is now a building and execution engine with no knowledge of concrete test plans.
  • Test plans can now live anywhere (public/private Git repos, or filesystem); they do not need to be co-located with the testground source.
  • Plans are managed on the client side, and we’ve introduced the testground plan command group to streamline management. More on this below.

🏠 $TESTGROUND_HOME directory is the homebase of everything

  • It hosts plans, SDKs, and data and work directories. The layout is as follows:
$TESTGROUND_HOME
 |
 |__ plans              >>> [c] contains test plans, can be git checkouts, symlinks to local dirs, or the source itself
 |    |__ suite-a       >>> test plans can be grouped in suites (which in turn can be nested); this enables you to host many test plans in a single repo / directory.
 |    |    |__ plan-1   >>> source of a test plan identified by suite-a/plan-1 (relative to $TESTGROUND_HOME/plans) 
 |    |    |__ plan-2
 |    |__ plan-3        >>> source of a test plan identified by plan-3 (relative to $TESTGROUND_HOME/plans)
 |
 |__ sdks               >>> [c] hosts the test development SDKs that the client knows about, so they can be used with the --link-sdk option.
 |    |__ sdk-go
 |
 |__ data               >>> [d] data directory  
      |__ outputs
      |__ work
 
[c] = used client-side // [d] = used mostly daemon-side.
  • If not set explicitly, we default to $HOME/testground, and the daemon creates the skeleton on first start.

🐣 Dedicated GitHub organisation + repo refactor

πŸ“’ New Testground documentation website

  • Our documentation website is now live at https://docs.testground.ai/
  • We have added a lot of articles documenting many parts of Testground, and how to use it.

πŸ’Ž cluster:k8s runner and local:docker runner and infrastructure hardening

  • CPU limits on test plan pods have been removed due to throttling when using the cluster:k8s runner.
  • The control network has been blackholed - communicating between test plans over the control network is no longer possible.
  • Testground daemon is now deployed remotely on Kubernetes when using the cluster:k8s runner.
  • Kernel parameters have been adjusted to support test plan runs with up to 10k libp2p nodes.

πŸ—‚ New testground plan CLI command group + CLI cleanup

  • testground plan command group:
    • testground plan import --git --from &lt;git url>
    • testground plan create to quickly create a templated test plan. Templates are picked from https://github.com/testground/plan-templates.
    • testground plan rm removes a test plan from $TESTGROUND_HOME/plans.
    • testground plan list [--testcases] enumerates test plans and (optionally) test cases known to the client.
  • General CLI cleanup to reduce surprise factor and friction.

πŸš€ Testground CI has moved from Travis to CircleCI

  • Builds are much faster now, with quicker feedback cycles. Down from ~15 minutes to ~4 minutes!
  • CI builds now benefit from Docker layer caching.

Thanks πŸ™Œ

HUGE, HUGE thanks to everyone who made it possible to get Testground to this stage. Your continued code contributions, patience and debugging energy, organisational skills, technical writing abilities, comms kung-fu, etc. were all indispensable to hit this incredible milestone.

❀️ @raulk, @nonsense, @coryschwartz, @Robmat05, @hacdias, @daviddias, @momack2, @yusefnapora, @vyzo, @dirkmc, @aschmahmann, @Stebalien, @jimpick, @petar. ❀️

v0.4.0

4 years ago

Testground v0.4.0 is a bugfix and infrastructure hardening release.

Highlighted v0.4.0 changes/features

🌊 Testground daemon outputs stream

  • Testground now streams all output for the testground daemon to the testground client, making it easier to monitor ongoing testplan runs.

πŸ‘― Deduplication of testplan builds

  • Testground now runs only unique builds for a given Testground composition.

πŸ₯ Healthchecks for various runners (local:docker, cluster:k8s, etc.) have been improved

πŸ’ͺ Kubernetes cluster environment hardening

  • Testground dependencies, such as Redis and Prometheus, are now running on separate host machines than testplan instances, which provides better isolation and resource management.

  • Resource requirements and limits are now enforced for all services running on Testground infrastructure.

  • Kubernetes setup scripts have been improved and are now more robust.

  • Kernel parameters, such as Netfilter's connection tracking limits, and file descriptor limits, have been adjusted to support testplan runs with up to 10k libp2p nodes.


We have merged a lot of bug fixes and improvements to all Testground components. For a full list of every change, click here.


Thanks πŸ™Œ

Huge thanks to the early adopters and users of Testground: @yusefnapora, @vyzo, @dirkmc, @aschmahmann, @stebalien, @jimpick, @petar . Your help is invaluable in making Testground a great platform for distributed testing at scale.

Thanks to all the contributors that made this milestone happen: @coryschwartz, @raulk, @Robmat05, @nonsense, @yusefnapora, @vyzo, @dirkmc, @aschmahmann, @petar, @stebalien, @jimpick, @daviddias, @hacdias and others. ❀️

v0.3.0

4 years ago

Observability and robustness are the key themes for this Testground release! Note: we have upgraded to Go 1.14, so make sure you have too if you plan on building from source.

Highlighted v0.3.0 changes/features

πŸ‘ Automatic exposition of pprof and Prometheus endpoints in test instances

  • All test instances automatically expose their pprof and Prometheus endpoint.
  • Each instance prints out the URL of its HTTP instrumentation endpoint (pprof/Prometheus) on startup.
    • In local:exec, instances run as system processes, and they use host ports. We start with 6060, and fall back to port 0 (random) on bind failure. Browse to the printed out address directly.
    • In local:docker, instances bind to port 6060, which is exposed and published on a random port on the host. Run docker ps or docker port <container_id> to find which host port to access.
    • In cluster:k8s, instances bind to port 6060, and users are encouraged to use kubectl port-forward to set up a local endpoint that forward to the pod's port.

πŸ’ͺ Stability/maturity/evolutionary improvements

  • Add a set of baseline benchmarks to acquire accurate readings of the performance of infrastructural components.
  • Harden Redis and and the sync service on various fronts.
  • Refactor sidecar to make it more maintainable.
  • Rate-limit container start and deletion in the local:docker runner, to avoid hammering poor, old Docker and making it trip over.
  • Improve terminal output of Docker messages.
  • Made output paths consistent.

πŸ’Ύ cluster:k8s runner: switched from S3 to EFS

  • For better performance.
  • testground collect --runner cluster:k8s has been adapted to fetch outputs from EFS.
  • If you're running a cluster, make sure to destroy it and re-create it. See Kubernetes README for more instructions.

πŸ’Š Prometheus and pushgateway logic in healthchecks + fixes

  • The healthchecks in the local:docker and cluster:k8s runner now start and expose Prometheus and pushgateway automatically.
  • We expect to merge the analogous checks for the local:exec runner by the next release.

πŸ“Š Preview: Grafana dashboard manager

  • Read dashboards/README.md.
  • Ready-made Grafana dashboards for infrastructural components: Weave and Redis.

Everything else

Check everything else that went into into release v0.3.0 here: https://github.com/ipfs/testground/milestone/5?closed=1.

Thanks!

BIG THANK YOU to all the contributors that persevered while the outside world is taking an unprecedented, uncertain, bold turn to combat the pandemic that affects us all. We hope you found a little refuge in Testground to escape the mental loops, despair and boredom that self-isolation and other very necessary measures impose on our human condition. Keep it coming! We're moving boldly towards an epic v0.4.0 release!

❀️ @coryschwartz @nonsense @aschmahmann @Robmat05 @dirkmc @yusefnapora @petar @raulk @daviddias @hacdias ❀️

v0.2.0

4 years ago

This is another action-packed release from the Testground team. A solid step in our path towards creating a robust and delightful platform for testing distributed systems at all scales! Read on to learn what's new.

Highlighted v0.2.0 features

πŸ₯ FEATURE: Runner healthchecks + automatic fixing.

  • Testground now supports the testground healthcheck --runner <runner_id> [--fix] command, which automates the verification that all preconditions for the runner to operate properly are met.
  • This includes things like Redis processes/containers, sidecar containers, directories, etc.
  • The --fix switch will attempt automatic healing of healthchecks that fail.
  • testground run implicitly performs a healthcheck on the target runner, before scheduling the run.
  • For now, the local:exec and local:docker runners support healthchecks, with cluster:k8s joining the party soon.
  • Builders will also support healthchecks in the near future.

πŸ’£ FEATURE: Runner termination.

  • Testground now supports the testground terminate --runner <runner_id> command, which destroys the environment of a runner, including all started test jobs/containers, as well as precondition containers (Redis, sidecar, etc.)
  • In the future, this command will be more flexible, allowing the user to indicate the scope of the destruction (#611).
  • Supported runners: local:docker, cluster:k8s.

🧩 FEATURE: Manual build selectors.

  • Testground composition files can now specify selectors to be applied to each build. These translate to build tags in Go builds, and can be used to construct shims for funnel-shaped wildcard test plans, such that a single test plan can target a variety of upstream dependency versions with changing APIs.
  • Read more in the docs/EVOLVING_APIs.md design doc.
  • In the future, we will introduce automatic build selectors, in the manner described in the above doc.

πŸ“¦ FEATURE: --collect and --collect-into flags.

  • The testground run command now supports the --collect and --collect-into flags that automatically perform output collection (i.e. testground collect) into an archive with the run ID name (--collect), or a user-specified file (--collect-into).

πŸ“‘ FEATURE: Increased observability + Prometheus + pushgateway in cluster:k8s.

  • The Kubernetes run environment now bundles Prometheus and pushgateway support, for test plans and other infrastructural components to be able to push metrics proactively.
  • pprof enabled on sidecar.
  • More improvements coming in this area soon.

Fixes and improvements

We have merged TONS of bug fixes and improvements in the sidecar, sync service, Redis scaling, cluster:k8s runner, k8s networking setup, S3, etc.

  • IMPROVEMENTS (observability): measure when testground is ready and testplans are running (#560), sdk runtime metrics (#545), expose pprof port on sidecar (#558), use shared ssh key for kops hosts ; command to extract kube context (#551).
  • FIXES (sidecar): listen for more Docker events, not just stop (#580), free netlink handle (#575), sidecar errors: check for context canceled (#526), fix: remove a runenv hack in the sidecar (#503).
  • FIXES (sync service): increase redis max clients (#578), cancel all subscriptions when closing the watcher (#576), use a shared redis client (#574), test redis address resolution, resolve redis once, abort early when we get too many results, improve context abort error, don't panic on error if we're canceling anyways, add pprof port to pods (#553), adjust redis config (#595), avoid logging spurious errors when we shutdown the watcher, feat: use contexts for the sync service (#456), buffer barrier channel (#516).
  • FIXES (runtime): runenv: flush logger on close. (#518), feat: distinguish between runenv and run params.
  • FIXES (AWS): aws ecr repo must be unique across regions (#536), fix pushing image to remote (aws ecr) (#493), extract S3_ENDPOINT in var, so that we can change region for bucket (#541), fix some minor issues with the S3 collection logic (#533).
  • FIXES (k8s networking): wait for flannel initContainer (#563).
  • FIXES (exec:go builder): don't hide output from go commands.
  • FIXES (cluster:k8s runner): handle canceled context in cluster_k8s.go (#510), configurable pod resource requirements (#513), max allowed pods check in cluster_k8s.go (#509), extract outputs configs to toml configuration (#494).
  • FIXES (docker:go builder): docker builder volume (#591).
  • DOC: Proposal: dealing with upstream API evolution in test plans (#565).

v0.1.0

4 years ago

Highlighted v0.1.0 features

  • Ability to run test plans ranging from 1-1000 nodes locally (executables and Docker), and/or on a cluster (Kubernetes).
  • Ability to compile test plans into Docker images (compatible with the local:docker and cluster:k8s runners), and into executables (local:exec, for rapid prototyping and iteration).
  • Compiling test plans against specific upstream dependencies (e.g. kad-dht v0.3, or commit 1a2b3c).
  • Network traffic shaping: simulating latencies, bandwidths, and connectedness.
  • Redis-backed sync API to coordinate and choreograph distributed test workloads.
  • Compositions: executing test runs with groups of instances built against different upstreams (e.g. 500 instances against go-ipfs v0.4.23, 250 instances against master, 250 against commit 42a0b1, with params X=1, Y=2).
  • Storage of test outputs and metrics locally, and on S3.
  • With a single command, collect all outputs from all instances from a given run, no matter the runner (local:exec, local:docker, cluster:k8s), into a ZIP file for analysis.
  • One-click bootstrapping of your own experimental Kubernetes cluster on AWS.

Thanks πŸ™Œ

Thanks to all the contributors that made this huge milestone happen: @raulk, @nonsense, @daviddias, @hacdias, @stebalien, @dirkmc, @aschmahmann, @jimpick, as well as all the new folks who started helping out recently: @coryschwartz, @gmasgras, @yusefnapora, and others. ❀️