Cri Resource Manager Versions Save

Kubernetes Container Runtime Interface proxy service with hardware resource aware workload placement policies

v0.9.0

3 months ago

What's New since v0.8.4

New dynamic-pools policy

A new policy that allows applications to be assigned to dynamically resized CPU pools. The pools are non-overlapping and are resized based on the resource requests of the containers and the actual CPU utilization of the application.

Enhancements to balloons policy

New PreferSpreadOnPhysicalCores configuration option in the balloons policy. Enabling this option for a balloon makes the CPU allocator to prefer CPUs from separate physical CPU cores.

There are also bugfixes to dynamic configuration updates.

CRI v1alpha2 dropped

Support for deprecated CRI version v1alpha2 was removed. This means that CRI-RM v0.9 requires Kubernetes v1.23 (or later) and containerd v1.6 (or later) or CRI-O v1.20 (or later).

List of all PRs since v0.8.0

New Contributors

Full Changelog: https://github.com/intel/cri-resource-manager/compare/v0.8.0...v0.9.0

v0.8.4

7 months ago

This patch release updates dependencies and contains one small bug fix for the page demoter.

What's Changed

  • Bump golang to v1.19.11 by @marquiz in https://github.com/intel/cri-resource-manager/pull/1016
  • go.mod: update dependencies by @marquiz in https://github.com/intel/cri-resource-manager/pull/1015
  • [Bump golang to v1.20.7 by @marquiz in https://github.com/intel/cri-resource-manager/pull/1019
  • backports from master by @marquiz in https://github.com/intel/cri-resource-manager/pull/1048
    • Merge pull request #1048 from marquiz/release-0.8
    • demoter: fix sudden cri-resmgr process exit on page demotion
    • e2e: relax dynamic demotion first detection round requirement
    • e2e: update default distro from Ubuntu 20.04 to 22.04
    • e2e: fix distro=opensuse to support k8s 1.27+
    • e2e: fix topology-aware/n4c16/test09-container-exit test
    • e2e: restore vm as last step in the static-pools test suite
    • pkg/topology: sync go.mod with the main module
    • go.mod: update goresctrl to v0.5.0
    • resmgr: stop importing kubernetes/kubelet internals.
    • all: switch to k8s.io/utils/cpuset.
    • Use golang builtin multierror
    • chore: remove refs to deprecated io/ioutil
    • chore: remove refs to deprecated io/ioutil
    • docs/deps: update pygments to v2.15.1
    • docs: use ADD in the dockerfile to fetch go tarball
    • github: use path filter for publishing docs
    • github: split code and docs CI into separate workflows
    • github: add trivy license scanning
    • github: fix the usage of github environments
    • github: add creation of vendored dist into release workflow
    • github: add release job for publishing binary packages
    • github: add workflows for image building
    • github: refactor verify workflow
    • github: use pinpointed ubuntu version on the runners
    • github: split security scanning into re-usable jobs
    • github: refactor docs building
    • github: drop containerized build of docs from verify workflow
    • github: drop the turnstyle plugin
    • github: take golang version from go.mod
    • github: add security scanning
    • Makefile: drop unwanted update-workflows target
    • Makefile: remove -it arg from docker run
    • Makefile: isolate image-push from image target
    • Makefile: prepare binary packages as release assets
    • cross-build: get golang binaries instead of compiling from source
    • scripts: drop unused stuff from docker-build-image
  • backport e2e test fixes from master by @marquiz in https://github.com/intel/cri-resource-manager/pull/1070
    • e2e: relax topology-aware coldstart test
    • e2e: don't use pyexec in tests directly
    • e2e: support for verify --retry N
    • e2e: drop vm-force-restart()
    • e2e: ignore terminated processes on fetching allowed resources
    • e2e: fix "ambiguous allowed resources" caused by race
    • e2e: add --wait to kubectl delete of pods and namespaces
    • Makefile: simplify packaging tests
    • e2e: check default serviceaccount in cluster readiness check
    • e2e: use bridge cni plugin by default
    • e2e: use Fedora 38 for the fedora test target
    • e2e: use latest image for debian-10 tests
    • e2e: fix opensuse-15.4 image URL, support distro=opensuse-15.5

Full Changelog: https://github.com/intel/cri-resource-manager/compare/v0.8.3...v0.8.4

v0.8.3

11 months ago

This patch release fixes a few bugs and adds support for config-status based readiness probe to the agent.

What's Changed

Bug Fixes

  • backport #985: pkg/avx/collector: don't crash on no regexp match
  • backport #995: topology-aware: fix logging of cpuset changes
  • backport #975: rdt: stop trying to get container cgroup dir

Functional Changes

  • backport #986: agent: implement config status readiness probe

Full Changelog: https://github.com/intel/cri-resource-manager/compare/v0.8.1...v0.8.3

v0.8.1

1 year ago

This patch release fixes bugs in cpu priority detection and metrics reporting.

What's Changed

Full Changelog: https://github.com/intel/cri-resource-manager/compare/v0.8.0...v0.8.1

v0.8.0

1 year ago

Major improvements in this release:

  • The balloons policy becomes fully aware of hardware topology, able to utilize idle CPUs, and allows users to choose between packing or spreading workloads across CPUs.
  • CRI-RM becomes compatible with CRI v1 towards both kubelet and container runtimes. Backwards compatibility to CRI v1alpha2 is maintained in both directions, too.
  • Fixes a crash when pod status data is not available in the synchronization phase at cri-resmgr start-up.

What's Changed

Policies

CRI v1

Bug fixes

Build

Validation

Documentation

Misc

New Contributors

Full Changelog: https://github.com/intel/cri-resource-manager/compare/v0.7.0...v0.8.0rc1

v0.7.2

1 year ago

This point release fixes issues with the cri-resmgr-agent and cri-resmgr-webhook container image builds. No functional changes since v0.7.0.

What's Changed since v0.7.0

Full Changelog: https://github.com/intel/cri-resource-manager/compare/v0.7.0...v0.7.2

v0.7.0

1 year ago

This release introduces the new balloons policy. It also adds support for namespace-based allocation of reserved CPUs as well as pod- and namespace-based colocation of containers to the topology-aware policy. In addition to these, the release contains various other smaller functional improvements and a number of bug fixes. Dependencies are updated to more recent versions.

What's Changed

New Contributors

Full Changelog: https://github.com/intel/cri-resource-manager/compare/v0.6.0...v0.7.0

v0.6.1rc1

2 years ago

What's Changed

New Contributors

Full Changelog: https://github.com/intel/cri-resource-manager/compare/v0.6.0...v0.6.1rc1

v0.6.0

2 years ago

This release brings dependencies up to date with recent versions. It contains a small number of functional improvements and fixes, and a large number of fixes and other improvements to the end-to-end tests.

Major Changes

  • build:
    • update K8s dependencies to v1.22.2
    • bump golang version to v1.16
  • fixes and improvements:
    • container cgroup directory discovery fixes
    • RDT pod QoS class discovery fixes in discovery mode
    • agent configuration: authorize access to adjustments
    • clean up cgroup and group control abstraction
    • remove SST code and pull it in from goresctrl
  • end-to-end test framework
    • new distributions: sles, opensuse-tumbleweed, ubuntu-21.04
    • installing and debugging locally built CRI-O, containerd and runc
    • configurable CRI runtime pipe and Kubernetes version

Other improvements

  • testing, demos:
    • end-to-end tests: a large number of end-to-end test fixes and other test infra improvements
    • blockio demo: fix detecting already installed cri-resmgr
    • blockio demo: always drop caches before measuring blockio speed

List of Merged PRs

  • PR #731: e2e: more robust coldstart test
  • PR #730: 0.6.0 release preparation: always try to enable 'SystemdCgroup = true' for tests with containerd.
  • PR #728: 0.6.0 release preparation: use distinctive VM names for packaging tests.
  • PR #729: 0.6.0 release preparation: add support for testing with cross-built distro binaries.
  • PR #725: 0.6.0 release preparation: ubuntu-21.04 cross-build and tests.
  • PR #727: 0.6.0 release preparation: centos-7 test cluster bootstrapping fixes.
  • PR #726: 0.6.0 release preparation: use latest fedora image for cross-build.
  • PR #724: 0.6.0 release preparation: update sid image URL.
  • PR #721: e2e: add support for distro=ubuntu-21.04
  • PR #722: go.mod: update to K8s deps to v1.22.2
  • PR #720: Bump to golang v1.16
  • PR #719: distro: force non-interactive 'apt-get install'.
  • PR #717: Drop travis CI support
  • PR #711: Integrate with goresctrl
  • PR #715: github: run tests before golanci-lint
  • PR #714: control/rdt: fix discovery of pod qos classes in discovery mode
  • PR #713: e2e: support distro=opensuse-tumbleweed
  • PR #712: e2e: add vm-put-pkg, install a package from host to vm
  • PR #710: e2e: fix cloud-init error on distro=debian-sid
  • PR #706: e2e: make sure tests have 'pidof' installed on fedora.
  • PR #707: e2e: fix sysctl settings that break cilium CNI on Fedora
  • PR #703: e2e: support running tests with CRI-O and cri-resmgr in NRI mode
  • PR #696: e2e: wait for cloud-init to finish during VM bootstrap.
  • PR #701: e2e: fix opensuse cloud-init and handle wrong containerd
  • PR #699: e2e: follow HTTP redirects when fetching apt repo keys.
  • PR #698: e2e: fix (EOL'd) Ubuntu Groovy image URL.
  • PR #697: e2e: allow installing cri-o from distro repos.
  • PR #694: scripts: add CRI-O support to kube-cgroups
  • PR #656: e2e: add support for k8s=X.Y.Z to set Kubernetes version
  • PR #660: docs: fix pkg urls in quick-start instructions
  • PR #690: e2e: distro=sles uses official package repositories
  • PR #689: e2e: enable reinstalling pretty much everything on VMs
  • PR #688: e2e: add support for distro=sles
  • PR #657: e2e: add an init container test
  • PR #687: edited e2e-test.md
  • PR #654: scripts: kube-cgroups prints cgroup entries per pod/container
  • PR #685: e2e: improve isolcpus test robustness
  • PR #684: e2e: clean up vm after successful reserved-resources test run
  • PR #683: e2e: blockio test for k8scri=crio and k8scri=containerd
  • PR #682: e2e: support CRI-O, containerd, and containerd + cri-resmgr as NRI
  • PR #681: e2e: cri-resource-manager configuration is optional in test suites
  • PR #680: e2e: allow templating in test suite variable files
  • PR #679: e2e: add function for checking if local binary is out-of-date
  • PR #678: e2e: change e2e test framework title
  • PR #677: e2e: support annotations in common pod templates
  • PR #676: e2e: add vm functions for dlv debugging
  • PR #675: e2e: add vm-install-runc
  • PR #674: e2e: add vm-put-docker-image to script API
  • PR #673: e2e: enable running without govm if VM_IP is set
  • PR #672: e2e: fix (remove) empty names from allowed resources printing
  • PR #671: e2e: switch k8s install source in opensuse
  • PR #670: e2e: fix reinstalling containerd on opensuse
  • PR #669: e2e: distro install crio
  • PR #668: e2e: distro: enable running fedora with cgroups=v2
  • PR #667: e2e: fix error message after installing golang from tar
  • PR #666: e2e: always install git-core with golang
  • PR #665: e2e: run apt-get install -y with default answers to dpkg
  • PR #662: e2e: Fix govm installation documentation
  • PR #663: e2e: lib: Use proper locale for bc to work
  • PR #661: e2e: require host dependencies jq and pv
  • PR #651: Basic edits to docs
  • PR #649: e2e: add goresctrl debugging support to "run.sh debug"
  • PR #648: blockio demo: fix detecting already installed cri-resmgr
  • PR #647: blockio demo: always drop caches before measuring blockio speed
  • PR #646: cache: add a directory to findContainerDir search path
  • PR #643: docs: a bunch of grammatical and stylistic fixes by DougTW.
  • PR #644: e2e: add tests for topology-aware mixed CPU allocations
  • PR #645: e2e: test topology-aware allocations with kernel isolcpus set
  • PR #642: fixes: fixes for fedora 33
  • PR #639: cgroups: add cleaned up cgroup, group control abstraction.
  • PR #641: docs: update Pygments requirements
  • PR #638: e2e: fix agent installation
  • PR #637: cri-resmgr-agent: authorize access to adjustments.
  • PR #621: e2e: fuzz topology-aware

v0.5.0

3 years ago

This release brings general stability and correctness improvements. It merges the memory tiering policy to the original topology aware one, with a number of important fixes for resource accounting and assignment.

Major Changes

  • policies:

    • Add new podpools policy for pod-granularity workload placement
    • topology-aware: merge topology-aware and memory tiering policies
    • topology-aware: honor CPU reservation/reserved CPU set in configuration
    • topology-aware: unify syntax for per container and pod annotated preferences
  • RDT:

    • split out RDT manipulation code to a self-contained package, https://github.com/intel/goresctrl
    • implement operating modes (Disabled, Discovery, Full)
    • add option to disable RDT monitoring
    • support L2 cache allocation
  • CPU allocator (used by topology-aware and podpools policies):

    • detect CPU priority levels with Intel Speed Select Technology (SST)

Bug Fixes

  • policies:

    • topology-aware: several significant cpu and memory accounting fixes
    • topology-aware: fixes in gradually relaxed memory pinning for OOM-prevention
    • topology-aware: better handling of bounding and reserved resources
    • topology-aware: fix assignment of CPU-less memory zones
    • topology-aware: fix building sparse topology trees
  • RDT:

    • use root class as a fallback for missing classes
    • empty class implies root class
    • do forceful rdt (re-)configuration
  • resource-manager:

    • force full reallocation when switching policies
    • run post-update hooks after reconfiguration
    • save cache at startup
  • config:

    • handle composite structs in Module.validate()
  • cache:

    • (over)write cache file atomically
  • testing:

    • e2e: fix clearing cri-resmgr cache on uninstall
    • e2e: properly set VM_COMPOSE_YAML when reloading existing vm-configs
  • documentation:

    • fix static-pools debug logging instructions
    • sample-configs: sample configuration fixes

Other Improvements

  • policies:

    • topology-aware: more regular annotation interpretation for CPU allocation preferences
  • resource-manager:

    • dump extra data for message disambiguation
    • flush logs after every request/event processed
  • cache:

    • log name on pod/container removal
  • cri-resmgr:

    • increase allowed service journal log bursts
  • logging:

    • switch logger to use klog
  • testing:

    • e2e: add tests for memset expansion in topology-aware policy
    • e2e: add vm-put-docker-image to the vm library
    • e2e: allow user override for VM_SSH_USER over distro-ssh-user
    • e2e: generalize templating any file with instantiate()
    • e2e: properly set VM_COMPOSE_YAML when reloading existing vm-configs
    • e2e: set imagePullPolicy on every test pod
    • e2e: support namespaced kubectl create from templates
    • e2e: unified memory-type and cold-start annotation syntax
    • e2e: update dynamic page demotion tests
    • e2e: update podpools tests to pass with new cpuallocator
    • e2e: update tests on pinning reserved CPUs
    • benchmark: add memtier_benchmark for memcached/redis
  • documentation:

    • improve RDT documentation
    • fix static-pools debug logging instructions

List of Merged PRs

  • PR #528: build: include only cri-resmgr in binary dist tarballs
  • PR #529: docs: fix static-pools debug logging instructions
  • PR #530: memtier/c4pmem4/test03-coldstart: don't jump the gun
  • PR #536: .github: update issue template for new releases
  • PR #537: docs: minor fixes in html template customization
  • PR #538: docs: use 'release branch' as the current version in versions menu
  • PR #540: e2e: support namespaced kubectl create from templates
  • PR #541: e2e: fix clearing cri-resmgr cache on uninstall
  • PR #542: e2e: generalize templating any file with instantiate()
  • PR #543: memtier: implement reserved CPUs pool
  • PR #545: resource-manager: run post-update hooks after reconfiguration
  • PR #546: go.mod: update to Kubernetes v1.19.4
  • PR #547: scripts: helper for maintaining replace lines in go.mod
  • PR #549: benchmark: add memtier_benchmark for memcached/redis
  • PR #550: test/functional: prevent read/write data race in klog
  • PR #553: docs: quote text containing '<' and '>' using `` in affinity docs
  • PR #555: scripts/update-gh-pages: more intelligent http redirect
  • PR #556: e2e: allow user override for VM_SSH_USER over distro-ssh-user
  • PR #557: Improve CPU prioritization
  • PR #560: e2e: add vm-put-docker-image to the vm library
  • PR #561: memtier: rework building of pool tree by HW topology
  • PR #562: docs: improve rdt documentation
  • PR #563: memtier/pool test: fix fd leakage causing test panics with more data
  • PR #566: Kata container support
  • PR #567: config: handle composite structs in Module.validate()
  • PR #568: control/rdt: add option to disable rdt monitoring
  • PR #570: page-migrate: add cache-like container.GetPodID()
  • PR #571: config: fix typo in log message
  • PR #572: control/rdt: fix and simplify handling of implicit disabling
  • PR #573: control/rdt: empty class implies root class
  • PR #574: control/rdt: implement assignAll()
  • PR #575: control/rdt: do forceful rdt (re-)configuration
  • PR #576: control/rdt: correct usage of checkIdle() in configNotify()
  • PR #577: control/rdt: implement operating modes
  • PR #579: memtier: don't imply error by signature for functions that never fail
  • PR #580: docs: use an explicit version of recommonmark
  • PR #581: rdt: accept missing default classes in Discovery mode
  • PR #583: docs: refer to the latest release in the installation instructions
  • PR #586: rdt: use root class as a fallback to missing classes
  • PR #587: e2e: set imagePullPolicy on every test pod
  • PR #588: memtier: unify syntax for annotated preferences
  • PR #589: memtier: fix build error introduced by improper, unrebased merging of both #524 and #543
  • PR #590: memtier: more regular annotation interpretation for CPU allocation preferences
  • PR #591: fix: nil pointer dereference on updateSharedAllocations(nil)
  • PR #592: e2e: unified memory-type and cold-start annotation syntax
  • PR #594: policy/builtin/*: fix outdated comment about PolicyName
  • PR #595: docs: recognize/handle .md-links to element IDs
  • PR #596: server,resource-manager: flush logs after every request/event processed
  • PR #597: resource-manager: rename 'memtier' policy to 'topology-aware'
  • PR #598: podpools: policy for pod-granularity workload placement
  • PR #599: rdt: fix order of params passed to GetTasksInContainer()
  • PR #600: test: drop stale rdt testdata
  • PR #601: topology-aware: improved topology tree/node dump
  • PR #602: cpuallocator: add CPU priority levels
  • PR #604: e2e: properly set VM_COMPOSE_YAML when reloading existing vm-configs
  • PR #606: Extended detection of Intel Speed Selection Technology (SST)
  • PR #607: klog: skip headers for journald by default
  • PR #608: cri-resmgr: increase allowed service journal log bursts
  • PR #609: fixes: topology-aware policy cpu/memory accounting fixes
  • PR #610: resource-manager: force full reallocation when switching policies
  • PR #612: topology-aware: force reserved/kube-system containers to the root
  • PR #613: e2e: add tests for memset expansion in topology-aware policy
  • PR #614: resource-manager,dump: dump extra data for message disambiguation
  • PR #615: topology-aware: better and more readable logs
  • PR #616: topology-aware: memory accounting and memset expansion fixes
  • PR #617: resource-manager: catch containers earlier when they are gone
  • PR #618: e2e: update podpools tests to pass with new cpuallocator
  • PR #622: topology-aware: use normal as fallback for reserved
  • PR #623: e2e: update tests on pinning reserved CPUs
  • PR #624: topology-aware: use prettyMem() in log messages
  • PR #625: cache: (over)write cache file atomically
  • PR #626: resource-manager: save cache at startup
  • PR #627: cache: log name on pod/container removal
  • PR #628: rdt: support L2 cache allocation
  • PR #629: topology-aware: fix filtering out nodes with insufficient memory
  • PR #630: topology-aware: fix moving up memory grant
  • PR #631: pkg/sysfs: clarifying comment on getCPUMapping()
  • PR #632: e2e: update dynamic page demotion tests
  • PR #634: sample-configs: make cri-resmgr-configmap.example.yaml usable
  • PR #636: podpools: fix reflect JSON tag typo