Cri Resource Manager Versions Save

Kubernetes Container Runtime Interface proxy service with hardware resource aware workload placement policies

v0.4.1

3 years ago

The documentation in this release has been overhauled with significant structural improvements and additional content over previous ones. End-to-end test coverage has been vastly extended and the test framework significantly improved. This release contains a number of important bug fixes and a few other functional improvements. Here is a non-exhaustive list of these.

Bug fixes

  • agent:
    • refuse to start if NODE_NAME environment variable is not specified
  • memtier policy:
    • fix updating containers after shared pool changes
    • honor CPU isolation opt-out preference
    • honor allowed CPUs in resource discovery
    • fix PMEM-only NUMA node assignment for weird topologies
  • static-pools policy:
    • make dynamic (re-)configuration work properly
    • look for cmk isolate when parsing container command line
    • re-load legacy config on config update
    • only take pools configuration from legacy config
    • improved sanity check on pool configuration
    • fix node tainting
  • cri-resmgr:
    • fill in defaults for unspecified values in configuration

Other Improvements

  • cri-resmgr:
    • dump outbound requests if debugging is enabled for the 'cri/relay' source
  • resource controllers:
    • page-migrate: split out page-migration into a controller of its own
  • e2e test framework
    • vastly improved test coverage on multiple distros
  • builds:
    • build binary dist tarballs

Difference wrt. Rolling Master

With the exception of the PRs listed below, all others in the inclusive range #411 - #527 has been cherry-picked or back-ported from the rolling master branch to this release. The omitted PRs have been excluded due to backwards compatibility or other similar reasons:

  • #525: cri-resmgr: reuse 'rdt' logger for the split out rdt package
  • #490: rdt: use goresctrl 
  • #497: pkg/log: switch logger to use klog
  • #472: e2e: add tests for static-pools
  • #489: static-pools: slight refactoring and renaming
  • #483: static-pools: lazier node updates
  • #475: static-pools: drop all cmdline flags

v0.4.0

3 years ago

Major changes

  • 'topology-aware' policy superseded by 'memtier'
  • support for cold start of containers
  • support for dynamic demotion of memory
  • support for limiting container top tier/DRAM memory usage (require kernel support)
  • support for externally adjusting container resource assignments
  • multi-die aware resource allocation
  • binary distribution with packages for popular Linux distributions and images at Docker Hub

Detailed changelog

Policies

  • 'topology-aware' policy superseded by 'memtier', which
    • is a forked and improved version of 'topology-aware'
    • has the same basic functionality
    • has a number of improvements and extra functionality:
      • multi-die topology support
      • multi-tier (DRAM/PMEM) memory support
      • top tier/DRAM memory limiting
      • container 'cold start' support: force containers initially exclusively to PMEM
      • experimental dynamic page demotion: periodically move least-used pages from DRAM to PMEM
      • experimental support for dynamic external adjustments to container resource assignments
    • has a bunch of resource assignment/allocation fixes (which are not backported to 'topology-aware' any more)
    • will in the next release replace 'topology-aware' altogether
  • static-pools:
    • compatibility back-ports from CMK: advertise CPUs in 'shared', 'infra' pools via CMK_CPUS_SHARED, CMK_CPUS_INFRA environment variables
  • common:
    • support for new Pod annotation controls:
      • opt out from automatic topology hint generation:
        • topologyhints.cri-resource-manager.intel.com/pod: false
        • topologyhints.cri-resource-manager.intel.com/container.$name: false
      • set DRAM/top tier memory limit:
        • toptierlimit.cri-resource-manager.intel.com/pod: $limit
        • toptierlimit.cri-resource-manager.intel.com/container.$name: $limit
    • make simple container affinities always implicitly symmetric
    • limit user-defined container affinity to [-1000,1000]
    • re-trigger pod cgroupfs parent directory and QoS class discovery if necessary

Resource controllers

  • RDT:
    • remove controller-level class name mapping
    • don't consider assignment to a default class an error if no classes are defined
    • fix crash/misplaced logging of group deletion
  • Block I/O:
    • remove controller-level class name mapping
    • don't consider assignment to a default class an error if no classes are defined
  • CRI:
    • properly send out generated/queued UpdateContainerResources requests

Data collectors

  • cgroupstats:
    • use/report container IDs
    • fix hugetlb size parsing
  • avx:
    • switch to cilium/ebpf from iovisor/gobpf

cri-resmgr

  • new command line options:
    • reset cached configuration: --reset-config
    • reset cached policy data: --reset-policy
  • always set up node agent connection, even when running with --force-config
  • allow switching policies during startup, unless started with --disable-policy-switch

Packaging

  • install sample fallback config as fallback and not real config file
  • use /etc/default for defaults on debian-based distros
  • support Ubuntu 20.04, OpenSUSE 15.2

Documentation

Testing

  • end-to-end test framework added

v0.3.1

3 years ago

This v0.3.1 patch release adds packaging and build fixes on top of the v0.3.0 release.

Changes:

  • feature: add command line options for resetting the active policy in the cache and allow this to happen automatically during startup if necessary
  • fix: NUMA CPU-/memory-attachment detection code to work with older kernels
  • fix: move from gobpf to Cilium-based AVX eBPF implementation to address build issues on older kernel
  • fix: add targets for containerized cross-builds for distro packages

v0.3.0

3 years ago
  • added memory-tiering policy: topology-aware policy with support for DRAM, PMEM (Intel Optate DC) and HBM (High Bandwidth Memory) allocation
  • added blockio controller: class-based control over block I/O using the cgroupfs blkio controller
  • added support for metrics collection:
    • collection of raw metrics data, exporting to Prometheus
    • AVX512 usage: collect per container AVX512 instruction usage, tag containers accordingly
  • rdt controller improvements: disjoint partitioning, L3 and memory bandwidth monitoring, and Intel RDT metrics
  • new annotations:
    • assign full pod or a container to block I/O or RDT class:
      • rdtclass.cri-resource-manager.intel.com/container.$container: class-name
      • rdtclass.cri-resource-manager.intel.com/pod: class-name
      • blockioclass.cri-resource-manager.intel.com/container.$container: class-name
      • blockioclass.cri-resource-manager.intel.com/pod: class-name
    • memtier policy preference for type of memory allocated to a container:
      • memory-type.cri-resource-manager.intel.com: $container: [dram,][pmem,][hbm]

v0.2.0

4 years ago

Implement a more general, unified mechanism for handling runtime configuration.

v0.1.0

4 years ago

Initial release for the project with major functionality available in alpha state.

Note: this is pre-production Alpha release. Not for production use!