Netutils Linux Versions Save

A suite of utilities simplilfying linux networking stack performance troubleshooting and tuning.

v2.7.9

5 years ago

RSS

rss-ladder tool supports PCI-slot-based queue naming now. I mean this:

 30:  127355089          0          0          0   PCI-MSI-edge      mlx5_comp0@pci:0000:01:00.0
 31:  120112828    5482507          0          0   PCI-MSI-edge      mlx5_comp1@pci:0000:01:00.0
 32:  121978940          0    5524729          0   PCI-MSI-edge      mlx5_comp2@pci:0000:01:00.0
 33:  122736116          0          0    5465612   PCI-MSI-edge      mlx5_comp3@pci:0000:01:00.0

How to tune it?

$ rss-ladder pci:0000:01:00.0
- distribute interrupts of pci:0000:01:00.0 (mlx5_async_eq) on socket 0
- distribute interrupts of pci:0000:01:00.0 (mlx5_cmd_eq) on socket 0
- distribute interrupts of pci:0000:01:00.0 (mlx5_comp) on socket 0
  - pci:0000:01:00.0: queue mlx5_comp0@pci:0000:01:00.0 (irq 30) bound to CPU0
  - pci:0000:01:00.0: queue mlx5_comp1@pci:0000:01:00.0 (irq 31) bound to CPU1
  - pci:0000:01:00.0: queue mlx5_comp2@pci:0000:01:00.0 (irq 32) bound to CPU2
  - pci:0000:01:00.0: queue mlx5_comp3@pci:0000:01:00.0 (irq 33) bound to CPU3
- distribute interrupts of pci:0000:01:00.0 (mlx5_pages_eq) on socket 0

It may be not perfect but it works at least. Well, at least for mlx5 driver.

RPS

autorps tool doesn't yelling at you with dreadful exception if you try to tune multiqueue NIC. Just says that it may be wrong idea and you should use -f flag to really change RPS settings.

Also some processors have inverted CPU masks in rps_cpus file and you could put all processing on foreign NUMA node. I don't know how these masks work and don't want to know, so default behaviour now is to copy mask from /sys/class/net/$dev/device/local_cpus instead of evaluate it.

v2.7.5

6 years ago

New class structure

There is a new class-structure in server-info utility and netutils_linux_hardware package:

Server class manages five subsystems - CPU, Disk, Net, Memory and System.

Server (server-info) can collect (--collect), read (--show) and rate (--rate) data.

Before the refactoring there were 3 big classes: Reader, Parser (--show) and Assessor (--rate). They had duplicated data about subsystems. Well, there were no "subsystems", there were just a lot of functions with prefixes in those classes. Now all those functions live in their own subsystems, all subsystems have standardised API (that's very cool for Server class, it can just iterate over subsystems).

Folding

There is a new Folding class with all the folding logic/constants. Also there is no more -f, -ff, -fff args, use --device, --subsystem, --server instead.

Other things

Some code was simplified and I restored run tests for server-info --rate. Also I got rid of six.iteritems dependency in few places and just use .items(). There no big data, so I don't think that 2-3 kbits of RAM are more than code simplicity.

v2.7.4

6 years ago

All new options available:

  --cpu                 Show information about CPU
  --memory              Show information about RAM
  --net                 Show information about network devices
  --disk                Show information about disks
  --system              Show information about system overall (rate only)

Example:

# server-info --rate --device --net --cpu
cpu:
  BogoMIPS: 5
  CPU MHz: 5
  CPU(s): 5
  Core(s) per socket: 10
  L3 cache: 7
  Socket(s): 10
  Thread(s) per core: 10
  Vendor ID: 10
net:
  eth1: 3.6666666666666665
  eth6: 9.666666666666666
  eth7: 9.666666666666666

It also works with --show:

# server-info --show --memory --disk 
disk:
  vda:
    model: null
    size: 21474836480
    type: HDD
memory:
  devices:
    '0x1100':
      size: '512'
      speed: 0
      type: RAM
  size:
    MemFree: 78272
    MemTotal: 500196
    SwapFree: 0
    SwapTotal: 0

Also if you run server-info without necessary parameters it shows more human-oriented error and --help output instead of traceback with AssertionError.

v2.7.2

6 years ago

Well, I failed the challenge to release it before new 2018 year. But later is better than never!

Detail are boring, you'd better look at examples in README!

  1. All the server-info-* utils have one entry point now.
  2. Old server-info-rate and server-info-show utils deleted.
  3. server-info-collect called via wrapper until it will be rewritten in python. It's a separate issue.
  4. Added (and fixed) tests for server-info --rate feature. It doesn't fail on all examples in ./tests/server-info-show.test/
  5. Yes, utils call looks like this now: server-info --rate instead of server-info-rate
  6. You can collect data and optional pack it into tarball with server-info --directory <path-to-directory> --gzip. It will make <path-to-directory>.tar.gz with all the data that you can take from server for later analyze.
  7. New examples are already in README.

v2.7.1

6 years ago

You can now skip details of your server's rating this way:

Usage

  • server-info-rate -f - shows entire device rate
  • server-info-rate -ff - shows entire subsystem's rate
  • server-info-rate -fff - shows entire server's rate

Example

$ server-info-rate -fff
WARNING: why do you use 20 years old hardware, dude?

Just a joke. For example:

➜  vscale-vm git:(folding) ✗ server-info-rate -ff
cpu: 4.5
disk: 1.0
memory: 1.0
net: 1.3333333333333333
system: 1.0

It can't be used directly via server-info rate call, you should go to /root/server/ before run server-info-rate. I know it's shitty usability, I'll fix it this week in 2.7.2.

2.6.1

6 years ago

Good news first

There is optional dmidecode support, now you can see how good your RAM is:

data rate

Bad news

Python 2.6 is deprecated so hard, so it's probably impossible to use pytest to run tests in Travis-ci anymore. I tried to jump into details and problem is probably not in pytest or python, but in pip. pip 8.0.1 installed by yum in CentOS 6 installed the latest version of pytest without any problems while pip 9.0.1 refused. I don't know how to set version of pip for python 2.6 environment in Travis, so I just removed it from .travis.yml. What does it mean? Higher probability of making bugs specific for python 2.6 - some modern syntax usage, etc. But is it a problem?

Main py2.6 users are CentOS 6 users. Where are python3.4 in EPEL in CentOS 6, so it's still possible to use. Also, there are Carbon Reductor 7 users. Well, most of them have old versions installed already and everything works. In Carbon Reductor 8 I will move netutils-linux from py2.6 env to 3.4 in January when I'll upgrade to a new version (from... 2.0?).

We (Carbon Soft) also offer a help in migrating from Carbon Reductor 7 to Carbon Reductor 8. So probably no one will suffer from bugs (I hope), but feel free to create issues.

v2.5.0

6 years ago

Fixed:

  • Highlighting of CPU/NIC had been able to break rss-ladder
  • numa and socket layout were mixed-up
  • lscpu output could not be parsed correctly in python3 and everything that needs CPU topology thought that you had only one logical CPU

And WOW, A FEATURE OF YEAR:

  • you may pass file with lscpu -p output to network-top in debugging purposes.

v2.3.0

6 years ago

Basic /proc/net/snmp file watcher:

-16-2017 01-25-23

Finished in just one day!

There is much more to do: https://github.com/strizhechenko/netutils-linux/issues/144

v2.2.5

6 years ago

here-come-dat-boi

Well, the only difference between rps and xps tuning if queue prefix (rx and tx)... So here are 25 lines changed and you are able to distribute packets transmitting between CPUs even with single-queue NIC!

Example:

# autoxps eth0
Using mask 'ff' for eth0-tx-0

v2.2.0

6 years ago

Autorps didn't work in systems with multiple NUMA-nodes or CPU sockets before, because it had been calculating cpu mask by total cpu count and wasn't aware of CPU/NUMA topology.

It has been rewritten in python. Now you can:

  1. Use it on single-queue NICs in multinuma systems. CPU socket/NUMA node to bind network packet processing will be chosen automatically (rss-ladder is able to do it too!) by reading /sys/class/net/$NIC/device/numa_node (fallback - 0).
  2. --force it to work with multiqueue NIC.
  3. Pass custom CPU mask. --cpu-mask=fe
  4. Pass custom CPU list --cpus 0 2 4 6 (in the end of options).
  5. Test it before using by --dry-run: it will print something like Using mask 'fc0' for eth0-rx-0.
  6. Explicitely define socket to bind queues by --socket=1. Why would you ever need it? Because you may found out that moving this nic to external NUMA-node gives you better performance than put all your NIC's on the device's local NUMA-node (and you can't put NIC in this NUMA-node's PCI slot right now).

Also I accidentally drop .pylintrc in repo and fixed all small pep8 violations and other code smells that landscape.io was hiding with default settings.