Standalone Linux Io Tracer Save

Standalone Linux IO Tracer (iotrace) is a tool for block device and filesystem I/O tracing

Project README

Standalone Linux IO Tracer

NEWS

Standalone Linux IO Tracer switched to using eBPF for capturing traces. Previously, the tracer ran custom loadable kernel module for that. eBPF tracing method is more secure for user. The version which uses the kernel module will be obsoleted. In case you want to run old version, switch to this branch: master-kernel

Description

Standalone Linux IO Tracer (iotrace) is a tool for block device I/O tracing and management of created traces

For each I/O to target device(s) basic metadata information is captured (IO operation type, address, size), supplemented with extended classification. Extended classification contains information about I/O type (direct / filesystem metadata / file) and target file attributes(e.g. file size).

iotrace is based on Open CAS Telemetry Framework (OCTF). Collected traces are stored in OCTF trace location. Traces can later be converted to JSON or CSV format.

The iotrace executable (iotrace command line application) includes an eBPF program which is loaded to the Linux kernel during tracing. The eBPF program captures trace information and shares them to the userspace iotrace application. This is serialized to the OCTF IO trace.

In this readme:

Supported OS

Right now the compilation of Standalone Linux IO Tracer is tested on the following OSes:

OS Version Kernel Version
Fedora 36 5.18.10

Source Code

Source code is available in the official Standalone Linux IO Tracer GitHub repository:

git clone https://github.com/open-cas/standalone-linux-io-tracer
cd standalone-linux-io-tracer

Deployment

Prerequisites

  • To build and use Standalone Linux IO Tracer, setup prerequisites first in the following way:

    git submodule update --init --recursive
    sudo ./setup_dependencies.sh
    

    Installed dependencies include OCTF, Google Protocol Buffers, CMake and Google Test. The dependencies are either installed with dnf/yum/apt or installed to a dedicated directory /opt/octf/ to avoid overwriting already installed ones.

Build

To build the iotrace executable invoke:

make

You can try to create rpm/deb installation package.

make package

For example in case of Fedora OS, the generated installation package is located in build/release/iotrace-XX.YY.ZZ-1.x86_64.rpm.

Installation

To install iotrace call:

sudo make install

Also you can try to install iotrace using rpm/deb package:

rpm -Uvh iotrace-XX.YY.ZZ-1.x86_64.rpm.

Theory of operation

Standalone Linux IO Tracer captures request data by registering to multiple trace points surfaced by the Linux kernel (e.g. BIO queueing, BIO splitting, BIO completion). This allows for gathering of IO metadata at the request level and passing it between kernel and userspace.

A perf buffer is allocated and shared between the eBPF program and the userspace application. The below example shows a recorded traces event.

struct iotrace_event_hdr {
    /** Event sequence ID */
    log_sid_t sid;

    /** Time stamp */
    uint64_t timestamp;

    /** Trace event type, iotrace_event_type enunerator */
    uint32_t type;

    /** Size of this event, including header */
    uint32_t size;
} __attribute__((packed, aligned(8)));

...

struct iotrace_event {
    /** Trace event header */
    struct iotrace_event_hdr hdr;
    /**
     * @brief IO ID
     *
     * This ID can be used by the tracing environment to assign an ID to the IO.
     *
     * @note Zero means not set.
     */
    uint64_t id;

    /** Address of IO in sectors */
    uint64_t lba;

    /** Size of IO in sectors */
    uint32_t len;

    /** IO class of IO */
    uint32_t io_class;

    /** Device ID */
    uint32_t dev_id;

    /** Operation flags: flush, fua, ... .
     * Values according to iotrace_event_flag_t enum
     * are summed (OR-ed) together. */
    uint32_t flags;

    /** Operation type: read, write, discard
     * (iotrace_event_operation_t enumerator) **/
    uint8_t operation;

    /** Write hint associated with IO */
    uint8_t write_hint;
} __attribute__((packed, aligned(8)));

The events declaration file can be found here.

The userspace part of the Standalone Linux IO Tracer reads the entries from the perf buffer and translates them into Google Protocol Buffer format (see example below), for easier portability. The data is then serialized in trace files in a per CPU basis (e.g. octf.trace.0).

message EventHeader {
    /** Event sequence ID */
    uint64 sid = 1;

    /** Time stamp */
    uint64 timestamp = 2;
}

...

enum IoType {
    UnknownIoType = 0;
    Read = 1;
    Write = 2;
    Discard = 3;
}

...

message EventIo {
    /** Address of IO in sectors */
    uint64 lba = 1;

    /** Size of IO in sectors */
    uint32 len = 2;

    /** IO class of IO */
    uint32 ioClass = 3;

    /** Device ID */
    uint64 deviceId = 4;

    /** Operation type: read, write, trim */
    IoType operation = 5;

    /** Flush flag */
    bool flush = 6;

    /** FUA flag */
    bool fua = 7;

    /** Write (lifetime) hint */
    uint32 writeHint = 8;

    /**
     * This ID can be used by the tracing environment to assign an ID to the IO.
     * Zero means not set.
     */
    uint64 id = 9;
}

...

message Event {
    /** Trace event header */
    EventHeader header = 1;

    oneof EventType {
        EventIo io = 2;
        EventDeviceDescription deviceDescription = 3;
        EventIoFilesystemMeta filesystemMeta = 4;
        EventIoCompletion ioCompletion = 5;
        EventIoFilesystemFileName filesystemFileName = 6;
        EventIoFilesystemFileEvent filesystemFileEvent = 7;
    }
}

The protobuf events declaration file can be found here.

You may see the results of translating into the above protobuf format, by executing the following command:

iotrace --trace-parser --io --path "kernel/2020-07-02_08:52:51" --raw

Output:

...
{"header":{"sid":"1","timestamp":"14193058940837"},"deviceDescription":{"id":"271581186","name":"nvme0n1","size":"732585168","model":"INTEL SSDPED1K375GA"}}
{"header":{"sid":"73","timestamp":"14196894550578"},"io":{"lba":"1652296","len":256,"ioClass":19,"deviceId":"271581186","operation":"Write","flush":false,"fua":false,"writeHint":0,"id":"110842991263647"}}
{"header":{"sid":"74","timestamp":"14196894550696"},"filesystemMeta":{"refSid":"110842991263647","fileId":{"partitionId":"271581186","id":"76","creationDate":"2020-07-02T06:52:55.712990641Z"},"fileOffset":"0","fileSize":"241960"}}
...

After tracing is complete these singular trace events may be parsed, combined and translated into different Google Protocol Buffer messages (or other formats, such as CSV) when executing Standalone Linux IO Tracer trace parser commands.

For example the --trace-parser --io command analyzes multiple submission, split and completion events to give a more complete view of a given IO request such as: its latency, queue depth, file size and path (if applicable) etc.

Examples

  • Start tracing two block devices for 1 hour, or until trace file is 1GiB:

    sudo iotrace --start-tracing --devices /dev/sda,/dev/sdb1 --time 3600 --size 1024
    
    • The below output example is based on sample traces found here. The traces were captured during YCSB workload type A benchmark on RocksDB.

    Traces can be unpacked using the following command:

    tar -xf sample_trace.tar.xz
    

    They can then be moved to the trace repository directory, the path of which can be extracted using:

    iotrace --trace-config --get-trace-repository-path
    
  • List created traces:

    iotrace --trace-management --list-traces
    

    Output:

    {
    "trace": [
      {
       "tracePath": "kernel/2020-07-02_08:52:51",
       "state": "COMPLETE"
       "tags": {}
      }
    ]
    }
    
  • Parse traces (note usage of path returned in --list-traces):

    iotrace --trace-parser --io --path "kernel/2020-07-02_08:52:51" --format json
    

    Output:

    {"header":{"sid":"1","timestamp":"3835590186"},"io":{"lba":"1652296","len":256,"ioClass":19,"operation":"Write","flush":false,"fua":false,"error":false,"latency":"83797","qd":"1","writeHint":0},"device":{"id":"271581186","name":"nvme0n1","partition":"271581186","model":"INTEL SSDPED1K375GA"},"file":{"id":"76","offset":"0","size":"241960","path":"/000014.sst","eventType":"Access","creationDate":"2020-07-02T06:52:55.712990641Z"}}
    {"header":{"sid":"2","timestamp":"3835625267"},"io":{"lba":"1652552","len":256,"ioClass":19,"operation":"Write","flush":false,"fua":false,"error":false,"latency":"95069","qd":"2","writeHint":0},"device":{"id":"271581186","name":"nvme0n1","partition":"271581186","model":"INTEL SSDPED1K375GA"},"file":{"id":"76","offset":"256","size":"241960","path":"/000014.sst","eventType":"Access","creationDate":"2020-07-02T06:52:55.712990641Z"}}
    {"header":{"sid":"3","timestamp":"3835638717"},"io":{"lba":"1652808","len":256,"ioClass":19,"operation":"Write","flush":false,"fua":false,"error":false,"latency":"130094","qd":"3","writeHint":0},"device":{"id":"271581186","name":"nvme0n1","partition":"271581186","model":"INTEL SSDPED1K375GA"},"file":{"id":"76","offset":"512","size":"241960","path":"/000014.sst","eventType":"Access","creationDate":"2020-07-02T06:52:55.712990641Z"}}
    {"header":{"sid":"4","timestamp":"3835652180"},"io":{"lba":"1653064","len":256,"ioClass":19,"operation":"Write","flush":false,"fua":false,"error":false,"latency":"203209","qd":"4","writeHint":0},"device":{"id":"271581186","name":"nvme0n1","partition":"271581186","model":"INTEL SSDPED1K375GA"},"file":{"id":"76","offset":"768","size":"241960","path":"/000014.sst","eventType":"Access","creationDate":"2020-07-02T06:52:55.712990641Z"}}
    ...
    

    NOTE: Any mention of LBA assumes a 512B sector size, even if the underlying drive is formatted to 4096B sectors. Similarly a 512B sector is the unit of length.

  • Show trace statistics:

    iotrace --trace-parser --statistics -p "kernel/2020-07-02_08:52:51/"
    

    Output:

    "statistics": [
    {
     "desc": {
      "device": {
       "id": "271581186",
       "name": "nvme0n1",
       "size": "732585168" >  NOTE:  In sectors
      }
     },
     "duration": "24650525234",  NOTE:  In nanoseconds
     ...
     "write": {  NOTE:  Statistics for writes
      "size": {
       "unit": "sector",
       "average": "254",  NOTE:  Average write I/O size in sectors
       "min": "1",  NOTE:  Minimum write I/O size in sectors
       "max": "256",  NOTE:  Maximum write I/O size in sectors
       "total": "16338524",  NOTE:  Total write I/O size in sectors
       ...
      "latency": {
       "unit": "ns",
       "average": "15071365",  NOTE:  Average write I/O latency in ns
       "min": "9942",  NOTE:  Minimum write I/O latency in ns
       "max": "63161202",  NOTE:  Maximum write I/O latency in ns
      ...
      "count": "64188",  NOTE:  Number of write operations
      "metrics": {
       "throughput": {
        "unit": "IOPS",
        "value": 2603.9201757643164  NOTE:  Average write IOPS
       },
       "workset": {
        "unit": "sector",
        "value": 2237616  NOTE:  Number of distinct sectors written
       },
       "bandwidth": {
        "unit": "MiB/s",
        "value": 323.63590009317039  NOTE:  Average write bandwidth
       }
    ...
    

    NOTE: Similar statistics exist for any read, discard or flush operations. There's a section for the combined statistics as well (Total).

  • Show file system statistics:

    iotrace --trace-parser --fs-statistics -p "kernel/2020-07-02_08:52:51/"
    

    Output:

    "entries": [
     {
      "deviceId": "271581186",
      "partitionId": "271581186",
      "statistics": {
     ...
     "write": {
       "size": {
        "unit": "sector",
        "average": "255",  NOTE:  Average write I/O size in sectors
        "min": "8",  NOTE:  Minimum write I/O size in sectors
        "max": "256",  NOTE:  Maximum write I/O size in sectors
        "total": "16336216",  NOTE:  Total write I/O size in sectors
        "percentiles": {}
       },
       "count": "63984",
       "metrics": {
        "throughput": {
         "unit": "IOPS",
         "value": 2602.9390810897676
        },
        "workset": {
         "unit": "sector",
         "value": 2237096  NOTE:  Number of distinct sectors written
        },
        "write invalidation factor": {
         "unit": "",
         "value": 7.3024206381845032  NOTE:  Average number of times a given sector in the workset is written to
        },
        "bandwidth": {
         "unit": "MiB/s",
         "value": 324.49957478019985  NOTE:  Average write bandwidth
        }
       },
       ...
       "directory": "/"  NOTE:  Directories are relative to the filesystem mountpoint
    

    NOTE: Similar statistics exist for any read, discard or flush operations. There's a section for the combined statistics as well (Total).

    NOTE: File system statistics are gathered for detected groups of related I/O requests with common attributes like directory, file extension, file name prefix or IO class.

  • Show latency histogram:

    iotrace --trace-parser --latency-histogram -p "kernel/2020-07-02_08:52:51/"
    

    Output:

    ...
     "write": {
      "unit": "ns",
      "range": [
       {
        "begin": "0",
        "end": "0",
        "count": "0"
       },
       ...
       {
        "begin": "8192",  NOTE:  Minimum bound of latency histogram bucket in nanoseconds
        "end": "16383",  NOTE:  Maximum bound of latency histogram bucket in nanoseconds
        "count": "95"  NOTE:  Number of I/O requests in the bucket
       },
    ...
    

    NOTE: In the above example 95 write requests had latency between 8192 and 16383 ns.

    NOTE: Similar statistics exist for any read, discard or flush operations. There's a section for the combined statistics as well (Total).

  • Show latency histogram:

    iotrace --trace-parser --lba-histogram -p "kernel/2020-07-02_08:52:51/"
    

    Output:

    ...
     "write": {
      "unit": "sector",
      "range": [
       {
        "begin": "0",   NOTE:  Minimum disk LBA bound of latency histogram bucket
        "end": "20479",  NOTE:  Maximum disk LBA bound of latency histogram bucket
        "count": "177"  NOTE:  Number of I/O requests in the bucket
       },
    ...
    

    NOTE: In the above example 177 write requests were issued to disk LBAs 0 through 20479.

    NOTE: Similar statistics exist for any read, discard or flush operations. There's a section for the combined statistics as well (Total).

Tests

See our tests README

Contributing

Please refer to the OCTF contributing guide.

Please explore related projects:

Notice

NOTICE contains more information

Open Source Agenda is not affiliated with "Standalone Linux Io Tracer" Project. README Source: Open-CAS/standalone-linux-io-tracer
Stars
58
Open Issues
10
Last Commit
1 month ago
License

Open Source Agenda Badge

Open Source Agenda Rating