Demonstrations of biolatency, the Linux eBPF/bcc version.


biolatency traces block device I/O (disk I/O), and records the distribution
of I/O latency (time), printing this as a histogram when Ctrl-C is hit.
For example:

# ./biolatency
Tracing block device I/O... Hit Ctrl-C to end.
^C
     usecs           : count     distribution
       0 -> 1        : 0        |                                      |
       2 -> 3        : 0        |                                      |
       4 -> 7        : 0        |                                      |
       8 -> 15       : 0        |                                      |
      16 -> 31       : 0        |                                      |
      32 -> 63       : 0        |                                      |
      64 -> 127      : 1        |                                      |
     128 -> 255      : 12       |********                              |
     256 -> 511      : 15       |**********                            |
     512 -> 1023     : 43       |*******************************       |
    1024 -> 2047     : 52       |**************************************|
    2048 -> 4095     : 47       |**********************************    |
    4096 -> 8191     : 52       |**************************************|
    8192 -> 16383    : 36       |**************************            |
   16384 -> 32767    : 15       |**********                            |
   32768 -> 65535    : 2        |*                                     |
   65536 -> 131071   : 2        |*                                     |

The latency of the disk I/O is measured from the issue to the device to its
completion. A -Q option can be used to include time queued in the kernel.

This example output shows a large mode of latency from about 128 microseconds
to about 32767 microseconds (33 milliseconds). The bulk of the I/O was
between 1 and 8 ms, which is the expected block device latency for
rotational storage devices.

The highest latency seen while tracing was between 65 and 131 milliseconds:
the last row printed, for which there were 2 I/O.

For efficiency, biolatency uses an in-kernel eBPF map to store timestamps
with requests, and another in-kernel map to store the histogram (the "count")
column, which is copied to user-space only when output is printed. These
methods lower the performance overhead when tracing is performed.


In the following example, the -m option is used to print a histogram using
milliseconds as the units (which eliminates the first several rows), -T to
print timestamps with the output, and to print 1 second summaries 5 times:

# ./biolatency -mT 1 5
Tracing block device I/O... Hit Ctrl-C to end.

06:20:16
     msecs           : count     distribution
       0 -> 1        : 36       |**************************************|
       2 -> 3        : 1        |*                                     |
       4 -> 7        : 3        |***                                   |
       8 -> 15       : 17       |*****************                     |
      16 -> 31       : 33       |**********************************    |
      32 -> 63       : 7        |*******                               |
      64 -> 127      : 6        |******                                |

06:20:17
     msecs           : count     distribution
       0 -> 1        : 96       |************************************  |
       2 -> 3        : 25       |*********                             |
       4 -> 7        : 29       |***********                           |
       8 -> 15       : 62       |***********************               |
      16 -> 31       : 100      |**************************************|
      32 -> 63       : 62       |***********************               |
      64 -> 127      : 18       |******                                |

06:20:18
     msecs           : count     distribution
       0 -> 1        : 68       |*************************             |
       2 -> 3        : 76       |****************************          |
       4 -> 7        : 20       |*******                               |
       8 -> 15       : 48       |*****************                     |
      16 -> 31       : 103      |**************************************|
      32 -> 63       : 49       |******************                    |
      64 -> 127      : 17       |******                                |

06:20:19
     msecs           : count     distribution
       0 -> 1        : 522      |*************************************+|
       2 -> 3        : 225      |****************                      |
       4 -> 7        : 38       |**                                    |
       8 -> 15       : 8        |                                      |
      16 -> 31       : 1        |                                      |

06:20:20
     msecs           : count     distribution
       0 -> 1        : 436      |**************************************|
       2 -> 3        : 106      |*********                             |
       4 -> 7        : 34       |**                                    |
       8 -> 15       : 19       |*                                     |
      16 -> 31       : 1        |                                      |

How the I/O latency distribution changes over time can be seen.



The -Q option begins measuring I/O latency from when the request was first
queued in the kernel, and includes queuing latency:

# ./biolatency -Q
Tracing block device I/O... Hit Ctrl-C to end.
^C
     usecs           : count     distribution
       0 -> 1        : 0        |                                      |
       2 -> 3        : 0        |                                      |
       4 -> 7        : 0        |                                      |
       8 -> 15       : 0        |                                      |
      16 -> 31       : 0        |                                      |
      32 -> 63       : 0        |                                      |
      64 -> 127      : 0        |                                      |
     128 -> 255      : 3        |*                                     |
     256 -> 511      : 37       |**************                        |
     512 -> 1023     : 30       |***********                           |
    1024 -> 2047     : 18       |*******                               |
    2048 -> 4095     : 22       |********                              |
    4096 -> 8191     : 14       |*****                                 |
    8192 -> 16383    : 48       |*******************                   |
   16384 -> 32767    : 96       |**************************************|
   32768 -> 65535    : 31       |************                          |
   65536 -> 131071   : 26       |**********                            |
  131072 -> 262143   : 12       |****                                  |

This better reflects the latency suffered by the application (if it is
synchronous I/O), whereas the default mode without kernel queueing better
reflects the performance of the device.

Note that the storage device (and storage device controller) usually have
queues of their own, which are always included in the latency, with or
without -Q.


The -D option will print a histogram per disk. Eg:

# ./biolatency -D
Tracing block device I/O... Hit Ctrl-C to end.
^C

Bucket disk = 'xvdb'
     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 1        |                                        |
       256 -> 511        : 33       |**********************                  |
       512 -> 1023       : 36       |************************                |
      1024 -> 2047       : 58       |****************************************|
      2048 -> 4095       : 51       |***********************************     |
      4096 -> 8191       : 21       |**************                          |
      8192 -> 16383      : 2        |*                                       |

Bucket disk = 'xvdc'
     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 1        |                                        |
       256 -> 511        : 38       |***********************                 |
       512 -> 1023       : 42       |*************************               |
      1024 -> 2047       : 66       |****************************************|
      2048 -> 4095       : 40       |************************                |
      4096 -> 8191       : 14       |********                                |

Bucket disk = 'xvda1'
     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 18       |**********                              |
       512 -> 1023       : 67       |*************************************   |
      1024 -> 2047       : 35       |*******************                     |
      2048 -> 4095       : 71       |****************************************|
      4096 -> 8191       : 65       |************************************    |
      8192 -> 16383      : 65       |************************************    |
     16384 -> 32767      : 20       |***********                             |
     32768 -> 65535      : 7        |***                                     |

This output sows that xvda1 has much higher latency, usually between 0.5 ms
and 32 ms, whereas xvdc is usually between 0.2 ms and 4 ms.


The -F option prints a separate histogram for each unique set of request
flags. For example:

./biolatency.py -Fm
Tracing block device I/O... Hit Ctrl-C to end.
^C

flags = Read
     msecs               : count     distribution
         0 -> 1          : 180      |*************                           |
         2 -> 3          : 519      |****************************************|
         4 -> 7          : 60       |****                                    |
         8 -> 15         : 123      |*********                               |
        16 -> 31         : 68       |*****                                   |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 2        |                                        |
       128 -> 255        : 12       |                                        |
       256 -> 511        : 0        |                                        |
       512 -> 1023       : 1        |                                        |

flags = Sync-Write
     msecs               : count     distribution
         0 -> 1          : 5        |****************************************|

flags = Flush
     msecs               : count     distribution
         0 -> 1          : 2        |****************************************|

flags = Metadata-Read
     msecs               : count     distribution
         0 -> 1          : 3        |****************************************|
         2 -> 3          : 2        |**************************              |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 1        |*************                           |
        16 -> 31         : 1        |*************                           |

flags = Write
     msecs               : count     distribution
         0 -> 1          : 103      |*******************************         |
         2 -> 3          : 106      |********************************        |
         4 -> 7          : 130      |****************************************|
         8 -> 15         : 79       |************************                |
        16 -> 31         : 5        |*                                       |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 1        |                                        |

flags = NoMerge-Read
     msecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 5        |****************************************|
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 1        |********                                |

flags = NoMerge-Write
     msecs               : count     distribution
         0 -> 1          : 30       |**                                      |
         2 -> 3          : 293      |********************                    |
         4 -> 7          : 564      |****************************************|
         8 -> 15         : 463      |********************************        |
        16 -> 31         : 21       |*                                       |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 5        |                                        |

flags = Priority-Metadata-Read
     msecs               : count     distribution
         0 -> 1          : 1        |****************************************|
         2 -> 3          : 0        |                                        |
         4 -> 7          : 1        |****************************************|
         8 -> 15         : 1        |****************************************|

flags = ForcedUnitAccess-Metadata-Sync-Write
     msecs               : count     distribution
         0 -> 1          : 2        |****************************************|

flags = ReadAhead-Read
     msecs               : count     distribution
         0 -> 1          : 15       |***************************             |
         2 -> 3          : 22       |****************************************|
         4 -> 7          : 14       |*************************               |
         8 -> 15         : 8        |**************                          |
        16 -> 31         : 1        |*                                       |

flags = Priority-Metadata-Write
     msecs               : count     distribution
         0 -> 1          : 9        |****************************************|

These can be handled differently by the storage device, and this mode lets us
examine their performance in isolation.


The -e option shows extension summary(total, average)
For example:
# ./biolatency.py -e
^C
     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 4        |***********                             |
       128 -> 255        : 2        |*****                                   |
       256 -> 511        : 4        |***********                             |
       512 -> 1023       : 14       |****************************************|
      1024 -> 2047       : 0        |                                        |
      2048 -> 4095       : 1        |**                                      |

avg = 663 usecs, total: 16575 usecs, count: 25

Sometimes 512 -> 1023 usecs is not enough for throughput tuning.
Especially a little difference in performance downgrade.
By this extension, we know the value in log2 range is about 663 usecs.


The -j option prints a dictionary of the histogram.
For example:

# ./biolatency.py -j
^C
{'ts': '2020-12-30 14:33:03', 'val_type': 'usecs', 'data': [{'interval-start': 0, 'interval-end': 1, 'count': 0}, {'interval-start': 2, 'interval-end': 3, 'count': 0}, {'interval-start': 4, 'interval-end': 7, 'count': 0}, {'interval-start': 8, 'interval-end': 15, 'count': 0}, {'interval-start': 16, 'interval-end': 31, 'count': 0}, {'interval-start': 32, 'interval-end': 63, 'count': 2}, {'interval-start': 64, 'interval-end': 127, 'count': 75}, {'interval-start': 128, 'interval-end': 255, 'count': 7}, {'interval-start': 256, 'interval-end': 511, 'count': 0}, {'interval-start': 512, 'interval-end': 1023, 'count': 6}, {'interval-start': 1024, 'interval-end': 2047, 'count': 3}, {'interval-start': 2048, 'interval-end': 4095, 'count': 31}]}

the key `data` is the list of the log2 histogram intervals. The `interval-start` and `interval-end` define the
latency bucket and `count` is the number of I/O's that lie in that latency range.

# ./biolatency.py -jF
^C
{'ts': '2020-12-30 14:37:59', 'val_type': 'usecs', 'data': [{'interval-start': 0, 'interval-end': 1, 'count': 0}, {'interval-start': 2, 'interval-end': 3, 'count': 0}, {'interval-start': 4, 'interval-end': 7, 'count': 0}, {'interval-start': 8, 'interval-end': 15, 'count': 0}, {'interval-start': 16, 'interval-end': 31, 'count': 1}, {'interval-start': 32, 'interval-end': 63, 'count': 1}, {'interval-start': 64, 'interval-end': 127, 'count': 0}, {'interval-start': 128, 'interval-end': 255, 'count': 0}, {'interval-start': 256, 'interval-end': 511, 'count': 0}, {'interval-start': 512, 'interval-end': 1023, 'count': 0}, {'interval-start': 1024, 'interval-end': 2047, 'count': 2}], 'flags': 'Sync-Write'}
{'ts': '2020-12-30 14:37:59', 'val_type': 'usecs', 'data': [{'interval-start': 0, 'interval-end': 1, 'count': 0}, {'interval-start': 2, 'interval-end': 3, 'count': 0}, {'interval-start': 4, 'interval-end': 7, 'count': 0}, {'interval-start': 8, 'interval-end': 15, 'count': 0}, {'interval-start': 16, 'interval-end': 31, 'count': 0}, {'interval-start': 32, 'interval-end': 63, 'count': 0}, {'interval-start': 64, 'interval-end': 127, 'count': 0}, {'interval-start': 128, 'interval-end': 255, 'count': 2}, {'interval-start': 256, 'interval-end': 511, 'count': 0}, {'interval-start': 512, 'interval-end': 1023, 'count': 2}, {'interval-start': 1024, 'interval-end': 2047, 'count': 1}], 'flags': 'Unknown'}
{'ts': '2020-12-30 14:37:59', 'val_type': 'usecs', 'data': [{'interval-start': 0, 'interval-end': 1, 'count': 0}, {'interval-start': 2, 'interval-end': 3, 'count': 0}, {'interval-start': 4, 'interval-end': 7, 'count': 0}, {'interval-start': 8, 'interval-end': 15, 'count': 0}, {'interval-start': 16, 'interval-end': 31, 'count': 0}, {'interval-start': 32, 'interval-end': 63, 'count': 0}, {'interval-start': 64, 'interval-end': 127, 'count': 0}, {'interval-start': 128, 'interval-end': 255, 'count': 0}, {'interval-start': 256, 'interval-end': 511, 'count': 0}, {'interval-start': 512, 'interval-end': 1023, 'count': 0}, {'interval-start': 1024, 'interval-end': 2047, 'count': 1}], 'flags': 'Write'}
{'ts': '2020-12-30 14:37:59', 'val_type': 'usecs', 'data': [{'interval-start': 0, 'interval-end': 1, 'count': 0}, {'interval-start': 2, 'interval-end': 3, 'count': 0}, {'interval-start': 4, 'interval-end': 7, 'count': 0}, {'interval-start': 8, 'interval-end': 15, 'count': 0}, {'interval-start': 16, 'interval-end': 31, 'count': 0}, {'interval-start': 32, 'interval-end': 63, 'count': 0}, {'interval-start': 64, 'interval-end': 127, 'count': 0}, {'interval-start': 128, 'interval-end': 255, 'count': 0}, {'interval-start': 256, 'interval-end': 511, 'count': 0}, {'interval-start': 512, 'interval-end': 1023, 'count': 4}], 'flags': 'Flush'}

The -j option used with -F prints a histogram dictionary per set of I/O flags.

# ./biolatency.py -jD
^C
{'ts': '2020-12-30 14:40:00', 'val_type': 'usecs', 'data': [{'interval-start': 0, 'interval-end': 1, 'count': 0}, {'interval-start': 2, 'interval-end': 3, 'count': 0}, {'interval-start': 4, 'interval-end': 7, 'count': 0}, {'interval-start': 8, 'interval-end': 15, 'count': 0}, {'interval-start': 16, 'interval-end': 31, 'count': 0}, {'interval-start': 32, 'interval-end': 63, 'count': 1}, {'interval-start': 64, 'interval-end': 127, 'count': 1}, {'interval-start': 128, 'interval-end': 255, 'count': 1}, {'interval-start': 256, 'interval-end': 511, 'count': 1}, {'interval-start': 512, 'interval-end': 1023, 'count': 6}, {'interval-start': 1024, 'interval-end': 2047, 'count': 1}, {'interval-start': 2048, 'interval-end': 4095, 'count': 3}], 'Bucket ptr': b'sda'}

The -j option used with -D prints a histogram dictionary per disk device.

# ./biolatency.py -jm
^C
{'ts': '2020-12-30 14:42:03', 'val_type': 'msecs', 'data': [{'interval-start': 0, 'interval-end': 1, 'count': 11}, {'interval-start': 2, 'interval-end': 3, 'count': 3}]}

The -j with -m prints a millisecond histogram dictionary. The `value_type` key is set to msecs.

USAGE message:

# ./biolatency -h
usage: biolatency.py [-h] [-T] [-Q] [-m] [-D] [-F] [-j]
                              [interval] [count]

Summarize block device I/O latency as a histogram

positional arguments:
  interval            output interval, in seconds
  count               number of outputs

optional arguments:
  -h, --help          show this help message and exit
  -T, --timestamp     include timestamp on output
  -Q, --queued        include OS queued time in I/O time
  -m, --milliseconds  millisecond histogram
  -D, --disks         print a histogram per disk device
  -F, --flags         print a histogram per set of I/O flags
  -e, --extension     also show extension summary(total, average)
  -j, --json          json output

examples:
    ./biolatency                    # summarize block I/O latency as a histogram
    ./biolatency 1 10               # print 1 second summaries, 10 times
    ./biolatency -mT 1              # 1s summaries, milliseconds, and timestamps
    ./biolatency -Q                 # include OS queued time in I/O time
    ./biolatency -D                 # show each disk device separately
    ./biolatency -F                 # show I/O flags separately
    ./biolatency -j                 # print a dictionary
    ./biolatency -e                 # show extension summary(total, average)