    Linux NVM-Express

      Keith Busch
     Feb 7, 2015

    Storage Stack Comparison- SAS vs. NVMe Device: Latency and CPU utilization reduced by 50+%*: NVMe: 2.8us, 9,100 cycles, SAS: 6.0us, 19,500 cycles. SCSI-NVMe SG_IO- Read/Write 6, 10, 12, 16- Inquiry, Mode Sense 10/16, Mode Select 10/16, Log Sense, Read Capacity 10/16, Report LUNS, Request Sense, Security Protocol In/Out, Start Stop Unit, Test Unit Ready, Write Buffer, Unmap. BIO Splitting- Not all I/O vectors can be mapped to an NVMe command’s PRP list, Requires virtually contiguous buffers. Disk Geometry- Prevent partitions that create this scenario. Surprise Removal- Additional synchronization and reference counting software need for controller+storage removal safe without sacrificing performance. Asynchronous Event Notification, Mismatched Host-Device Pages, PCI-e Advanced Error Reporting, Write Zeroes command. Block Multi-Queue- Removes request based block driver software queue contention bottleneck, Moves contention closer to the h/w, reducing latency on multi-threaded IO, Merged into Linux 3.13; virtio-blk only mainline driver. Block Multi-Queue- IO Merging and Scheduling Removes duplicated code in bio-based drivers: Timeouts, Tracing, Tagging, Diskstats, Runtime Power Management, Device Removal Latest nvme-mq tree based on 3.15, passes stability testing. Mainline merge date uncertain. Future Work- Block Polling NVMe 1.1 and 1.2 Support, SGL, Subsystem, Namespace List, Performance Tuning, Aggregate Doorbell writes, Asynchronous Start/Stop, Runtime Power Management, SR-IOV. Block Polling- IO Latency Sources: Beyond NAND: For low-latency device, context switch and interrupt dominate observed latency. User-space Tooling- Provides examples using the NVMe IOCTL interface to send commands to controller and parse output. Emulation- Useful if you just want to test driver and user space tools. Performs poorly on imitating nvme performance characteristics, among other things.

