When it comes to selecting a new NVMe device for enterprise and cloud storage systems, companies often overlook NVMe feature support that might be required for their applications and use cases. Technical discussions tend to focus on benefits of NVMe performance, endurance, power and thermal requirements. While those are without a doubt important, NVMe technology — much like SCSI and SAS — brings a lot of sophisticated features that might be useful, or even mandatory, for applications or projects. (My colleague Rohit Gupta has written about key NVMe features on this blog.)
With the availability of our latest Ultrastar® SN840 NVMe SSD, I’d like to share some of the features supported on this latest drive, including SGL and metadata. I’ll walk you through, step by step, on how to check if the feature is supported by your current NVMe SSD device.
SGL Feature in NVMe Drives
There are two distinct mechanisms used by the NVMe protocol to transfer commands and data:
- PRP (Physical Region Page)
- SGL (Scatter Gather List)
PRP was the unique mechanism supported by the first NVMe specification. SGL support was added later-on and it allows more efficient large data transfer. This mechanism is mandatory for NVMe over Fabrics (NVMe-oF™) where NVMe commands use capsules. In addition, the support of SGL on NVMe over PCIe was added to the version 4.15 of the Linux kernel. PRP or SGL is part of the Submission Queue Entry.
“SGL support is mandatory for NVMe over Fabrics where NVMe commands use capsules”
The extract below was taken from the NVMe PCIe driver documentation, explaining the selection mechanism to use either PRP or SGL for NVMe SSDs when used on a PCIe bus:
The usage of SGLs is controlled by the sgl_threshold module parameter, which allows to conditionally use SGLs if average request segment size (avg_seg_size) is greater than sgl_threshold. In the original patch, the decision of using SGLs was dependent only on the IO size, with the new approach we consider not only IO size but also the number of physical segments present in the IO.
We calculate avg_seg_size based on request payload bytes and number of physical segments present in the request.
For e.g.:-
- blk_rq_nr_phys_segments = 2 blk_rq_payload_bytes = 8k avg_seg_size = 4K use sgl if avg_seg_size >= sgl_threshold.
- blk_rq_nr_phys_segments = 2 blk_rq_payload_bytes = 64k avg_seg_size = 32K use sgl if avg_seg_size >= sgl_threshold.
- blk_rq_nr_phys_segments = 16 blk_rq_payload_bytes = 64k avg_seg_size = 4K use sgl if avg_seg_size >= sgl_threshold.
How to Check if Your NVMe SSD Supports SGL?
Other than reading the product manual, the support of SGL can be easily verified on any NVMe device with the NVMe identify command by checking Bytes #536 to # 539 (SGLS).
See details below of the NVMe specification about the first 2 bits of byte # 536:
To verify these bytes in order to check whether SGL is supported on your NVMe device, the generic nvme-cli tool can be used.
The result you will see will look like one of these:
- Drive 1: sgls : 0 (bit#0 set to 0 = not supported)
- Drive 2: sgls : 0x70001 (bit#0 set to 1 = SGLs are supported)
The support of SGL is not effective on all NVMe SSDs, and as such should be a consideration when you evaluate what drives to deploy in your environment and will depend on the applications and use case scenario.
NVMe Metadata Support
What is metadata for NVMe? Similar to SCSI / SAS devices, the NVMe standard supports the addition of 8 bytes (called metadata or protection information (PI)) to each data sector to ensure data integrity during data transfer.
NVMe end-to-end Data Protection is compatible with T10 DIF/DIX and provides data protection via the Data Integrity Field (DIF)/Data Integrity Extension (DIX). DIF support includes Type 1, Type 2, or Type 3 and is selected when a namespace is formatted.
Metadata support was added to NVMe specification 1.2.1, and there are two different options to transfer metadata. Transfer of MD could be done via a distinct buffer (metadata buffer) or simply using an extended LBA (LBA with data and metadata) as shown below:
Why is Metadata Support Needed for NVMe Devices?
Support of metadata could be critical and mandatory for some applications. As an example, while talking to a leading Telco customer, I learned they deploy a software-defined storage solution which requires support of metadata for erasure coding. In the case of the particular solution, the implementation required metadata to be stored in-line with data, as part of the sector data. This NVMe feature support was a key differentiator that determined which NVMe SSD they chose for their architecture.
How to check support of metadata on a NVMe SSD?
This is rather simple with either the NVMe command line (NVMe-CLI) generic tool or also with our own tool which can be made available to customers.
With the NVMe generic tool, simply use the id-ns command and look for MC (Metadata Capabilities) and also for the DPC (Data Protection Capabilities) fields:
Metadata Test Example on our Ultrastar SN840
Here is an example of how this would look when testing our Ulstrastar SN840. As you can see the mc field is 0x3 meaning both bits # 0 and 1 are set to 1 which tells you metadata are supported and transferred as part of an extended LBA. The dpc field is at 0x17 meaning all 3 PI types are supported and PI data are transferred as the last 8 bytes of MD.
root@marc-Asus-server:/home/marc# nvme id-ns /dev/nvme0n1
NVME Identify Namespace 1:
nsze : 0x1749a42b0
ncap : 0x1749a42b0
nuse : 0x5feb72c8
nsfeat : 0x2
nlbaf : 4
flbas : 0x1
mc : 0x3
dpc : 0x17
dps : 0
nmic : 0x1
rescap : 0xff
fpi : 0x80
dlfeat : 9
nawun : 7
nawupf : 7
nacwu : 7
nabsn : 7
nabo : 0
nabspf : 7
noiob : 0
nvmcap : 3200631791616
nsattr : 0
nvmsetid: 0
anagrpid: 0
endgid : 0
nguid : 01000000000000000014ee810005a982
eui64 : 0014ee810005a982
lbaf 0 : ms:0 lbads:12 rp:0
lbaf 1 : ms:0 lbads:9 rp:0 (in use)
lbaf 2 : ms:8 lbads:9 rp:0
lbaf 3 : ms:8 lbads:12 rp:0
lbaf 4 : ms:64 lbads:12 rp:0
Example with nvme-cli tool by using the command id-ns.
What if Metadata is Not Supported?
The below example from the previous generation, Ultrastar SN620, shows fields are set to 0 meaning there is no support for MD and PI.
root@marc-Asus-server:/home/marc# nvme id-ns /dev/nvme0n1
NVME Identify Namespace 1:
nsze : 0x2e934856
ncap : 0x2e934856
nuse : 0x2e934856
nsfeat : 0x2
nlbaf : 1
flbas : 0x10
mc : 0
dpc : 0
dps : 0
nmic : 0
rescap : 0
fpi : 0x80
dlfeat : 1
nawun : 0
nawupf : 0
nacwu : 0
nabsn : 0
nabo : 0
nabspf : 0
noiob : 0
nvmcap : 3200631791616
nsattr : 0
nvmsetid: 0
anagrpid: 0
endgid : 0
nguid : 03000000000000000014ee830020c580
eui64 : 0014ee830020c580
lbaf 0 : ms:0 lbads:12 rp:0 (in use)
lbaf 1 : ms:0 lbads:9 rp:0
The Role of NVMe Features
The NVMe protocol and feature set is extremely rich and sophisticated. As field engineers we work upfront with customers to review their technical requirements, system architecture, targeted applications and use cases, to help and guide them to the most suitable product that will greatly fulfill their needs from a performance, cost, endurance and application perspective.
I hope these examples on NVMe features illustrate the importance of having a technical discussion when evaluating a new project based on NVMe SSDs. We’re here to help you succeed.