Sprandom #1954

tomas-winkler-sndk · 2025-07-30T20:50:51Z

Sprandom (SanDisk Pseudo Random) is a method developed to rapidly
precondition large SSDs by leveraging knowledge of physical
over-provisioning (OP). It uses fio to recreate the steady-state OP
distribution of a 100% random write workload by performing
invalidations at region-specific rates during writes.

This approach aims to:

Fully populate the L2P table by writing every LBA once.
Achieve a steady-state OP distribution across physical media.
Prevent physical contiguity in L2P entries, avoiding compressibility.
Limit the physical scatter of logically adjacent addresses.

Sprandom accomplishes the first three goals with a single physical
write pass, improving preconditioning efficiency and fidelity.

The patch series suggest a way to integrate sprandom into FIO,
as discussed here: #1935

Signed-off-by: Tomas Winkler [email protected]

HOWTO.rst

examples/sprandom.fio

sprandom.c

vincentkfu · 2025-07-31T16:48:35Z

sprandom.c

+	print_d_array("validity resampled:", validity_dist, spr_info->num_regions);
+
+	spr_info->validity_dist = validity_dist;
+	total_alloc += spr_info->num_regions * sizeof(spr_info->validity_dist[0]);


Is this value actually used?

Needed something track how much memory overhead this had Can be removed eventually.

sprandom.c

init.c

sprandom.c

filesetup.c

pcbuf.h

sprandom.c

vincentkfu · 2025-08-01T01:20:15Z

Also please resolve the build failures identified by our automated tests.

vincentkfu · 2025-08-02T02:14:46Z

I think the approach taken here is still overly influenced by the job file approach described in the OCP presentation which has each region overwrite offsets written in the immediately previous region.

The two-phase circular buffer seems well done but it is a considerable amount of additional code (that will have to be maintained in perpetuity) with only one user. Its use must be very advantageous to justify including it.

I have a simpler alternative that as far as I can tell accomplishes the same result:

Have each region and its invalidating writes be self contained
Write each region with offsets from the LFSR to fill (1 - invalidation_fraction) * region_size
While writing the above, save invalid_fraction of the offsets
For the remainder of the region, issue writes from the saved list, repeating offsets if necessary

As far as I can tell this produces the same outcome as the two-phase ring buffer approach but is simpler and does not require a special data structure.

steven-sprouse-sndk · 2025-08-03T15:38:36Z

I think the approach taken here is still overly influenced by the job file approach described in the OCP presentation which has each region overwrite offsets written in the immediately previous region.

The two-phase circular buffer seems well done but it is a considerable amount of additional code (that will have to be maintained in perpetuity) with only one user. Its use must be very advantageous to justify including it.

I have a simpler alternative that as far as I can tell accomplishes the same result:

Have each region and its invalidating writes be self contained

Write each region with offsets from the LFSR to fill (1 - invalidation_fraction) * region_size

While writing the above, save invalid_fraction of the offsets

For the remainder of the region, issue writes from the saved list, repeating offsets if necessary

As far as I can tell this produces the same outcome as the two-phase ring buffer approach but is simpler and does not require a special data structure.

We discussed a similar approach, but there were a couple concerns:

When writing the "remainder of the region..." we do not want to create a sub-region that has 100% validity. In other words we need to ensure that all writes in a region have the same probability of becoming invalid.
We could create invalidation writes within the 2nd sub-region, but at some point the invalidations of a specific address would come close enough together in time that some drives might be able to do invalidation in their write caches. This might be over estimating how much of an effect this would have but that is one reason we landed on the current approach.

I think if you can think of a way to address (1) and convince ourselves that (2) is not a significant issue, then we can look at this other approach.

vincentkfu · 2025-08-04T15:33:34Z

I think the approach taken here is still overly influenced by the job file approach described in the OCP presentation which has each region overwrite offsets written in the immediately previous region.
The two-phase circular buffer seems well done but it is a considerable amount of additional code (that will have to be maintained in perpetuity) with only one user. Its use must be very advantageous to justify including it.
I have a simpler alternative that as far as I can tell accomplishes the same result:

Have each region and its invalidating writes be self contained

Write each region with offsets from the LFSR to fill (1 - invalidation_fraction) * region_size

While writing the above, save invalid_fraction of the offsets

For the remainder of the region, issue writes from the saved list, repeating offsets if necessary

As far as I can tell this produces the same outcome as the two-phase ring buffer approach but is simpler and does not require a special data structure.

We discussed a similar approach, but there were a couple concerns:

When writing the "remainder of the region..." we do not want to create a sub-region that has 100% validity. In other words we need to ensure that all writes in a region have the same probability of becoming invalid.

We could create invalidation writes within the 2nd sub-region, but at some point the invalidations of a specific address would come close enough together in time that some drives might be able to do invalidation in their write caches. This might be over estimating how much of an effect this would have but that is one reason we landed on the current approach.

I think if you can think of a way to address (1) and convince ourselves that (2) is not a significant issue, then we can look at this other approach.

For item 1 it would be straightforward to sprinkle invalidating writes throughout the entire region instead of waiting until near the end. However, this would likely exacerbate item 2 especially for the region with the the highest number of invalid blocks. I cannot think of a way to resolve item 2.

So I think this concern is sufficient to justify including the new ring buffer code. Please include this justification in the comment summarizing this feature.

Finally, please add some unit tests for the two-phase ring buffer. There are examples in https://github.com/axboe/fio/tree/master/unittests

examples/sprandom.fio

steven-sprouse-sndk · 2025-08-12T22:32:30Z

@vincentkfu There are still a few commits we'd like to make before this is merged. Can you provide feedback on the fixes that @tomas-winkler-sndk checked in over the weekend?

examples/sprandom.fio

HOWTO.rst

sprandom.c

init.c

sprandom.c

vincentkfu · 2025-08-13T21:49:21Z

sprandom.c

+	double *validity_distribution = NULL;
+	double *blocks_ratio = NULL;
+	double *acc_ratio = NULL;
+	double acc;


Can you explain what blocks_ratio and acc_ratio will be used for? It seems like they are involved in invalidating blocks from previous regions but it is not obvious how they are used.

sprandom.c

Add sprandom command line options: 1. boolean sprandom: enables the sprandom flow. 2. spr_num_regions: granularity of sprandom, defaults to 100 3. spr_op: over provisioning factor, defaults to 0.15 Signed-off-by: Tomas Winkler <[email protected]>

Add FD_SPRANDOM debug facility Signed-off-by: Tomas Winkler <[email protected]>

An example demonstrating sprandom preconditioning: Includes sample commands for basic execution, enabling debug output, setting over-provisioning, and tuning region count for large devices. Default job section: [preconditioning] sprandom=1 spr_op=0.15 spr_num_regions=100 Signed-off-by: Tomas Winkler <[email protected]>

Divide storage into equally sized regions and compute desired invalidation percentage per region. This model estimates the distribution of valid data across a flash drive in a steady state. It is based on the key insight from Desnoyers' research, which establishes a relationship between data validity and the physical space it occupies. This is based on: P. Desnoyers, "Analytic Models of SSD Write Performance" paper and SandDisk internal reaserch using Markov chain analysis to model write amplification as a function of over-provisioning. Signed-off-by: Tomas Winkler <[email protected]>

SPrandom targets large storage devices, hence use an LFSR generator to ensure full storage coverage without repetion and without extra memory. Disable randommap since it prevents repeated writes to the same offset, which is required for invalidation. Signed-off-by: Tomas Winkler <[email protected]>

Function only converts bytes to a human-readable provided string. It doesn't allocate memory and can be used directly in printf("%s", bytes2str_simple()) Signed-off-by: Tomas Winkler <[email protected]>

Signed-off-by: Tomas Winkler <[email protected]>

Implements a circular buffer with staged (write-ahead) and committed (read-visible) regions using dual head pointers. Data is written to a staging area and becomes visible only upon explicit commit. Signed-off-by: Tomas Winkler <[email protected]>

Add tests for basic functionality and wraparound behavior. Signed-off-by: Tomas Winkler <[email protected]>

sprandom requires a random number generator to randomly choose which offsets will be rewritten. Signed-off-by: Tomas Winkler <[email protected]>

Implement sprandom_get_next_offset(), which generates offsets for each region using an LFSR. The function enforces an invalidation percentage by randomly recycling a defined fraction of offsets back into the pool. A two-phase cyclic buffer is used to manage this process: one phase collects new offsets while the other serves recycled offsets. When transitioning between regions, all stored offsets are exhausted first, ensuring the target invalidation level is achived. Signed-off-by: Tomas Winkler <[email protected]>

plug sprandom to file initialization make sure that sprandom operates on one file Signed-off-by: Tomas Winkler <[email protected]>

Invoke sprandom_get_next_offset() to generate offsets for sprandom random-write operations. Signed-off-by: Tomas Winkler <[email protected]>

tomas-winkler-sndk force-pushed the sprandom branch 2 times, most recently from 1156c6d to 5f5ee4f Compare July 31, 2025 10:40