Skip to content

Conversation

@tomas-winkler-sndk
Copy link
Contributor

Sprandom (SanDisk Pseudo Random) is a method developed to rapidly
precondition large SSDs by leveraging knowledge of physical
over-provisioning (OP). It uses fio to recreate the steady-state OP
distribution of a 100% random write workload by performing
invalidations at region-specific rates during writes.

This approach aims to:

  1. Fully populate the L2P table by writing every LBA once.
  2. Achieve a steady-state OP distribution across physical media.
  3. Prevent physical contiguity in L2P entries, avoiding compressibility.
  4. Limit the physical scatter of logically adjacent addresses.

Sprandom accomplishes the first three goals with a single physical
write pass, improving preconditioning efficiency and fidelity.

The patch series suggest a way to integrate sprandom into FIO,
as discussed here: #1935

Signed-off-by: Tomas Winkler [email protected]

@tomas-winkler-sndk tomas-winkler-sndk force-pushed the sprandom branch 2 times, most recently from 1156c6d to 5f5ee4f Compare July 31, 2025 10:40
print_d_array("validity resampled:", validity_dist, spr_info->num_regions);

spr_info->validity_dist = validity_dist;
total_alloc += spr_info->num_regions * sizeof(spr_info->validity_dist[0]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this value actually used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed something track how much memory overhead this had Can be removed eventually.

@vincentkfu
Copy link
Collaborator

Also please resolve the build failures identified by our automated tests.

@vincentkfu
Copy link
Collaborator

I think the approach taken here is still overly influenced by the job file approach described in the OCP presentation which has each region overwrite offsets written in the immediately previous region.

The two-phase circular buffer seems well done but it is a considerable amount of additional code (that will have to be maintained in perpetuity) with only one user. Its use must be very advantageous to justify including it.

I have a simpler alternative that as far as I can tell accomplishes the same result:

  • Have each region and its invalidating writes be self contained
  • Write each region with offsets from the LFSR to fill (1 - invalidation_fraction) * region_size
  • While writing the above, save invalid_fraction of the offsets
  • For the remainder of the region, issue writes from the saved list, repeating offsets if necessary

As far as I can tell this produces the same outcome as the two-phase ring buffer approach but is simpler and does not require a special data structure.

@steven-sprouse-sndk
Copy link

I think the approach taken here is still overly influenced by the job file approach described in the OCP presentation which has each region overwrite offsets written in the immediately previous region.

The two-phase circular buffer seems well done but it is a considerable amount of additional code (that will have to be maintained in perpetuity) with only one user. Its use must be very advantageous to justify including it.

I have a simpler alternative that as far as I can tell accomplishes the same result:

  • Have each region and its invalidating writes be self contained
  • Write each region with offsets from the LFSR to fill (1 - invalidation_fraction) * region_size
  • While writing the above, save invalid_fraction of the offsets
  • For the remainder of the region, issue writes from the saved list, repeating offsets if necessary

As far as I can tell this produces the same outcome as the two-phase ring buffer approach but is simpler and does not require a special data structure.

We discussed a similar approach, but there were a couple concerns:

  1. When writing the "remainder of the region..." we do not want to create a sub-region that has 100% validity. In other words we need to ensure that all writes in a region have the same probability of becoming invalid.
  2. We could create invalidation writes within the 2nd sub-region, but at some point the invalidations of a specific address would come close enough together in time that some drives might be able to do invalidation in their write caches. This might be over estimating how much of an effect this would have but that is one reason we landed on the current approach.

I think if you can think of a way to address (1) and convince ourselves that (2) is not a significant issue, then we can look at this other approach.

@vincentkfu
Copy link
Collaborator

I think the approach taken here is still overly influenced by the job file approach described in the OCP presentation which has each region overwrite offsets written in the immediately previous region.
The two-phase circular buffer seems well done but it is a considerable amount of additional code (that will have to be maintained in perpetuity) with only one user. Its use must be very advantageous to justify including it.
I have a simpler alternative that as far as I can tell accomplishes the same result:

  • Have each region and its invalidating writes be self contained
  • Write each region with offsets from the LFSR to fill (1 - invalidation_fraction) * region_size
  • While writing the above, save invalid_fraction of the offsets
  • For the remainder of the region, issue writes from the saved list, repeating offsets if necessary

As far as I can tell this produces the same outcome as the two-phase ring buffer approach but is simpler and does not require a special data structure.

We discussed a similar approach, but there were a couple concerns:

  1. When writing the "remainder of the region..." we do not want to create a sub-region that has 100% validity. In other words we need to ensure that all writes in a region have the same probability of becoming invalid.
  2. We could create invalidation writes within the 2nd sub-region, but at some point the invalidations of a specific address would come close enough together in time that some drives might be able to do invalidation in their write caches. This might be over estimating how much of an effect this would have but that is one reason we landed on the current approach.

I think if you can think of a way to address (1) and convince ourselves that (2) is not a significant issue, then we can look at this other approach.

For item 1 it would be straightforward to sprinkle invalidating writes throughout the entire region instead of waiting until near the end. However, this would likely exacerbate item 2 especially for the region with the the highest number of invalid blocks. I cannot think of a way to resolve item 2.

So I think this concern is sufficient to justify including the new ring buffer code. Please include this justification in the comment summarizing this feature.

Finally, please add some unit tests for the two-phase ring buffer. There are examples in https://github.com/axboe/fio/tree/master/unittests

@tomas-winkler-sndk tomas-winkler-sndk force-pushed the sprandom branch 2 times, most recently from d3f4176 to 43c5793 Compare August 11, 2025 20:50
@steven-sprouse-sndk
Copy link

@vincentkfu There are still a few commits we'd like to make before this is merged. Can you provide feedback on the fixes that @tomas-winkler-sndk checked in over the weekend?

double *validity_distribution = NULL;
double *blocks_ratio = NULL;
double *acc_ratio = NULL;
double acc;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what blocks_ratio and acc_ratio will be used for? It seems like they are involved in invalidating blocks from previous regions but it is not obvious how they are used.

@tomas-winkler-sndk tomas-winkler-sndk force-pushed the sprandom branch 6 times, most recently from 33550d1 to ed7e17a Compare August 18, 2025 18:57
Add sprandom command line options:
1. boolean sprandom:  enables the sprandom flow.
2. spr_num_regions: granularity of sprandom, defaults to 100
3. spr_op: over provisioning factor, defaults to 0.15

Signed-off-by: Tomas Winkler <[email protected]>
Add FD_SPRANDOM debug facility

Signed-off-by: Tomas Winkler <[email protected]>
An example demonstrating sprandom preconditioning:
Includes sample commands for basic execution, enabling debug output,
setting over-provisioning, and tuning region count for large devices.

Default job section:
[preconditioning]
sprandom=1
spr_op=0.15
spr_num_regions=100

Signed-off-by: Tomas Winkler <[email protected]>
Divide storage into equally sized regions and compute desired
invalidation percentage per region.

This model estimates the distribution of valid data across a flash drive
in a steady state. It is based on the key insight from Desnoyers'
research, which establishes a relationship between data validity
and the physical space it occupies.

This is based on:
P. Desnoyers, "Analytic Models of SSD Write Performance" paper
and SandDisk internal reaserch using Markov chain analysis to
model write amplification as a function of over-provisioning.

Signed-off-by: Tomas Winkler <[email protected]>
SPrandom targets large storage devices, hence use an LFSR generator
to ensure full storage coverage without repetion and without
extra memory.
Disable randommap since it prevents repeated writes to the same offset,
which is required for invalidation.

Signed-off-by: Tomas Winkler <[email protected]>
Function only converts bytes to a human-readable provided
string. It doesn't allocate memory and can be used directly
in printf("%s", bytes2str_simple())

Signed-off-by: Tomas Winkler <[email protected]>
Implements a circular buffer with staged (write-ahead) and committed
(read-visible) regions using dual head pointers. Data is written to a
staging area and becomes visible only upon explicit commit.

Signed-off-by: Tomas Winkler <[email protected]>
Add tests for basic functionality and wraparound behavior.

Signed-off-by: Tomas Winkler <[email protected]>
sprandom requires a random number generator
to randomly choose which offsets will be
rewritten.

Signed-off-by: Tomas Winkler <[email protected]>
Implement sprandom_get_next_offset(), which generates offsets for each
region using an LFSR. The function enforces an invalidation percentage
by randomly recycling a defined fraction of offsets back into the pool.

A two-phase cyclic buffer is used to manage this process:
one phase collects new offsets while the other serves recycled offsets.
When transitioning between regions, all stored offsets are exhausted
first, ensuring the target invalidation level is achived.

Signed-off-by: Tomas Winkler <[email protected]>
plug sprandom to file initialization
make sure that sprandom operates on one file

Signed-off-by: Tomas Winkler <[email protected]>
Invoke sprandom_get_next_offset() to generate offsets for sprandom
random-write operations.

Signed-off-by: Tomas Winkler <[email protected]>
@axboe axboe merged commit 5b6f59e into axboe:master Aug 22, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants