Hi, I'm a little confused about how to adjust the searching window for positions near the end of the sequence.
I know that these positions need to be kept for producing an equal number of strobemers as k-mers. But in sequence mapping/searching scenarios, unlike k-mers, the strobemers near the end of the sequences would be different from these in the reference sequence, because of the incomplete searching window.
In the function seq_to_randstrobes2_iter:
window_p_start = p + strobe_w_min_offset if p + strobe_w_max_offset <= len(hash_seq_list) else max( (p + strobe_w_min_offset) - (p + strobe_w_max_offset - len(hash_seq_list)), p )
window_p_end = min(p + strobe_w_max_offset, len(hash_seq_list))
For positions near the end of the sequence (p + strobe_w_max_offset > len(hash_seq_list)),
max( (p + strobe_w_min_offset) - (p + strobe_w_max_offset - len(hash_seq_list)), p )
equals to
max( len(hash_seq_list) - (strobe_w_max_offset - strobe_w_in_offset), p)
As I understand, it keeps the size of the searching window and moves the window to the left (box A in the figure below), am I right? Have you tried the way in box B?

Besides, for order 3, the windows of m2 and m3 would have some duplicated regions, is this OK?

Hi, I'm a little confused about how to adjust the searching window for positions near the end of the sequence.
I know that these positions need to be kept for producing an equal number of strobemers as k-mers. But in sequence mapping/searching scenarios, unlike k-mers, the strobemers near the end of the sequences would be different from these in the reference sequence, because of the incomplete searching window.
In the function
seq_to_randstrobes2_iter:For positions near the end of the sequence (
p + strobe_w_max_offset > len(hash_seq_list)),equals to
As I understand, it keeps the size of the searching window and moves the window to the left (box A in the figure below), am I right? Have you tried the way in box B?
Besides, for order 3, the windows of
m2andm3would have some duplicated regions, is this OK?