How to adjust the searching window for positions near the end of the sequence

Hi, I'm a little confused about how to adjust the searching window for positions near the end of the sequence.

I know that these positions need to be kept for producing an equal number of strobemers as k-mers. But in sequence mapping/searching scenarios, unlike k-mers, the strobemers near the end of the sequences would be different from these in the reference sequence, because of the incomplete searching window.

In the function `seq_to_randstrobes2_iter`:

    window_p_start = p + strobe_w_min_offset if p + strobe_w_max_offset <= len(hash_seq_list) else max( (p + strobe_w_min_offset) -  (p + strobe_w_max_offset - len(hash_seq_list)), p )
    window_p_end = min(p + strobe_w_max_offset, len(hash_seq_list))

For positions near the end of the sequence (`p + strobe_w_max_offset > len(hash_seq_list)`),

    max( (p + strobe_w_min_offset) -  (p + strobe_w_max_offset - len(hash_seq_list)), p )

equals to 

    max( len(hash_seq_list) - (strobe_w_max_offset - strobe_w_in_offset), p)

As I understand, it *keeps the size of the searching window and moves the window to the left* (box A in the figure below), am I right? Have you tried the way in box B?

![Screenshot_20210413_230241](https://user-images.githubusercontent.com/2655946/114575092-7075ac00-9cac-11eb-9336-065682e3ae15.png)

Besides, for order 3, the windows of `m2` and `m3` would have some duplicated regions, is this OK?

![Screenshot_20210413_230706](https://user-images.githubusercontent.com/2655946/114575671-fc87d380-9cac-11eb-8252-6d6f1db3925b.png)





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to adjust the searching window for positions near the end of the sequence #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to adjust the searching window for positions near the end of the sequence #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions