Is your feature request related to a problem or challenge? Please describe what you are trying to do.
BitChunks and the associated BitChunkIterator allow iterating over a bitmask in u64 sized blocks. Unfortunately this comes at the cost of unaligned reads, and non-trivial bit-shuffling for every u64 block. This is advantageous when alignment is important, for example when writing the data to another buffer, but is unnecessary when computing bit counts or set bit offsets.
Describe the solution you'd like
Add an UnalignedBitChunkIterator that iterates through already aligned u64 blocks, potentially with padding either side. SlicesIterator should then be updated to use this.