kmers

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 8, 2021 License: MIT Imports: 2 Imported by: 8

README

kmers

Go Reference

This package provides manipulations for bit-packed k-mers (k<=32, encoded in uint64).

Related projects:

Benchmark

CPU: AMD Ryzen 7 2700X Eight-Core Processor, 3.7 GHz

$ go test . -bench=Bench* -benchmem \
    | grep Bench \
    | perl -pe 's/\s\s+/\t/g' \
    | csvtk cut -Ht -f 1,3-5 \
    | csvtk add-header -t -n test,time,memory,allocs \
    | csvtk pretty -t -r

                                      test           time     memory        allocs
------------------------------------------   ------------   --------   -----------
                     BenchmarkEncodeK32-16    19.67 ns/op     0 B/op   0 allocs/op
       BenchmarkEncodeFromFormerKmerK32-16    7.692 ns/op     0 B/op   0 allocs/op
   BenchmarkMustEncodeFromFormerKmerK32-16    2.008 ns/op     0 B/op   0 allocs/op
                     BenchmarkDecodeK32-16    80.73 ns/op    32 B/op   1 allocs/op
                 BenchmarkMustDecodeK32-16    76.93 ns/op    32 B/op   1 allocs/op

                        BenchmarkRevK32-16    3.617 ns/op     0 B/op   0 allocs/op
                       BenchmarkCompK32-16   0.7999 ns/op     0 B/op   0 allocs/op
                    BenchmarkRevCompK32-16    3.814 ns/op     0 B/op   0 allocs/op
                   BenchmarkCannonalK32-16    4.147 ns/op     0 B/op   0 allocs/op

History

This package was originally maintained in unikmer.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrCodeOverflow = errors.New("kmers: code value overflow")

ErrCodeOverflow means the encode interger is bigger than 4^k.

View Source
var ErrIllegalBase = errors.New("kmers: illegal base")

ErrIllegalBase means that base beyond IUPAC symbols are detected.

View Source
var ErrKMismatch = errors.New("kmers: K mismatch")

ErrKMismatch means K size mismatch.

View Source
var ErrKOverflow = errors.New("kmers: k-mer size (1-32) overflow")

ErrKOverflow means K > 32.

View Source
var ErrNotConsecutiveKmers = errors.New("kmers: not consecutive k-mers")

ErrNotConsecutiveKmers means the two k-mers are not consecutive.

View Source
var MaxCode []uint64

MaxCode is the maxinum interger for all Ks.

Functions

func Canonical

func Canonical(code uint64, k int) uint64

Canonical returns code of its canonical kmer.

func Complement

func Complement(code uint64, k int) uint64

Complement returns code of complement sequence.

func Decode

func Decode(code uint64, k int) []byte

Decode converts the code to original seq

func Encode

func Encode(kmer []byte) (code uint64, err error)

Encode converts byte slice to bits.

Codes:

A    0b00
C    0b01
G    0b10
T    0b11

For degenerate bases, only the first base is kept.

M       AC     A
V       ACG    A
H       ACT    A
R       AG     A
D       AGT    A
W       AT     A
S       CG     C
B       CGT    C
Y       CT     C
K       GT     G
N       ACGT   A

func EncodeFromFormerKmer

func EncodeFromFormerKmer(kmer []byte, leftKmer []byte, leftCode uint64) (uint64, error)

EncodeFromFormerKmer encodes from the former k-mer, inspired by ntHash

func EncodeFromLatterKmer

func EncodeFromLatterKmer(kmer []byte, rightKmer []byte, rightCode uint64) (uint64, error)

EncodeFromLatterKmer encodes from the former k-mer.

func MustCanonical

func MustCanonical(code uint64, k int) uint64

MustCanonical is similar to Canonical, but does not check k.

func MustComplement

func MustComplement(code uint64, k int) uint64

MustComplement is similar to Complement, but does not check k.

func MustDecode

func MustDecode(code uint64, k int) []byte

MustDecode is similar to Decode, but does not check k and code.

func MustEncodeFromFormerKmer

func MustEncodeFromFormerKmer(kmer []byte, leftKmer []byte, leftCode uint64) (uint64, error)

MustEncodeFromFormerKmer encodes from former the k-mer, assuming the k-mer and leftKmer are both OK.

func MustEncodeFromLatterKmer

func MustEncodeFromLatterKmer(kmer []byte, rightKmer []byte, rightCode uint64) (uint64, error)

MustEncodeFromLatterKmer encodes from the latter k-mer, assuming the k-mer and rightKmer are both OK.

func MustRevComp

func MustRevComp(code uint64, k int) (c uint64)

MustRevComp is similar to RevComp, but does not check k.

func MustReverse

func MustReverse(code uint64, k int) (c uint64)

MustReverse is similar to Reverse, but does not check k.

func RevComp

func RevComp(code uint64, k int) (c uint64)

RevComp returns code of reverse complement sequence.

func Reverse

func Reverse(code uint64, k int) (c uint64)

Reverse returns code of the reversed sequence.

Types

type CodeSlice

type CodeSlice []uint64

CodeSlice is a slice of Kmer code (uint64), for sorting

func (CodeSlice) Len

func (codes CodeSlice) Len() int

Len return length of the slice

func (CodeSlice) Less

func (codes CodeSlice) Less(i, j int) bool

Less simply compare two KmerCode

func (CodeSlice) Swap

func (codes CodeSlice) Swap(i, j int)

Swap swaps two elements

type KmerCode

type KmerCode struct {
	Code uint64
	K    int
}

KmerCode is a struct representing a k-mer in 64-bits.

func NewKmerCode

func NewKmerCode(kmer []byte) (KmerCode, error)

NewKmerCode returns a new KmerCode struct from byte slice.

func NewKmerCodeFromFormerOne

func NewKmerCodeFromFormerOne(kmer []byte, leftKmer []byte, preKcode KmerCode) (KmerCode, error)

NewKmerCodeFromFormerOne computes KmerCode from the Former consecutive k-mer.

func NewKmerCodeMustFromFormerOne

func NewKmerCodeMustFromFormerOne(kmer []byte, leftKmer []byte, preKcode KmerCode) (KmerCode, error)

NewKmerCodeMustFromFormerOne computes KmerCode from the Former consecutive k-mer, assuming the k-mer and leftKmer are both OK.

func (KmerCode) BitsString

func (kcode KmerCode) BitsString() string

BitsString returns code to string

func (KmerCode) Bytes

func (kcode KmerCode) Bytes() []byte

Bytes returns k-mer in []byte.

func (KmerCode) Canonical

func (kcode KmerCode) Canonical() KmerCode

Canonical returns its canonical kmer

func (KmerCode) Comp

func (kcode KmerCode) Comp() KmerCode

Comp returns KmerCode of the complement sequence.

func (KmerCode) Equal

func (kcode KmerCode) Equal(kcode2 KmerCode) bool

Equal checks wether two KmerCodes are the same.

func (KmerCode) Rev

func (kcode KmerCode) Rev() KmerCode

Rev returns KmerCode of the reverse sequence.

func (KmerCode) RevComp

func (kcode KmerCode) RevComp() KmerCode

RevComp returns KmerCode of the reverse complement sequence.

func (KmerCode) String

func (kcode KmerCode) String() string

String returns k-mer in string

type KmerCodeSlice

type KmerCodeSlice []KmerCode

KmerCodeSlice is a slice of KmerCode, for sorting

func (KmerCodeSlice) Len

func (codes KmerCodeSlice) Len() int

Len return length of the slice

func (KmerCodeSlice) Less

func (codes KmerCodeSlice) Less(i, j int) bool

Less simply compare two KmerCode

func (KmerCodeSlice) Swap

func (codes KmerCodeSlice) Swap(i, j int)

Swap swaps two elements

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL