Skip to content

Segmentation fault when clustering MERC using easy-linclust #323

@arglog

Description

@arglog

I tried to run mmseqs easy-linclust on the MERC dataset (from http://gwdu111.gwdg.de/~compbiol/plass/2018_08/) but got a segmentation fault.

Expected Behavior

Normal output of mmseqs easy-linclust

Current Behavior

Got Segmentation fault in the middle

Steps to Reproduce (for bugs)

> wget http://gwdu111.gwdg.de/~compbiol/plass/2018_08/MERC.fasta.gz
> mmseqs easy-linclust MERC.fasta.gz MERC /export/tmp/MERC -c 0.9 --cov-mode 1 --cluster-mode 2 --min-seq-id 0.5 --split-memory-limit 500G 

MMseqs Output (for bugs)

Tmp /export/tmp/MERC folder does not exist or is not a directory.
createdb ../MERC.fasta.gz /export/tmp/MERC/4233864688410091672/input --dbtype 0 --shuffle 1 --createdb-mode 1 --write-lookup 0 --id-offset 0 --compressed 0 -v 3 
Shuffle database cannot be combined with --createdb-mode 0
We recompute with --shuffle 0
Converting sequences
=================================================================================================== 292 Mio. sequences processed
=============
Time for merging to input_h: 0h 0m 40s 64ms
Time for merging to input: 0h 0m 40s 130ms
Database type: Aminoacid
Time for processing: 0h 12m 9s 179ms
Tmp /export/tmp/MERC/4233864688410091672/clu_tmp folder does not exist or is not a directory.
kmermatcher /export/tmp/MERC/4233864688410091672/input /export/tmp/MERC/4233864688410091672/clu_tmp/16445679162920043634/pref --sub-mat nucl:nucleotide.out,aa:blosum62.out --alph-size nucl:5,aa:13 --min-seq-id 0.5 --kmer-per-seq 21 --spaced-kmer-mode 0 --kmer-per-seq-scale nucl:0.200,aa:0.000 --adjust-kmer-len 0 --mask 0 --mask-lower-case 0 --cov-mode 1 -k 0 -c 0.9 --max-seq-len 65535 --hash-shift 67 --split-memory-limit 500G --include-only-extendable 0 --ignore-multi-kmer 0 --threads 96 --compressed 0 -v 3 
kmermatcher /export/tmp/MERC/4233864688410091672/input /export/tmp/MERC/4233864688410091672/clu_tmp/16445679162920043634/pref --sub-mat nucl:nucleotide.out,aa:blosum62.out --alph-size nucl:5,aa:13 --min-seq-id 0.5 --kmer-per-seq 21 --spaced-kmer-mode 0 --kmer-per-seq-scale nucl:0.200,aa:0.000 --adjust-kmer-len 0 --mask 0 --mask-lower-case 0 --cov-mode 1 -k 0 -c 0.9 --max-seq-len 65535 --hash-shift 67 --split-memory-limit 500G --include-only-extendable 0 --ignore-multi-kmer 0 --threads 96 --compressed 0 -v 3 
Database size: 292137902 type: Aminoacid
Reduced amino acid alphabet: (A S T) (C) (D B N) (E Q Z) (F Y) (G) (H) (I V) (K R) (L J M) (P) (W) (X) 
Generate k-mers list for 1 split
[=================================================================] 292.14M 36s 571ms
Sort kmer 0h 0m 3s 87ms
Sort by rep. sequence 0h 0m 2s 827ms
Time for fill: 0h 0m 16s 310ms
Time for merging to pref: 0h 0m 58s 394ms
Time for processing: 0h 3m 54s 379ms
rescorediagonal /export/tmp/MERC/4233864688410091672/input /export/tmp/MERC/4233864688410091672/input /export/tmp/MERC/4233864688410091672/clu_tmp/16445679162920043634/pref /export/tmp/MERC/4233864688410091672/clu_tmp/16445679162920043634/pref_rescore1 --sub-mat nucl:nucleotide.out,aa:blosum62.out --rescore-mode 0 --wrapped-scoring 0 --filter-hits 0 -e 0.001 -c 0.9 -a 0 --cov-mode 1 --min-seq-id 0.5 --min-aln-len 0 --seq-id-mode 0 --add-self-matches 0 --sort-results 0 --db-load-mode 0 --threads 96 --compressed 0 -v 3 
[=================================================================] 292.14M 2m 8s 805ms
Time for merging to pref_rescore1: 0h 2m 40s 361ms
Time for processing: 0h 5m 54s 815ms
clust /export/tmp/MERC/4233864688410091672/input /export/tmp/MERC/4233864688410091672/clu_tmp/16445679162920043634/pref_rescore1 /export/tmp/MERC/4233864688410091672/clu_tmp/16445679162920043634/pre_clust --cluster-mode 2 --max-iterations 1000 --similarity-type 2 --threads 96 --compressed 0 -v 3 
Clustering mode: Greedy
Total time: 0h 1m 7s 208ms
Size of the sequence database: 292137902
Size of the alignment database: 292137902
Number of clusters: 245753321
Writing results 0h 1m 30s 550ms
Time for merging to pre_clust: 0h 1m 31s 28ms
Time for processing: 0h 5m 19s 116ms
createsubdb /export/tmp/MERC/4233864688410091672/clu_tmp/16445679162920043634/order_redundancy /export/tmp/MERC/4233864688410091672/input /export/tmp/MERC/4233864688410091672/clu_tmp/16445679162920043634/input_step_redundancy -v 3 --subdb-mode 1 
Time for merging to input_step_redundancy: 0h 0m 34s 71ms
Time for processing: 0h 1m 29s 221ms
createsubdb /export/tmp/MERC/4233864688410091672/clu_tmp/16445679162920043634/order_redundancy /export/tmp/MERC/4233864688410091672/clu_tmp/16445679162920043634/pref /export/tmp/MERC/4233864688410091672/clu_tmp/16445679162920043634/pref_filter1 -v 3 --subdb-mode 1 
Time for merging to pref_filter1: 0h 0m 45s 806ms
Time for processing: 0h 1m 48s 52ms
filterdb /export/tmp/MERC/4233864688410091672/clu_tmp/16445679162920043634/pref_filter1 /export/tmp/MERC/4233864688410091672/clu_tmp/16445679162920043634/pref_filter2 --filter-file /export/tmp/MERC/4233864688410091672/clu_tmp/16445679162920043634/order_redundancy --threads 96 --compressed 0 -v 3 
Filtering using file(s)
[=================================================================] 245.75M 2m 9s 682ms
Time for merging to pref_filter2: 0h 2m 9s 511ms
Time for processing: 0h 6m 15s 7ms
Segmentation fault (core dumped)
Error: Ungapped alignment step died
Error: Search died

Context

Your Environment

Include as many relevant details about the environment you experienced the bug in.

  • Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters): dc054792d1b1d091380638a712ee7566aba2bb38
  • Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.): self-compiled
  • For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation: cmake 3.10.2
  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
  • Operating system and version: Ubuntu 18.04

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions