-
Notifications
You must be signed in to change notification settings - Fork 265
Description
Expected Behavior
mmseqs unpackdb command output database files from cluster result successfully.
Current Behavior
mmseqs unpackdb command stuck for database file contains '/'.
<>:"/ | ?* is not allowed in file path of Windows(all) and Linux(only /) when created separated database files with sequence names. Therefore, these symbols should be substituted with others.
Steps to Reproduce (for bugs)
mmseqs cluster IPR/gfp IPR/cluster IPR/tmp
mmseqs createseqfiledb IPR/gfp IPR/cluster IPR/cluster_seq
mmseqs result2flat IPR/gfp IPR/gfp IPR/cluster_seq clu_seq.fasta
mmseqs createdb clu_seq.fasta clu_result
mkdir IPR/cluster_out
trash-put IPR/cluster_out/*
mmseqs unpackdb clu_result IPR/cluster_out/
MMseqs Output (for bugs)
(xxx) yyy@zzz:~/aaa/shell/predo$ mmseqs unpackdb clu_result IPR/cluster_out/
unpackdb clu_result IPR/cluster_out/
MMseqs Version: 15ace29a276be54fee6b9aedd7a1e814a3c7769b
Verbosity 3
Could not open IPR/cluster_out/Q04901|unreviewed|Entactin/nidogen|taxID:7729 for writing!
Context
Providing context helps us come up with a solution and improve our documentation for the future.
Your Environment
Include as many relevant details about the environment you experienced the bug in.
- Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters): 15ace29
- Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.):
- For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation: GNU Make 4.1/cmake version 3.10.2
- Server specifications (especially CPU support for AVX2/SSE and amount of system memory): Intel(R) Xeon(R) Gold 6230R CPU 256GB
- Operating system and version: Ubuntu 18.04.1 LTS
Solution
At file src/util/unpack.cpp, substitute these forbidden symbols with others.
As '|' is frequently appeared in sequence name as:
A0A348AT68|unreviewed|Fluorescent
We change '|' to '!':
A0A348AT68!unreviewed!Fluorescent
Other symbols are changed to '@':
W5UC41|unreviewed|Nidogen-1|taxID/7998->W5UC41!unreviewed!Nidogen-1!taxID@7998
Then everything is ok!
(xxx) yyy@zzz~/aaa/shell/predo$ mmseqs unpackdb clu_result IPR/cluster_out/
unpackdb clu_result IPR/cluster_out/
MMseqs Version: 15ace29a276be54fee6b9aedd7a1e814a3c7769b
Verbosity 3
[=================================================================] 100.00% 3.19K 0s 81ms
Time for processing: 0h 0m 0s 90ms
I have fixed it and I will soon write a pull request, so do not worry about it!
pr #467