-
Notifications
You must be signed in to change notification settings - Fork 265
Description
First of all, I must thank all the MMseqs contributers for all the excellent documentation and support! The wiki has been extremely helpful.
Problem: Expected seqTaxDB .dmp files not created
I followed the instructions in the wiki for creating a seqTaxDB from an existing BLAST database. I am using the NR database.
The documentation says that the following files should be created:
seqTaxDB, seqTaxDB.index, seqTaxDB.dbtype, seqTaxDB.lookup, seqTaxDB_h, seqTaxDB_h.index, seqTaxDB_h.dbtype, seqTaxDB_mapping, seqTaxDB_nodes.dmp, seqTaxDB_names.dmp, seqTaxDB_merged.dmp
However, am missing several of these files. Here are the files created. Notably, all the .dmp files are not being created.
nrDB.dbtype nrDB.idx.1 nrDB.idx.2 nrDB.idx.7 nrDB.index
nrDB_h nrDB.idx.10 nrDB.idx.3 nrDB.idx.8 nrDB.lookup
nrDB_h.dbtype nrDB.idx.11 nrDB.idx.4 nrDB.idx.9 nrDB_mapping
nrDB_h.index nrDB.idx.12 nrDB.idx.5 nrDB.idx.dbtype nrDB.source
nrDB nrDB.idx.0 nrDB.idx.13 nrDB.idx.6 nrDB.idx.index nrDB_taxonomy
I know that this is at least a valid aminoacid database since I an search against it and get hits. However, I cannot use the taxonomyreport command on the results since it tells me that the result is an alignment database and not a taxonomy database. Similarily, when I run the taxonomyreport command with the nrDB as the result and seqTaxDB, it tells me that nrDB is an aminoacid database.
taxonomyreport ../nrDB ../nrDB report.html --report-mode 1
MMseqs Version: 6672bbc9de55e89b011c8a055982a2644d31a467
Report mode 1
Threads 20
Verbosity 3
Input database "../nrDB" has the wrong type (Aminoacid)
Allowed input:
- Taxonomy
I tried copying the .dmp files from the downloaded taxonomy into the same folder as my database, and renaming them to nrDB_merged.dmp, nrDB_names.dmp, and nrDB_nodes.dmp. My database is still not being recognized as a taxonomy database though.
createdb log file
createdb ../test/nr.fsa nrDB
MMseqs Version: 6672bbc9de55e89b011c8a055982a2644d31a467
Database type 0
Shuffle input database true
Createdb mode 0
Write lookup file 1
Offset of numeric ids 0
Compressed 0
Verbosity 3
Converting sequences
[=================================================================================================== 1 Mio. sequences processed
=================================================================================================== 340 Mio. sequences processed
==============================
Time for merging to nrDB_h: 0h 2m 37s 499ms
Time for merging to nrDB: 0h 3m 51s 292ms
Database type: Aminoacid
Time for processing: 0h 45m 44s 356ms
createtaxdb log file
createtaxdb nrDB tmp --ncbi-tax-dump ../test/taxonomy/ --tax-mapping-file ../test/nr.fsa.taxidmapping
MMseqs Version: 6672bbc9de55e89b011c8a055982a2644d31a467
NCBI tax dump directory ../test/taxonomy/
Taxonomy mapping file ../test/nr.fsa.taxidmapping
Taxonomy mapping mode 0
Taxonomy db mode 1
Threads 36
Verbosity 3
Loading nodes file ... Done, got 2304309 nodes
Loading merged file ... Done, added 61039 merged nodes.
Loading names file ... Done
Init RMQ ...Done
Thanks for taking the time to look at this!