dbcanlight
Dbcanlight is a lightweight rewrite of a widely used CAZyme annotation tool run_dbcan. It uses pyhmmer, a Cython binding to HMMER3, in place of the HMMER3 CLI suite as the backend for search processes, improving multithreading performance. In addition, it removes a limitation in run_dbcan that required manual splitting of large sequence files beforehand.
The main program dbcanlight comprises three modules - build, search and conclude. The build module help to download the
required databases from dbCAN website; the search module searches against protein HMM, substrate HMM or diamond databases and
reports the hits separately; and the conclude module gathers all the results made by each module and provides a summary.
We benchmarked dbcanlight with a protein fasta with 14,574 sequences. 3 rounds of test were run on cazyme and substrate detection
mode (--tools hmmer dbcansub in run_dbcan and -m cazyme and -m sub in dbcanlight). The performance tests show that the
dbcanlight is approximately 3X faster than run_dbcan with acceptable 2 GB of RAM usage.
If you’re interested in dbcanlight, please refer to the GitHub page for more details.
