Snakemake

Snakemake is a Python based workflow management system which aim to create a reproducible and scalable data processing pipeline. In the workflow, there are rules defined how to create outputs from inputs. When running the snakemake, it will automatically determine what rules should be run in order to obtain a given target output(s).

Building a snakemake workflow is very different, and even more challenging than the traditional method. In the past, we usually construct the workflow with a series of scripts and run it step by step. Therefore, we can easily connect the workflow by assigning the output of the current step as the input for the next step. However, the snakemake isn’t work like that. It somewhat works with a back-tracing manner - decide the final target first and look for rule that can create these outputs, and keep going upward until it reach the input(s) we currently have. Developer should also think backward when defining the rules. (i.e. defining the input name from the output) In other words, the snakemake workflow relies heavily on the file name and used it to determine what rules should be used. Since the workflow/rule order was decided automatically, developer also need to be cautious when defining the rules. Ambiguity will cause the failure of the workflow. Implicit defined rules will even connect the workflow in an unexpected way and usually produce false output.

Currently we are developing several workflows that can be used for: