cdhit.Rd
This function executes a ubuntu docker that cluster minION sequences using CD-HIT
cdhit(
group = c("sudo", "docker"),
scratch.folder,
data.folder,
identity.threshold = 0.9,
memory.limit = 30000,
threads = 0,
word.length = 7
)
a character string. Two options: sudo or docker, depending to which group the user belongs
a character string indicating the path of the scratch folder
a character string indicating the folder where input data are located and where output will be written
sequence identity threshold, default 0.9, this is the default cd-hit's global sequence identity calculated as: number of identical bases in alignment divided by the full length of the shorter sequence
memory limit in MB for the program, default 30000. 0 for unlimitted
number of threads, default 0; with 0, all CPUs will be used
7 for thresholds between 0.88 and 0.9 for other option see user manual cdhit
Returns two files: a fasta file of representative sequences and a text file of list of clusters
if (FALSE) {
#running fastq2fasta
cdhit(group="docker", scratch.folder="/data/scratch", data.folder=getwd(), identity.threshold=0.90, memory.limit=8000, threads=0, word.length=7)
}