cdhit.Rd
This function executes a ubuntu docker that cluster minION sequences using CD-HIT
cdhit( group = c("sudo", "docker"), scratch.folder, data.folder, identity.threshold = 0.9, memory.limit = 30000, threads = 0, word.length = 7 )
group, | a character string. Two options: sudo or docker, depending to which group the user belongs |
---|---|
scratch.folder, | a character string indicating the path of the scratch folder |
data.folder, | a character string indicating the folder where input data are located and where output will be written |
identity.threshold, | sequence identity threshold, default 0.9, this is the default cd-hit's global sequence identity calculated as: number of identical bases in alignment divided by the full length of the shorter sequence |
memory.limit, | memory limit in MB for the program, default 30000. 0 for unlimitted |
threads, | number of threads, default 0; with 0, all CPUs will be used |
word.length, | 7 for thresholds between 0.88 and 0.9 for other option see user manual cdhit |
Returns two files: a fasta file of representative sequences and a text file of list of clusters
if (FALSE) { #running fastq2fasta cdhit(group="docker", scratch.folder="/data/scratch", data.folder=getwd(), identity.threshold=0.90, memory.limit=8000, threads=0, word.length=7) }