This function executes the docker container bwa1 where BWA is installed. Optionally, the index can be created also for GATK bundle data genome fasta file.

bwaIndex(
  group = c("sudo", "docker"),
  genome.folder = getwd(),
  genome.url = NULL,
  gtf.url = NULL,
  dbsnp.file = NULL,
  g1000.file = NULL,
  mode = c("General", "GATK", "miRNA", "ncRNA"),
  mb.url.haripin,
  mb.url.mature,
  mb.species = NULL,
  rc.version = NULL,
  rc.species = NULL,
  length = NULL
)

Arguments

group,

a character string. Two options: "sudo" or "docker", depending to which group the user belongs

genome.folder,

a character string indicating the folder where the indexed reference genome for bwa will be located

genome.url,

a character string indicating the URL from download web page for the genome sequence of interest

gtf.url,

a character string indicating the URL from ENSEMBL ftp for the GTF for genome of interest

dbsnp.file,

a character string indicating the name of dbSNP vcf located in the genome folder. The dbSNP vcf, dbsnp_138.b37.vcf.gz and dbsnp_138.hg19.vcf.idx.gz, can be downloaded from ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37

g1000.file,

a character string indicating the name of 1000 genome vcf located in the genome folder. The 1000 genomes vcf, Mills_and_1000G_gold_standard.indels.b37.vcf.gz and Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.idx.gz, can be downloaded from ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/

mode,

a character string indicating the required type of analysis. Compatible analyses mode are "General", "GATK", "miRNA", and "ncRNA". In "General" mode the url of any online fasta file ("genome.url" argument) can be provided and indexed, only canonical cromosopmes are kept see id.fa after end of indexing. In the GATK analysis mode, the list of variants from dbsnp ("dbsnp.file" argument) and g1000 ("dbsnp.file" argument) are required in addition to the url of the genome fasta ("genome.url" argument). In "miRNA" analysis mode, the version ("mb.version" argument) and species prefix ("mb.species" argument) of miRBase are required. In "ncRNA" analysis mode, the version ("rc.version" argument) and species prefix ("rc.species" argument) of RNA Central are required. This mode require also a desidered maximum length of the studied RNA annotations ("length" argument).

mb.url.haripin,

character string indicating the link to the hairpin miRNA sequences miRBase database. Visit http://www.mirbase.org to select the proper version number.

mb.url.mature,

a character string indicating the link to the mature miRNA sequences from miRBase database. Visit http://www.mirbase.org to select the proper version number.

mb.species,

a character string indicating the name of a species annotated in miRBase (e.g. "hsa" for human miRNAs). Please refer to http://www.mirbase.org/help/genome_summary.shtml to proper species name.

rc.version,

a character string indicating the required version of RNA Central database. Visit ftp://ftp.ebi.ac.uk/pub/databases/RNAcentral/releases/ to select the proper version number.

rc.species,

a character string indicating the name of a species annotated in RNA Central (e.g. "Homo sapiens" for human ncRNAs). Please refer to NCBI taxonomy annotations at https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi to proper species name.

length,

an integer corresponding on the length threshold selected to define the ncRNA reference from RNA Central.

Value

The indexed bwa reference sequence

Author

Giulio Ferrero

Examples