This function executes the docker container bwa1 where BWA is installed. The index is created using GATK bundle data genome fasta file. User needs to dowload the file in the genome folder from ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle

bwaIndexUcsc(
  group = c("sudo", "docker"),
  genome.folder = getwd(),
  uscs.urlgenome = NULL,
  dbsnp.file = NULL,
  g1000.file = NULL,
  gatk = FALSE
)

Arguments

group,

a character string. Two options: "sudo" or "docker", depending to which group the user belongs

genome.folder,

a character string indicating the folder where the indexed reference genome for bwa will be located

uscs.urlgenome,

a character string indicating the URL from uscs download web page for the unmasked genome sequence of interest

dbsnp.file,

a character string indicating the name of dbSNP vcf located in the genome folder. The dbSNP vcf, dbsnp_138.b37.vcf.gz and dbsnp_138.hg19.vcf.idx.gz, can be downloaded from ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37

g1000.file,

a character string indicating the name of 1000 genome vcf located in the genome folder. The 1000 genomes vcf, Mills_and_1000G_gold_standard.indels.b37.vcf.gz and Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.idx.gz, can be downloaded from ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/

gatk,

a boolean TRUE and FALSE that indicate if the index will be used for GATK analysis

Value

The indexed bwa genome reference sequence

Author

Raffaele Calogero

Examples

if (FALSE) {
    #running bwa index
    bwaIndexUcsc(group="sudo",genome.folder="data/genomes/hg19_bwa", uscs.urlgenome=
    "http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz",
    dbsnp.file="dbsnp_138.hg19.vcf.gz", g1000.file="Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz",
    gatk=TRUE)

    #running bwa index
    bwaIndexUcsc(group="sudo",genome.folder="/data/genomes/mm10bwa", uscs.urlgenome=
    "http://hgdownload.cse.ucsc.edu/goldenPath/mm10/bigZips/chromFa.tar.gz",
    gatk=FALSE)


}