This function execute starchip on a set of folders containing the output of starChimeric. It requires a specific bed generated with starChipIndex in the genome folder used by starChimeric

starchipCircle(
  group = c("sudo", "docker"),
  scratch.folder,
  genome.folder,
  samples.folder,
  reads.cutoff,
  min.subject.limit,
  threads,
  do.splice = c(TRUE, FALSE),
  cpm.cutoff = 0,
  subjectCPM.cutoff = 0,
  annotation = c(TRUE, FALSE)
)

Arguments

group,

a character string. Two options: sudo or docker, depending to which group the user belongs

scratch.folder,

a character string indicating the scratch folder where docker container will be mounted

genome.folder,

a character string indicating the folder where the indexed reference genome for STAR is located.

samples.folder,

the folder where are located all the folders of the samples processed with starChimeric

reads.cutoff,

Integer. Minimum number of reads crossing the circRNA backsplice required.

min.subject.limit,

Integer. Minimum number of individuals with readsCutoff reads required to carry forward a circRNA for analysis

threads,

Integer. Number of threads to use

do.splice,

true false. The splices within the circRNA be detected and reported. Linear splices are searched within each circRNA in each individual. Any linear splice with >= 60% of the read count of the cRNA is considered a splice within the circRNA. Two files are then created, .consensus with most common splice pattern, and .allvariants with all reported splice patterns.

cpm.cutoff,

Float. Reads counts are loaded into R and log2(CountsPerMillion) is calculated using the limma package. With cpmCutoff > 0, circRNAs with log2(CPM) below this value will be filtered from this analysis

subjectCPM.cutoff,

Integer. See above. This value is the lower limit for number of individuals required to have the circRNAs expressed at a value higher than cpmCutoff.

annotation,

true/false. circRNAs are provided with gene annotations

Value

1. Count matrices : raw cRNA backsplice counts: circRNA.cutoff[readthreshold]reads.[subjectthreshold]ind.countmatrix log2CPM of above: norm_log2_counts_circRNA.[readthreshold]reads.[subjectthreshold]ind.0cpm_0samples.txt Maximum Linear Splices at Circular Loci: rawdata/linear.[readthreshold]reads.[subjectthreshold]ind.sjmax 2. Info about each circRNA: Consensus Information about Internal Splicing: Circs[reads].[subjects].spliced.consensus Complete Gene Annotation: circRNA.[readthreshold]reads.[subjectthreshold]ind.annotated Consise Gene Annotation + Splice Type: circRNA.[readthreshold]reads.[subjectthreshold]ind.genes 3. Images: PCA plots: circRNA.[readthreshold]reads.[subjectthreshold]ind.0cpm_0samples_variance_PCA.pdf Heatmap: circRNA.[readthreshold]reads.[subjectthreshold]ind.heatmap.pdf

Author

Raffaele Calogero, raffaele.calogero [at] unito [dot] it, Bioinformatics and Genomics unit, University of Torino Italy

Examples

if (FALSE) {
    #downloading fastq files
    starchipCircle(group="docker", genome.folder="/data/genomes/hg38star", scratch.folder="/data/scratch",
                       samples.folder=getwd(), reads.cutoff=1, min.subject.limit=2, threads=8,
                       do.splice = TRUE, cpm.cutoff=0, subjectCPM.cutoff=0, annotation=TRUE)
}