This function executes the docker container annotate.1, where refGenome is used to annotate a single cell counts table with ensembl gene ids on first column using ENSEMBL GTF annotation

scannobyGtf(
  group = c("docker", "sudo"),
  file,
  gtf.name,
  biotype = NULL,
  mt = c(TRUE, FALSE),
  ribo.proteins = c(TRUE, FALSE),
  umiXgene = 3,
  riboStart.percentage = 20,
  riboEnd.percentage = 70,
  mitoStart.percentage = 1,
  mitoEnd.percentage = 100,
  thresholdGenes = 250
)

Arguments

group,

a character string. Two options: "sudo" or "docker", depending to which group the user belongs

file,

a character string indicating the folder where input data are located and where output will be written and matrix name "/bin/users/matrix.csv". The system recognize automatically csv as comma separated files and txt as tab separated file

gtf.name,

a character string indicating the ENSEMBL gtf file

biotype,

a character string the biotypes of interest

mt,

a boolean to define if mitocondrial genes have to be removed, FALSE mean that mt genes are removed

ribo.proteins,

a boolean to define if ribosomal proteins have to be removed, FALSE mean that ribosomal proteins (gene names starting with rpl or rps) are removed

umiXgene,

a integer defining how many UMI are required to call a gene as present. default: 3

riboStart.percentage,

start range for ribosomal percentage, cells within the range are kept

riboEnd.percentage,

end range for ribosomal percentagem cells within the range are kept

mitoStart.percentage,

start range for mitochondrial percentage, cells within the range are retained

mitoEnd.percentage,

end range for mitochondrial percentage, cells within the range are retained

thresholdGenes,

parameter to filter cells according to the number og significative genes expressed

Value

one file: annotated_counts table, where ensembl ids are linked to gene symbols and a PDF showing the effect of ribo and mito genes removal. Filtered_annotated annotated counts table with only cells and genes given by filtering thresholds. A pdf showing the effect of genes counts of the filtering and a filteredStatistics.txt indicating how many cell and genes were filtered out

Author

Raffaele Calogero, Luca Alessandri

Examples

if (FALSE) {
        system("wget http://130.192.119.59/public/testSCumi_mm10.csv.zip")
     library(rCASC)
     system("unzip testSCumi_mm10.csv.zip")
     #filtering low quality cells
     lorenzFilter(group="docker",scratch.folder="/data/scratch/",
                  file=paste(getwd(),"testSCumi_mm10.csv",sep="/"),
                  p_value=0.05,separator=',')
     #running annotation and removal of mit and ribo proteins genes
     #download mouse GTF for mm10
     system("wget ftp://ftp.ensembl.org/pub/release-92/gtf/mus_musculus/Mus_musculus.GRCm38.92.gtf.gz")
     system("gunzip Mus_musculus.GRCm38.92.gtf.gz")
     scannobyGtf(group="docker", file=paste(getwd(),"testSCumi_mm10.csv",sep="/"),
                  gtf.name="Mus_musculus.GRCm38.94.gtf", biotype="protein_coding", 
                  mt=TRUE, ribo.proteins=TRUE, umiXgene=3, riboStart.percentage=0, 
                  riboEnd.percentage=100, mitoStart.percentage=0, mitoEnd.percentage=100, thresholdGenes=100)
}