Software

We have developed some packages included in Bioconductor and the R project in collaboration with other researches from different institutions. Some of these libraries are related to genomics and other ones to survival analysis with recurrent events. Most of the packages are available at [Bioconductor](https://www.bioconductor.org/) or [CRAN](https://cran.r-project.org/). The development version of our packages are available at our GitHub repository [BRGE](https://github.com/isglobal-brge).

## Genetics

### Package MultiDataSet

This package is created for combining several different sources of information (including different omic data-sets) into a single convenient structure. A `MultiDataSet` can be manipulated (e.g., subsetted, copied, …) conveniently, and can be the input or output from some Bioconductor packages designed to integrate multi-omic data. This is a joint work with Carlos Ruiz (ISGlobal) and Carles Hernandez-Ferrer (ISGlobal).

The package can be installed through Bioconductor:

[generic]
source(“https://bioconductor.org/biocLite.R”)
biocLite(“MultiDataSet”)
[/generic]

The vignette `MultiDataSet` can be found here: [`MultiDataSet`](https://bioconductor.org/packages/release/bioc/html/MultiDataSet.html).

### Package MEAL

This package contains a set of tools to analyze and visualize methylation and gene expression data. This is a work of Carlos Ruiz (ISGlobal).

The package can be installed through Bioconductor:

[generic]
source(“https://bioconductor.org/biocLite.R”)
biocLite(“MEAL”)
[/generic]

The vignette for MEAL can be found here: [`MEAL`](https://bioconductor.org/packages/release/bioc/html/MEAL.html).

### Package rasp (under development)

This is a joint work with [Roderic Guigó’s group – Bioinformatics and Genomics program](http://www.crg.eu/roderic_guigo), Center for Genomic Regulatio (CRG). This R package is designed to compare transcript relative expression of different conditions obtained from RNA-seq experiments. Our approach is based on a distance-based non-parametric multivariate ANOVA method.

The Linux version of the package is currently under development. To install a beta version of rasp, start R and enter:

[generic]
library(devtools)
install_github(“isglobal-brge/rasp”)
[/generic]

The performance of our approach has been compared with two other existing R packages (`DEXseq` and `EBseq`) using data from The Cancer Genome Atlas (TCGA). Exom abundances from RNA-seq data were obtained for several individuals diagnosed with Liver hepatocellular carcinoma [LIHC] and Bladder Urothelial Carcinoma [BLCA]. We have created two experimental data packages (ExonCountDataLIHC and ExonCountDataBLCA, respectively). So far, they can be installed by starting R and entering:

[generic]
source(“http://www.creal.cat/media/upload/arxius/jr/CREAL_install/install.R”)
creal.install(“ExonCountDataLIHC”)
creal.install(“ExonCountDataBLCA”)
[/generic]

### Package invClust

Joint with Alejandro Cáceres (ISGlobal), we have developed a method that can be applied to common GWAS for calling the inversion genotypes, which accounts for population stratification when an appropriate reference population is not known. This method is extremely useful when performing inversion association studies in a GWAS context were population stratification can be present. To install `invClust`, start R and enter:

[generic]
source(“http://www.creal.cat/media/upload/arxius/jr/CREAL_install/install.R”)
creal.install(“invClust”)
[/generic]

If you expermiment some problem during this proces, the source code of the package can be downlaod from [here](http://www.creal.cat/media/upload/arxius/jr/inversions/invClust_1.0.tar.gz).

The methods are described in the manuscript:

> Cáceres and González, J. R. (2015) Following the footprints of polymorphic inversions on SNP data: from detection to association tests. NAR doi:10.1093. Free available [here](http://nar.oxfordjournals.org/content/early/2015/02/05/nar.gkv073.full.pdf).

We have created a [vignette](http://www.creal.cat/media/upload/arxius/jr/inversions/invClust.pdf) that illustrates how to analyze real data.

### Package tweeDEseq

`tweeDEseq` is an R package for analyzing RNAseq count data. It implements Poisson-Tweedie family of distributions to model count data distribution. This family includes Poisson and Negative Bionomial as particular cases. The testPT test is used to detect genes that are differentially expressed (DE).

The methods are described in the manuscript

> Esnaola M, Puig P, Gonzalez D, Castelo R, Gonzalez JR. A flexible count data model to fit the wide diversity of expression profiles arising from extensively replicated RNA-seq experiments. BMC Bioinformatics 2013, 14:254. Free available [here](http://www.biomedcentral.com/1471-2105/14/254/abstract).

The manuscript illustrates the performance of our proposed method using a real RNA-seq data set comprising 69 Nigerian. We have created an experimental data pacakge (tweeDEseqCountData) that is available at [Bioconductor](https://bioconductor.org/packages/release/data/experiment/html/tweeDEseqCountData.html).

tweeDEseq is available from [Bioconductor](https://bioconductor.org/packages/release/bioc/html/tweeDEseq.html).

### Package inveRsion

`inveRsion` is an R package for the detection of genetic inversions using SNP-array data. This is a joint collaboration with Alejandro Caceres (ISGlobal) and Suzanne Sindi (Center for Computational Molecular Biology, Brown University). `inveRsion` is available at [Bioconductor](https://bioconductor.org/packages/release/bioc/html/inveRsion.html), and a manuscript is published at BMC Bioinformatics.

> Cáceres A, Sindi SS, Raphael BJ, Cáceres M, González JR. Identification of polymorphic inversions from genotypes. BMC Bioinformatics. 2012 Feb 9;13:28.

Our aim is to use SNP-array data of large cohorts, for which phenotype information has been collected, to assess the association of inversions with disease. We also intent to use the tool to assist in the mapping of human inversions; a project headed by Mario Caceres (Universitat Autonoma de Barcelona).

### MAD (Mosaic Alteration Detector)

This is a joint work with Benjamin Rodriguez-Santiago (qGenomics) and Luis Pérez-Jurado (UPF). `MAD` is a software tool to detect mosaic events from SNP arrays using *BAF* and *LRR* values. The algorithm is based on a segmentation procedure which uses the main features of `GADA` (and R package to detect CNVs). To install MAD, start R and enter:

[generic]
source(“http://www.creal.cat/media/upload/arxius/jr/CREAL_install/install.R”)
creal.install(“mad”)
[/generic]

The methodological paper has been accepted in BMC Bioinformatics and can be found [here](http://www.biomedcentral.com/1471-2105/12/166). An example about how to use the software is described in the [vignette](http://www.creal.cat/media/upload/arxius/jr/GADA/mad_vignette.pdf). The algorithm has been used to discover mosaic alterations in a large collaborative study:

> Jacobs KB, Yeager M, Zhou W, … Gonzalez JR, … Rothman N, Pérez-Jurado LA, Chanock SJ. Detectable clonal mosaicism and its relationship to aging and cancer. Nat Genet. 2012 May 6;44(6):651-8