v1.0

Main

GStream is a method that combines SNP and CNV genotyping with unprecedented accuracy. This new method outperforms previous well-established SNP- and CNV-genotyping software for the Illumina platform.

With GStream, genetic researchers will have a fast and powerful tool to leverage Illumina genotyping microarrays and find new genetic variants and associations of interest.

GStream method and its corresponding software has been developed at the Grup de Recerca de Reumatologia (GRR) which is a research group from the Institut de Recerca de l'Hospital Universitari Vall d'Hebron (VHIR).

GStream software

A software to apply our genotyping method to a set of SNP probes can be freely downloaded from this link as precompiled binaries or as a source code. To compile the code you need a c++ compiler, the standard libraries, and both Armadillo and IT++ C++ libraries properly installed.

As an input, the code use a data file format that can be directly extracted from GenomeStudio projects (tab-separated columns):

NameChrPositionSamp1.XSamp1.YSamp2.XSamp2.Y...
rs21069407885608250.3350.3850.5780.010...
rs21069437248245080.0300.9150.5080.509...
rs210694916180320611.4890.1171.5490.207...
rs210696671177965421.0320.0630.6500.778...
.....................
GenomeStudio input

The basic usage is,
GStream --input <file>
The available options are,
--goutput <file>Defines the SNP genotypes output file (default=GStream.snp)
--cnvoutput <file>Defines the CNV scores output file (default=GStream.cnv)
--outliers <file>Defines the sample outliers file
--og Computes only SNP genotypes
--noz Turns off homozygous deletion detection (--og turns on automatically)
--gqt X Sets the quality score threshold for genotyping. X is a floating point number (default=0)
--stdcnp X Minimum distance (number of STDs) to call amplifications/deletions under the one component mode. X is a floating point number (default=8)
--res N Sets the number of BAF bins used for density estimation. N is an integer number (default=40)
The output files have the following structures:

SNP genotypes output file

SNPCHRBPQCSamp1Samp2...
rs1290538915200716730.87132...
rs1290939715200717650.81632...
rs1016310815201516100.88833...
The first three columns define the probe, followed with a fourth column with the SNP-genotyping global quality score. Subsequent columns account for the genotypes assigned to each sample (1="AA", 2="AB", 3="BB", 0="Homozygous deletion" and -1="Not called").

SNP genotypes quality scores output file

SNPSamp1Samp2...
rs129053890.9930.966...
rs129093970.9500.885...
rs101631080.9980.995...
The first column define the probe identifier followed with individual SNP quality scores of each sample genotype.

CNV scores output file

IDCHRBPNCMaxRatSamp1Samp2...
rs1290538915200716731/10.559/1.1811.912...
rs1290939715200717652/20.612/0.6791.7772...
rs101631081520151610-1/1-1/11.9152.141...
The first three columns define the probe, followed with a fourth column with the number of components found at each homozygote cluster. The fifth column account for the component rations and subsequent columns refer to the copy number scores of each sample.

GWAS-based CNV SCAN on human traits

In order to identify putative causal CNVs we have analyzed the linkage disequilibrium patterns between all the trait-associated SNPs reported by the catalog of published genome-wide association studies (NHGRI) and the CNV microarray markers detected over the HumanOmni1-Quad platform. Trait-associated SNP genotypes were extracted from the 1KGP data and CNV genotypes were called with GStream. All the HumanOmni1-Quad markers that presented a non-diploid frequency greater than 1% (CNV markers) were included in the analysis.

The following link provides access to regularly updated results of new CNV associations within known human risk loci identified with this method:

NEW CANDIDATE CNVs

Downloads

GStream for WindowsDownload
GStream sourceDownload
Visualization scriptDownload
Note: To compile the source you need both Armadillo and IT++ C++ libraries properly installed. After compiling, please edit the Makefile file and insert the adequate changes.

Example case

Running GStream

An example dataset can be downloaded from this link. For running GStream with the default parameters download it to the same directory than the executable gstream.exe and run:

gstream.exe --input GStream.txt

Visualizing the output

The output files created by GStream and the input file can be used by the script plotGTCN.py to visualize the results. Since all these files have the same prefix (“GStream”) we can use the option “–p GStream” to do that:

plotGTCN.py -p GStream -d figures -f 0

which will create a plot per probe within the ./figures directory. The remaining options are available in the script help:

plotGTCN.py -h
Usage: plotGTCN.py [options]
This program generates SNP and CNV genotype plots from GStream output data.
Options:
-hshow this help message and exit
-p PREFprefix for intensity, SNP, CNV and log files
-i FXYintensity file (required if prefix not provided)
-s FSNPSNP genotype file (required if prefix not provided)
-c FCNVCNV genotype file (required if prefix not provided)
-L FLOGGStream log file (required if prefix not provided)
-d DIRdirectory to save figures (required)
-f Flower frequency threshold for plotting SNP probe (optional)
-A TAscore threshold for amplifications (optional)
-D TDscore threshold for deletions (optional)