CNstream tutorial

 

  1.CNstream Input Files

  2.Running CNstream

  3.CNstream parameters

  4.CNstream Output Files

 

1.      CNstream Input Files

 

Input files for the CNstream R-package are all in text format. Only one intensity file is required for running CNstream and it can be easily extracted from the Illumina software GenomeStudio. CNstream accept also two optional input files that specify the status of the samples (Case/Control) and the plate number where each sample was genotyped.

 

1.1.  CNstream input signal intensity file

 

Example file

 

The input signal intensity file is a text file that contains information about each probe (name, chromosome and basepair) together with the channel X and Y intensities of each sample. Each line corresponds to a microarray probe and they must be sorted by chromosome and basepair position.

 

Name

Chr

Position

SAMPLE1.X

SAMPLE1.Y

SAMPLE2.X

SAMPLE2.Y

SAMPLE3.X

SAMPLE3.Y

rs1545536

8

144714312

0,02148205

0,6833881

0,2203226

0,3893139

0,4117542

0,4549562

rs10099003

8

144716603

1,52746

1,342773

2,320188

0,08941685

2,463521

0,08725205

rs896946

8

144719785

0,02898269

2,140413

0,06641929

1,785213

0,04940656

2,085741

rs11775744

8

144720335

0,01787181

0,4492684

0,02162173

0,249042

0,01381066

0,3286062

rs1809148

8

144734218

0,7682863

0,6901037

0,5261341

0,4613593

0,02988189

1,034436

rs2123758

8

144734804

0,5999851

0,6547657

0,3712394

0,4563591

1,006618

0,114436

rs3793371

8

144735442

0,0316033

1,599119

0,0900638

1,305015

0,07485583

1,708853

rs3793368

8

144735737

1,244978

0,0511517

1,1884

0,04798501

1,241068

0,04858789

rs2382962

8

144740535

1,299303

0,06504522

0,5841687

0,6120613

1,179762

0,02839759

rs4874159

8

144742093

1,517084

0,02558302

0,5202222

0,9074748

1,531161

0,05225999

 

 

 

 

 

 

 

 

 

 

 

 

This file can be easily extracted from GenomeStudio once we have created the project and all the samples that we want to analyze have been processed and included in it. The GenomeStudio project window should look like below and, from now on, we will be working over the tab “Full Data Table”:

 

 

The first step consists of selecting the appropriate fields that will be shown in the table. For doing this click the “Column chooser” button inside the “Full Data Table” tab, include the fields “X” and “Y” in the “Displayed Subcolumns” box and the fields “Name”, “Chr” and “Position” together with all the sample IDs that we want to analyze in the “Displayed Columns” box:

 

 

Then click “OK” to close the column chooser and click the “Sort by multiple columns” button. Select “Chr” as the first field and “Position” as the second field.

 

 

If we only want to analyze one genome region, we can use the button “Filter rows”. In our case, we have selected the chromosome 8 region 15,400,000-15,500,000 as you can see in the figure below.

 

 

Once we have selected the correct fields, sorted the probes by basepair position, we can proceed to create the input signal intensity file for CNstream. Click the “Export displayed data to a file” button and save the file in your working directory.

 

 

TIP: Depending on the number of samples and the array used in your project, GenomeStudio can spend a lot of time creating this file. A tip for considerably reduce this time is sorting the probes using the “Index” field and, once the file has been created, use a Python or a Perl script to sort the probes by chromosome and basepair position. For creating the file sorted by “Index” you should select in the “Column chooser” panel the field “Index” to be displayed in the table, then use the “Sort by multiple columns” panel to sort the probes by “Index”. Once the probes have been sorted, go back to the “Column chooser” panel and delete the field “Index” of the “Displayed columns” box in order to delete this column of the table.

 

1.2.  CNstream input status file

 

Example file

 

This file is only mandatory if we want to perform also a Case/Control analysis of disease risk association. If we provide this file, CNstream will perform a chi-square test for each CNP segment and will include in the results file some informative fields as the P-value. Status file is a plain text file where each line corresponds to the status of one sample (0 for controls and 1 for cases). The samples must be sorted in the same way that in the input signal intensity file. Then, the first line of the status file will correspond to the status of the first sample in the signal intensity file (first row) and so on.

 

1.3.  CNstream input plate file

 

Example file

 

This file is only mandatory if we want to perform plate normalization. It follows the same rules than the input status file but, in this case, each line corresponds to the plate number of the sample. Plate number “0” is reserved for outlier samples that won’t be included in the analysis.

 

2.      Running CNstream

 

Once CNstream package has been installed and loaded, we can start to use it by calling the main CNstream function CN.stream(). By typing the command help(CN.stream) we will obtain detailed information about the function. Optional parameters of CNstream have been optimized for Illumina array data, then we can start to run it without modifying them. The basic analysis includes neither plate normalization nor association test, then only the input signal intensity file is required and the path where the results will be saved:

 

> data <- "http://www.urr.cat/cnv/data/ex.txt"

> output_dir <- "C:/"

> CN.stream(data, output_dir)

********************************************************

********************************************************

Analysis started at Thu Sep 1 13:36:06 2009

        Input data file (X, Y):  http://www.urr.cat/cnv/data/ex.txt

        Copy number scores:      C://CNstream_scores.txt

        Segment calls:           C://CNstream_CNPseg.txt

        Number of samples:       572

********************************************************

********************************************************

 

If we want to include the plate normalization file and the status file:

 

> data <- "http://www.urr.cat/cnv/data/ex.txt"

> output_dir <- "C:/"

> plate <- "http://www.urr.cat/cnv/data/plate.txt"

> status <- "http://www.urr.cat/cnv/data/status_all.txt"

> CN.stream(data, output_dir, norm_plate = plate, status = status)

 

Once the analysis has finished, the results are saved in the output directory (in this case, “C:/”):

 

CNstream_scores.txt

Single-locus scores for each sample at each probe

CNstream_CNPseg.txt

CNV segment calls for each sample and other relevant information, as the percentages of amplifications and deletions, and the P-Value and the OR when the status file is provided.

 

An interesting option, when we are analyzing a few number of probes, is verbose = 1. Including this option as an input parameter, CNstream will screen the genotyping results and the single-locus scoring results for each probe:

 

> CN.stream(data, output_dir, norm_plate = plate, status = status, verbose = 1)

Processing probe number  0 : rs6545625 in chromosome 2 basepair 57304896

        GENOTYPING

                Samples per genotype:  134 , 228 , 126

                Plot OK... Press ENTER to continue.

        CNV SCORING

                Plot channel A OK... Press ENTER to continue.

                Plot channel B OK... Press ENTER to continue.

               

Processing probe number  1 : rs1424627 in chromosome 2 basepair 57307915

        GENOTYPING

                Samples per genotype:  303 , 157 , 28

                Plot OK... Press ENTER to continue.

        CNV SCORING

                Plot channel A OK... Press ENTER to continue.

                Plot channel B OK... Press ENTER to continue.

                

Processing probe number  2 : rs7576091 in chromosome 2 basepair 57308888

        GENOTYPING

                Samples per genotype:  39 , 189 , 259

                Plot OK... Press ENTER to continue.

        CNV SCORING

                Plot channel A OK... Press ENTER to continue.

                Plot channel B OK... Press ENTER to continue.

 

3.      CNstream parameters

 

CNstream default parameters can be modified by using the input option, which is defined as a list with the following parameters. We only need to specify the field that we wish to change.

 

option$segment_length

Maximum segment length allowed in Kb (default = 100)

option$nmarkers

Number of probes per segment (default = 5)

option$minmarkers

Minimum number of probes in one segment that must exceed the amplification/deletion threshold for calling an amplification/deletion (default = 3)

option$fr

CNV frequency threshold. Only segments exceeding this frequency will be saved (default = 1%)

option$LI

Deletion threshold (default = 1.65)

option$LS

Amplification threshold (default = 2.7)

option$pdfplot

If pdfplot != 1, detailed figures corresponding to the CNP segments will be saved in the output path as PDF documents (default = -1)

 

Example: If we want the CNP segment figures to be saved, we proceed as follows:

 

> option <- list(pdfplot = 1)

> CN.stream(data, output_dir, norm_plate = plate, status = status, option = option)

 

For each CNP segment a PDF file will be created (Example file).

 

4.      CNstream Output Files

 

CNstream creates two output files that resume the CNP analysis results and that are saved in the path specified by the input variable ouput_dir.

 

Ţ    CNstream_scores.txt”

 

Contains the scores assigned to each sample at each locus. These scores are computed at the single-locus scoring step and they are subsequently used to call the copy numbers.

 

Name

Chr

Position

Sample_1

Sample_2

Sample_3

Sample_4

Sample_5

Sample_6

rs6545625

2

57304896

2

2,33

2

2

2

2,02

rs1424627

2

57307915

1,99

2

2

2,01

2

1,97

rs7576091

2

57308888

2

1,96

2

2

1,99

1,93

rs6755060

2

57339604

2

1,99

2

2,02

2

2

rs1345941

2

57347229

2

2,03

2

2

2

2

rs7604249

2

57348282

2

2

2

1,99

2

2

rs4372943

2

57353504

1,99

1,99

2

2

1,99

1,98

rs4614972

2

57357860

1,89

2

2

2

1,95

2

rs4271799

2

57395320

2

2

2

2

2

2,11

 

Ţ    CNstream_CNPseg.txt”

 

This file contains the information about all the probe segments which copy number frequency exceeds the frequency threshold (default=1%). Copy number polymorphism regions are then resumed here, with multiple informative fields as the chromosome region (CHROM, BP_INIT, BP_END), the percentage of amplifications and deletions over the samples (%AMPS, %DELS) and the CN assigned to each sample. Deletions are defined as “-1”, amplifications as “+1” and normal 2-copies state as “0”.

 

CHROM

BP_INIT

BP_END

%AMPS

%DELS

Sample_1

Sample_2

Sample_3

Sample_4

Sample_5

2

57307915

57348282

0,020

0,000

0

0

0

0

0

2

57308888

57353504

0,020

0,000

0

0

0

0

0

2

57339604

57357860

0,020

0,000

0

0

0

0

0

2

57347229

57395320

0,018

0,000

0

0

0

0

0

2

57348282

57395677

0,016

0,000

0

0

0

0

0

8

15435527

15453141

0,000

0,061

0

0

0

0

0

8

15439515

15455979

0,000

0,078

0

0

0

0

0

8

15447669

15464497

0,000

0,078

0

0

0

0

0

8

15450330

15467035

0,000

0,051

0

0

0

0

0

19

20368239

20449621

0,000

0,094

0

0

-1

0

0

19

20385941

20473895

0,000

0,125

0

0

-1

0

0

19

20423788

20520617

0,000

0,125

0

0

-1

0

0

19

20439390

20522325

0,000

0,082

0

0

-1

0

0

 

When the input status file is provided, these additional fields are listed:

 

o       P-VALUE: Significance value computed using a chi-square Case/Control association test.

o       ODDS RATIO (OR)

o       %A_CASES: Percentage of amplifications in cases.

o       %D_CASES: Percentage of deletions in cases.

o       %A_CONTROLS: Percentage of amplifications in controls.

o       %D_CONTROLS: Percentage of deletions in controls.

 

CHROM

BP_INIT

BP_END

P-Value

OR

%A_CASES

%D_CASES

%A_CONTROLS

%D_CONTROLS

Sample_1

Sample_2

Sample_3

Sample_4

Sample_5

2

57307915

57348282

0,199

0,448

1,493

0,000

3,268

0,000

0

0

0

0

0

2

57308888

57353504

0,199

0,448

1,493

0,000

3,268

0,000

0

0

0

0

0

2

57339604

57357860

0,199

0,448

1,493

0,000

3,268

0,000

0

0

0

0

0

2

57347229

57395320

0,114

0,358

1,194

0,000

3,268

0,000

0

0

0

0

0

2

57348282

57395677

0,056

0,267

0,896

0,000

3,268

0,000

0

0

0

0

0

8

15435527

15453141

0,028

3,134

0,000

7,761

0,000

2,614

0

0

0

0

0

8

15439515

15455979

0,012

3,234

0,000

9,851

0,000

3,268

0

0

0

0

0

8

15447669

15464497

0,012

3,234

0,000

9,851

0,000

3,268

0

0

0

0

0

8

15450330

15467035

0,089

2,491

0,000

6,269

0,000

2,614

0

0

0

0

0

19

20368239

20449621

0,005

3,322

0,000

11,940

0,000

3,922

0

0

-1

0

0

19

20385941

20473895

0,017

2,265

0,000

14,925

0,000

7,190

0

0

-1

0

0

19

20423788

20520617

0,017

2,265

0,000

14,925

0,000

7,190

0

0

-1

0

0

19

20439390

20522325

0,007

3,453

0,000

10,448

0,000

3,268

0

0

-1

0

0