This tool aligns Illumina single end RNA-seq reads to publicly available genomes. You need to supply the reads in a FASTQ file. If you would like us to add new reference genomes to Chipster, please contact us.
TopHat2 first identifies potential exons by mapping the reads to the genome using the Bowtie2 aligner. Using this initial mapping, it builds a database of possible splice junctions, and then maps the reads against these junctions to confirm them. As many exons are shorter than reads, TopHat2 splits the reads into smaller segments, which are then mapped independently. The segment alignments are "glued" back together in a final step of the program to produce end-to-end read alignments. TopHat generates its database of possible splice junctions from two sources of evidence:
The "anchor length" means that TopHat2 will report junctions spanned by reads with at least this many bases on each side of the junction. Note that individual spliced alignments may span a junction with fewer than this many bases on one side. However, every junction involved in spliced alignments is supported by at least one read with this many bases on each side. By default no mismatches are allowed in the anchor, but you can change this.
TopHat2 will ignore donor-acceptor pairs which are closer than the minimum intron length or further than the maximum intron length apart. With long (>=75bp) reads, "GT-AG", "GC-AG" and "AT-AC" introns can be found ab initio. With shorter reads, TopHat only reports alignments across "GT-AG" introns.
After running TopHat2, Chipster indexes the BAM file using the SAMtools package. This way the results are ready to be visualized in the genome browser.
This tool is based on the TopHat package. Please cite the following article:
Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics (2009) 25 (9): 1105-1111.