Utilities / Remove duplicates from FASTQ or FASTA

Description

Given a FASTQ or a FASTA file, this tool removes identical sequences.

Parameters

none

Details

Identical sequences are collapsed into a single sequence. The sequences are renamed with two numbers: a running number followed by how many times that sequence occurred.

Output

A FASTQ or FASTA file containing unique reads (note that even if the input file was FASTA, the result file will have the ending ".fastq", so you have to rename it to ".fasta").

Reference

This tool is based on the FASTA/Q Collapser tool of the FASTX package.