zsla ESTs not properly vector-clipped

We found a large number of perfect sequence alignments to something that looks like a vector contamination in the ESTs named internally zslaa.*.scf. I assume the sequences were not vector-clipped effectively. Take this into account when using these EST sequences or alignments against the genome.

The genome and other ESTs do not contain this vector sequence (largest sub-string is 50bp long and found once), however, ESTs might pile up in an alignment of ESTs against the genome, where the genome sequence contains short sub-strings of the vector by chance. There might be further unidentified vectors in the zsla* ESTs.

This sequence was identified by Christiane Eichner, it looks like a common vector:

TCAGTGAGCGAGGAAGCGGCCGCATAACTTCGTATAGCATACATTATACGAAGTTATCAGTCGACGGTACCGGACATATGCCCGGGAATTCGGCCATTACGGCCGGGG

>gb|HO684438.1|HO684438 zslaa0_003065.z1.scf Lepeophtheirus salmonis LNO Lepeophtheirus salmonis cDNA, mRNA sequence
 

Length = 480

Score = 200 bits (108), Expect = 3e-51
Identities = 108/108 (100%), Frame = +1 / +1

 


 

Query: 1   TCAGTGAGCGAGGAAGCGGCCGCATAACTTCGTATAGCATACATTATACGAAGTTATCAG 60
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 15  TCAGTGAGCGAGGAAGCGGCCGCATAACTTCGTATAGCATACATTATACGAAGTTATCAG 74
Query: 61  TCGACGGTACCGGACATATGCCCGGGAATTCGGCCATTACGGCCGGGG 108
           ||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 75  TCGACGGTACCGGACATATGCCCGGGAATTCGGCCATTACGGCCGGGG 122