Submitted by Administrator on
We found a large number of perfect sequence alignments to something that looks like a vector contamination in the ESTs named internally zslaa.*.scf. I assume the sequences were not vector-clipped effectively. Take this into account when using these EST sequences or alignments against the genome.
The genome and other ESTs do not contain this vector sequence (largest sub-string is 50bp long and found once), however, ESTs might pile up in an alignment of ESTs against the genome, where the genome sequence contains short sub-strings of the vector by chance. There might be further unidentified vectors in the zsla* ESTs.
This sequence was identified by Christiane Eichner, it looks like a common vector:
TCAGTGAGCGAGGAAGCGGCCGCATAACTTCGTATAGCATACATTATACGAAGTTATCAGTCGACGGTACCGGACATATGCCCGGGAATTCGGCCATTACGGCCGGGG
>gb|HO684438.1|HO684438 zslaa0_003065.z1.scf Lepeophtheirus salmonis LNO Lepeophtheirus salmonis cDNA, mRNA sequence
Length = 480
Score = 200 bits (108), Expect = 3e-51
Identities = 108/108 (100%), Frame = +1 / +1
Query: 1 TCAGTGAGCGAGGAAGCGGCCGCATAACTTCGTATAGCATACATTATACGAAGTTATCAG 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 15 TCAGTGAGCGAGGAAGCGGCCGCATAACTTCGTATAGCATACATTATACGAAGTTATCAG 74 Query: 61 TCGACGGTACCGGACATATGCCCGGGAATTCGGCCATTACGGCCGGGG 108 |||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 75 TCGACGGTACCGGACATATGCCCGGGAATTCGGCCATTACGGCCGGGG 122