[NEWS] Automatic functional annotation of L. salmonis genes

Functional annotations have been been assigned automatically to all the salmon louse genes. They can be browsed and searched via Search Genes and Sequences.

We have decided to keep the gene names, the EMLSAG stable ids assigned by Ensembl metazoa for now to allow for easy linking to specific genes. This is slightly different than with the Tigriopus kingsejongensis genome, which has annotations within the gene names (such as '116 kda u5 small nuclear ribonucleoprotein component'). Our policy is to keep attributes as they were assigned by the upstream provider of the annotation data wherever possible.

The annotations of L. salmonis genes where assigned automatically based on best blast hits vs. the SwissProt and NR database. BlastP hit descriptions were assigned based on the following 'algorithm':

  • Manually annotated genes were skipped (for each gene this can be checked under the Properties tab: Evidence = Manual)
  • IF there is a SwissProt hit with E-value < 1E-6, this hit becomes the new annotation
  • IF there is no SwissProt hit but a NR hit with E-value < 1E-10, this hit becomes the new annotation
  • With two EXCEPTIONS:
    • If NR hit is very much better than SwissProt hit,  NR hit is taken anyway
    • If NR hit is better than SwissProt hit and contains the species name Lepeophtheirus, the NR hit is taken
  • If there is no Blast hit, the annotation 'Hypothetical protein' is assigned

Of course these annotations are just meant to get a first impression and they could be quite misleading or incomplete, it is still required to check Blast results manually.