Submitted by Michael Dondrup on
We are happy to announce that we have finally published a peer-reviewed research paper describing the Atlantic salmon louse genome in Genomics.
Even though copepods have many ecological roles and interest is on the rise, they are still not that well-represented in genome databases. The same is true for parasites, especially the marine species. The LSalAtl2s assembly has a size of 695.4 Mbp (million base pairs of DNA) and we have annotated 13,081 protein-coding genes.
Because of our commitment to open data, we have made the genome accessible early on, e.g., here in LiceBase and in Ensembl Metazoa.
Over the years, a lot of researchers world-wide have used LSalAtl2s for gene expression studies, to study different gene families, design RNAi experiments, or to map genetic markers. This also means that the genome has been extremely vetted. But simply having data is not enough, we needed some good scientific questions, bioinformatics analyses, and interpretations. We have long argued that the key to sea lice control is in the genome. In the following, I will summarize what are some -in my opinion- key findings from this project.
Nobody’s perfect
First off, no genome is 100% correct, there are often gaps, regions that are hard to sequence or to stitch together (assemble) from the fragments sequencing machines deliver.
We may also appreciate that every individual is also slightly different and usually also has variation between the two copies of their chromosomes. What is more, finding genes in sequences requires sophisticated computer algorithms and does not work well yet without a lot of training data and human intervention. New sequencing technology, like long-read sequencing, helps closing gaps, but even then, the genome is rather a model of someone’s DNA and genes.
We therefore tried to assess how complete our salmon louse genome is by using “Benchmarking Universal Single-Copy Ortholog” (BUSCO) tool. BUSCO tries to detect genes that could be found in all or most organisms in a lineage because they are highly conserved and delivers an estimate of complete the genome is.
LSalAtl2s contains approximately 92% complete BUSCO’s and only about 4% are completely missed, a value that is comparable to other arthropod genomes. Similar values have been achieved when mapping transcriptome (RNA-seq) and other sequencing data back to the genome.
Also, here at SLRC and internationally, many important genes have already been characterized, annotated, sequenced, published, and -where necessary- corrected.
But what exactly shapes a parasite genome, and which features make the salmon louse so competitive?
Repeats, the genomic dark matter
Only a tiny portion of genomes codes for proteins and RNA. Most plant and animal genomes contain repetitive sequences of various types and in various amounts. Some of these repeats can, and indeed will, at any given moment cut-and-paste or copy-and-paste themselves to different locations of the genomes, they are called transposable elements (TE). These have sometimes been called names, like “selfish genes” or “junk DNA”. But nowadays our views are shifting, and researchers have proposed that TE’s shape and accelerate genome evolution. They influence genome size, may make new copies of genes, and delete others, re-arrange them, or alter gene expression.
When we analyzed the salmon louse genome and other crustacean genomes, we found it has over 60% repetitive sequences. This is the highest percentage of all sequenced crustacean genomes so far and remarkable for a relatively small genome. Interestingly, the salmon genome, despite being over four times larger, has similar repeat content. While high repeat content is not restricted to parasites, it might give the salmon louse an edge in adaptation.
Less is more
One of the key features of parasite genomes seems to be to have less genes and proteins than free-living organisms. Using a larger dataset of parasites and free-living organisms, we could show for the first time that animal parasites have a significantly reduced set of genes, and especially a reduced proteome. Interestingly, the total length of the proteome was an even better indicator of lifestyle than just the number of predicted genes. One explanation is that parasites need less genes because they can exploit a host for nutrients and need a less complex metabolism. An example for this is that the louse is apparently lacking genes for making peroxisomes, organelles that are involved in breaking down long fatty acids.
Certainly, genomes and computer predicted gene-sets are often of varying quality and good parasite genomes are still a scarce commodity. Therefore, a wider diversity of lifestyles should be covered by future sequencing projects.
Strength in gene numbers, the key to resistance?
The salmon louse is highly capable of developing resistances against chemical delousing, doesn’t this mean it should have many copies of genes that help it deal with toxins?
We investigated this idea for several gene families related to coping with toxins or stress and, much to our surprise, found the opposite is the case. Some very important and well-known gene-families had reduced numbers when compared to free-living relatives. Only very few gene-families were expanded, and it will be exiting to focus on these in the future.
Here, I have depicted only a small part of our findings about the genome, and there is a host of exiting open questions and hypothesis for the sea lice community to investigate in the future, some of which we have pinpointed in the paper.
For citing the salmon louse genome, please refer to:
Rasmus Skern-Mauritzen, Ketil Malde, Christiane Eichner, Michael Dondrup, Tomasz Furmanek, Francois Besnier, Anna Zofia Komisarczuk, Michael Nuhn, Sussie Dalvin, Rolf B. Edvardsen, Sven Klages, Bruno Huettel, Kurt Stueber, Sindre Grotmol, Egil Karlsbakk, Paul Kersey, Jong S. Leong, Kevin A. Glover, Richard Reinhardt, Sigbjørn Lien, Inge Jonassen, Ben F. Koop, Frank Nilsen,
The salmon louse genome: Copepod features and parasitic adaptations,
Genomics,
Volume 113, Issue 6,
2021,
Pages 3666-3680,
ISSN 0888-7543,
https://doi.org/10.1016/j.ygeno.2021.08.002
(https://www.sciencedirect.com/science/article/pii/S0888754321003098)