Phage diversity, genomics and phylogeny噬菌体多样性、基因组学和系统发育

Insights into evolutionary relationships between phages
洞察噬菌体之间的进化关系

Genetic mosaicism as the main actor in phage evolution
基因镶嵌是噬菌体进化的主角

Defining clear evolutionary relationships is no easy task when it comes to phages. Ironically, what makes them so diversified and unique is perhaps one of the few features they have in common: the mosaicity of their genomes. Genetic mosaicism refers to phage genomes that share regions of high sequence similarity with abrupt transitions to adjacent regions with no detectable resemblance106. These regions are often the result of recombination between two non-identical ancestors. Such recombination events, called horizontal gene transfer (HGT), are major mediators of phage evolution, which complicate how we view their evolutionary relationships.
要明确噬菌体的进化关系并非易事。具有讽刺意味的是,使噬菌体如此多样化和独特的也许是它们为数不多的共同特征之一:基因组的镶嵌性。基因镶嵌指的是噬菌体基因组中序列高度相似的区域突然过渡到无法检测到相似性的相邻区域106。这些区域通常是两个非相同祖先重组的结果。这种重组事件被称为水平基因转移(HGT),是噬菌体进化的主要媒介,这使我们如何看待噬菌体的进化关系变得更加复杂。

Horizontal gene transfer mechanisms at a glance
横向基因转移机制一览

The molecular mechanisms leading to HGT have been well studied in model phages and consist of illegitimate, relaxed and homologous recombinations. Illegitimate recombination occurs randomly across the genome107,108, disrupting genes and gene blocks, leaving most of the phage recombinants or chimeras to be eliminated by counterselection such as host barriers, including anti-phage systems. The mosaic joints (recombination sites) of the few “lucky” ones that emerge are not located randomly. They are rather positioned at gene or gene block boundaries as a result of natural selection favouring only phages whose biological functions remained undamaged109. Relaxed (also called homeologous) recombination takes place at sites of limited homology but that are somewhat related between genomes. In several phages such as lambda, Rad52-like recombinases are responsible for gene shuffling. Relaxed recombination efficiency depends on sequence identity and occurs more frequently than illegitimate recombination110. Homologous recombination, although hard to detect111, is presumed to be the most frequent avenue for HGT and is promoted by the phage recombination machinery112.
导致 HGT 的分子机制已在模式噬菌体中得到充分研究,包括非法重组、松弛重组和同源重组。非法重组在基因组中随机发生107,108,破坏基因和基因块,使大多数噬菌体重组体或嵌合体被反选择(如宿主屏障,包括抗噬菌体系统)所淘汰。少数 “幸运 “出现的噬菌体的马赛克接合点(重组点)并不是随机出现的。由于自然选择只青睐生物功能未受损害的噬菌体,它们被安置在基因或基因块边界109。松弛(也称同源)重组发生在基因组之间同源性有限但有一定关联的位点上。在一些噬菌体(如 lambda)中,类似 Rad52 的重组酶负责基因洗牌。松弛重组的效率取决于序列同一性,其发生频率高于非法重组110。同源重组虽然难以检测111 ,但被认为是 HGT 最常见的途径,并由噬菌体重组机制推动112。

Temperate phages are the brokers in HGT
温带噬菌体是 HGT 的经纪人

Genetic mosaicity has been studied most extensively with dsDNA phages and was first described in lambda113. In theory, all dsDNA phages are mosaic because they have access to a large common gene pool through HGT106. However, phages do not have equal accessibility to the entire reservoir, as it depends on the number of steps (genetic exchanges) required to bring any given sequence from that pool and a particular phage together. For gene exchange to occur, two phages need to infect the same host cell. One scenario involves two virulent phages exchanging genetic material while coinfecting the same cell. Co-infection appears to be prevalent in natural bacterial populations114 and a bioinformatic analysis suggested that a possible chimera even occurred between a ssRNA and a ssDNA virus115 during co-infection. Because temperate phages can integrate into the host genome and become prophages, they are thought to act as viral sequence reservoirs and likely play a central role in HGT116. When a prophage (functional or cryptic) behaves as a sequence donor, the infecting phage (virulent or temperate) becomes the recipient of a new gene or gene block allele, as demonstrated with a cryptic prophage in Escherichia coli infected by lambda110 or with dairy phages117. Bioinformatics analyses support the idea that temperate phages (and prophages) undergo frequent HGTs, while mosaicism is still present but seems less crucial for virulent phages118, which form clustered viral populations. Mavrich and colleagues showed that phages have two evolutionary modes with distinct rates of HGT63. Virulent phages typically fall into the low gene content flux category while temperate phages tend to be distributed in both low and high gene content flux categories. Another study (discussed also below) showed that if we represent phage relationships and gene exchanges as a big web, we find temperate phages at its center119, connecting groups of virulent phages located on the periphery. Thus, temperate phages function as banks for HGT119.
对基因镶嵌性的研究最广泛的是 dsDNA 噬菌体,最早是在 lambda113 中描述的。理论上,所有 dsDNA 噬菌体都是镶嵌的,因为它们可以通过 HGT106 进入一个大型的共同基因库。然而,噬菌体进入整个基因库的机会并不均等,因为这取决于将基因库中的任何特定序列与特定噬菌体结合在一起所需的步骤(基因交换)数量。要进行基因交换,两个噬菌体必须感染同一个宿主细胞。其中一种情况是两种毒性噬菌体在共同感染同一个细胞时交换遗传物质。共感染似乎在自然细菌种群中很普遍114 ,一项生物信息学分析表明,在共感染过程中,ssRNA 和 ssDNA 病毒115 之间甚至可能出现嵌合体。由于温带噬菌体可以整合到宿主基因组中并成为噬菌体,因此它们被认为是病毒序列库,并可能在 HGT 中发挥核心作用116。当噬菌体(功能噬菌体或隐性噬菌体)充当序列供体时,感染噬菌体(毒性噬菌体或温带噬菌体)就会成为新基因或基因块等位基因的受体,大肠杆菌中的隐性噬菌体感染 lambda110 或乳制品噬菌体117 就证明了这一点。生物信息学分析支持这一观点,即温带噬菌体(和原噬菌体)经常发生 HGT,而镶嵌现象仍然存在,但对于形成集群病毒种群的毒性噬菌体118 来说似乎不那么重要。马夫里奇及其同事发现,噬菌体有两种进化模式,其 HGT 的发生率各不相同63。毒性噬菌体通常属于低基因含量通量类别,而温带噬菌体则往往分布在低和高基因含量通量类别中。另一项研究(也将在下文讨论)表明,如果我们把噬菌体之间的关系和基因交流看成一张大网,我们会发现温带噬菌体位于网的中心119,连接着位于外围的毒性噬菌体群。因此,温带噬菌体起到了 HGT 银行的作用119。

Evolutionary relationships between phages also differ by host
噬菌体之间的进化关系也因宿主而异

Along with lifestyle, the rate and differential manner in which phages appear to exchange genetic material depend on their hosts and which environments they thrive in63 (Fig. 5). Additionally, groups of phages infecting the same host can either form discrete genotypic clusters, an uninterrupted genetic continuum or something in between120. For example, despite regular exchanges of photosynthesis genes by homologous recombination, cyanophage genomes still differentiate into stable discrete groups86,121,122. Virulent dairy phages infecting Streptococcus thermophilus have likely recombined with phages infecting other lactic acid bacteria species123 and follow a high gene content flux despite their lytic lifestyle124,125. Mycobacteriophages fall into the “something in between” category as they are grouped in clusters and display an overall continuous spectrum of diversity. However, intra-cluster diversity and discreetness are highly variable and temperate mycobacteriophages evolve in both the low and high gene content flux63,126. More phages with other nucleic acid types (ssDNA and RNA) and that infect other bacteria still need to be characterized and sequenced. This will help to elucidate any possible universal patterns in viral evolutionary relationships, confirm the existence of discrete populations in nature, and verify whether or not they are the result of insufficiently sampled environments127.
除了生活方式外,噬菌体交换遗传物质的速度和不同方式也取决于宿主及其生长环境63(图 5)。此外,感染同一宿主的噬菌体群既可以形成离散的基因型群,也可以形成不间断的基因连续体,或者介于两者之间120。例如,尽管通过同源重组定期交换光合作用基因,但噬菌体基因组仍会分化成稳定的离散群86,121,122。感染嗜热链球菌的剧毒乳制品噬菌体很可能与感染其他乳酸菌的噬菌体发生了重组123 ,尽管它们的生活方式是杀菌124,125,但基因含量却很高124,125。嗜分枝杆菌噬菌体属于 “介于两者之间 “的一类,因为它们被归类为噬菌体群,总体上显示出连续的多样性。然而,噬菌体簇内的多样性和分散性变化很大,温带噬菌体在低基因含量和高基因含量通量中都有进化63,126。更多具有其他核酸类型(ssDNA 和 RNA)并感染其他细菌的噬菌体仍需进行特征描述和测序。这将有助于阐明病毒进化关系中可能存在的普遍模式,确认自然界中是否存在离散种群,并验证它们是否是采样不足环境的结果127。

A network representation of phage phylogeny
噬菌体系统发育的网络表征

Phage phylogeny has undergone several changes in the past two decades. Classification was initially based on morphology and traditional phylogenetic trees were used to visualize evolutionary relationships. With the rapid increase of viral metagenomics, a plethora of phage sequences were discovered without the determination of the virion morphology. It also became clear that no single gene or protein was found in all phage genomes, making it difficult to build a tree based on a single shared genomic feature128. In addition, phylogenetic trees cannot support the combinatorial nature of phage genomes119. Therefore, an alternative way to visualize phage phylogeny is to use networks, with nodes corresponding to phage genomes and edges representing similarities at the gene, protein or genome level. This was first shown by Lima-Mendez and colleagues in 2008, using a set of 306 phage genomes119. In their network, temperate phages were shown to be much more closely interconnected, whereas virulent phages were on the periphery, forming discrete clusters. The path from one virulent phage cluster to another had to pass through temperate phages in the center of the network. Gene-sharing networks were further explored on the complete dsDNA virosphere (eukaryotic and prokaryotic viruses)129. Supermodules were identified within the network that grouped phages according to their ICTV-based family, although some modules contained phages belonging to different families. Another advance in phage phylogeny is the development of vConTACT16,130,131, a software that classifies viruses to build a network (Fig. 5). Already at its second version (vConTACT2131), this program extracts predicted proteins from each viral genome to build viral protein clusters, which is then used to calculate genome similarities between each pair of viruses. Genome pairs with a similarity score above a given threshold become linked by an edge and the viral cluster formation is performed by a program that can disentangle complex network relationships and delineate clusters. With this approach, the authors showed that viruses can be accurately clustered at the genus level and that the more the virosphere is sampled, the more robust the network will become.
在过去二十年中,噬菌体系统发育经历了多次变化。噬菌体的分类最初基于形态学,传统的系统发生树被用来直观显示进化关系。随着病毒元基因组学的迅速发展,人们发现了大量噬菌体序列,而无需确定病毒的形态。同样明显的是,所有噬菌体基因组中都没有单一的基因或蛋白质,因此很难根据单一的共享基因组特征构建系统树128。此外,系统发生树无法支持噬菌体基因组的组合性质119。因此,可视化噬菌体系统发育的另一种方法是使用网络,节点对应噬菌体基因组,边代表基因、蛋白质或基因组水平的相似性。利马-门德斯(Lima-Mendez)及其同事于 2008 年利用一组 306 个噬菌体基因组119 首次证明了这一点。在他们的网络中,温带噬菌体的相互联系更为紧密,而毒性噬菌体则处于外围,形成离散的簇群。从一个毒性噬菌体群到另一个毒性噬菌体群的路径必须经过网络中心的温带噬菌体。在完整的dsDNA病毒球(真核和原核病毒)129 上进一步探索了基因共享网络。在网络中发现了超级模块,这些模块根据噬菌体基于 ICTV 的科进行分组,尽管有些模块包含属于不同科的噬菌体。噬菌体系统发育的另一个进展是 vConTACT16,130,131 的开发,这是一种对病毒进行分类以构建网络的软件(图 5)。该软件的第二个版本(vConTACT2131)已经从每个病毒基因组中提取出预测的蛋白质来构建病毒蛋白质簇,然后用来计算每对病毒之间的基因组相似性。相似性得分超过给定阈值的基因组对会被边缘连接起来,而病毒集群的形成则由一个能理清复杂网络关系并划分集群的程序来完成。作者用这种方法表明,病毒可以在属一级被准确聚类,而且病毒球的采样越多,网络就会变得越强大。

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top