Why did we make this website?

We believe that a new perspective and the formulation of questions can provide insights from the same analyses. However, the number of good and appropriate questions is vast, far exceeding what any single scientist can consider. We contend that the unrealized potential of many scientific efforts is concealed under massive piles of supplementary numbers and tables. The processed data we supply is ready for analysis by a diverse group of scientists, each with their own set of intriguing questions. To overcome the considerable technological barriers that frequently obstruct research, we gladly present this user-friendly interface that allows free access to our research.

Please keep in mind that certain sections require the interactive shiny app to compute and re-plot the processed data. This may take a few seconds. Please be patient. We compiled short methodological explanations and parameter specifics for each allocated section. For further reading, please find corresponding paper or website under Method section.

Project Overview

We cultivated three species, namely Physcomitrium patens, Mesotaenium endlicherianum SAG12.97, and Zygnema circumcarinatum SAG698-1b, each subjected to four distinct environmental conditions: control, cold, heat, and high light (+recovery). We generated RNA-Seq data and executed metabolite investigation at several time points after the experiment started. Subsequently, we used a variety of computational methods to uncover the underlying stress response mechanisms. To have a thorough understanding of our findings, please read our manuscript: Rieseberg, Tim P., et al. "Time-resolved oxidative signal convergence across the algae–embryophyte divide." Nature Communications 16.1 (2025): 1780.

Credits

Armin Dadras created this website based on the experiment design conducted by the de Vries group. The study’s key findings have been extensively recorded and published in Rieseberg et al. (2024). The design and maintenance of this website was made possible by the support offered by GWDG. The project was funded by DFG grants SPP 2237 (MAdLand - Molecular Adaptation to Land: Plant Evolution to Change), VR 132/4-1 (CarotPhyte: Exploring the evolutionary roots for the biosynthesis of apocarotenoids and their role as signals in plastid-mediated stress response in streptophyte algae), and ERC grant 852725 (ERC-StG TerreStriAL Terrestrialization: Stress Signalling Dynamics in the Algal Progenitors of Land Plants).

Walkthrough

In the video below, I walk you through different portions of this website and briefly demonstrate its capabilities. If you’re new to the Shiny app, I highly recommend watching it.

Gene Discovery Toolbox

Methodology

We used six different methods to find interesting genes in our study: eggNOG-mapper, Orthofinder, TapScan, Gene Ontology, best blast hit against Araport11 and InterProScan. Each of them approach a similar question with a different method and we believe the collective look can help us to target interesting genes better than individual approaches. Please read the corresponding paper for each tool if you need a comprehensive explanation of the tools.

In the context of the analyses described in this section, it is vital to emphasize that only protein sequences from the indicated species were used. Consequently, this collection does not include non-coding RNA sequences. Alternative methodology or customized criteria-based studies may be necessary for investigating non-coding RNA or identifying specific genes of interest, potentially necessitating independent study or the use of distinct analytical approaches.

OrthoFinder is a software for identifying and categorizing orthologous genes across species or genomes. It employs an algorithm that considers both sequence similarity and gene evolutionary relationships. By creating a similarity graph based on pairwise comparisons of protein sequences, OrthoFinder clusters genes into orthogroups, which represent genes descended from a common ancestral gene. This approach ensures accurate grouping, even for genes with complex evolutionary histories.To obtain the best results from Orthofinder, we must include a phylogenetically diverse sample of species for orthogroup inference. As a result, this section contains more species.

EggNOG-mapper is a tool for assigning functional annotations to genes by mapping them to orthologous groups and functional categories in the EggNOG database. Users submit gene sequences, and EggNOG-mapper detects orthologous groups and provides functional insights based on known functions in the database. This technology simplifies gene annotation and helps us understand gene functions and evolutionary links across species.

With the increased availability of genomic data, TapScan is a tool designed specifically for plant genome analysis to identify transcription factors with high precision.

Gene ontology (GO) is a database and standardized approach for classifying and categorizing genes and gene products based on their molecular functions, biological processes, and cellular components. It establishes a common vocabulary and framework for identifying gene features and behaviors across various animals, making it easier to evaluate Genomic data and better understand gene function in biological systems. GO organizes genes into hierarchical categories and links, allowing researchers to perform comparative analysis, identify functional similarities, and gain biological insights across species. We integrated the InterProScan and eggNOG-mapper results and filtered the “obsolute” GO IDs to generate comprehensive GO-Gene-Term sets.

InterProScan helps also with the functional annotation of protein sequences. By entering protein sequences into InterProScan, we may detect conserved domains, motifs, and functional signatures. This information contributes to our understanding of these proteins’ potential roles and activities in biological processes. Furthermore, InterProScan can predict protein families and connect sequences to known biological pathways, which is critical for understanding the molecular mechanisms that underpin many biological occurrences.

Unfortunately, the multiple sequence alignment viewer is not compatible with the Firefox browser. It’s a bug in the package we use for MSA visualization, and we can’t do anything about it. We checked, and MSA viewer works fine with Chrome, Brave, and Safari. Other browsers have not been checked.

The parameters that were used for each program are listed below.

Orthofinder

We used Orthofinder with two different settings, and used the results of the second run since we believe it is more accurate. The species tree that we used for the second run is shown below.

# 1st run
orthofinder.py -S diamond -M msa -A mafft -T fasttree -t 50 -a 6 -y -n run_1
# 2nd run
orthofinder.py -t 50 -a 6 -y -n run_2 -ft Results_run_1 -s SpeciesTree_input.txt

eggNOG-mapper

emapper.py -m diamond --itype proteins --data_dir eggnog-mapper-data/ --dmnd_iterate yes --dbmem --cpu 0 --evalue 1e-10 --sensmode ultra-sensitive --tax_scope 33090 --dmnd_db eggnog-mapper-data/eggnog_proteins_default_viridiplantae.dmnd

InterProScan

interproscan.sh -cpu 150 -pa -goterms  &> iprsc.log

Write JUST 1 Orthogroup ID

eggNOG-mapper

TapScan

Gene Ontology

Best blast hit against Araport11

InterProScan

Exploratory data analysis

Introduction

We utilized principal component analysis (PCA) to extract the core of the data and find the most significant sources of variation between samples. Multidimensional Scaling (MDS) goes one step further, allowing us to see the subtle interactions between samples, discovering underlying structures that would otherwise go undiscovered. Finally, our Hierarchical Clustering reveals natural data groupings, allowing us to identify physiologically significant groups.

Methods

We employed a range of R packages for our exploratory data analysis, and you can find the comprehensive list in the Methods section. To give an overview, our process involved importing Kallisto quantification files using the txImport tool and applying a lengthScaledTPM transformation. We then filtered the data, retaining reads with a minimum Count-Per-Million (CPM) of at least 10 across a minimum of three samples. Ensuring robustness, we conducted Smooth Quantile Normalization, accounting for varying experimental conditions, and subsequently transformed the data using the voom function from the limma package. To perform hierarchical clustering, we computed distances using the Euclidean method and opted for the “ward.D”” method for agglomeration.

A note on abbreviations

Each treatment has a name composed of two components separated by an underscore. The first portion is the treatment, and the second part is the time (in hours) since the start of the experiment. We had two highlight settings: stress (“s”) and recovery (“r”).

Principal component analysis (PCA) plot

Multidimensional scaling (MDS) plot

Hierarchical clustering

Gene expression visualization

Visualization of a single gene expression profile

Here, we provide the chance to see how particular genes are expressed. We think it’s critical to analyze gene expression using a variety of visualization techniques, such as the boxplot, dot plot, and heatmap. By providing a distinct viewpoint, each approach makes it possible to decipher the intricate workings of gene regulation. Because of its simplicity, the boxplot shows central tendencies and outliers in a clear description of the distribution of gene expression. Conversely, the dot plot provides a finer perspective by displaying individual data points together with their distribution. In the meantime, the heatmap presents gene expression across samples in a clear and understandable way, painting a complete picture. In addition, we applied the Z-score transformation, a statistical method that normalizes gene expression levels to facilitate comparisons between various genes and datasets.

A note on abbreviations

Plots

Please choose a plot type

Boxplot_conditions

Boxplot_treatment

Dotplot

Heatmap

Write JUST 1 gene ID

Differential Gene Expression Analysis

Introduction

This section covers differential gene expression analysis. Our primary goal in this part was to compare individual treatments to appropriate control samples (at the same time point if possible) in order to better understand how the treatments affected gene expression patterns.

Methodology

To begin our differential gene expression analysis, we used pre-processed count data (normalized with qsmooth) transformed with the voom function from the limma package. Our analytical technique starts with the formulation of a contrast matrix. When we compared the treatment group to a properly selected control sample, we estimated the fold change and adjusted p-value values using gene expression shifts. Our research was supported by a number of essential functions from the limma package, such as lmFit, contrast.fit, eBayes, and decideTests. These methods helped us discover genes with differential expression (DEGs). We next visualized the dynamic gene expression profiles using ggplot2 and a hierarchical clustering technique. For each comparison, genes were divided into two unique groups: “up-regulated” and “down-regulated” using the cutree algorithm. For the majority of our experimental samples, only one treatment and control sample combination was available. However, in cases where this matching was impossible due to experimental restrictions, we used a cautious approach. We used the nearest accessible time point as a surrogate reference for comparison. After that, we looked at over-representation analysis, which is a critical step in understanding the functional implications of DEGs. Using the clusterProfiler package, we attempted to identify Gene Ontology (GO) concepts that were strongly enriched inside each genetic cluster. Our background set included all expressed genes in our samples.

Parameter selection

The default values are suitable, but you can adjust them. Please be patient while a new plot or table is loading. It takes time to do the calculation. If there are no enhanced terms, you will see an empty plot/table. Then perhaps you might choose a different set of parameters or a different comparison.

Parameters for visualization and filtering

Select a comparison

Fold change cutoff

Adjusted p-value cutoff

Select a GO domain for ORA

The number of categories to show in ORA plots

p-value cutoff for ORA

q-value cutoff for ORA

GO ORA results

Table of Log2(Fold change) and adjusted P-value compared to the control

Heatmaps of Z-score transformed of qsmooth normalized and voom transformed counts including only DEGs

Tables of qsmooth normalized and voom transformed counts including only DEGs

Co-expression network analysis using DPGP

Introduction

Co-expression network analysis with RNA-Seq data provides insights into the complex regulatory mechanisms that control gene expression. We can find functional relationships, identify important regulatory genes, and discover biological pathways by analyzing the correlation patterns between gene expression profiles across diverse situations. This method enables the discovery of co-regulated genes that may be implicated in shared biological processes or pathways, providing a comprehensive understanding of gene expression dynamics. Furthermore, co-expression networks provide a systematic framework for selecting candidate genes for additional experimental validation, making it possible to uncover novel biomarkers in a variety of biological situations.

Dirichlet process and Gaussian process (DPGP)

Dirichlet and Gaussian process models have emerged as promising tools for capturing temporal dependencies and dynamic interactions among genes in co-expression network research using time series data. Dirichlet process models provide a versatile non-parametric framework for clustering genes into co-expression modules while supporting different cluster sizes and architectures. Gaussian process models, on the other hand, offer a robust probabilistic framework for modeling time-varying gene expression trajectories, incorporating both deterministic and stochastic variations. Integrating these Bayesian modeling tools with RNA-Seq time series data allows for the identification of dynamic gene regulatory networks, which sheds insight on the temporal dynamics of gene expression and regulatory interactions in biological systems. If you want to find out more about this tool, please read this paper. Please remember that if no terms get enriched in an analysis there will be no table and plot for that specific setting. You may get a table or plot by tuning the parameters below.

Please be patient. Loading the results in this section take a few seconds.

Pick parameters for analysis and visualization of DPGP results

Select a network

Threshold of probability inclusion (If you want to have more 'conserved' clusters increase number towards 1)

Select a GO domain

The number of categories to show

p-value cutoff for ORA

q-value cutoff for ORA

Select a network

Threshold of probability inclusion (If you want to have more 'conserved' clusters increase number towards 1)

Select a GO domain

The number of categories to show

p-value cutoff for ORA

q-value cutoff for ORA

Select a network

Threshold of probability inclusion (If you want to have more 'conserved' clusters increase number towards 1)

Select a GO domain

The number of categories to show

p-value cutoff for ORA

q-value cutoff for ORA

DPGP cluster expression visualization

DPGP cluster GO ORA visualization

DPGP cluster GO ORA tables

Table of genes clustered via DPGP (all clusters)

Co-expression network analysis WGCNA

Introduction

Weighted Gene Co-Expression Network Analysis (WGCNA)

WGCNA is a widely used method in bioinformatics for creating and analyzing co-expression networks. WGCNA identifies modules of highly connected genes, allowing to discover biologically significant gene clusters associated with specific phenotypes or conditions. By weighting gene-gene correlations based on their significance, WGCNA provides a strong framework for finding co-expression patterns and choosing genes with high relevance to the biological context under study. For more information about this tool, see this paper. For network construction, we used the following settings: merge threshold = 0.20, network type = signed, TOM type = signed, Min. module size = 30, Max. P. outliers = 0.05. We used 20 as soft threshold for M. endlicherianum and P. patens and 13 for Z. circumcarinatum based on our screening for scale-free network properties.

Please use the following acronym table for measuring metabolites:

Acronym	Full name
11_cis	15-cis-beta-Carotene
6MHO	6MHO
9_cis	9-cis-beta-Carotene
9_cis_neo	9-cis-Neoxanthin
A_Z_per_V_A_Z	(Antheraxanthin+Zeaxanthin) / (Violaxanthin+Antheraxanthin+Zeaxanthin)
alpha_car	alpha-Carotene
anthera	Antheraxanthin
beta_car	beta-Carotene
beta_car_per_9_cis_neox	beta-Carotene / 9-cis-beta-Carotene
beta_Car_per_beta_CC	beta-Carotene / beta-Cyclocitral
beta_Car_per_beta_Io	beta-Carotene / beta-Ionone
beta_Car_per_DHA	beta-Carotene / DHA
beta_Car_per_beta_CC_beta_Io_DHA	beta-Carotene / (beta-Cyclocitral+beta-Ionone+DHA)
beta_CC	beta-Cyclocitral
beta_Io	beta-Ionone
Chla	Chlorophyll a
Chla_per_b	Chlorophyll a / Chlorophyll b
Chlb	Chlorophyll b
DHA	DHA
DHA_per_beta_Io	DHA/beta_Ionone
Lut	Lutein
viol	Violaxanthin
V_A_Z_per_Chla_b	(Violaxanthin+Antheraxanthin+Zeaxanthin) / (Chlorophyll b)
Zeax	Zeaxanthin

Please be patient. Loading the results in this section take a few seconds.

Pick your parameters for WGCNA visualization

Select a cluster and treatment for ORA analysis and visualization

Select a GO domain

The number of categories to show

p-value cutoff for ORA

q-value cutoff for ORA

Select a cluster and treatment for ORA analysis and visualization

Select a GO domain

The number of categories to show

p-value cutoff for ORA

q-value cutoff for ORA

Select a cluster and treatment for ORA analysis and visualization

Select a GO domain

The number of categories to show

p-value cutoff for ORA

q-value cutoff for ORA

Summary table of WGCNA analysis

WGCNA Gene Significance visualization

Gene Significance or absolute value of value of Gene Significance?

Pick a measurement for Gene Significance

WGCNA cluster expression visualization

Table of top 20 hubs for each module of WGCNA analysis

WGCNA cluster GO ORA visualization

WGCNA cluster GO ORA tables

Gene regulatory networks

Introduction

Gene regulatory networks are the intricate webs of interactions between genes and regulatory elements that control biological activities. Understanding these networks is essential for determining the underlying mechanisms that control biological systems. Time-series RNA-Seq data provide a unique chance to dynamically infer gene regulatory networks by capturing the temporal dynamics of gene expression changes in response to different stimuli or perturbations. By studying gene expression profiles across time, we can discover regulatory relationships, identify important regulatory genes, and figure out the dynamic interactions that shape gene expression programs. Time-series RNA-Seq data are thus an effective tool for deciphering the temporal characteristics of gene regulation networks and understanding how they influence cellular behaviours and responses.

Sliding Window Inference for Network Generation (SWING)

Despite the availability of time-resolved, high-throughput data, many algorithms ignore the temporal delays inherent in regulatory systems, resulting in unreliable network inferences. SWING is used to address this issue by only taking temporal information into account when identifying time-delayed edges. SWING’s tolerance to user-defined parameters allows for the successful identification of regulatory mechanisms from time-series gene expression data. SWING uses multivariate Granger causality to capture the regulatory relationships between genes throughout time. SWING, unlike traditional Granger approaches, uses a sliding window approach to evaluate numerous upstream regulators at the same time over a range of time delays. If you would like to investigate more about this tool, please see this paper.

The algorithm is O(2^N), indicating that the number of input genes has a significant impact on the computing time. It was not possible to use all expressed genes as input for this tool. We filtered for transcription factors as well as stress response gene homologs of A. thaliana (identified using sequence similarity search). For approximately 3,000 genes and 5 to 9 time points, the calculation took 4 to 16 days to complete. As a result, these databases do not include all genes or their interactions. We did two analyses. First, we include all time points where we measured RNA-Seq and metabolite levels for each species under treatment. Second, we only considered RNA-Seq data for each species subjected to a treatment. For M. endlicherianum in both instances and transcript-only data sets of P. patens and Z. circumcarinatum-1b, the following settings were used as input: k_min = 0, k_max = 1, w = 4, method = RandomForest, trees = 500, and lag_method='mean_mean'. For P. patens and Z. circumcarinatum-1b, we utilized the following settings for the data set of RNA-Seq and metabolites: k_min = 0, k_max = 1, w = 2, method = RandomForest, trees = 500, lag_method="mean_mean".

Please use the following acronym table for measuring metabolites:

Acronym	Full name
11_cis	15-cis-beta-Carotene
6MHO	6MHO
9_cis	9-cis-beta-Carotene
9_cis_neo	9-cis-Neoxanthin
A_Z_per_V_A_Z	(Antheraxanthin+Zeaxanthin) / (Violaxanthin+Antheraxanthin+Zeaxanthin)
alpha_car	alpha-Carotene
anthera	Antheraxanthin
beta_car	beta-Carotene
beta_car_per_9_cis_neox	beta-Carotene / 9-cis-beta-Carotene
beta_Car_per_beta_CC	beta-Carotene / beta-Cyclocitral
beta_Car_per_beta_Io	beta-Carotene / beta-Ionone
beta_Car_per_DHA	beta-Carotene / DHA
beta_Car_per_beta_CC_beta_Io_DHA	beta-Carotene / (beta-Cyclocitral+beta-Ionone+DHA)
beta_CC	beta-Cyclocitral
beta_Io	beta-Ionone
Chla	Chlorophyll a
Chla_per_b	Chlorophyll a / Chlorophyll b
Chlb	Chlorophyll b
DHA	DHA
DHA_per_beta_Io	DHA/beta_Ionone
Lut	Lutein
viol	Violaxanthin
V_A_Z_per_Chla_b	(Violaxanthin+Antheraxanthin+Zeaxanthin) / (Chlorophyll b)
Zeax	Zeaxanthin

Gene regulatory network tables

Select a network

Download complete table

Show top N interactions (the maximum number on the website is 2000):

Write JUST 1 gene ID. Or, leave it empty.

Filter for this gene in:

Select a network

Download complete table

Show top N interactions (the maximum number on the website is 2000)

Write JUST 1 gene ID. Or, leave it empty.

Filter for this gene in:

Select a network

Download complete table

Show top N interactions (the maximum number on the website is 2000)

Write JUST 1 gene ID. Or, leave it empty.

Filter for this gene in:

Gene regulatory network visualization

Raw RNA-Seq reads

The raw sequencing data generated for this project has been deposited in the Sequence Read Archive (SRA) and is available for download via the respective BioProject IDs: PRJNA939006 and PRJNA890248. Access to individual data files can be obtained through the SRA accessions SRR23625966 to SRR23626145 and SRR21891679 to SRR21891705.

Analyses results files

To adhere to our commitment to reproducible and open science practices, we have made all the codes, scripts, and results utilized in this project available on GitLab. Access to these resources can be obtained through here.

Annotation files that were used in this study

In the course of this study, we employed genome annotation and protein sequences of various species. Below is the comprehensive list of these resources and their respective locations for reference:

Species	Paper	Downloaded from
C. reinhardtii	Craig, Rory J., et al. “The Chlamydomonas Genome Project, version 6: Reference assemblies for mating-type plus and minus strains reveal extensive structural mutation in the laboratory.” The Plant Cell 35.2 (2023): 644-672.	https://data.jgi.doe.gov/refine-download/phytozome?organism=CreinhardtiiCC-4532&expanded=707
O. lucimarinus	Palenik, Brian, et al. “The tiny eukaryote Ostreococcus provides genomic insights into the paradox of plankton speciation.” Proceedings of the National Academy of Sciences 104.18 (2007): 7705-7710.	https://data.jgi.doe.gov/refine-download/phytozome?q=Ostreococcus+lucimarinus&expanded=Phytozome-231
M. viride	Liang, Zhe, et al. “Mesostigma viride genome and transcriptome provide insights into the origin and evolution of Streptophyta.” Advanced Science 7.1 (2020): 1901850.	https://genome.jgi.doe.gov/portal/pages/dynamicOrganismDownload.jsf?organism=Mesvir1
C. melkonianii	Wang, Sibo, et al. “Genomes of early-diverging streptophyte algae shed light on plant terrestrialization.” Nature Plants 6.2 (2020): 95-106.	https://ftp.cngb.org/pub/CNSA/data1/CNP0000228/CNS0021447/CNA0002353/
K. nitens	Hori, Koichi, et al. “Klebsormidium flaccidum genome reveals primary factors for plant terrestrial adaptation.” Nature communications 5.1 (2014): 3978.	https://genome.jgi.doe.gov/portal/pages/dynamicOrganismDownload.jsf?organism=Klenit1
C. braunii	Nishiyama, Tomoaki, et al. “The Chara genome: secondary complexity and implications for plant terrestrialization.” Cell 174.2 (2018): 448-464.	https://bioinformatics.psb.ugent.be/gdb/Chara_braunii/
A. agrestis oxford	Li, Fay-Wei, et al. “Anthoceros genomes illuminate the origin of land plants and the unique biology of hornworts.” Nature plants 6.3 (2020): 259-272.	https://www.hornworts.uzh.ch/en/download.html
M. polymorpha	Montgomery, Sean A., et al. “Chromatin organization in early land plants reveals an ancestral association between H3K27me3, transposons, and constitutive heterochromatin.” Current Biology 30.4 (2020): 573-588.	https://marchantia.info/download/MpTak_v6.1/
P. patens	Lang, Daniel, et al. “The Physcomitrella patens chromosome‐scale assembly reveals moss genome structure and evolution.” The Plant Journal 93.3 (2018): 515-533.	https://data.jgi.doe.gov/refine-download/phytozome?organism=Ppatens&expanded=318
S. moellendorffii	Banks, Jo Ann, et al. “The Selaginella genome identifies genetic changes associated with the evolution of vascular plants.” science 332.6032 (2011): 960-963.	https://data.jgi.doe.gov/refine-download/phytozome?q=Selaginella+moellendorffii&expanded=Phytozome-91
A. filiculoides	Li, Fay-Wei, et al. “Fern genomes elucidate land plant evolution and cyanobacterial symbioses.” Nature plants 4.7 (2018): 460-472.	https://fernbase.org/ftp/Azolla_filiculoides/Azolla_asm_v1.1/
A. thaliana	Cheng, Chia‐Yi, et al. “Araport11: a complete reannotation of the Arabidopsis thaliana reference genome.” The Plant Journal 89.4 (2017): 789-804.	https://data.jgi.doe.gov/refine-download/phytozome?q=Arabidopsis+thaliana&expanded=Phytozome-447
S. lycopersicum	Hosmani, Prashant S., et al. “An improved de novo assembly and annotation of the tomato reference genome using single-molecule sequencing, Hi-C proximity ligation and optical maps.” biorxiv (2019): 767764.	https://data.jgi.doe.gov/refine-download/phytozome?q=Solanum+lycopersicum&expanded=Phytozome-691
Z. mays	Jiao, Yinping, et al. “Improved maize reference genome with single-molecule technologies.” Nature 546.7659 (2017): 524-527.	https://data.jgi.doe.gov/refine-download/phytozome?q=zea+mays&expanded=Phytozome-493
B. distachyon	DNA sequencing and assembly Barry Kerrie 5 Lucas Susan 5 Harmon-Smith Miranda 5 Lail Kathleen 5 Tice Hope 5 Schmutz (Leader) Jeremy 4 Grimwood Jane 4 McKenzie Neil 7 Bevan Michael W. michael. bevan@ bbsrc. ac. uk 7 k, Gene analysis and annotation Haberer Georg 16 Spannagl Manuel 16 Mayer (Leader) Klaus 16 Rattei Thomas 17 Mitros Therese 6 Rokhsar Dan 6 Lee Sang-Jik 18 Rose Jocelyn KC 18 Mueller Lukas A. 19 York Thomas L. 19, and Comparative genomics Salse (Leader) Jerome 27 Murat Florent 27 Abrouk Michael 27 Haberer Georg 16 Spannagl Manuel 16 Mayer Klaus 16 Bruggmann Remy 13 Messing Joachim 13 You Frank M. 8 Luo Ming-Cheng 8 Dvorak Jan 8. “Genome sequencing and analysis of the model grass Brachypodium distachyon.” Nature 463.7282 (2010): 763-768.	https://data.jgi.doe.gov/refine-download/phytozome?q=Brachypodium+distachyon&expanded=Phytozome-556
O. sativa	Ouyang, S. et al. The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic Acids Res. 35, D883–D887 (2007).	https://data.jgi.doe.gov/refine-download/phytozome?q=Oryza+sativa&expanded=Phytozome-323
P. margaritaceum	Jiao, C. et al. The Penium margaritaceum genome: hallmarks of the origins of land plants. Cell 181, 1097–1111.e12 (2020).	http://bioinfo.bti.cornell.edu/cgi-bin/Penium/download.cgi
Closterium sp. NIES67	Tsuchikane, Yuki, and Hiroyuki Sekimoto. “The genus Closterium, a new model organism to study sexual reproduction in streptophytes.” New Phytologist 221.1 (2019): 99-104.	https://www.ncbi.nlm.nih.gov/protein/?term=BQMA+Closterium
Z. circumcarinatum SAG698-1b	Feng, Xuehuan, et al. “Chromosome-level genomes of multicellular algal sisters to land plants illuminate signaling network evolution.” bioRxiv (2023): 2023-01.	https://phycocosm.jgi.doe.gov/Zygcir6981b_2/Zygcir6981b_2.home.html
Z. circumcarinatum SAG698-1a	Feng, Xuehuan, et al. “Chromosome-level genomes of multicellular algal sisters to land plants illuminate signaling network evolution.” bioRxiv (2023): 2023-01.	https://phycocosm.jgi.doe.gov/Zygcyl6981a_1/Zygcyl6981a_1.home.html
Z. circumcarinatum UTEX1560	Feng, Xuehuan, et al. “Chromosome-level genomes of multicellular algal sisters to land plants illuminate signaling network evolution.” bioRxiv (2023): 2023-01.	https://phycocosm.jgi.doe.gov/Zygcir1560_1/Zygcir1560_1.home.html
Z. circumcarinatum UTEX1559	Feng, Xuehuan, et al. “Chromosome-level genomes of multicellular algal sisters to land plants illuminate signaling network evolution.” bioRxiv (2023): 2023-01.	https://phycocosm.jgi.doe.gov/Zygcir1559_1/Zygcir1559_1.home.html
M. endlicherianum	Dadras, Armin, et al. “Environmental gradients reveal stress hubs pre-dating plant terrestrialization.” Nature Plants (2023): 1-20.	https://mesotaenium.uni-goettingen.de/download.html
S. muscicola	Cheng, Shifeng, et al. “Genomes of subaerial Zygnematophyceae provide insights into land plant evolution.” Cell 179.5 (2019): 1057-1067.	https://figshare.com/articles/dataset/
P. coloniale	Li, Linzhou, et al. “The genome of Prasinoderma coloniale unveils the existence of a third phylum within green plants.” Nature ecology & evolution 4.9 (2020): 1220-1231.	https://phycocosm.jgi.doe.gov/Praco1/Praco1.home.html

Methods and tools that were used in this research

Tool	Version	Link
ape	5.7-1	https://github.com/emmanuelparadis/ape
alluvial	0.2-0	https://github.com/mbojan/alluvial
Blast	2.15.0+	https://ncbiinsights.ncbi.nlm.nih.gov/2023/11/21/faster-focused-searches-blast-2-15/
clusterProfiler	4.10.0	https://github.com/YuLab-SMU/clusterProfiler
ComplexHeatmap	2.18.0	https://github.com/jokergoo/ComplexHeatmap
conda	4.12.0	https://conda.io/projects/conda/en/latest/user-guide/install/index.html
cowplot	1.1.2	https://github.com/wilkelab/cowplot
Dirichlet process Gaussian process (DPGP)	-	https://github.com/PrincetonUniversity/DP_GP_cluster
DT	0.31	https://rstudio.github.io/DT/
edgeR	4.0.6	https://bioconductor.org/packages/release/bioc/html/edgeR.html
eggNOG-mapper	2.1.12	https://github.com/eggnogdb/eggnog-mapper
enrichplot	1.22.0	https://github.com/YuLab-SMU/enrichplot
FastQC	0.12.1	https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
ggdist	3.3.1	https://mjskay.github.io/ggdist/
ggforce	0.4.1	https://github.com/thomasp85/ggforce
gghalves	0.1.4	https://github.com/erocoar/gghalves
ggtree	3.6.0	https://bioconductor.org/packages/release/bioc/html/ggtree.html
htmltools	0.5.7	https://rstudio.github.io/htmltools/
igraph	1.6.0	https://igraph.org/
InterProScan	5.64-96.0	https://interproscan-docs.readthedocs.io/en/latest/
iqtree	IQ-TREE multicore version 2.2.6 COVID-edition	http://www.iqtree.org/
Kallisto	0.48.0	https://pachterlab.github.io/kallisto/
limma	3.58.1	https://bioconductor.org/packages/release/bioc/html/limma.html
MAFFT	7.520	https://mafft.cbrc.jp/alignment/software/
msaR	0.6.0	https://cran.r-project.org/web/packages/msaR/index.html
MultiQC	1.16	https://multiqc.info/
Orthofinder	2.5.5	https://github.com/davidemms/OrthoFinder
pheatmap	1.0.12	https://cran.r-project.org/web/packages/pheatmap/index.html
plotly	4.10.3	https://plotly.com/r/
Python	various versions were installed using conda for each software	https://www.python.org/
qsmooth	1.18.0	https://github.com/stephaniehicks/qsmooth
R	4.3.2	https://cran.r-project.org/
RColorBrewer	1.1-3	https://cran.r-project.org/web/packages/RColorBrewer/index.html
reshape	0.8.9	https://cran.r-project.org/web/packages/reshape/index.html
rmarkdown	2.25	https://rmarkdown.rstudio.com/
RStudio	2023.12.1 Build 402	https://posit.co/download/rstudio-desktop/
scales	1.3.0	https://cran.r-project.org/web/packages/scales/index.html
seqtk	1.4-r122	https://github.com/lh3/seqtk
Shiny	1.8.0	https://shiny.posit.co/
shinydashboard	0.7.2	https://rstudio.github.io/shinydashboard/
Sliding Window Inference for Network Generation (SWING)	-	https://github.com/bagherilab/SWING
Snakemake	7.25.2	https://snakemake.readthedocs.io/en/stable/index.html
TapScan	2	https://plantcode.cup.uni-freiburg.de/tapscan/
The Gene Ontology Resource	-	https://geneontology.org/
The Arabidopsis Information Resource	Araport11	https://www.arabidopsis.org/
Tidyverse	2.0.0	https://www.tidyverse.org/
treeio	1.22.0	https://github.com/YuLab-SMU/treeio
Trimmomatic	0.39	http://www.usadellab.org/cms/index.php?page=trimmomatic
tximport	1.30.0	https://bioconductor.org/packages/release/bioc/html/tximport.html
Weighted Gene-Co-expression Network Analysis (WGCNA)	1.72-5	https://web.archive.org/web/20221130104830/http://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/