2024
SOYGEN3: Building capacity to increase soybean genetic gain for yield and composition through combining genomics-assisted breeding with characterization of future environments
Category:
Sustainable Production
Keywords:
GeneticsGenomics
Lead Principal Investigator:
Aaron Lorenz, University of Minnesota
Co-Principal Investigators:
Asheesh Singh, Iowa State University
William Schapaugh, Kansas State University
Dechun Wang, Michigan State University
Carrie Miranda, North Dakota State University
Katy M Rainey, Purdue University
Leah McHale, The Ohio State University
Matthew Hudson, University of Illinois at Urbana-Champaign
Nicolas Frederico Martin, University of Illinois at Urbana-Champaign
Andrew Scaboo, University of Missouri
George Graef, University of Nebraska
David Hyten, University of Nebraska at Lincoln
+10 More
Project Code:
Contributing Organization (Checkoff):
Leveraged Funding (Non-Checkoff):
The proposed work would not be possible without other sources of funding supporting the extensive field trials. For example, the participants of the Uniform Tests grow the trials with funding from other sources. For the 2024 test, if two reps are planted at each location, almost 11,000 plots would be grown (this is an underestimate as three reps are grown at some locations). Assuming it costs $30 to prepare, plant, grow, and harvest a yield plot, this would be a contribution of $330,000. In total, on an annual basis project co-investigators receive more than $1M/yr in funding related to this project, most from QSSBs. In addition to this funding, breeders (Graef, Martin, Rainey, Lorenz, McHale, Scaboo, Singh, Wang, Miranda) also participate in United Soybean Board funded projects for breeding for improved seed quality and composition.
Show More
Institution Funded:
Brief Project Summary:
Demand for soybeans is extraordinarily high and is expected to remain high, being driven by demand for soybean oil as a renewable fuel feedstock, demand for protein, and production disruptions across the world. To meet the global demand for food and fuel without cultivating marginal and sensitive land, U.S. soybean farmers need to maximize yield per acre in the face of rapidly changing environments. A key driver of U.S. soybean yield since the 1940s has been the development of new varieties through plant breeding. After the implementation of scientific soybean breeding in the 1930s and 1940s in the U.S., soybean yields have dramatically increased, regions of production have expanded, varieties...
Unique Keywords:
#breeding & genetics, #environmental adaptation, #genomic prediction, #soybean breeding, #yield
Information And Results
Project Summary

Demand for soybeans is extraordinarily high and is expected to remain high, being driven by demand for soybean oil as a renewable fuel feedstock, demand for protein, and production disruptions across the world. To meet the global demand for food and fuel without cultivating marginal and sensitive land, U.S. soybean farmers need to maximize yield per acre in the face of rapidly changing environments. A key driver of U.S. soybean yield since the 1940s has been the development of new varieties through plant breeding. After the implementation of scientific soybean breeding in the 1930s and 1940s in the U.S., soybean yields have dramatically increased, regions of production have expanded, varieties with defensive traits have been developed, and seed composition has been altered to meet various premium-based specialty markets. These foregoing facts show that soybean breeding is a powerful activity capable of transforming the agricultural landscape and making U.S. farms more competitive and profitable.

Despite these successes of soybean breeding, formidable challenges remain. One major challenge is the ubiquity of genotype-by-environment interactions. Genotype-by-environment interactions occur when varieties that do relatively well in some environments perform relatively poorly in other environments. This phenomenon slows the progress in developing broadly adapted varieties and necessitates more field testing across years and locations. It commonly occurs across years, which is particularly frustrating to the breeder. The timespan of a variety development program (7-10 years from cross to variety release) combined with genotype-by-environment interaction and climate change effectively makes it necessary for breeders to somehow breed for future environments, not necessarily the ones they are testing in now. On the other hand, genotype-by-environment interactions can be viewed as an opportunity to develop locally adapted varieties if it can be sufficiently exploited through well-defined target environments and their characterization for purposes of prediction.

Advances in DNA sequencing and the science of genomics has been revolutionizing crop breeding for more than a decade now, making it easier to identify genes underlying economically important traits, search for useful genetic diversity, and make faster and more effective selections through “genomic selection”. Genomic prediction and selection is a breeding method in which line selection and advancement decisions are made on the basis of genomic data only, allowing breeders to save time and resources. Numerous scientific articles have been published on the development and optimization of genomics-assisted breeding techniques. However, implementation in actual breeding programs still lags, especially in public-sector programs and small- to mid-size industry programs. Since the inception of the SOYGEN (Science Optimized Yield Gains across ENvironments) initiative funded by NCSRP, we have made a concerted effort to develop the resources and tools needed implement genomics-assisted breeding techniques. The SOYGEN network consists of all public soybean breeding programs located in the North Central region along with key collaborators in the areas of genomics, genotyping technology, and precision agriculture (Figure 1). We have compiled and curated existing variety performance data from our regional trial network and deposited them in a relational and searchable database from which data can be easily retrieved for analysis (https://soybase.org/ncsrp/queryportal/). We collectively genotyped nearly 3,300 advanced elite breeding lines entered in our regional trial network with genome-wide markers and developed low-cost low-density DNA marker technology necessary for conducting cost-effective genomic selection. To help use this genotypic data in making breeding decisions, we developed workflows and analysis tools (Figure 2). During the course of this initiative we have made over 10,000 genomic predictions, predicted cross value of over 1.2 million potential cross combinations, and dramatically increased our genotyping capacity. These advancements have been used to facilitate rapid-cycling genomic selection to increase genetic per year, select upon early-generation progenies at the plant row stage increase program efficiency, and identify parental combinations expected to create promising breeding populations in terms of average performance and variation.

Despite this progress, there is still work to do to continue to completely infuse genomics-assisted breeding into public soybean breeding programs. There are three new challenges we would like to tackle to advance genomics-assisted breeding in soybean: 1) Collect and model extremely dense “low pass sequencing” data, project sequence data onto breeding populations, and use it routinely in breeding programs; 2) Predict performance of varieties in future environments through modelling genotype-by-environment interaction effects and environmental parameters to improve varietal stability, increase efficiency, and more effectively develop varieties for future environments and local adaptation; 3) Use of structural variant data for enhancing genomic predictions and connecting yield stability to underlying genetics. We have deliberately chosen these objectives because they are not only major questions facing public programs but are also major questions facing large multi-national companies striving to leverage genomics to deliver new higher yielding products more rapidly and effectively to farmers. Such companies – large, mid-sized, and small – look to public programs to investigate such questions of general interest that sometimes involve high-risk, high-reward experimentation (see letters of support).

Accomplishing the foregoing objectives will advance soybean breeding methodology to help ensure continued genetic gain is made for yield, defensive traits, and seed composition well into future. Findings from our studies will be published in peer-reviewed open-access journals so that the knowledge we generate is available to everyone in the soybean seed industry. Findings will also be integrated into our current public programs to enhance their effectiveness and efficiency. Finally, keeping our public programs on the leading edge of breeding technology contributes to graduate and undergraduate education and thus produces future plant breeders, geneticists and other agricultural scientists well equipped to join the seed industry and create ever higher yielding soybean varieties.

Project Objectives

Goal: The overall goal of this project is to advance genomics-assisted breeding for the development of future superior soybean varieties improved for both yield and composition. We will accomplish this using a multipronged approach including developing better breeding methods and furthering routine implementation of genomic prediction in actual public soybean breeding programs.
Our overall goal can be broken down into three interrelated objectives:

1. Continue to develop and enhance genomics-assisted breeding resources and tools to facilitate routine application in public breeding programs.

2. Develop and test methods for predicting cultivar performance in future target environments through genomics-assisted breeding models, phenomics, and environment characterization.

3. Discover structural variants and test whether modelling structural variants improves genomic predictions for yield and seed composition.

Project Deliverables

The following will be delivered upon completion of this three-year project:
1. Publicly available resources and tools for soybean breeders to implement cost-effective genomic prediction in their programs.

2. Publicly available knowledge on genetic control of genotype-by-environment interaction in soybean, and improved models for prediction of breeding line performance in new environments. Knowledge will be made available through open-access publications, presentations at scientific meetings, and presentations to the seed industry.

3. Identification of important structural variants that control seed yield and composition, and publication of knowledge on any benefit into explicitly modeling structural variants for predicting breeding line performance.

4. Enhanced germplasm and superior varieties developed through adoptions of genomics-assisted breeding techniques better adapted to future environmental conditions.

Progress Of Work

Update:
Objective 1. Continue to develop and enhance genomics-assisted breeding resources and tools to facilitate routine application in public breeding programs.
Objective 1 can be broken down into five sub-objectives for which we can report specific and significant progress during these past six months.
Sub-obj 1: Continue to genotype with genome-wide and trait-targeted markers all new breeding lines entered in the Northern Uniform Soybean Tests
Progress: 560 new NUST breeding lines were genotyped for 20 trait targeted markers using a commercial service. These results were returned and published in the NUST reports to help breeders make selections on SCN resistance, brown stem rot resistance, IDC resistance, and more. Additionally, high quality DNA from these 560 lines was extracted and sent to Gencove for genome-wide skim sequencing. We are awaiting return of data from the vendor. Once the data is returned, it will be filtered and deposited in Soybeanbase, where we have already made NUST genotype data publicly available.
Working with Rex Nelson at USDA, we’ve processed and uploaded the 2023 Norther Uniform Trials data to the Soybase query portal (https://www.soybase.org/ncsrp/queryportal/). This database now hosts NUST trial data from 1993-2023.

Sub-obj 2: Enable individual public breeding programs to test and use genomic prediction
This project has instigated and enabled several public programs to start using genomic prediction routinely. Below are some highlights from reports from individuals programs that are part of the SOYGEN initiative.
UMN has refined its GS pipeline and tested it extensively on the UMN Preliminary Yield Trials (PYT) data. PYT 2023 progeny population lines were assayed using 1K low-density (LD) genotyping assay and parents of PYT23 lines from the crossing block were assayed using a low-pass sequencing platform to generate high density (HD) variant data. The 50K SoySNP Chip subset from the HD data set as the parental reference panel to impute 1K LD set to 50K HD set (~30K SNPs after QC). We used this imputed data to make genomic predictions using genomic prediction models that include GxE interaction effects. We are also designing experiments to compare the efficacy of genomic prediction with phenotypic selection. We’ve also expanded the selection of our GxE models and tested several of these using the 2023 PYT data.
ISU genotyped all lines in their prelim and advanced yield trials. A newly hired postdoctoral research associate is working on developing genomic prediction models.
NDSU is currently working to develop a genomic selection pipeline for future use in the NDSU breeding program. They have begun testing different models through a cross-validation procedure for predicting yield and maturity, using the Agriplex marker set and phenotype data from 2022 and 2023 for roughly 1,000 experimental lines. In the short term, we are aiming to develop training populations and assess accuracy for predicting yield and maturity from past years within our program. The Agriplex genotype data was all funded through the SOYGEN project and the progress we’ve made to this point would have been impossible without this support.
KSU is continuing to evaluate progeny of the rapid cycling experiment. Last year’s data was not the best because of terminal drought conditions. They would like to place the experiment in the field this year, but trying to figure out if they can handle it. In 2020, we setup our crossing block based on 1) GS combinations, and 2) Breeder combinations. F4 derived lines from those populations are in preliminary yield trials this year. We have about 300 entries from populations created based of genomic predictions, and about 300 lines from the breeder’s selections. They have genotyped all 600 entries in the rapid cycling experiment which will be another layer of information to examine the response to selection.
Purdue implemented the genomic selection experiment in progeny rows as part of SOYGEN. They studied the efficacy of genomic selection for yield compared to phenotypes only, and added an objective combining genomic and phenomic data as well. Across two years, we genotyped and phenotyped ~10,000 progeny rows and planted ~2,000 preliminary yield trial plots across four environments. We finished this experiment in the 2023 season and are currently writing a manuscript describing results. Preliminary results indicate that phenotypic and genomic selection for yield were equivalent, but including biomass phenotypes in genomic selection increased accuracy of yield prediction by 10%.
MSU is genotyping all breeding lines with 6K SNPS and using genomic prediction models to predict white model and SDS resistance.

Sub-obj 3: Development of a genomic prediction R-Shiny app for easy implementation of GS for breeders.
Work on the application has continued. We have built in functionality for various types of genotype imputation, including a powerful pedigree-based approach called AlphaPlantImpute.
This will help us go from data to predictions in a streamlined, effective way using one application. We are still working on implementing the genotype-by-environment prediction models. This is getting closer, and once this is up and running, we will write a publication releasing this application to the wider public. We have met with at least four separate research groups who have expressed interest in this application, and we sent them copies for beta testing.

Sub-obj 4: Adopting and advancing BreedBase for storage of information for soybean genomic prediction.
There is little to report on this objective except for the fact that we continue to work with the USDA Breeding OnRamp team to optimize BreedBase for public soybean breeding (called “Soybeanbase”). We have met with this team periodically to make improvements to the database. At least four programs in SOYGEN are using this database for regular organization of genome-wide marker data.

Sub-obj 5: Connect target and training populations using imputation that leverages pedigree relationships and enhance this capacity by inclusion of this method in the software application.
This sub-obj has been completed this past reporting period. We have explored the use of AlphaPlantImpute and found that imputation accuracy is very high when projecting high-density SNPs onto low-density SNPs. This method has been incorporated into the genomic prediction R Shiny App as described above. A grad student presented two posters on this research this past reporting period and will prepare a publication.


Objective 2. Develop and test methods for predicting cultivar performance in future target environments through genomics-assisted breeding models, phenomics, and environmental characterization.
For this objective, we are conducting a multi-environment, multi-institutional coordinated performance trial of 1200 diverse breeding lines. Each breeding line will be phenotyped for several agronomic and phenological traits, and each will be genotyped using low pass re-sequencing technologies. Detailed environmental for each growing location in each year will be collected and analyzed. The ultimate goal is to better predict the interactions between the environment and genotype. If we are successful, we leverage genomic data, phenotype data, and environmental data to predict how new breeding lines may perform in future environments that a producer is most likely to encounter.
The last report focused the successful seed increases we conducted last summer. During the last reporting period, the main goal was to design entry lists for each RM Set, design field maps and field books, and package seed for shipment for planting. The grad student funded on this project organized all the logistics in terms of receiving seed, packaging seed, and sending seed back out to cooperators.
Over 1200 packs of seed were shipped to UMN, and seed was packaged, and shipped back out to nine universities that will plant multi-location yield trials. Planting will commence once weather conditions allow. While describing this feat does not take much space, it was indeed quite an undertaking for the grad student involved to receive all this seed, organize it, do a quality control check, and ship it back out for specific yield trials.


Objective 3. Discover structural variants and test whether modelling structural variants improves genomic predictions for yield and seed composition.

We have fully sequenced the NAM founders using Illumina, we have conducted and optimized SNP variant calling, and have now effectively utilized various structural variant (SV) caller tools in tandem to identify SVs within the soybean NAM parents' dataset. Specifically, Sentieon has revealed approximately 470,000 unfiltered SVs. Delly has identified about 35,000 unfiltered SVs, and CNVnator has detected approximately 4,000 unfiltered copy number variations (CNVs). Currently, we are executing a pipeline that incorporates Manta and Smoove, aiming to uncover additional SVs. The primary objective is to isolate high-quality SVs. To achieve this, we will prioritize SVs that have been consistently identified by at least two distinct SV caller tools, ensuring the reliability of the detected variants. Once we have the full SV dataset we will proceed with determining their effect on heritability within the soybean breeding population. The grad student funded on this project is also re-writing and improving the pipeline for better ease-of-use and reproducibility.
Meanwhile, we have sent 19 high-quality samples to JGI so far to begin sequencing the core of the soybean pangenome, including key North-Central founder lines such as Lincoln, current public elite lines, and the SCN indicator lines. We plan to submit 200 more samples this year as we ramp up the generation of DNA for this very large project which leverages SOYGEN funds.

Peer-reviewed publications for this reporting period
1) Wartha, C., and A.J. Lorenz. 2024. Genomic predictions of genetic variances and correlations among traits for breeding crosses in soybean. Nature Heredity (Accepted pending revision)

2) Wang, H., X. Zhao, L. Tan, J. Zhu, D. Hyten. 2024. Crop DNA extraction with lab-made magnetic nanoparticles. Plos ONE: doi.org/10.1371/journal.pone.0296847/

3) Mahmood Anser , Bilyeu Kristin D. , Škrabišová Mária , Biová Jana , De Meyer Elizabeth J. , Meinhardt Clinton G. , Usovsky Mariola , Song Qijian , Lorenz Aaron J. , Mitchum Melissa G. , Shannon Grover , Scaboo Andrew M. Cataloging SCN resistance loci in North American public soybean breeding programs. Frontiers in Plant Science. 14. 2023. https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2023.1270546. DOI 10.3389/fpls.2023.1270546

4) Viana, J.P.G., A. Avalos, Z. Zhang, R. Nelson, M. Hudson. 2024. Common signatures of selection reveal target loci for breeding across soybean populations. Crop Sci.: doi.org/10.1002/tpg2.20426

Invited presentations

1) Lorenz, A.J., et al. 2024. Developing resources to advance the implementation of genomic prediction in soybean. BioOnRamp USDA Webinar. Feb. 23, 2024.
2) Lorenz, A.J., et al. 2024. Developing resources to advance the implementation of genomic prediction in soybean. International Institute of Tropical Agriculture Webinar. March 7, 2024.

View uploaded report Word file

Updated November 6, 2024:
Overview: We continue to make steady progress developing data resources and tools for testing and applying genomic prediction to public soybean breeding. These efforts will advance genomics-assisted breeding overall, leading to greater gains for yield in the future. A unique feature of this project is the large GxE project we are undertaking as a large multi-institutional group of researchers. We will collected yield data on over 1200 breeding lines evaluated at 20 locations 2024. This work will be repeated in 2025 at another 20 locations, totally 40 environments! These lines are also being genotyped using DNA skim sequencing technologies. These data will allow us to develop and test methods driving future advancements in predicting the ways in which individual varieties and breeding lines interact with specific environments.

Note that all figures are tables are visible in this textbox. Please see attached progress report in a Microsoft Word document for figures and tables.


Objective 1. Continue to develop and enhance genomics-assisted breeding resources and tools to facilitate routine application in public breeding programs.

Objective 1 can be broken down into five sub-objectives for which we can report specific and significant progress during these past six months.

Sub-obj 1: Continue to genotype with genome-wide and trait-targeted markers all new breeding lines entered in the Northern Uniform Soybean Tests

Progress: This past summer we planted 511 new advanced breeding lines from the Northern Uniform Soybean Tests (NUST) and NUST SCN regional tests. Tissue was collected from each line and DNA was extracted. These are currently being prepared for shipment to our genotyping vendor. The 560 NUST breeding lines sampled last year were sent off for genotyping as described in the last quarterly report. Data was received and deposited into Soybeanbase as we anticipated in the last report.

As described in the last report, we processed and uploaded the 2023 Norther Uniform Trials data to the Soybase query portal (https://www.soybase.org/ncsrp/queryportal/). This database now hosts NUST trial data from 1993-2023. Data from the 2024 NUST trials was just collected this past fall. Once it is sent to Adam Brock, NUST coordinator, we will format it and upload it to the website.

A manuscript on this work we have pursued for the last several years has been submitted to the scientific journal Crop Science. It is currently under review. The manuscript described findings we’ve made from the data thus far, and publicly releases the data we have collected to the community who can analyze it to answer their own questions about the genetic control of phenotypes and optimization of genomics-assisted breeding.


Sub-obj 2: Enable individual public breeding programs to test and use genomic prediction

This project has instigated and enabled several public programs to start using genomic prediction routinely. Below are some highlights from reports from individuals programs that are part of the SOYGEN initiative.

UMN has refined its GS pipeline and tested it extensively on the UMN Preliminary Yield Trials (PYT) data. PYT 2023 progeny population lines were assayed using 1K low-density (LD) genotyping assay and parents of PYT23 lines from the crossing block were assayed using a low-pass sequencing platform to generate high density (HD) variant data. The 50K SoySNP Chip subset from the HD data set as the parental reference panel to impute 1K LD set to 50K HD set (~30K SNPs after QC). We used this imputed data to make genomic predictions using genomic prediction models that include GxE interaction effects. This last summer we planted a trial including lines selected using genomic prediction and phenotypic selection. The trial was successfully planted at six locations in Minnesota. Every location yielded good data recently collected during harvest. We are currently processing the data to assess whether our genomic predictions using the pipeline built in this project were successful or not.

The other highlight to describe is from the University of Missouri. Andrew Scaboo’s lab is diving into the data we collected as part of SOYGEN2 in the genomic selection experiment. This experiment tested genomic prediction versus phenotypic selection versus random selection at four universities: University of Minnesota, North Dakota State University, University of Illinois, and University of Missouri. The selection treatments we applied in the originally designed experiment were not as successful as we had hoped. Currently, we are thoroughly analyzing the data to figure out why the genomic selection treatment was not as successful as we had anticipated, and how we can better understand and utilize it in the future. Because this multi-institutional dataset is very large and complex, we are first starting to develop the analysis framework and treatments using the Missouri data only.
The first thing we did was analyze the molecular marker data to make sure no mistakes were made when sampling. The neighbor-joining tree displayed in Figure 1 shows that breeding families clustered together on the same branches of the tree, indicating the molecular marker relationships recapitulated the pedigree relationships. This suggests to us that the sampling was properly performed.

Figure 1. Neighbor-joining tree showing the relationships among breeding lines from the same family. Lines from the same family share a common color. It can be seen that lines from the same family largely cluster together.

The next thing we looked at was the quality of the phenotypic data by estimating the “broad-sense heritability” of the yield performance. Table 1 below displays the broad-sense heritability of the yield data from each family (indicated by experiment name). Broad-sense heritabilities were moderate to high for most families, but low for a couple of families such as EDGS(3)E and EDGS(3)G. Predictive ability within the families with low broad-sense heritability were also low as expected. It is not possible to achieve good correlations between genomic predictions and phenotypes if the contribution of the genetic component of the phenotype is low.


Table 1. Broad-sense heritability estimates of the yield data collected for each breeding family in the Missouri dataset. Data was collected using three reps grown in two years.

We went back through the data and re-applied our genomic prediction pipeline, modifying genotype imputation methods and estimating better genotype effects using the phenotype data. Figure 2 below shows the relationship between genomic predictions (y-axis) and observed yield (x-axis) for each family. Several families displayed good to moderate predictive ability (A=0.41, I=0.45, K=0.62), but several still displayed low predictive ability, but some of that is due to the low broad-sense heritability, such for family E (Broad-sense heritability=0.21, predictive ability= -0.24).


Figure 2. Relationship between genomic prediction (y-axis) and yield estimate (x-axis) for each breeding line of each family.

We are still exploring this dataset. Next steps include creating new models that model genotype-by-environment interaction effects. We will also test the effect of modeling historical data from the NUST dataset, and alternative ways of estimating genotype effects of breeding lines in the validation trials. New designs for creating training sets will also be explored. Results from these analyses will inform our breeding network on best practices for implementing genomic prediction.
Activities from the other universities were reported in the last report. There are no new activities to report for those universities.

Sub-obj 3: Development of a genomic prediction R-Shiny app for easy implementation of GS for breeders.
Progress on this application this past reporting period has largely been implementation of genotype-by-environment interact effect models for genomic prediction. This is now working, and will be an enormous contribution to the community of breeders who want to deploy these methods but lack the in-house technical skills to write their own software programs.
We’ve also streamlined the pipeline application for faster implementation and better user experience. We’ve redesigned the user interface and implemented effective deployment methods for easier distribution. We’ve also tested the application using several data sets from SOYGEN collaborators and others. The application has also been tested by groups implementing GS in other crops and there is growing interest in using the application among the community of researchers implementing GS. We’ve written the first draft of a manuscript describing the pipeline application and preparing it for publication. Finally, we are exploring means to make the application better with new features like one-click implementation and better modeling capabilities.

Sub-obj 4: Adopting and advancing BreedBase for storage of information for soybean genomic prediction.
There is little to report on this objective except for the fact that we continue to work with the USDA Breeding OnRamp team to optimize BreedBase for public soybean breeding (called “Soybeanbase”). We have met with this team periodically to make improvements to the database. At least four programs in SOYGEN are using this database for regular organization of genome-wide marker data.

Sub-obj 5: Connect target and training populations using imputation that leverages pedigree relationships and enhance this capacity by inclusion of this method in the software application.
This sub-obj has been completed this past reporting period. We have explored the use of AlphaPlantImpute and found that imputation accuracy is very high when projecting high-density SNPs onto low-density SNPs. This method has been incorporated into the genomic prediction R Shiny App as described above. A grad student presented two posters on this research this past reporting period and will prepare a publication. We have shown that the implementation of our imputation method increases genomic prediction accuracy (Figure 3).



Figure 3. Genomic prediction accuracy when increasing marker density through genotype imputation. Comparing the first bar (TP_1K_TRUE) to the third bar (TP_to_TP_Imp) shows that imputation from low density to high density can achieve genomic prediction accuracies equivalent to if those high density marker genotypes were actually collected (costing extra money). Imputation is computationally intensive, but saves costs in actual genotyping.


Objective 2. Develop and test methods for predicting cultivar performance in future target environments through genomics-assisted breeding models, phenomics, and environmental characterization.
For this objective, we are conducting a multi-environment, multi-institutional coordinated performance trial of 1200 diverse breeding lines. Each breeding line will be phenotyped for several agronomic and phenological traits, and each will be genotyped using low pass re-sequencing technologies. Detailed environmental for each growing location in each year will be collected and analyzed. The ultimate goal is to better predict the interactions between the environment and genotype. If we are successful, we leverage genomic data, phenotype data, and environmental data to predict how new breeding lines may perform in future environments that a producer is most likely to encounter.
The last report focused on the design and packaging of these trials. Over 1200 packs of seed were distributed to the various universities for packaging and planting into yield trials. We are happy to report that all yield trials across the Midwest were successfully planted (except for one in Missouri because of excessive spring moisture). We, as a group, planted 20 locations of the SOYGEN GxE study, with each location including over 300 breeding lines replicated two times using an incomplete block design. The entire project includes over 12,400 plots. Date of emergence, R1 developmental stage, height, and yield were collected this past summer and fall. Samples of each plot were collected, and plans are being made to scan then with NIR to predict sample protein and oil. Yield data was collected. We are currently designing a process to efficiently collect yield data from each cooperator in an organized manner and deposit it in our database.

Progress is being made on genotyping all ~1200 breeding lines using skim sequencing as well. Seeds were delivered to the Hyten lab, who planted them in the greenhouse for tissue collection. DNA libraries have been prepared and shipping to DNA sequencing center is currently in progress.


Objective 3. Discover structural variants and test whether modelling structural variants improves genomic predictions for yield and seed composition.

We are building on recently-published work from the Hudson group which, using a novel genome variant-calling pipeline, identified >600k high-confidence structural variants (SVs) (Zhang et al., 2024) in the Sorghum Bioenergy Diversity Panel (Brenton et al., 2016). Using a modified version of this pipeline on SoyNAM (Song et al., 2017), we have identified SVs with the Wm82.a4.v1 reference and are in the process of running a further improved version of the pipeline with the recently-released Wm82.a6.v1 reference. Ongoing research suggests that incorporating SVs into downstream analyses can provide substantial improvements to results reliant on correcting for genotypic similarity in apportioning phenotypic variance.
While research is ongoing and conclusions are necessarily tentative, the net effect of unbalanced SVs (i.e. insertions, deletions, copy-number variation, etc.) modulate plant genome size, and this variation in the mass of nuclear material is a possible cause of phenotypic variation that has not been explored since the advent of sequencing-based analyses. We are developing methods to address this oversight.

One such method leverages k-mer-based genome size estimation (GSE), which has identified unexpectedly large variation among the SoyNAM founder population; an approximate 24% change between the smallest and largest genomes, which is consistent with optical-based GSEs reported in the literature (Leitch et al., 2019). Additionally, we’ve found that this variation in GSE among the SoyNAM founders is highly correlated with many agronomically-important phenotypic traits and the effects sizes for some of them, most notably oil and protein content, are quite large. Finally, we’ve found that including GSEs as an explicit correction in GWAS appears to improve both Type I and Type II error rates.
If borne out, this line of research will have a substantial impact on any branch of science interested in associating genomic variants with phenotypic outcomes, including plant breeding, evolutionary biology, quantitative genetics, ecology, conservation biology, bioinformatics, and human health particularly as it relates to cancer.

Peer-reviewed publications for this reporting period
1) Wartha, C., and A.J. Lorenz. 2024. Genomic predictions of genetic variances and correlations among traits for breeding crosses in soybean. Nature Heredity 133: 173-185. https://doi.org/10.1038/s41437-024-00703-3
2) Wartha, C.A., B. Campbell, V. Ramasubramanian, L. Nice, A. Brock, G. Cai, M.M. Eskandari, G. Graef, M.E. Hudson, D. Hyten, A.L. Mahan, N.F. Martin, L. McHale, C. Miranda, E. Monteverde- Dominguez, R. Nelson, K. Rainey, I. Rajcan, A. Scaboo, W. Schapaugh, A.K. Singh, J. Paolo Gomes, D. Wang, A.J. Lorenz. 2024. Genomic analysis and predictive modeling in the Northern Uniform Soybean Tests. Crop Science (submitted).


Volunteered presentations
1) L. Singh, V. Ramasubramanian, B. Harms, M. Happ, G. Graef, D. Hyten, A. Lorenz. 2024 (Poster) Comparison of imputation methods for projection of markers from low density to high density for genomic selection in soybean (Glycine max). 7th International Conference of Quantitative Genetics held on 22-26 July, Vienna, Austria.

View uploaded report Word file

Final Project Results

Benefit To Soybean Farmers

Soybean breeding has a large impact on the efficiency and profitability of agriculture through the development of high yielding new varieties with critical defensive traits and enhanced seed composition. Ensuring that such programs (both private and public) are using state-of-the-art technologies to drive genetic gain in the face of changing environments and narrowing genetic diversity will contribute to continual development and release of ever better varieties. Additionally, these efforts help to educate future agricultural scientists and soybean breeders that are best prepared to enter the seed industry and develop impactful future products for farmers, keeping the North Central region competitive in soybean production.

The United Soybean Research Retention policy will display final reports with the project once completed but working files will be purged after three years. And financial information after seven years. All pertinent information is in the final report or if you want more information, please contact the project lead at your state soybean organization or principal investigator listed on the project.