2022
SOYGEN2: Increasing soybean genetic gain for yield and seed composition by developing tools, know-how and community among public breeders in the north central US
Category:
Sustainable Production
Keywords:
GeneticsGenomics
Lead Principal Investigator:
Leah McHale, The Ohio State University
Co-Principal Investigators:
Asheesh Singh, Iowa State University
William Schapaugh, Kansas State University
Dechun Wang, Michigan State University
Carrie Miranda, North Dakota State University
Katy M Rainey, Purdue University
Brian Diers, University of Illinois at Urbana-Champaign
Matthew Hudson, University of Illinois at Urbana-Champaign
Nicolas Frederico Martin, University of Illinois at Urbana-Champaign
Aaron Lorenz, University of Minnesota
Pengyin Chen, University of Missouri
Andrew Scaboo, University of Missouri
George Graef, University of Nebraska
David Hyten, University of Nebraska at Lincoln
+12 More
Project Code:
Contributing Organization (Checkoff):
Institution Funded:
Brief Project Summary:
The SOYGEN team is adding value to the Northern Uniform Soybean Trials (NUST). They will add environmental and genotypical data to NUST and the SCN Regional Trials. The team will focus on the development and use of high-throughput genome-wide genotyping technologies and making these tools widely available. They will evaluate breeding methods that target areas of trait improvement such as yield and seed protein. Breeders will test methods to determine which are most viable to improve genetic gains and will complete the evaluation of diverse soybean genotypes from the USDA Soybean Germplasm Collection to obtain high-quality phenotype and environmental data.
Key Beneficiaries:
#breeders, #farmers, #geneticists
Unique Keywords:
#breeding & genetics, #genetic gain, #germplasm , #protein, #soybean breeding, #trials
Information And Results
Project Summary

The soybean research community has generated incredible public resources for soybean breeding, including collaborative yield trials such as the Northern Uniform Soybean Trials (NUST) which date back to 1941 and commodity board funded-genotypic data and genotyping platforms. However, these tools can be better leveraged to enhance genetic gains for yield and improvement for seed composition in soybean. As part of our first objective, we are adding value and utility to these resources through a breeding database housed within SoyBase, the current community-supported USDA-ARS repository for soybean genetics and genomic data. In addition to the agronomic, resistance, and composition data normally collected in the NUST, we have added GPS coordinates in order to access environmental data for the NUST and have added genotypic data to both the NUST and the SCN Regional Trials, this information will facilitate breeding for stability of both yield and seed composition.

Genomics-assisted breeding entails the use of genome-wide molecular marker data to aid in breeding decisions that make breeding programs more efficient and effective. Applications range from the use of genomic selection, which can increase selection intensity and allow selection of parents earlier in a program, to the use of genomic data to optimally pair parents for creation of breeding populations containing more superior breeding lines and even possibly more favorable correlations between traits such as seed yield and protein. This latter application has been called “genomic mating”.

Numerous scientific articles have been published on the development and optimization of genomics-assisted plant breeding and, in part, through our prior NCSRP project, we have learned about the optimal application of genomics-assisted breeding methods applied to soybean. The actual implementation of genomics-assisted breeding in the public plant breeding communities, however, has been minimal. Thus, Objective 2 is focused on the development and use of high-throughput genome-wide genotyping technologies that are of low cost with high-quality repeatable marker data, and making available tools for genomic data management and decisions that integrate genomic data and phenotypic data along with various analysis pipelines in a user-friendly form. While we are making these tools and technologies widely available, the transfer and availability to the public sector is critical to our ability to effectively train future soybean breeders, many of whom will be employed by private sector companies using these techniques.

Increases in soybean yield through breeding have been slower than growers expect. A collaborative study led by Diers of a historic set of MG II-IV varieties released from 1923 to 2008 revealed a recent rate of genetic gain of 0.43 bu/ac/yr, whereas reports of genetic gain in corn generally range from 1.0 to 1.2 bu/ac/yr. Moreover, this same study found that protein has decreased between these time periods by 1.7 percentage points, an undesirable outcome. Objective 3 of this work focuses on the evaluation of different breeding methods, each of which target one or more areas for improvement, such as selection intensity, accuracy, diversity, and the time required for each breeding cycle, and simultaneous improvement of traits that typically show negative correlations, such as yield and seed protein content. Breeders will implement and test the methods in their own breeding programs to determine which methods are most viable to improve genetic gains. Compiling data across breeding programs will provide power and confidence in our findings.

The proposed activities build on the previous project funded to this group by NCSRP. One main objective in that project dealt with extensive evaluation of diverse soybean genotypes from the USDA Soybean Germplasm Collection over four years and 30 environments to obtain high-quality phenotype and environment data. Completion and follow-up on that is detailed under Objective 4 in this project, and it provides foundational information for tool development and implementation. Information from that study will be leveraged in this project for Objectives 1, 2, and 3. The entire set of 750 accessions evaluated in the project, or various subsets of those (i.e., exotic landraces only, elite germplasm only, certain geographical regions only, etc.) can be used as training sets for prediction of yield, seed composition, maturity, and other traits for various objectives and for other programs.

Project Objectives

Objective 1: Elevating collaborative field trials
Objective 2: Development of a genomic breeding facilitation suite
Objective 3: Evaluation of soybean breeding methods that increase gain
Objective 4: Characterization and use of the USDA Soybean Germplasm Collection, a foundation for future success

Project Deliverables

Objective 1:
(1) Database framework for agronomic, environmental, genotypic, meta and other trait data for collaborative trials.
(2) Database populated with historical and current data from collaborative trials, including agronomic, environmental, genotypic, meta and other trait data.
(3) Data from the uniform tests will become more useful as it will be connected to environmental and genotypic data.
(4) Breeders will better understand how to weigh data from different environments of the NUST understand where new cultivars be more likely to be adapted and tested successfully.

Objective 2: Development of a genomic breeding facilitation suite
1) Streamlined public genotyping service for the public soybean breeding sector at a low enough cost to afford genomic selection on a wide scale.
2) Workshops on genomic selection delivered to public soybean breeding community.

Objective 3:
(1) Methods to improve selection of progeny rows based on genomic selection with secondary traits and/or improved spatial statistics.
(2) Understand the potential to improve the unfavorable correlation between yield and protein in soybean through genomic mating.
(3) Application and limitations established for rapid cycling genomic selection in soybean.
(4) Characterization of allelic effect of putative yield alleles and markers for their selection.

Objective 4:
(1) Include high-throughput phenotype data in the analyses and models to identify important relationships and potential future focus areas for HTP data collection and use.
(2) Include weather and environment data in analyses and models to ID significant factors, better define environment and genotype-environment interaction effects, and evaluate contributions to prediction models.
(3) Complete submission of additional publications with complete set of data, image and environment data.

Progress Of Work

Updated April 19, 2022:
In the last reporting period, 7975 breeding lines from eight breeding programs were genotyped with the 1000 SNP genotyping panel at UNL. DNA has been isolated for an additional 2520 lines which will be genotyped. In total, DNA has been isolated from 17,259 breeding lines and 14739 have been genotyped with the 1000 SNP genotyping panel.

Using the genotype data collected from the 1000 SNP panel at UNL as well as the genotypic and phenotypic data continually collected from the Northern Uniform Regional Trial entries we calculated genomic estimated breeding values (GEBVs) of progeny rows. In March 2022, we calculated GEBVs for a total of 9357 progeny rows from four different breeding programs. GEBVs were calculated from a training set which included 7700 unique genotypes with phenotypic and genotypic data.

This data has been used in combination with or comparison to canopy coverage and yield or other breeders' choice methodology, respectively, to make selections. Outcomes of those selection methods will be evaluated by field trials in 2022.

Updated February 28, 2023:
Objectives 1 & 2: Elevating collaborative field trials & Development of genomic breeding facilitation suite.
A database framework for agronomic, environmental, genotypic, meta and other trait data for previously completed collaborative trials has been generated and populated and made available via Soybase. The soybean implementation of Breedbase, is now in installed, in use, and in its testing phase. In addition to historical datasets from the Uniform Trials, two breeding programs have uploaded current and historical data. Regular meetings with Breedbase staff (one to several times per month) take place in order to ensure that the installation meets all needs of and is usable for soybean breeding programs. Currently, program specific genotypic data (see Obj 3) from SOYGEN2 is being uploaded. All data is shareable and accessible by participating breeders.
Methods of DNA isolation and genotyping have been updated and streamlined so that UNL can now carry out both the DNA isolation and the genotyping, and additionally handling samples outside of the project goals using a fee-based structure. The consolidation of DNA isolation with genotyping will lessen sample handling and reduce quality control issues, thereby making the entire process higher throughput.

Objective 3: Evaluation of soybean breeding methods that increase gain
Genomic selection with canopy coverage has been applied to progeny row tests to make four sets of comparative selection sets for two breeding programs in 2021, a selections from a third breeding program were made in 2022 . Preliminary yield tests for evaluation of selection methods were completed in 2022 and will be completed beyond the scope of the current project in 2023, respectively.
For a total of five breeding programs, advancements to the F4 generation or progeny row tests were made for progeny combinations which were either genomic predicted matings to improve unfavorable yield and protein combinations or based on breeder selections for the same, accounting for up to 25 cross combinations of each type. While substantial materials have been developed, yield and protein seed protein data will be required to evaluate the success of the methods.

Objective 4:
Based on multi-location, collaborative trials of Soybean Collection germplasm panels, high-throughput phenotype data is being analyzed to determine its potential use in and relation to other agronomic data. Data has been made available to collaborators and is being prepared for submission.

Final Project Results

Updated June 28, 2023:
Final project report (FY22)

SOYGEN 2: Increasing soybean genetic gain for yield and seed composition by developing tools, know-how and community among public breeders in the north central US

This SOYGEN (Science Optimized Yield Gains across Environments) project leverages and builds upon ongoing and previously funded work to increase soybean genetic gain for yield and seed composition by developing tools, know-how and community among public breeders in the north central US. Specifically, we have created and tested (or are testing) breeding resources and methods that can be applied to our own breeding programs, or more broadly to soybean breeding programs in general.

Objective 1: Elevating collaborative field trials

Key performance indicators
(1) Standardized data input methods will be developed and will include data quality control methods.
Forms provided to Northern Uniform Regional Trial collaborators were updated to include GPS coordinates. These are now consistently reported for all trial locations. Additionally, data from forms are uploaded to a database. Future plans entail direct uploading by collaborators of data to a SoybeanBase database, however this is still in progress.
https://soybase.org/ncsrp/queryportal/
(2) Existing data from collaborative trials will be quality checked.
Uploading data (past and present) to a database requires quality checking, which has been done https://soybase.org/ncsrp/queryportal/
(3) Collection of genotypic data from the Soy6KSNP chip for UT and SCN regional trial entries.
We have genotyped a total of 3813 UT and SCN UT lines since 2019. We now have a database of 2510 NUST lines genotyped with the 6K SNP chip uploaded to soybeanbase.breedinginsight.net. In 2020 we switched to genotyping via low pass sequencing and imputation provided by Gencove. Imputation accuracy was determined to be extremely high, >99%. This provided us with many many more SNPs (~3 million versus 6000), allowing much more powerful analyses to be performed on this germplasm in future years.
(4) Weather data will be collected for the majority of the future NUST field environments.
Regular reporting of GPS coordinates of field trials allows us to connect to weather databases. We can do this through Soybeanbase and have loaded templates for the 2022 trials into Soybeanbase. Weather data from GPS coordinates was used to determine the independence of trial sites.
(5) The data from the NUST will be analyzed to determine the usefulness of test locations in predicting the performance of the experimental lines.
We leveraged the newly acquired genotype data and combined with our new database holding the UT phenotypic data to test how well genomic prediction might work with the UT trials. We used a leave-one-trial out cross-validation scheme, which basically means we dropped all data from one complete trial out of the dataset, and used the remaining data to develop a genomic prediction model. We then used this genomic prediction trial to predict the trial left out, and correlated observed yield performance with predicted yield performance. We only designated those trials containing more than 20 genotyped lines as validation trials for better estimates of correlations coefficients. This left 17 validation trials. Prediction accuracies were all quite good, ranging from 0.46 to 0.95 (see table below). This indicates that the genotype-phenotype data resource we began building as part of this project has value in terms of assisting future efforts towards genomics-assisted breeding.


Deliverables
(1) Database framework for agronomic, environmental, genotypic, meta and other trait data for collaborative trials and (2) Database populated with historical and current data from collaborative trials, including agronomic, environmental, genotypic, meta and other trait data
Soybeanbase (https://soybeanbase.breedinginsight.net/) and a SQL database hosted at Soybase (https://soybase.org/ncsrp/queryportal/) have been created and are available for researchers to deposit their data. The long-term plan is to host both genotype and phenotype data at Soybeanbase, which is part of SOYGEN3 objectives. Currently, Soybeanbase hosts genotype data on 2510 UT and SCN UT breeding lines. We are still determining how to host data for the 1303 lines genotyped with low-pass sequencing. Soybeanbase also holds data for SoyNAM and internal breeding lines, amounting to genotype data being stored for over 10,000 genotypes. The Soybase SQL database holds phenotypic data on more than 8000 advanced breeding lines dating back to 1993. The phenotypic data represents over 1650 unique environments, from years ranging from 1993 to 2021. Data from 2022 is currently being imported. The total dataset consists of over 128,000 yield datapoints, as well as data on 18 other traits including maturity date, seed composition, and disease resistance.
(2) Data from the uniform tests will become more useful as it will be connected to environmental and genotypic data.
We performed several analyses on the genotype and phenotype data that make up this dataset, and we have prepared a manuscript that is very close for submission to a peer-reviewed scientific journal. In this manuscript, we have made the UT genotype and phenotype data public, characterized the data available to researchers all over the world, determined the genetic relationships among all lines submitted to the UT, made genotype-phenotype associations, and tested genomic prediction models. Some interesting findings included the lack of strong population differentiation among breeding programs. This finding highlights the role of the cooperative Uniform Trials in facilitation of germplasm sharing among breeding programs, which helps all programs achieve greater sustained genetic gain. Secondly, we found the biggest driver of population differentiation among maturity groups was the E2 locus, with a few other loci showing effects. We also identified some genomic regions lacking genetic diversity in one maturity group, but for which there was genetic variation in other maturity groups. This could help future breeding efforts identify such regions for targeted incorporation of diversity into key genomic regions lacking diversity, perhaps caused by genetic drift. We performed genome-wide association mapping, and found a total of 30 marker-trait associations representing 30 independent QTL. These results helps researchers determine which loci are driving phenotypic variation in the UT germplasm, and tells us that this dataset contains good genetic signal for performing future analyses perhaps on specific questions. Finally, as mentioned above, we were able to train accurate genomic prediction models using these data.
(3) Breeders will better understand how to weigh data from different environments of the NUST understand where new cultivars be more likely to be adapted and tested successfully.
Ranking of entries is dependent on Uniform trial locations, and though site redundancy (in terms of cultivar ranking was correlated to physical distance between trial site locations, it had a slightly higher correlation to environmental variables. The most influential trial sites (those which other sites clustered around in terms of cultivar ranking) were Ames Iowa, Urbana Illinois, Manhattan Kansas, West Lafayette Indiana. The most influential variables grouping sites together were coldest quarter precipitation, driest month/quarter precipitation, annual mean temperature, and annual precipitation.

Objective 2: Development of a genomic breeding facilitation suite

Key performance indicators
(1) Genotyping of 10,000 breeding lines using targeted GBS approach on 1k SNPs during first year of project.
Participating breeding programs each genotyped ~2500 lines as part of the selection experiment, and are continuing to genotype new and advanced breeding lines to develop breeding program specific training sets and use these training sets.
Development of the 1K marker set has been published: Wang, H., B. Campbell, M. Happ, S. McConaughy, A. Lorenz, K. Amundsen, Q. Song, V. Pantalone, D. Hyten. 2022. Development of molecular inversion probes for soybean progeny genomic selection genotyping. Plant Genome doi.org/10.1002/tpg2.20270.
(2) Annual workshop or webinar given on application of genomic selection to soybean breeding.
An in-person workshop was held at the SBW in 2020, a second training was done specifically for the SOYGEN team via Zoom in 2021, more recently the Breeding Insight team has made themselves available for training of the soybean community in database management through our implementation BreedBase.
(3) Genomic data management system and allied analysis tools for adoption by soybean breeding community identified.
Excitingly, we have identified and adopted the breeding database and genomic data management system, BreedBase. The soybean implementation of this is called Soybeanbase (https://soybeanbase.breedinginsight.net/). This has been adopted and implemented through the help of Rex Nelson and the Breeding Insight team.
We have also created a streamlined analysis pipeline that can be run as an R shiny app, greatly increasing the ease with which these data can be analyzed. The app takes data in a standardized format exported from Soybeanbase and executes many of the major steps in a genomic prediction pipeline, including marker data filtering, imputation, training population optimization, model selection, cross validation, and prediction of genetic values for defined target population.
https://github.com/UMN-Lorenz-Group/SoyGen2App


Deliverables
(1) Streamlined public genotyping service for the public soybean breeding sector at a low enough cost to afford genomic selection on a wide scale.
Previously we had developed a set of markers, genotyping methods, and DNA isolation methods to cost effectively provide a genotyping service to the SOYGEN breeders (David Hyten, UNL). Yet, recognizing the throughput limitations of an academic lab with an active research program, we have more recently been working with Agriplex.
(https://www.agriplexgenomics.com/1k-soy?utm_medium=email&_hsmi=244361042&_hsenc=p2ANqtz-8claIasz9m8NLRed190_rTJ1F-kI3CIngv8nPCRRzbDvDj8dMc5CXAces3K1CjF9vnooKH8gjrpzQ2cCAiAKRgM0dafQ&utm_content=244361042&utm_source=hs_email)
The Soybean Community panel, which consists of 1326 SNPs, was developed in collaboration with members of the SOYGEN team and AgriPlex. Using their AgriPlex Connect program, public soybean breeders and geneticists can take advantage of discounted pricing and expedited turn-around times during critical times of the year for selection.
(2) Workshops on genomic selection delivered to public soybean breeding community.
As above, an in-person workshop was held at the SBW in 2020, a second training was done specifically for the SOYGEN team via Zoom in 2021, more recently the Breeding Insight team has made themselves available for training of the soybean community in database management through our implementation BreedBase.

Objective 3: Evaluation of soybean breeding methods that increase gain

Key performance indicators
(1) Genotyping of 2500 F4 lines in two years for each participating breeding program.
This was completed for four participating breeding programs using either the 1K set of SNPS from UNL or the 1.3K set of SNPs from AgriPlex.
(2) Application of 4 different selection schemes.
Thus far, only a single year of validation has been summarized, thus no significant results have been reported. Yet, in an application of random selection, genomic selection, and a combination of genomic selection plus selection on canopy coverage from University of Minnesota, genomic selection did increase our chances of selecting superior lines, however we did not achieve statistical significance in most cases. A second year of validation data has been collected in 2022. The student managing this project has since graduated, and the data is being analyzed right now by a student at the University of Missouri. In the first year, we did see genomic prediction work quite well, especially in the southern region of Minnesota. This could be due to the fact that the MG I germplasm is better connected to the UT training set used, and these populations were larger than the ones tested in central and northern Minnesota. In the table below, the top ten lines by validation yield data were defined for each location. The tabled valued indicate whether genomic prediction (GP) canopy coverage selection combined with genomic prediction (CC+GP) or random selection selected that line (Table 2). It can be seen that, especially in the south on average and the two individual south locations (Lamberton and Waseca) that the best 10 lines were much more likely to have been chosen by either GP or CC+GP than random selection.
Table 2. Performance of genomic prediction (GP) canopy coverage selection combined with genomic prediction (CC+GP) or random selection selected that line from UMN in test locations going from North to South.

Similarly, data from University of Missouri showed no significant difference among selection methods based only on a single year of data; yet, random selections had lower yield estimates.


(3) Generate crosses for 5 cross combinations based on breeder selections and 5 cross combinations based on genomic mating selections for protein and yield (Task 4).
Each year, UMN predicted the mean, population variance, and genetic correlations among traits for all possible crosses among UT breeding lines and provided these predictions to breeders. A total of eight participating breeding programs used this information to select crosses. UMN also selected 5 cross combinations that strived to create breeding populations with high protein, high yield and a reduced genetic correlation between these traits. These crosses were made in 2020, and populations were advanced to the progeny row stages in 2023. This fall we will be able to measure protein on these populations.
Additionally, we also leveraged the existing SoyNAM dataset to validate these models for predicting genetic variances and genetic correlations among traits. Using these 39 biparental populations, models to predict genetic variances and genetic correlations between traits for all possible crosses, we validated models by correlating observed parameter values with predicted parameter values. We found that in 17 out of 21 cases, there was a positive correlation between predicted genetic correlations and observed genetic correlations, indicating this methodology holds promise for identifying breeding crosses that could have less detrimental correlations between traits. This manuscript is written and will be submitted in the coming months.


Figure 3. Correlations between predicted genetic correlations and observed genetic correlations in SoyNAM populations.
(4) Advance generation by single seed descent for generated crosses in (5) and perform preliminary yield trials with protein data collected by NIRS on F3 or F4 derived lines in FY23.
These are in progress and will be tested and compared by multiple breeding programs in-field in 2023, with yield data available in 2024.
(6) Perform crosses, genotyping, and line advancement according to rapid cycling breeding scheme.
During the summer of 2020, a Cycle 0 (base population) population of F1 plants was created by random mating 13 parents. The parents were selected for yield potential, genetic diversity and seed composition. Genomic predictions for seed yield, genetic variation, and seed composition were used to select superior F1 plants, and intermate the selected plants to produce a new cycle of F1 plants. This rapid cycling process of selection and intermating was repeated three times to produce Cycle 1, Cycle 2 and Cycle 3 generations. No phenotypic data was collected on these progeny during this process. Creating of Cycle 0 through Cycle 3 was completed in less than two years by growing two generations in the greenhouse in the winter, and two in the field in the summer. Each generation between 100 and 250 new F1 progeny were created and between 20 to 30% of the F1s were selected for intermating. In each cycle, the F1 plants were allowed to self over multiple generations, and inbred populations of random F3 or F4-derived lines were created from each cycle of selection. The inbreeding and seed increase process to complete lines for evaluation was completed in the winter of 2022. In the summer of 2022, 150 to 160 random F3 or F4-derived lines from Cycle 0 through Cycle 3 were evaluated in the field at three locations in KS. In additional to obtaining information on seed yield, maturity, and seed composition (seed protein and oil), each of the 633 genotypes in the trial were genotyped using the Agriplex 1000K SNP array and monitored using remote sensing from about the V2 growth stage until maturity. In 2023, these field evaluations are being repeated.
(7) Generate near isogenic lines varying for putative “yield alleles” previously identified from landscape genomics analyses.
Our previous NCSRP project identified 26 putative yield-related alleles based on a population genetic evaluation of haplotypes under selection in an alternative (from Randy Nelson’s breeding program) and a conventional gene pool. To test the value of these alleles, the OSU breeding program focused on four loci which differ between the breeding line LG09-8165 and LG11-5120. Material transfer agreements were obtained for these lines with complex pedigrees. In FY20, crosses were made. Since then, reselections from heterozygous individuals were carried out to develop F4 derived families which primarily differ only for the yield allele of the targeted loci. Currently, these F4 derived lines are being grown in progeny rows and will be available for preliminary yield tests by collaborators in FY24.

Deliverables
(1) Methods to improve selection of progeny rows based on genomic selection with secondary traits and/or improved spatial statistics.

Although the multi-year data necessary to obtain a definite conclusion on best methods has not yeat been obtained, the participating breeding programs have implemented the procedures and methods necessary to apply these methods and have shared this knowledge with the SOYGEN group. The feasibility of genotyping and making selections on large numbers of early generation materials during the field season can be logistically difficult, requiring the implementation of new field protocols; which have been and will continue to be shared among breeding programs.
(2) Application and limitations established for rapid cycling genomic selection in soybean.
The KSU and UMN worked together to implement and establish methods for a rapid cycling genomic selection program. Results of these upcoming field trials will be used to characterize the effectiveness of rapid cycling to increase genetic gain, and understand the impact of rapid cycling on the phenotype of the progeny and genetic makeup of each cycle of selection. Ultimately, providing data to support a specific number of rounds of rapid cycling based off a given model, for a given population diversity.

Objective 4: Characterization and use of the USDA Soybean Germplasm Collection, a foundation for future success

Key performance indicators
(1) Soybean breeding programs choose soybean accessions for use in their breeding programs based on results of this work.
Data summaries from the tests were shared with cooperators. Breeding programs have used accessions from this study as parents in their breeding programs. For example, the McHale breeding program has sub-selected lines predicted to have good agronomic traits and yield from a selection of exotic germplasm screened for disease resistance traits.

View uploaded report PDF file

SOYGEN 2: Increasing soybean genetic gain for yield and seed composition by developing tools, know-how and community among public breeders in the north central US

Objective 1: Elevating collaborative field trials

Numerous collaborative, multi-state field trials have been carried out within the soybean breeding community. Prominent among those are the Northern (and Southern) Soybean Uniform Trials, including the SCN Regional Trials. The common practice has been to make data generated from these collaborative projects available to project participants or the broader community in the form of simple tables or flat-files. While this is a potentially high-value resource, it requires a huge amount of work to utilize the data as a whole. This project has worked to implement a queryable data-sharing platform for phenotypic and genotypic data (https://soybase.org/ncsrp/queryportal/).

Additionally, the systematic collection of GPS data of trial locations and genotypic data on the germplasm entered into the Uniform Trials and SCN Region Trials has provided data for more sophisticated selection methods. Genotypic data was initially collected via the Soy 6K SNP chip, but has now migrated to low-pass whole genome sequencing and imputation provided by Gencove. A method that provides greater than 100 times more data and, therefore, enables more powerful analyses.

In parallel, we have initiated a soybean implementation of BreedBase (SoybeanBase; https://soybeanbase.breedinginsight.net/), which currently hosts data from the Uniform Trials, SoyNAM, and is being updated with internal breeding lines for individual breeding program as their bandwidth allows.

In summation, these tools and additional data types allow us to better share and make use of the field data resources that we already have available to us. For example, genomic prediction models using these public data sources have already been developed and applied to individual breeding programs in order to make yield selections based on genotypic data alone.

Objective 2: Development of a genomic breeding facilitation suite

We have established a pipeline for compiling, curating and analyzing yield and seed composition data collected from the Uniform Trials, as well as genotyping all breeding lines entered into the trials. A complete and curated dataset of phenotypes and pedigrees of all lines entered and evaluated at all locations between the years 1992 to present was created. Genotypic data was (and is) collected for all lines with available/viable seed (see Objective 1 for description of genotyping of Unirform Trials germplasm). This data is being used as a starting point to implement genomic selection within individual breeding programs.

To facilitate the use and integration of this public data with data specific to individual breeding programs. We have identified and adopted the breeding database and genomic data management system, BreedBase. The soybean implementation of this is called Soybeanbase (https://soybeanbase.breedinginsight.net/). This has been adopted and implemented through the help of Rex Nelson and the Breeding Insight team.

We have also created a streamlined analysis pipeline that can be run as an R shiny app, greatly increasing the ease with which these data can be analyzed. The app (https://github.com/UMN-Lorenz-Group/SoyGen2App) takes data in a standardized format exported from Soybeanbase and executes many of the major steps in a genomic prediction pipeline, including marker data filtering, imputation, training population optimization, model selection, cross validation, and prediction of genetic values for defined target population.

In order to implement genomic breeding, germplasm within individual breeding programs should be genotyped for selection and modeling prior to their entry into the Uniform Tests. For this, we need to have a high-throughput cost-effective DNA-isolation and genotyping. We developed a set of markers, genotyping methods, and DNA isolation methods to cost effectively provide a genotyping service to the SOYGEN breeders (David Hyten, UNL). Yet, recognizing the throughput limitations of an academic lab with an active research program, we have more recently been working with Agriplex.

(https://www.agriplexgenomics.com/1k-soy?utm_medium=email&_hsmi=244361042&_hsenc=p2ANqtz-8claIasz9m8NLRed190_rTJ1F-kI3CIngv8nPCRRzbDvDj8dMc5CXAces3K1CjF9vnooKH8gjrpzQ2cCAiAKRgM0dafQ&utm_content=244361042&utm_source=hs_email)

The Soybean Community panel, which consists of 1326 SNPs, was developed in collaboration with members of the SOYGEN team and Agriplex. Using their Agriplex Connect program, public soybean breeders and geneticists can take advantage of discounted pricing and expedited turn-around times during the critical times of the year for selection (mid and late summer).

Objective 3: Evaluation of soybean breeding methods that increase gain

There are many breeding strategies to improve genetic gain; however, they have not all been tested in the realities and limitations of our soybean breeding programs. Thus, we tested several breeding methods to learn and establish protocol on how to apply these methods and to test their utility in increasing genetic gains, thereby identifying best breeding practices.

Selection of unreplicated progeny rows remains a weak point in breeding programs. Thus, we dedicated some efforts to improving this. Among the methods that were tested, Iowa State University tested our ability to improve selection of unreplicated yield tests by using advanced spatial adjustment parameters and within field soil testing. In Iowa, data revealed that there was no benefit to using the advanced spatial adjustments. Additionally, a comparison of four selection methods (genomic selection, genomic selection plus canopy coverage, yield, and random) were compared using data from four breeding programs (UMN, NDSU, Purdue, and UMO). Selections were made using these four methods for two years. The first year has thus far been validated in the field and some programs analyzed. Based on this incomplete data, there have been no significant differences between selection methods. However, complete data and analysis will be conducted in the coming year.

Molecular markers significantly associated with yield are notoriously difficult to identify. Our previous NCSRP project used a novel population genetics based approach to 26 potential yield-related alleles by identifying alleles that appeared to be selected for both in modern soybean cultivars as well as in an “alternative breeding program” initiated from exotic germplasm (from Randy Nelson’s breeding program). To test the validity of this method, we are working to develop soybean breeding lines that are identical except for the alleles present at these possible yield genes. We are in the process of increasing seed for these lines and will have seed ready for limited field testing in FY24 and for wider distribution to collaborators in FY25.

Each time a cross is carried out, we have the potential to have a new-and-improved combination of alleles. Given enough cycles of crossing and selecting the best progeny, we can theoretically recombine parental alleles into the best possible combination of all alleles. While, we cannot phenotypically evaluate that combination alleles for 5+ years (this is time it takes to inbreed, increase seed, and carry out yield trials), by choosing cross combinations based on the genomic marker patterns of the individuals segregating (varying) from each cross, we can then rapid cross and make selections. Thus, getting to the “best” possible combination of all alleles faster. ” By carrying out genomic mating on segregating F1 progeny, we can dramatically reduce the time for an individual cycle of selection. KSU tested this method by using genomic prediction (or genomic matings) to progress through three crossing seasons per year. The products of each crossing cycle have been inbred and were tested in 2022 and are being tested again in 2023 to see how they compare.
Seed yield and protein concentration are notoriously negatively correlated with each other. We used a genomic mating strategy to design cross combinations that are predicted to produce breeding populations with more favorable genetic correlations between unfavorably correlated traits, such as yield and protein. These “model selected” cross-combinations were carried out alongside “breeder selected” cross combinations for yield and protein. Crosses were made by 8 breeding programs in 2020 and the derived populations are in progress for in-field evaluation by NIR in 2023, with yield and protein and oil data available in 2024.

Objective 4: Characterization and use of the USDA Soybean Germplasm Collection, a foundation for future success

Previous research funded by NCSRP allowed us to collect high-quality yield data from 14 environments, 28 reps, over 2 years for the initial set of 500 accessions, and 16 environments, 32 reps, 2 years for the 250 accessions in the validation set. In addition to the traditional yield, maturity, plant height, lodging, and seed traits routinely collected on yield plots, we collected plant developmental data (R1, R5, and R8) at most environments during 2018 and 2019, and collected high-throughput phenotype data including image and other multi-spectral and multi-sensor data at all locations at two growth stages (V4-V5 and R5) during 2019, as well as weather and soil data for better characterization of environments. This data has been distributed to collaborators and has allowed the development of genomic prediction models for agronomic traits of the entire USDA germplasm collection. Breeding programs have used accessions from this study as parents in their breeding programs. For example, the genomic predictions allow sub-selected lines predicted to have good agronomic traits and yield from a selection of exotic germplasm screened for disease resistance traits.

Benefit To Soybean Farmers

This SOYGEN (Science Optimized Yield Gains across Environments) project leverages and builds upon ongoing and previously funded work to increase soybean genetic gain for yield and seed composition by developing tools, know-how and community among public breeders in the north central US. Ultimately, this will lead to faster development of improved (yield and seed quality) soybean cultivars, which will provide farmers with increased production and increase the competitiveness of US soybean in the global market.

The United Soybean Research Retention policy will display final reports with the project once completed but working files will be purged after three years. And financial information after seven years. All pertinent information is in the final report or if you want more information, please contact the project lead at your state soybean organization or principal investigator listed on the project.