Historical Project Details: SOYGEN2: Increasing SB genetic gain for yield & seed composition by developing tools, know-how & community among public breeders in the NC US (2021)

2021

SOYGEN2: Increasing SB genetic gain for yield & seed composition by developing tools, know-how & community among public breeders in the NC US

Home

Contributor/Checkoff:

North Central Soybean Research Program

Category:

Sustainable Production

Keywords:

GeneticsGenomics

Parent Project:

Increasing the rate of genetic gain for yield in soybean breeding programs

Lead Principal Investigator:

Leah McHale, The Ohio State University

Co-Principal Investigators:

Asheesh Singh, Iowa State University
William Schapaugh, Kansas State University
Dechun Wang, Michigan State University
Katy M Rainey, Purdue University
Brian Diers, University of Illinois at Urbana-Champaign
Matthew Hudson, University of Illinois at Urbana-Champaign
Nicolas Frederico Martin, University of Illinois at Urbana-Champaign
Aaron Lorenz, University of Minnesota
Pengyin Chen, University of Missouri
Andrew Scaboo, University of Missouri
George Graef, University of Nebraska
David Hyten, University of Nebraska at Lincoln

+11 More

Project Code:

GRT00060503

Contributing Organization (Checkoff):

North Central Soybean Research Program

$531,450

United Soybean Board

$178,550

Institution Funded:

The Ohio State University

$710,000

Brief Project Summary:

The SOYGEN team is adding value to the Northern Uniform Soybean Trials (NUST). They will add environmental and genotypical data to NUST and the SCN Regional Trials. The team will focus on the development and use of high-throughput genome-wide genotyping technologies and making these tools widely available. They will evaluate breeding methods that target areas of trait improvement such as yield and seed protein. Breeders will test methods to determine which are most viable to improve genetic gains and will complete the evaluation of diverse soybean genotypes from the USDA Soybean Germplasm Collection to obtain high-quality phenotype and environmental data.

Key Benefactors:
farmers, geneticists, breeders

Information And Results

Project Deliverables

1.1) Database framework for agronomic, environmental, genotypic, meta and other trait data for collaborative trials.
1.2) Database populated with historical and current data from collaborative trials, including agronomic, environmental, genotypic, meta and other trait data.
1.3) Data from the uniform tests will become more useful as it will be connected to environmental and genotypic data.
1.4) Breeders will better understand how to weight data from different environments of the Uniform Tests to know how well it will predict performance.
2.1) Streamlined public genotyping service for the public soybean breeding sector at a low enough cost to afford genomic selection on a wide scale.
2.2) Workshop on genomic selection delivered to public soybean breeding community.
3.1) Methods to improve selection of progeny rows based on genomic selection with secondary traits and/or improved spatial statistics.
3.2) Understand the potential to improve the unfavorable correlation between yield and protein in soybean through genomic mating.
3.3) Application and limitations established for rapid cycling genomic selection in soybean.
3.4) Characterization of allelic effect of putative yield alleles and markers for their selection.
4.1) Comparison of sampling methods and effective ways to efficiently sample the genotype collection, particularly for improvement of quantitative traits like yield.
4.2) Means and variances for traits in the different sampling groups (so, effect of sampling method on those estimates).
4.3) Identify loci associated with yield and other traits in this diverse panel of accessions that represents the genetic diversity in the collection, so we may ID new loci and alleles that will be useful for commercial and public breeding programs.
4.4) Provide genomic predictions for yield (done), maturity, seed protein and oil %, and other traits as appropriate, for all untested accessions in the USDA Soybean Germplasm Collection.
4.5) Investigate genotype-environment interaction effects on traits, and evaluate stability of yield and composition traits across environments.
4.6) Use data/results in implementation in Objectives 1, 2, and 3 of this project (FY20-22).
4.7) Preliminary analysis of data from the validation set of 250 entries.

Final Project Results

Updated June 8, 2022:
SOYGEN 2: Increasing soybean genetic gain for yield and seed composition by developing tools, know-how and community among public breeders in the north central US

Objective 1: Elevating collaborative field trials

1a. Development of a database to store, query, and distribute data from collaborative field trials

Database tables and draft query user interfaces have been created. Phenotypic data from collaborative trials from 1989 to the present have been loaded into the data tables and are accessible to project participants. Environmental data will be available through an interface to the DayMet meterological API. Beta testing of the interface by project participants continues. Additionally, a soybean specific breedbase installation has been created, allowing users to upload and share data from their own breeding program and leverage the (being populated) Uniform Trial data to gain accuracy in yield predictions.

1b. Updating the Uniform Soybean Trials

We collected 6K genotype data on all 2020 UT lines. The 2020 SCN UT lines will be planted in the field along with all 2021 UT and SCN UT lines for tissue collection and genotyping. All materials from 2021 UT and 2020 SCN UT was sampled and DNA isolation will commence shortly.

By discussion and agreement of UT collaborators, data submission forms have been updated for current and future field trial sites to include GPS coordinates. This will allow weather data to be linked to phenotypic data.
Weather datasets were collected in the site years corresponding to NUST field trials from using the geographic coordinates of the field trials linked with the DAYMET weather data. This information along with field trial phenotypic information will be used to compare the year to year site trialing similarity.

Objective 2: Development of a genomic breeding facilitation suite

2a. Genotyping methods

We have received 17,259 DNA samples to run with the 1k SNP set. A total of 14,739 have been genotyped. We have surpassed the project goal of 10,000 genotyped samples.

2b. Imputation methods

Imputation of progeny with low-pass sequencing has been tested on a small scale. Scripts were completed and are being tested in the Lorenz laboratory. We have been working to improve their accuracy and iterating new versions to make the scripts more useful in different use cases. Scripts have been distributed to project participants upon request. See below for details.

2c. Genomic Prediction Facilitation Suite

During this past reporting reporting period we were able to install a genome-wide marker database called GIGWA (https://gigwa.southgreen.fr/gigwa/). We have deposited our current genome-wide marker data into this, including all the genotype data collected on the UT as part of this project. A workflow of software tools and scripts was initiated to seamlessly combine data held in this database with phenotypic data and genomic prediction models to ease the use of genomic selection in a practical breeding context. There are a few steps that need to be developed, such as low-to-high marker density imputation and training population optimization. A tutorial of the workflow was provided to project participants.

On a related front, co-PI Nelson has begun the adoption of a platform called BreedBase (breedbase.org). This will be available to public breeders for depositing the phenotypic and genotypic data from individual breeding programs as well as collaborative datasets, such as the Uniform Trials. It will facilitate the use of genome-wide marker data for breeding. This has been installed at Soybase and project participants are uploading data for beta testing and can receive or have received individualized training on request from BreedBase personnel.

Objective 3: Evaluation of soybean breeding methods that increase gain

3a. Advanced spatial analysis.

Preliminary yield prediction models have been run on single location progeny rows from 2019 using elastic net, ridge regression and lasso. Preliminary results show RMSE of 7 bu/acre and R2 of 0.69. Models have shown relative maturity and pedigree information to have the largest effect on yield. Soil parameters and canopy area have also shown some significance. Soil data is extracted using fine scale soil maps generated in collaboration with soil scientist Dr. Miller and his postdoc Dr. Khaledian. With these soil maps we get soil nutrient data (N,P,K, CA,MG, CEC, NO3, OM) as well as soil texture data on a 3m x 3m scale. Further machine learning and model development and selection criteria are being developed with Dr. Sarkar and his graduate student Luis Riera.

In collaboration with statistician Dr. Dutta and his graduate student, Dongjin Li, we have prepared a tutorial using the statgenSTA R package. This tutorial includes videos, and an html notebook showing the steps from data preparation, fitting and running models, as well as outlier analysis. The statgenSTA package allows users to fit traditional non-spatial models, as well as spatial models, by including row and column information as well as replications. Users can use the lme4, SPATs or ASREML packages for fitting the data. This tutorial will be shared with the breeding community prior to the fall season. We used the SPATs engine, which uses a penalized spline for spatial correction. This allows for a more dynamic spatial correction compared to the traditional moving means corrections. We also used this tool in our spatial adjustments for 2020 yield trials, and compared it with the traditional moving means method that we have used in the past.

Code and full tutorials for the selection process were shared with the entire research group.

3b. Development of breeding program specific genomic prediction models

More than 14,000 advanced lines have been genotyped (see Objective 2) and are being used for the development of training models and implementation of genomic selection within individual breeding programs.

3c. Genomic plus secondary trait selection at the progeny row stage

For each of four breeding programs, nearly 2500 progeny rows were grown, phenotypically evaluated and genotyped in 2021. In addition, canopy coverage data was extracted from UAV images. Selections were made based on phenotypic data alone (primarily yield), genotypic data alone (genomic selection using model developed from UT data), phenotypic data plus canopy coverage, genotypic data plus canopy coverage, and random selections. These selection methods are being tested in 2022.

3d. Exploration of genomic prediction to reduce unfavorable correlations between seed yield and protein

We used genomic prediction to predict the mean, variance, yld-pro correlation, and superior progeny mean of all possible crosses among 2019 and 2020 UT lines. We made this information available to all SOYGEN2 breeders for their consideration in terms of 2021 crosses. F1 progeny between 5 breeder selected high yield high protein lines and 5 model select high yield high protein lines were harvested from 8 breeding programs and increased to F2.

3e. Rapid cycling

Three cycles of genomic selection were completed on schedule, with coordination between UNL (genotyping from Hyten lab, model predictions from Lorenz lab) and KSU (Schapaugh conducting crossing and advancements. About 100-400 F1s were created each cycle and about 40 F1s were selected based on genotypic data. Following each of the cycles of selection (up to 3) there have been inbreeding. F4 generations have been grown for Cycles 0, 1, and 3, with Cycle 2 being one generation behind due to miscommunication with the winter nursery. Testing will occur in 2022.

3f. Evaluation of putative “yield” alleles

Due to inability to MTA from the USDA for many of the cultivars used in the pedigrees of these lines, we were only able to complete a single cross combination: LG09-8165 x LG11-5120. Markers were developed for four putative yield loci segregating between these parents and F3 families have been planted.

Objective 4: Characterization and use of the USDA Soybean Germplasm Collection, a foundation for future success

As we did for yield and agronomic traits previously, we conducted a genome-wide association analysis for each of the seed composition traits using the multi-year, multi-location phenotype data collected as part of this project, along with the existing 50K genotype data from the collection. The association analyses were conducted by sampling group (CLU, RAN, SSD) and over all lines together.

This SOYGEN (Science Optimized Yield Gains across Environments) project leverages and builds upon ongoing and previously funded work to increase soybean genetic gain for yield and seed composition by developing tools, know-how and community among public breeders in the north central US. In support of these goals we are adding to the availability and utility of public soybean data: adding genotypic data for tens of thousands of breeding lines and cultivars, much of which is attached to high quality, geo-referenced field data. We are testing and learning what breeding methods can be improved with this data and how to do so in cost effective manner. Ultimately, this will lead to faster development of improved (yield and seed quality) soybean cultivars, which will provide farmers with increased production and increase the competitiveness of US soybean in the global market.

The United Soybean Research Retention policy will display final reports with the project once completed but working files will be purged after three years. And financial information after seven years. All pertinent information is in the final report or if you want more information, please contact the project lead at your state soybean organization or principal investigator listed on the project.