Updated April 2, 2021:
Objective 1: Elevating collaborative field trials
1c. Key performance indicators
(3) Collection of genotypic data from the Soy6KSNP chipfor UT and SCN regional trial entries.
We collected 6K genotype data on all 2020 UT lines. The 2020 SCN UT lines will be planted in the field along with all 2021 UT and SCN UT lines for tissue collection and genotyping.
(4) Weather data will be collected for the majority of the future NUST field environments.
Weather datasets were collected in the site years corresponding to NUST field trials from using the geographic coordinates of the field trials linked with the DAYMET weather data. This information along with field trial phenotypic information will be used to compare the year to year site trialing similarity.
(5) The data from the NUST will be analyzed to determine the usefulness of test locations in predicting the performance of the experimental lines.
1d. Deliverables
(1) Database framework for agronomic, environmental, genotypic, meta and other trait data for collaborative trials.
Database tables and draft query user interfaces have been created. Beta testing of the interface by project participants continues.
(2) Database populated with historical and current data from collaborative trials, including agronomic, environmental, genotypic, meta and other trait data.
Phenotypic data from collaborative trials from 1989 to the present have been loaded into the data tables and are accessible to project participants. Environmental data will be available through an interface to the DayMet meterological API.
Objective 2: Development of a genomic breeding facilitation suite
2c. Key performance indicators
(1) Genotyping of 10,000 breeding lines using targeted GBS approach on 1k SNPs during first year of project.
We have received 7,730 DNA samples to run with the 1k SNP set. We are currently processing these samples. The first 2,592 are in the process of being sequenced.
(2) Beta version of R script to impute underlying whole-genome haplotypes developed.
The scripts were completed and are being tested in the Lorenz laboratory. We have been working to improve their accuracy and iterating new versions to make the scripts more useful in different use cases.
(4) Genomic data management system and allied analysis tools for adoption by soybean breeding community identified.
During this past reporting reporting period we were able to install a genome-wide marker database called GIGWA (https://gigwa.southgreen.fr/gigwa/). We have deposited our current genome-wide marker data into this, including all the genotype data collected on the UT as part of this project. A workflow of software tools and scripts was initiated to seamlessly combine data held in this database with phenotypic data and genomic prediction models to ease the use of genomic selection in a practical breeding context. There are a few steps that need to be developed, such as low-to-high marker density imputation and training population optimization. The current postdoc left for a permanent position, and we are currently seeking another postdoc to continue this work.
On a related front, co-PI Nelson, with input from Lorenz, is research the adoption of a platform called BreedBase (breedbase.org). We are hoping this can be installed at Soybase and be available to public breeders for depositing the phenotypic and genotypic data and facilitate the use of genome-wide marker data for breeding. This is in the early stages of development right now.
2e. Deliverables
1) Streamlined public genotyping service for the public soybean breeding sector at a low enough cost to afford genomic selection on a wide scale.
This first batch of 7,000 lines is helping us to streamline our submission process and determine what parts of the genotyping process need to be improved for this summer.
Objective 3: Evaluation of soybean breeding methods that increase gain
3c. Key performance indicators
(1) Preliminary single-site validation of spatial statistics are selection of added growth stage and/or drone based phenotyping and soil parameter factors (Task 1).
Preliminary yield prediction models have been run on single location progeny rows from 2019 using elastic net, ridge regression and lasso. Preliminary results show RMSE of 7 bu/acre and R2 of 0.69. Models have shown relative maturity and pedigree information to have the largest effect on yield. Soil parameters and canopy area have also shown some significance. Soil data is extracted using fine scale soil maps generated in collaboration with soil scientist Dr. Miller and his postdoc Dr. Khaledian. With these soil maps we get soil nutrient data (N,P,K, CA,MG, CEC, NO3, OM) as well as soil texture data on a 3m x 3m scale. Further machine learning and model development and selection criteria are being developed with Dr. Sarkar and his graduate student Luis Riera.
(3) Validation and selection of spatial statistics and added factors based on multi-location data (Task 1).
In collaboration with statistican Dr. Dutta and his graduate student, Dongjin Li, we have prepared a tutorial using the statgenSTA R package. This tutorial includes videos, and an html notebook showing the steps from data preparation, fitting and running models, as well as outlier analysis. The statgenSTA package allows users to fit traditional non-spatial models, as well as spatial models, by including row and column information as well as replications. Users can use the lme4, SPATs or ASREML packages for fitting the data. This tutorial will be shared with the breeding community prior to the fall season. We used the SPATs engine, which uses a penalized spline for spatial correction. This allows for a more dynamic spatial correction compared to the traditional moving means corrections. We also used this tool in our spatial adjustments for 2020 yield trials, and compared it with the traditional moving means method that we have used in the past. We have not validated results yet, on which method used for selection gives more accurate results, and this is an on-going work.
(4) Genotyping of advanced lines, development, and cross-validation of breeding program specific models (Task 2).
7000 advanced lines have been submitted and in the process of being genotyped (see Objective 2).
(8) Generate crosses for 5 cross combinations based on breeder selections and 5 cross combinations based on genomic mating selections for protein and yield (Task 4).
We used genomic prediction to predict the mean, variance, yld-pro correlation, and superior progeny mean of all possible crosses among 2019 and 2020 UT lines. We made this information available to all SOYGEN2 breeders for their consideration in terms of 2021 crosses.
(9) Advance generation by single seed descent for generated crosses in (8) and perform preliminary yield trials with protein data collected by NIRS on F3 or F4 derived lines in FY22 (Task 4).
Due to inability to MTA from the USDA for many of the cultivars used in the pedigrees of these lines, we were only able to complete a single cross combination: LG09-8165 x LG11-5120. F2 seed is currently being generated.
(10) Perform crosses, genotyping, and line advancement according to rapid cycling breeding scheme (FY20-22) (Task 5).
Crosses were made in Nebraska summer 2020 and sent F1 seeds to Puerto Rico to grow F1 plants from October ’20 to January ’21. Intermating among F1 plants were attempted, but virus issues in Puerto Rico caused issues and we were not able to obtain all of the F1 x F1 crosses. Instead, F2 seeds were harvested from all of the confirmed F1 plants and are now crossing among F1:2 lines for the second intermating.
Objective 4: Characterization and use of the USDA Soybean Germplasm Collection, a foundation for future success
4c. Key performance indicators
(1) Soybean breeding programs choose soybean accessions for use in their breeding programs based on results of this work.
Predictions for crosses are now currently being obtained.
View uploaded report
Updated November 8, 2021:
Objective 1: Elevating collaborative field trials
1c. Key performance indicators
(3) Collection of genotypic data from the Soy6KSNP chipfor UT and SCN regional trial entries.
We collected 6K genotype data on all 2020 UT lines. The 2020 SCN UT lines will be planted in the field along with all 2021 UT and SCN UT lines for tissue collection and genotyping. All materials from 2021 UT and 2020 SCN UT was sampled and DNA isolation will commence shortly.
(4) Weather data will be collected for the majority of the future NUST field environments.
Weather datasets were collected in the site years corresponding to NUST field trials from using the geographic coordinates of the field trials linked with the DAYMET weather data. This information along with field trial phenotypic information will be used to compare the year to year site trialing similarity.
(5) The data from the NUST will be analyzed to determine the usefulness of test locations in predicting the performance of the experimental lines.
Objective 2: Development of a genomic breeding facilitation suite
2c. Key performance indicators
(1) Genotyping of 10,000 breeding lines using targeted GBS approach on 1k SNPs during first year of project.
We have received 9620 DNA samples to run with the 1k SNP set. Thus far, 6764 have been genotyped. The remaining samples are in the process of being sequenced.
(2) Beta version of R script to impute underlying whole-genome haplotypes developed.
The scripts were completed and are being tested in the Lorenz laboratory. We have been working to improve their accuracy and iterating new versions to make the scripts more useful in different use cases.
(4) Genomic data management system and allied analysis tools for adoption by soybean breeding community identified.
We installed a genome-wide marker database called GIGWA (https://gigwa.southgreen.fr/gigwa/) and deposited our current genome-wide marker data into this, including all the genotype data collected on the UT as part of this project. A workflow of software tools and scripts was initiated to seamlessly combine data held in this database with phenotypic data and genomic prediction models to ease the use of genomic selection in a practical breeding context.
Collaborator Rex Nelson (soybase.org) is working to implent a version of BreedBase for soybean breeders. An overview of the software package was givent to all PIs on the project who unanimously agreed to its utility. The BreedBase team has agreed to allow an instance in their cloud account for our work, which will make installation and implementation significantly simpler.
Objective 3: Evaluation of soybean breeding methods that increase gain
3c. Key performance indicators
(1) Grow single rep progeny row and preliminary yield trials and test two different methods of spatial adjustments (Task 1).
Code and full totorials for the selection process were shared with the entire research group during the last reporting period.
(3) Genotyping of advanced lines, development, and cross-validation of breeding program specific models (Task 2).
More than 6000 advanced breeding lines have been genotyped for the development of genomic selection models (see Objective 2).
(4) Genotyping of 2500 F4 lines in two years for each participating breeding program (Task 3).
DNA has been submitted for genotyping (+1000; Objective 2) while more are in process.
(8) Generate crosses for 5 cross combinations based on breeder selections and 5 cross combinations based on genomic mating selections for protein and yield (Task 4).
We used genomic prediction to predict the mean, variance, yld-pro correlation, and superior progeny mean of all possible crosses among 2019 and 2020 UT lines. We made this information available to all SOYGEN2 breeders for their consideration in terms of 2021 crosses. Crosses with model predicted parents and breeder selected parents were carried out during summer 2021 for 8 breeding programs.
(9) Advance generation by single seed descent for generated crosses in (8) and perform preliminary yield trials with protein data collected by NIRS on F3 or F4 derived lines in FY22 (Task 4).
Due to inability to MTA from the USDA for many of the cultivars used in the pedigrees of these lines, we were only able to complete a single cross combination: LG09-8165 x LG11-5120. Markers were developed for four loci predicted to be selected for yield. F2 lines have been genotyped and harvested. Seed will be sent to a winter nursery in Puerto Rico where F2:3 families will be produced for homozygous alleles and F3 inbred lines will be produced to further the generation of near isogenic lines derived from heterogenous inbred families.
(10) Perform crosses, genotyping, and line advancement according to rapid cycling breeding scheme (FY20-22) (Task 5).
Crosses were made in Nebraska summer 2020 and sent F1 seeds to Puerto Rico to grow F1 plants from October ’20 to January ’21. Intermating among F1 plants were attempted, but virus issues in Puerto Rico caused issues and we were not able to obtain all of the F1 x F1 crosses. Instead, F2 seeds were harvested from all of the confirmed F1 plants and are now crossing among F1:2 lines for the second intermating.
Objective 4: Characterization and use of the USDA Soybean Germplasm Collection, a foundation for future success
4c. Key performance indicators
(1) Compile and annotate the data for the validation study.
As we did for yield and agronomic traits previously, we conducted a genome-wide association analysis for each of the seed composition traits using the multi-year, multi-location phenotype data collected as part of this project, along with the existing 50K genotype data from the collection. The association analyses were conducted by sampling group (CLU, RAN, SSD) and over all lines together. Results for some of the seed composition traits are shown in Figures 4 to 6. We are continuing with analysis and interpretation of these results, identifying significant SNPs and underlying genes for each of the traits.
View uploaded report