Updated January 25, 2025:
Final report (January 2024 to December 2024)
Note – Figure and table numbers are retained here while all the figures and tables can be accessed through the PDF version of the final report for 2024
Project funded by North Central Soybean Research Program
Project title - Field phenotyping using machine learning tools integrated with genetic mapping
to address heat and drought induced flower abortion in soybean.
Participating institutions – Texas Tech University, Kansas State University, University of Missouri, and University of Tennessee.
Goals & Objectives
Long-term Goal – Develop soybean cultivars with 20 to 30% lower flower abortion under favorable to challenging environmental conditions, leading to about 10-15% increase in yield potential.
Objectives (Year 2)
• Continue to explore the genetic diversity in flower abortion under different soil moisture and climatic conditions using a diverse set of landraces and elite genotypes.
• Improve the image-based field phenotyping system and deep-learning tools to document temporal dynamics in flower abortion and pod retention in diverse soybeans grown under field conditions.
• Identify molecular mechanisms controlling flower abortion under diverse climatic conditions.
Objective 1 - Explore the genetic diversity in flower abortion under different soil moisture and climatic conditions using a diverse set of landraces and elite genotypes
Texas Tech University:
The 50 genotypes were planted on June 5th under two distinct irrigation regimes. One field was irrigated to maintain 80% evapotranspiration (ET) throughout the experiment, while the 40% ET regime was implemented only during the flowering phase. Flowering began on July 11th (Figure 1). Both imaging and manual flower counting were conducted until the end of the flowering period every three days. Each plot was identified with QR code label. Pod imaging began on August 27th and continued weekly until all lines had reached the R6/R7 stage. Harvesting was completed across the locations and sample processing is completed for agronomic data.
University of Missouri:
Planted 50 genotypes on May 22nd and Harvested on 9-11-2024
University of Tennessee:
Planted 50 genotypes on May 30th and Harvested on 9-9-2024
Kansas State University:
Planted 50 genotypes on May 29th. and Harvested on 8-31-2024
All locations (Figures 2 and 3) followed the same protocol developed by Texas Tech University for manual flower counting and imaging to ensure uniform and high-quality data collection.
Results:
The results from 11 diverse lines in Texas Tech University and the University of Missouri (Figure 4) showed that lines IA3023, PI556511, HS6-3976, and CL0J173-6-8 had the lowest flower abortion rates in Texas, while PI552538, PI556511, LG05-4464, and CL0J173-6-8 had the lowest rates in Missouri, indicating different genetic responses across environments. Interestingly, PI556511, LG05-4464 and CL0J173-6-8 were common to both locations, having lower flower abortion indicating potential genetic sources for developing cultivars with wider adaptation and reduced rate of abortion.
Texas Tech, the lines PI535648, HS6-3976, LG05-4832, CL0J173-6-8 and LD02-9050 exhibited lower flower abortion percentage than in Missouri and Kansas, likely due to irrigation applied in Texas that mitigated the stress of higher temperatures, helping to maintain flower and pod development without excessive abortion. The fluctuating weather in Missouri, including waterlogging, fungal infections, or high humidity, could have contributed to higher stress levels.
At Texas Tech University (Figure 5) the same lines were grown under drought conditions (40% ET). As expected, higher flower abortion rates were observed for most lines under drought, but lines PI534648, K17-6388, LG05-4832 PI533654, and LD02-9050 recorded lower level of abortion under stress conditions. This could be explained by adaptive drought tolerance mechanisms, including enhanced root growth in response to moderate water stress (40% ET), enabling these lines to retain flowers as a survival strategy.
Additional data on maturity days was collected across all locations (Figure 6), with values ranging from 98 to 114 days. Lodging scores varied from 0 to 2.5, but majorly confined to 0 to 1.5 (Figure 7), indicating that these lines are well-suited for phenotyping using the imaging platform and for breeding purposes. Additionally, no significant differences in yield (Figure 8) were observed among the lines within each location, except in Tennessee, where lines CL0J173-6-8, K17-6388, IA3023, LD02-9050, PI533654, and PI534648 demonstrated higher yields. Across locations, Kansas recorded higher yields for most lines followed by Texas.
Plant height (Figure 8) was consistently greater for all lines grown in Missouri, while the same lines exhibited shorter heights in Texas. This suggests that plant height is not a determinant of grain production for these lines. Moreover, the high temperatures experienced in Texas (~100°F) did not significantly reduce yields, likely due to the mitigating effects of good soil structure, proper nutrition, irrigation, and effective crop management practices. Lastly, the 100-seed weight results highlight that Kansas, Texas, and Tennessee recorded higher seed weights for most lines.
Two greenhouse experiments were performed in 2024 in Tennessee and Texas. In University of Tennessee plants were assigned to one of the two treatments: severe stress (SS) or well-watered (WW). Within each line, five plants were subjected to stress, while three under well-watered condition, serving as a reference for calculating the normalized transpiration rate (NTR). Daily flower counts commenced at the onset of flowering and continued until flowering ceased.
The University of Tennessee collected data in 2024 from eight contrasting soybean lines exhibiting high and low flower abortion rates (Figure 9), selected based on the field data collected in Summer 2023. They conducted a greenhouse experiment with four lines known for high flower abortion (PI 567638, PI 603583, PI 567398, and PI 423926) and four lines characterized by low flower abortion (PI 506862, PI 80837, PI 437690, and PI 548318) to investigate the impact of severe drought conditions on flower dynamics (Figure 10).
In Figure 10, three high flower abortion lines—PI 567398, PI 567638, and PI 603583—showed pronounced sensitivity to severe water stress. In contrast, the low flower abortion line PI 548318 stood out, producing the highest number of flowers under the severe stress treatment. Interestingly, despite the severe stress conditions, some lines managed to produce around 100 flowers during the flowering phase, with most of these being low flower abortion lines. To build on these findings, a second trial will be conducted under moderate stress conditions to better assess flower abortion, as the severe stress caused unrealistically high levels of flower loss.
In Texas Tech University a small trial in a greenhouse was conducted to study flower dynamics per node in two soybean lines. Significant differences were observed between the nodes (Figure 11). For K17-6388, 27% and 21% of the total flowers were located on the first and second nodes, respectively. In contrast, William 82 showed a more even distribution of flowers across nodes 1 to 4, with 13%, 18%, 10%, and 12%, respectively. Next year, this trial will be expanded to field conditions and tested on the high and low flower abortion lines selected from the 2023/2024 seasons. The study will examine flower dynamics per node under contrasting irrigation regimes (80% and 40% ET) and contrasting genotypes, aiming to better understand soybean flowering dynamics per node as well as pods.
Objective 2 – Improve the image-based field phenotyping system and deep-learning tools to document temporal dynamics in flower abortion and pod retention in diverse soybeans grown under field conditions.
All locations acquired an RC car crawler (Figure 12), which operates at a slower speed (40s per 0.3 meters) than the model used last year to ensure improved and higher quality image capture. As with the previous year, GO PRO Hero 11 cameras were mounted, with two to four cameras deployed to capture images of the entire height of the plants. The platform was able to navigate effectively even when the rows were covered by leaves, with the compact size of the RC car having minimal disturbance to the plants.
Training of the Machine learning model – flower tracking
In the initial stages of the project, the model was trained to detect flowers as a single class without distinguishing between different stages of the flowers. However, as the project is evolving, it is becoming clear that distinguishing between new flowers and old flowers can be more helpful (Figure 13) to avoid or reduce double counts.
Transitioning from a single-class to a two-class model allows for more nuanced analysis. By classifying flowers into two distinct categories. The model can provide more detailed information on the stage-by-stage flower counts, potentially enabling more accurate predictions of total and aborted flowers.
Challenges in Two-Class Prediction: While two-class prediction offers significant advantages, it also presents several challenges. Labeling data for two distinct classes requires precise definitions of these stages and consistent labeling across the dataset. This process is more complex than single-class labeling, as each flower must be accurately categorized. Training the model to differentiate between new and old flowers demands a larger and more diverse dataset. The model must recognize subtle differences in flower appearance, which can be influenced by factors like lighting, angle, and the plant's overall condition. The accuracy of the model's classification is highly dependent on the quality of the training data and the model's robustness. Misclassifications are possible, particularly when the visual cues distinguishing between the two classes are ambiguous.
Relabeling Progress: As part of the transition to two-class prediction, the original videos are being relabeled to reflect the new classification scheme. This relabeling process is in progress, and each flower instance in the original dataset is reviewed to determine whether it should be classified new or old. This process involves a combination of manual labeling and automated tools. A quality control protocol has been established to maintain consistency and accuracy in the relabeling process. This involves cross-checking labeled data with Dr. Juliana Espíndola (specialist in soybean flower morphology) to ensure that the labels are correct and consistent across the dataset.
Implications and Future Work: The shift to two-class prediction enables more detailed analysis and potentially leads to better counting estimates. However, the success of this approach depends on the accuracy of the model. As we continue to refine the model, ongoing evaluation and adjustment will be necessary. Future work may involve exploring the potential for further classification, such as distinguishing flower clusters, when identifying the number of flowers per node is not feasible. .
Flower tracking has been tested using several algorithms, including DeepSORT, Byte Track, OC-SORT, and SORT. Among these, OC-SORT and SORT demonstrated the most promising results based on tests conducted with videos from all locations. Notably, the SORT algorithm achieved a Multi-Object Tracking Accuracy (MOTA) of 0.964 (Table 1), making it the chosen method for flower tracking in the 2024 videos. However, testing with the 2024 videos has not yet commenced due to the large volume of data being uploaded to our Amazon cloud (AWS). Currently, all participating partners are in the process of uploading their videos.
Training of the Machine learning model – Pods
Pod Detection/Segmentation
Recognizing the intricate shape of soybean pods, we have opted for an instance segmentation method (Figure 14), as opposed to bounding box object detection, for the task of identifying and counting the pods in a frame. The instance segmentation approach enables us to obtain precise segmentation masks of the pods, ensuring a more accurate representation of their complex structures.
We selected images of plants around the R6/R7 stage, which we visually found to be best for counting the pods. Using images from the R6/R7 stage, we did a first round of annotation for approximately 50 images/frames. We marked as “pod” all visible fragments of a pod, without specifically keeping track of fragments that belong to the same occluded pod, when applicable. We trained a Mask R-CNN model on the annotated images and visually assessed its performance. The model was able to detect pods, but it identified individual fragments of occluded pods as distinct “pods”, given how the training images were annotated. As our goal is to count pods in a video, this could lead to an artificially inflated number of pods. To mitigate this issue, we subsequently designed an annotation scheme in which fragments of an occluded pod are annotated together as just one pod. We annotated approximately 300 images/frames with the new annotation scheme and trained another model.
The images in Figure 14 are examples of predictions made by the model trained using annotations performed with the new scheme that takes occlusions into account. These images are used for testing the model and have not been used for training the model.
The revised model can accurately detect many of the actual pods, including some occluded pods. Given the complexity of the task at hand and the relatively small number of images used to train the model, it is expected that the detection can be significantly improved with a larger number of images, especially in the case of clusters of pods and occluded pods (which are both less represented in the dataset compared to the more visible, less crowded pods).
Pod Detection and Tracking
To avoid double-counting pods that appear in multiple frames in a video, we have also worked on pod tracking informed by pod detection. Towards this goal, we have trained a base Faster R-CNN architecture and subsequently used the Faster R-CNN detection model to track the pods using tracking methods such as SORT, OC-SORT and Byte-SORT. As the tracking model is highly dependent on the detector, we are also training a YOLO-v8 model for pod detection and the best model between Faster R-CNN and YOLO-v8 will ultimately be used in our detection/tracking system. To facilitate evaluation of the tracking models, we are annotating 4 videos using a tool called CVAT. Each video has a resolution of 1080x1920 at 15 frames per second. It is expected that the annotation of these videos will result in an adequate amount of data for proper evaluation of the tracking models.
All participating locations are finishing imaging pods at R6/R7 stage and will upload them into the Amazon cloud (AWS). After all locations obtain videos (for pod and flowers) processing of the collected and uploaded videos will be initiated to capture flower and pod counts.
Preliminary Results: Two contrasting lines (Figure 15), one with high flower abortion (K17-6388) and one with low flower abortion (IA3023), were tested using machine learning models for flower and pod counts. The results revealed a significant difference between IA3023 and K17-6388, consistent with observations from manual counts. Additionally, the model successfully detected no significant differences between the two irrigation regimes applied in the Texas Tech experiment for these two lines, which aligned with manual count observations. For pod counting, the model similarly identified no differences between the irrigation regimes, as reflected in the manual counts. This initial testing of the models for field-based counts demonstrates their great potential for predicting flower abortion in the future. Both models will undergo further improvement, as outlined earlier, to enhance accuracy and precision. The models are in the process of re-training based on this first counting to improve accuracy before counting all the 50 lines videos. Next year, imaging both sides of the row are expected to increase the detection and counting of flowers and pods, addressing current limitations.
Objective 3 - Identify molecular mechanisms controlling flower abortion under diverse climatic conditions.
In Texas Tech University, to investigate genetic control and variation in flower abortion in soybean, we selected two contrasting accessions, PI567638 and PI506862, based on 2023 and 2024 field data (Figure 16). PI567638 (high abortion; HA) exhibited a high flower abortion rate of up to 70%, while PI506862 (low abortion; LA) showed a significantly lower rate of around 26%. Flower tissues at different developmental stages (buds, partially open flowers, fully open flowers, and post-anthesis flowers) were collected from both accessions under field conditions for RNA sequencing. Principal component analysis (PCA) of four replicates revealed a high degree of concordance between samples. Notably, the analysis showed distinct clusters, with flower buds and post-anthesis flowers grouping together, while partially open and fully open flowers formed a separate cluster, highlighting stage-specific transcriptomic profiles. The analysis identified 1,223 differentially expressed genes (DEGs) in buds, 1,220 DEGs in closed petals, 1,140 DEGs in open flowers, and 4,292 DEGs in dry flowers between the two genotypes. Genes associated with floral development (Figure 17) were predominantly upregulated in the low-abortion genotype, while genes negatively regulating floral development were highly expressed in the high-abortion genotype. Key genes regulating flower development and abortion include FLOWERING LOCUS C (FLC), MADS AFFECTING FLOWERING (MAF), TERMINAL FLOWER1 (TFL1), CDK-regulating FLOWERING LOCUS M, DAD1, AIPP3 (associated with flowering inhibition), ASP1, GI, AHL20, FLAVIN-BINDING, COL2, CONSTANS (CO), PRR5, BSMT1, AGAMOUS-Like 15 (AGL15), AGL20, and GA20OX2.
Furthermore, cluster analysis of DEGs identified five major clusters (Figure 18). Comparison between LA_Buds with HA_Buds, we identified significant upregulation in C3, wherein genes associated with Pectin catabolism were significantly upregulated. Pectin is an essential biomolecule that acts as structural component (glue) for cell adhesion. However, further exploration and validation of these RNA-Seq findings is required to understand the process of flower abortion in soybean.
View uploaded report 
Key Findings
1. Genetic Diversity in Flower Abortion
• Over 50 soybean genotypes were studied under different irrigation regimes across four locations: Texas, Kansas, Missouri, and Tennessee.
• Genotypes such as PI556511, LG05-4464, and CL0J173-6-8 consistently showed lower flower abortion rates across multiple environments.
• Drought tolerance mechanisms, possibly through enhanced root growth, were observed in some lines, such as PI534648 and PI533654, enabling better flower retention under water stress.
2. Flowering Dynamics and Stress Response
• Greenhouse experiments revealed significant variation in flowering patterns between high and low flower abortion genotypes.
• Low flower abortion lines, such as PI506862, maintained higher flower counts even under severe drought stress.
• Node-specific flowering studies indicated that flower distribution patterns differ significantly between genotypes, with implications for breeding strategies.
3. Machine Learning for Field Phenotyping
• A machine learning model was further improvised to track flowers and pods, transitioning from single-class to two-class predictions to distinguish between new and old flowers.
• The SORT algorithm achieved the highest tracking accuracy (MOTA 0.964), ensuring reliable flower and pod counts.
• Initial results demonstrated alignment between manual and machine learning-based flower counts, validating the model’s potential for large-scale applications.
4. Pod Detection and Segmentation
• Advanced segmentation techniques, such as Mask R-CNN, were applied to accurately detect soybean pods, even under occlusions.
• Pod tracking models are being refined to reduce errors and improve counting accuracy, with promising preliminary results.
5. Molecular Insights into Flower Abortion
• RNA sequencing of contrasting genotypes (PI567638 and PI506862) identified over 4,000 differentially expressed genes (DEGs).
• Genes associated with floral development, such as FLC and AGL15, were upregulated in low-abortion genotypes, while stress-related genes dominated in high-abortion genotypes.
• Cluster analysis revealed significant upregulation of genes involved in pectin catabolism, suggesting potential molecular pathways influencing flower abortion.
6. Environmental and Agronomic Observations
• Irrigation mitigated heat stress impacts in Texas, maintaining yields despite temperatures reaching 100°F.
• Genotypes with lower lodging scores and higher seed weights were identified as suitable candidates for breeding.
• Variations in plant height across locations highlighted the influence of environmental factors on growth without significantly affecting yields.
• Based on 2023 findings, flower abortion and yield did not always exhibit a direct correlation. Some genotypes with high flower abortion still maintained yield levels comparable to low-abortion
genotypes, likely due to allocating more energy to flowering. Comparing 2023 and 2024 results will enhance our understanding of these patterns, aiding in the selection of genotypes with consistent low flower abortion and high yield potential for the 2025 trials.
Future Direction
The integration of genetic, phenotypic, and machine learning tools has provided significant insights into selecting low flower abortion in soybeans genotypes. Future work will focus on:
• Enhancing machine learning models for improved flower and pod counts to extract abortion rates, improve platform video imaging, and create a breeder friendly program for soybean flower and pod count.
• Validating candidate genes through gene editing to understand the mechanism of flower abortion in contrasting (high and low abortion) genotypes.
• Create new biparental mapping populations to map the dynamics of flower abortion in soybean.