Project Details: Field phenotyping using machine learning tools integrated with genetic mapping to address heat and drought induced flower abortion in soybean (2025)

2025

Field phenotyping using machine learning tools integrated with genetic mapping to address heat and drought induced flower abortion in soybean

Home

Contributor/Checkoff:

North Central Soybean Research Program

Category:

Sustainable Production

Keywords:

Macronutritional bundle

Parent Project:

Field phenotyping using machine learning tools integrated with genetic mapping to address heat and drought induced flower abortion in soybean

Lead Principal Investigator:

Krishna Jagadish, Texas Tech University

Co-Principal Investigators:

Doina Caragea, Kansas State University
William Schapaugh, Kansas State University
Juliana Espindola, Texas Tech University
Gunvant Patil, Texas Tech University
Glen Ritchie, Texas Tech University
Hamed Sari-Sarraf, Texas Tech University
Impa Somayanda, Texas Tech University
Christopher Turner, Texas Tech University
Henry Nguyen, University of Missouri
Avat Shekoofa, University of Tennessee-Institute of Agriculture

+9 More

Project Code:

60065

Contributing Organization (Checkoff):

North Central Soybean Research Program

$400,156

Institution Funded:

Texas Tech University

$400,156

Final Report

Brief Project Summary:

The balance between flower production and abortion is the key determining factor that dictates the final pod number and yield in soybeans. Although soybean plants can produce an enormous number of flowers, 25-35% of flowers are aborted under favorable conditions and up to 80% under drought and heat stress conditions (Kokubun 2011). Large variation exists for flower number among US soybean cultivars (Hansen & Shibles, 1978), with flower abortion ranging from 20 to 80% for midwestern US cultivars and 37 to 61% for early maturing cultivars (McBlain and Hume, 1981). We are refining a novel and robust image-based phenotyping system to capture the genetic variation in flower abortion in a diverse set of entries including adapted cultivars to help us develop advanced breeding lines with lower flower and pod abortion.

Unique Keywords:
#farmers, #geneticists, #physiologists, #public and private soybean improvement groups

Information And Results

Project Summary

A 30 to 80% flower drop in soybeans grown across different regions in the US is an unresolved and persisting bottleneck that has limited soybean's ability to achieve full genetic yield potential. The major challenge has been the lack of robust, field-based high-throughput phenotyping and analysis tools to capture variation in flower abortion and pod retention across genetically diverse germplasm. The multi- regional (KS, MO, TN and TX) and trans-disciplinary team has developed a novel image-based field phenotyping system, integrated with deep-learning approaches to capture large genetic variation in flower abortion and pod retention under different climatic conditions. Currently, the field-based phenotyping tool is improvised to phenotype with minimal human intervention and easily used by researchers without expertise in machine learning. Wide genetic diversity in flower abortion has been captured over the last two seasons facilitating the development of Recombinant Inbred Lines for identifying genomic regions and using contrasting lines for transcriptomic analysis and functional validation of novel hub genes controlling flower abortion. This fundamental knowledge will help discover molecular switches to enhance flower and pod retention and thereby enhance yield potential under diverse environmental conditions. The proposed project will address - Tools and Technology for Soybean Improvement and utilizing these to induce Extreme Weather Resiliency. In summary, increase in flower and pod retention by 20 to 30%, with a potential to enhance yields by 10 to 15%, would ultimately translate to an additional $6.07 billion in revenue across the U.S. soybean industry at the current market price.

Project Objectives

Long-term Goal – Develop soybean cultivars with 20 to 30% lower flower abortion under favorable to challenging environmental conditions, leading to about 10-15% increase in yield potential.

Objectives for Year 3:
• A novel image-based machine learning tool for quantifying flower abortion with minimal to no manual counting in soybeans grown in diverse environmental conditions.
• Develop recombinant inbred line (RIL) populations using contrasting (low and high) flower abortion lines identified from different environmental conditions.
• Identify key hub genes that regulate flower abortion using contrasting lines and functionally characterize using CRISPR/Cas9-mediated knockout (KO) technology.

By leveraging the insights from Years 1 and 2, we will optimize high-throughput phenotyping and machine learning tools, initiate breeding populations with enhanced flower and pod retention, and identify key hub genes to understand gene regulation driving flower retention and overall productivity.

Project Deliverables

• Range in genetic variation with flower abortion and pod retention in soybean grown under different environmental conditions captured using novel high throughput machine learning tool.
• Machine learning tool to capture flower abortion with minimal manual effort developed. • RIL population developed to capture genomic regions controlling flower abortion in soybeans.
• Candidate genes identified and functionally validated for lower flower abortion.

The high-throughput phenotyping, image capture and analysis and molecular mechanisms associated with this dynamic and complex process will serve as foundational knowledge towards addressing the goal.

Progress Of Work

Updated July 12, 2025:
Report (January 1 to June 30 2025)

Project funded by Multiregional Soybean Checkoff Program and the United Soybean Board.

Project title - Field phenotyping using machine learning tools integrated with genetic mapping
to address heat and drought induced flower abortion in soybean.

Participating institutions – Texas Tech University, Kansas State University, University of Missouri, and University of Tennessee.

Goals & Objectives

Long-term Goal – Develop soybean cultivars with 20 to 30% lower flower abortion under favorable to challenging environmental conditions, leading to about 10-15% increase in yield potential.

Objectives (Year 3)

• A novel image-based machine learning tool for quantifying flower abortion with minimal to no manual counting in soybeans grown in diverse environmental conditions.
• Investigate physiological effects of drought stress on contrasting lines in both controlled and field environments to assess tolerance to adverse conditions (new activity).
• Develop recombinant inbred line (RIL) populations using contrasting (low and high) flower abortion lines identified from different environmental conditions.
• Identify key hub genes that regulate flower abortion using contrasting lines and functionally characterize using CRISPR/Cas9-mediated knockout (KO) technology.

Note – For all graphs and images kindly refer to the PDF version.

Objective 1 - A novel image-based machine learning tool for quantifying flower abortion with minimal to no manual counting in soybeans grown in diverse environmental conditions.

Genotypes of high (CL0J17-3-6-8 and PI567638) and low (IA3023 and PI506862) flower abortion were selected from 2023/2024, plus two cultivars as checks, for all locations trials in 2025 (Figure 1), planting happened on the following dates:
- Texas Tech University: June 2nd
- Kansas State University: June 9th
- University of Tennessee: June 3rd
- University of Missouri: June 24th (delayed by rain events)

All locations are preparing QR codes for plots identification. The video imaging pre-testing for camera settings and position will start before flowering for method establishment and imaging improvements, to ensure we develop a robust tool that operated with minimal human involvement.

Models’ development

Texas Tech University – Flower count

For flower count annotations, soybean flowers were categorized into two classes, as shown in Figure 2. The total dataset used for training, validation, and testing is presented in Table 1. The flower detection model we developed utilizes Faster R-CNN architecture. We picked Faster R-CNN due to its established capacity for agricultural object detection. The performance of the Faster R-CNN flower detection model was evaluated using a held-out test set comprising 352 images with 14,299 annotated flowers (7,397 old and 6,902 new). Standard object detection metrics—including precision, recall, F1-score, and Average Precision (AP)—were used to assess model accuracy. The network had a precision of 89.6%, a recall of 86.5%, and a resulting F1-score of 88.0%, which was computed across the two flower classes. These metrics also verify the robustness of the model for achieving a good balance between accuracy, sensitivity, and consistency, even under the uncontrolled and high-variance conditions seen on the field. Our Faster R-CNN detector achieved an average precision (AP) of 86% with an IoU threshold of 0.3 (AP30) on the held-out test split, demonstrating high precision for detecting soybean flowers under a wide variety of field conditions as well.

For accomplishing flower enumeration by tracking, we evaluated several general-purpose and popular tracking algorithms (SORT, ByteTrack, OC-SORT, and DeepSORT) that have performed well in the MOTChallenge Dataset (Leal-Taixé et al., 2015). The results shown in Table 3 and Figure 3 reveal an intriguing performance pattern among the algorithms evaluated. Although OC-SORT and ByteTrack achieved slightly better final flower count accuracy in terms of RMSE, SORT consistently outperformed both in terms of MOTA and RMSE. Given that MOTA is a composite metric that accounts for false negatives, false positives, and ID-switches, SORT’s superior MOTA indicates a higher overall reliability in tracking flower identities across frames, even though MOTChallenge benchmarks typically favor OC-SORT and ByteTrack for their advanced occlusion handling. Our findings suggest that in the context of flower counting in agricultural fields, where camera movement is relatively constant and objects move predictably, the added complexity of algorithms designed for non-linear motion is unnecessary. Instead, SORT’s straightforward motion assumptions align well with these conditions, making it a more reliable choice.

Temporal Dynamics of Model Flower Count

The model successfully captured fluctuations in both new and old flower counts, providing detailed insights into flowering dynamics from testing genotypes with high (LG02-9050), intermediate (PI556511), and low (IA3023) flower manual counts, throughout the reproductive period. It predicted the onset of flowering, peak activity, and cessation. For IA3023 and PI556511 (Figure 4), flower counts on August 2nd were higher than those of LG02-9050, indicating a slower flowering initiation. The peak in flower production was more sharply defined for IA3023 and LG02-9050 on August 12, whereas PI556511 exhibited a broader peak spanning August 9th and 12th. The highest number of new flowers occurred on August 6th for IA3023, on both August 6th and 9th for PI556511, and on August 9th for LG02-9050. Notably, prior to the total flower count, LG02-9050 consistently exhibited higher counts of both new and old flowers compared to the other two genotypes, indicating its superior flower production capacity. This trend highlights the genotype’s greater plasticity to flower during the season.
Tracking old flower counts proved valuable in identifying the transition toward flowering cessation, as the number of old flowers began to exceed new flower counts. For all genotypes, flowering slowed down after August 20th, which shows a synchronized end of flowering. This information is important for assessing how long each genotype remains in the reproductive phase and whether this duration is influenced by environmental conditions. These temporal flowering patterns can be influenced by planting date, environmental stress, photoperiod, pest or diseases pressure, and ultimately impact yield.

Regarding total flower counts (Figure 5), the model successfully distinguished three distinct levels of flowering across genotypes for both new and old flower counts, providing dual validation of genotype-specific flowering performance in the field. However, for future application of new and old flower data to predict flower abortion in soybeans, enhancements are needed to ensure the new flower counts consistently exceed old flower counts, as would be expected biologically. In this study, new flower count for genotypes IA3023 and PI556511were lower than old flower counts, which may indicate limitations due to occlusion or insufficient field of view, given that only a single camera was used for this analysis. The camera was positioned to capture the middle section of the plant, which included a substantial portion of the canopy, but may have failed to detect flowers in upper and lower regions as genotypes have different heights or leaf occlusion. To mitigate occlusion and improve plant coverage, a multi-camera setup will be implemented in 2025 trials. Enhancing the imaging platform would enable the model to detect a more comprehensive set of flowers, thereby increasing counting accuracy and improving the reliability of flower abortion predictions.

While the current model effectively distinguishes between two classes flowers, our future work will expend this classification framework to include a third class for small pods. Our preliminary results testing how small pods could improve future predictions for flower abortion are shown in Figure 6. The same genotypes studied for two-class model were quantified for new, and old flowers and small pods. It is possible to track the transition from new flowers to old flowers and then pod formation at each time point (Figure 6). The developmental sequence provides critical information for predicting the dynamics of flower abortion in soybeans under varying environmental conditions and may help identify atypical responses triggered by stress events. A paper detailing our two-class model findings has been written and is in the final stages of review for submission. A second paper, focusing on the three-class model and occlusion quantification, is currently being developed.

Kansas State University – Pod count

For pod count model development, we have further fine-tuned our prior model for pod segmentation and tracking, while also setting up the ground for numeric model evaluation. To achieve this, we have manually annotated 13 videos from 3 locations using the CVAT tool. Each annotated video has a resolution of 608x1080 at 24 frames per second. The set of annotated videos (Table 4) represents an important resource for the project as it allows both model training and fine-tuning as well as model evaluation. To the best of our knowledge, this dataset is the largest of its kind in literature, and ensures diversity in terms of locations, genotypes, irrigation regime, pod stage, video quality, among others, making it ideal for the task at hand. This diversity can be clearly seen from the last two columns of the table below, which show the number of pods manually counted in the field and the number of pods manually counted in videos (through annotating and tracking pods manually using the CVAT tool).

The last two columns in the table also show that there is a significant difference between the two sets of counts. This can be attributed to the fact that only one side of the plant is being captured in videos and thus many of the actual pods counted in the field are occluded. To account for this, during this year of the project, the plants will be imaged on both sides. We have used the annotated videos to fine-tune our prior YOLOv8 model and to evaluate its results. Numeric evaluation shows good results in terms of the standard variants of average precision used as a metric for object detection/segmentation – AP 59.05%, AP50 74.08%, and AP75 63.05%. After the model was trained/fine-tuned for the object segmentation task, we have used ByteSORT to track the pods across the frames. The linked video (Soybean Pod Tracking) shows a demonstration of how the tracking approach performs. Specifically, the model is able to detect 1033 pods while the actual manual count based on the corresponding video (IA3023 2nd replicate irrigated) is 963. The difference can be explained by the fact that the annotators were asked to count only the pods belonging to the front row of the camera, while the model is also detecting and counting some pods which were further back. For some specific examples of the tracking capabilities of our model, we are showing a sequence of images in Figure 7. The sequence shows three frames which are five frames apart from each other in the video. We can see the pods being numbered as well as the pods having the same number through the frames. All pods which were not detected have been detected in the next frame (this behavior is due to how the parameters for the tracking algorithm are set).

Currently, we are working on a paper that will document the data collected during the 2024 harvest season, the annotated dataset that we assembled as well as the model we trained and evaluated using the dataset. We have all the tools in place to count soybean pods from field videos, and we believe that improving the capturing of the pods in the videos (by imaging multiple sides and using multiple cameras) will further improve the results, taking us closet to the actual pods counts in the field.

Objective 2 - Investigate the physiological effects of drought stress on contrasting lines for flower abortion under controlled conditions.

This study aimed to assess the flowers abortion among six soybean lines including PI506862, PI567638, IA3023, CL0J7-3-6-8, PI548318, and PI80837 under progressive water-deficit stress (dry-down phase) and subsequent re-watering recovery (recovery phase) in a greenhouse setting at the West Tennessee Research and Education Center, University of Tennessee (Figure 8). On April 10, 2025, seeds were sown in a 1:1 mixture of sand and Lexington silt loam at a 2 cm depth, then thinned to one plant per pot at 13 days after planting (DAP). Fertilizers were applied at 12 DAP (0.075% V/V liquid, 0–10–10) and 24 DAP (0.06% W/V water-soluble, 24–8–16). Plants were maintained under a 14-hour light/10-hour dark cycle and received 200 mL of water daily during the pre-treatment phase. At 28 DAP, when plants had 4–5 trifoliate leaves, the dry-down (DD) phase began; pots were saturated, drained to their pot capacity, enclosed in 15-L plastic bags (Fig. 1) to eliminate the evaporation, and fitted with watering tubes for controlled watering and monitoring plants transpiration/water loss. Based on daily transpiration rate (TR), four pots per genotype were designated as well-watered (WW)/controls, and six as DD treatments. The DD plants were watered only if TR exceeded 80 g/day, following Shekoofa et al. (2013). Stress progression was monitored using normalized transpiration rate (NTR), with <0.10 indicating the endpoint of available soil water. Recovery occurred on days 36–41 after planting by re-watering DD pots with 350 mL to full capacity. Data were collected daily including number of flowers, flower rate per day, and wilting score (0–5) during stress and recovery phases. The number of nodes and pods was recorded at the end of the dry-down phase and again at experiment termination (June 18, 2025). Data is being processed, graphs and outcome will be incorporated in the final report.

Objective 3 - Develop recombinant inbred line (RIL) populations using contrasting (low and high) flower abortion lines identified from different environmental conditions.
The greenhouse experiment (Figure 9) was planted on April 22nd, with plants grown in pots arranged in four rows, each approximately 4 feet in length and containing 8 pots. Video recordings began at the R1 growth stage. Manual counting of old and new flowers is being conducted within a 2-foot stretch in each row, in parallel with video recording, twice a week. The videos are captured using two GoPro cameras mounted on a hand-held PVC pipe frame. Student employees are being trained on the greenhouse experiment protocols prior to the commencement of the main field experiment. We are continuing the generation advancement of crosses between putative soybean genotypes with contrasting levels of flower abortion. F2 seed is currently being harvested in the greenhouse. The seed will be inventoried, and a selected subset will be sent to a winter nursery in Puerto Rico for further advancement, with the goal of developing F4 or F5-derived lines for field evaluation in the summer of 2026.

Objective 4 - Identify key hub genes that regulate flower abortion using contrasting lines and functionally characterize using CRISPR/Cas9-mediated knockout (KO) technology.

To investigate the molecular basis of flower abortion in soybeans, we performed a comparative transcriptomic analysis of high-abortion (HA) and low-abortion (LA) genotypes across four floral developmental stages: bud, close petals, open flower, and dry flower. Raw RNA-seq reads were quality-checked, trimmed, and aligned to the soybean reference genome. Gene-level counts were generated, followed by normalization and differential expression analysis using DESeq2. The principal component analysis (PCA) (Figure 10) revealed a clear separation of samples according to genotype and developmental stage, with PC1 accounting for 64% of the variance and effectively distinguishing HA and LA groups.

Furthermore, bar plot analysis of differentially expressed genes (DEGs) (Figure 11) showed comparable numbers of up- and down-regulated genes across stages in both genotypes, with subtle variations in magnitude reflecting the complexity of transcriptional responses associated with flower abortion. A comprehensive heatmap of DEGs (Figure 12) highlighted distinct gene expression clusters (C1–C6) associated with specific floral stages and abortion phenotypes, indicating stage-specific transcriptional reprogramming. We are currently advancing functional enrichment and network analyses to identify key candidate genes and pathways driving these contrasting phenotypes.

View uploaded report PDF file

Final Project Results

Updated December 22, 2025:
Final report (January 2025 to December 2025)

Project title - Field phenotyping using machine learning tools integrated with genetic mapping
to address heat and drought induced flower abortion in soybean.

Participating institutions – Texas Tech University, Kansas State University, University of Missouri, and University of Tennessee.

Goals & Objectives
Long-term Goal – Develop soybean cultivars with 20 to 30% lower flower abortion under favorable to challenging environmental conditions, leading to about 10-15% increase in yield potential.

Objectives (Year 3)
• A novel image-based machine learning tool for quantifying flower abortion with minimal to no manual counting in soybeans grown in diverse environmental conditions.
• Investigate physiological effects of drought stress on contrasting lines in both controlled and field environments to assess tolerance to adverse conditions (new activity).
• Develop recombinant inbred line (RIL) populations using contrasting (low and high) flower abortion lines identified from different environmental conditions.
• Identify key hub genes that regulate flower abortion using contrasting lines and functionally characterize using CRISPR/Cas9-mediated knockout (KO) technology.

Note 1 – All pictures, graphs and tables are included in the PDF version of the final report

Note 2 – Additional details related to the machine learning tools, greenhouse trial and the transcriptomics is included in the attached PDF due to space limitation

Objective 1 - A novel image-based machine learning tool for quantifying flower abortion with minimal to no manual counting in soybeans grown in diverse environmental conditions.
Genotypes of high (CL0J17-3-6-8 and PI567638) and low (IA3023 and PI506862) flower abortion were selected from 2023/2024, plus two cultivars as checks (CTVA38, CTVA40), for all locations trials in 2025, planting happened on the following dates:
- Texas Tech University: June 2nd
- Kansas State University: June 9th
- University of Tennessee: June 3rd
- University of Missouri: June 24th (delayed by rain events)
All locations label plot using QR codes for identification (Figure 1). The video imaging was performed on both sides of the row and in the two-middle rows of each plot with four rows. Texas Tech University had an irrigation treatment in the field to simulate well-watered and drought conditions during flowering (80% and 30% Evapotranspiration). The other locations experiments were rainfed only.
Flower counts were done every three days marking new and olds flowers (Figure 2), using oil-paint permanent marker. All plants in 2 feet of one middle row were used in each plot for manual flower count.
Videos were taken from all locations using four GoPro 11 Cameras (Figure 3) positioned in different angles on the same day of the flower and pod counts. The videos are in the process of downloading to a shared flower to be used in 2026 for further analysis.
Experimental design:
• Texas Tech University:
6 genotypes x 2 irrigation regimes (80%ET and 30% ET), three replicates.
• Kansas State University, University of Tennessee, and University of Missouri:
6 genotypes, three replicates of four rows, with each row of 8 feet length.

Flower count model - Texas Tech University
Model performance across locations
Our current detector for soybean flowers is built on Faster R-CNN in Detectron2 with a ResNet 50 backbone and a feature pyramid network. The model was originally trained on annotated frames from a single field site and season, i.e., late-summer soybean season of 2024 at Kansas State University. On images obtained at Kansas, the model behaves reasonably well, giving a recall of about 87 percent and a precision of about 90 percent. The high precision shows that, in this source domain, the model produces very few false positives.
When the same model is deployed on videos from other locations, performance drops noticeably. This is not surprising, but it is useful to separate the two distinct failure modes. In our tests on new sites, recall was not affected very much. Missed flowers are not the dominant problem. The detector usually picks up most visible flowers, even at new locations.
The major issue is false positives as we move away from the training domain. The detector marks spots and holes on leaves, stems, bright soil patches, weeds, and various background textures as flowers. Soil color, plant spacing, weed pressure, and growth stage all vary from site to site. Sometimes there is more visible sky or more bare soil. All of these create new backgrounds that the model has never seen. Differences in the camera positioning also matter. Different devices have different color responses, compression, and sharpening. Lens focal length and distortion changes the apparent shape and size of flowers and leaves. The mounting height and tilt of the camera changes the viewing angle. Hence, the detector has to recognize flowers from slightly different perspectives, often with more occlusion than in the original site.
What was done when accuracy dropped at a new location
a. Hard negative mining
Because the errors in new locations are dominated by false positives, hard negative mining is often the most practical way to improve behavior at new sites.
In our setting, a hard negative is a region of the image that the current model believes is a flower with reasonably high confidence, but that is actually background. These regions are very informative because they sit exactly at the boundary where the model is confused (for additional details on procedure followed see the PDF version of the report).
b. Domain adaptation
In parallel with this hard negative mining and clustering work, we have also explored more formal domain adaptation. In particular, we tested a scale aware domain adaptive variant of Faster R-CNN with separate domain classifiers at the image and instance level and additional losses for scale specific alignment.
These experiments are useful and remain a candidate for future development, especially if we want a single model that works across many sites with minimal manual work. So far, however, they have not given clear gains in this soybean project (for additional details see the longer PDF version).

Pod count model - Kansas State University
Model development for pod detection and tracking using the 2024 imagery data
Computer vision and deep learning approaches have proven effective in controlled environments, but their performance substantially declines under real-world field conditions due to occlusion, cluttered backgrounds, and lighting variation. Existing soybean pod counting research focuses mainly on static images, lacking support for temporal analysis in videos.
To bridge these gaps, in this project, we developed SoybeanPod-MOTS, the first multi-object tracking and segmentation (MOTS) dataset for soybean pods captured directly in field conditions across multiple locations, genotypes, and reproductive stages. The dataset supports pod detection, segmentation, identity-level tracking, and non-destructive pod counting.
Using the SoybeanPod-MOTS dataset, we evaluated several modern, state-of-the-art object detection models and tracking-by-detection algorithms to identify and track soybean pods across diverse field conditions. The primary goal was to determine which combinations of detectors and trackers produce the most reliable counts and identity associations in videos that vary widely in lighting, background contrast, pod color, and crop stress.
Across all evaluation metrics, YOLOv8 consistently outperformed YOLOv9 and YOLOv11. It delivered the highest accuracy in both detection and segmentation tasks. YOLOv8 achieved an AP@50 of 79.30 and a mean Average Precision (mAP) of 69.50 for detection. In the segmentation task, which is more challenging due to pod size variation and cluttered backgrounds, YOLOv8 recorded an AP@50 of 54.50. These results reflect YOLOv8’s balanced capacity for feature extraction, generalization across genotypes, and robustness in videos with variability in pod color and environmental conditions. Although YOLOv9 and YOLOv11 demonstrated competitive performance in some scenarios, their overall accuracy was more sensitive to moisture stress, lighting inconsistencies, and background interference.
In terms of tracking, five tracking-by-detection algorithms were tested: BoostTrack, BoT-SORT, StrongSORT, ByteTrack, and OC-SORT. These trackers represent diverse approaches in appearance modeling, motion prediction, and detection association strategies. The results of these tracking approaches in terms of standard tracking metrics are shown in table 1. The up and down arrows indicate if higher or smaller values are better.
The best overall tracking performance was achieved by StrongSORT, which produced the highest identity-focused metrics: IDF1 = 80.85, MOTA = 77.61, and HOTA = 65.65. StrongSORT integrates appearance embeddings with robust motion modeling, allowing it to maintain pod identities effectively even under occlusions, crossing trajectories, and rapid camera movement. This makes it particularly suitable for field conditions where pods frequently overlap or sway due to wind.
In terms of precision, ByteTrack was the strongest performer, achieving an IDP of 86.93, along with competitive MOTA and IDF1 values. Its simple yet effective association strategy excels when detections are clean and consistent. ByteTrack offers lightweight computation with strong precision, making it well-suited for large-scale processing where speed is essential.
OC-SORT delivered balanced results across identity, precision, and robustness metrics. Its motion-centric updates help maintain performance when pod movement is irregular or when appearance cues are less reliable due to pod color similarity.
In summary, YOLOv8 paired with StrongSORT provided the most reliable combination for pod detection and tracking across varied field environments. YOLOv8 offered the strongest detection foundation, while StrongSORT provided stable identity tracking under challenging conditions. However, ByteTrack remained a strong alternative when computational efficiency and high precision are priorities.

Preliminary results of the current models using 2025 imagery data
Using the best models trained on the 2024 data, we have carried out some preliminary data analysis using the imagery from year 2025. The following graph (Figure 4) provides a summary of the 2025 video dataset by camera (note: four cameras were used in 2025).
We tracked the videos captured with the four cameras independently. The following graph shows a boxplot summary of the number of pods tracked in the video collection, by camera. As shown, cameras 1 and 3 generally capture a larger number of pods as compared to cameras 2 and 4. Aggregation of the counts per plant will be performed in the near future, as described in more detailed below, and the resulting counts will be compared with ground truth counts per plant as collected manually in the field.

Limitations and Planned Improvements for 2026
A key limitation of the current phenotyping system is the restricted use of video data. In 2024, the data collection platforms recorded four simultaneous camera views, but the present models rely on only one of these views, and imagery was captured from only one side of each row. Consequently, the dataset used for model training and evaluation represents only a fraction of the true pod distribution in the field, leading to incomplete total pod counts.
To overcome this limitation, we plan to incorporate all four camera views into the processing pipeline, enabling reconstruction of a more complete view of each plant side. This multi-view integration is expected to reduce occlusions and improve the accuracy of pod detection and tracking.
Furthermore, the 2025 data collection efforts included imaging from both sides of the plants, providing substantially richer coverage. By merging the 2024 single-side videos with the 2025 dual-side dataset, we aim to enhance the robustness and completeness of pod count estimation for the 2026 system.
However, these improvements introduce additional technical challenges. Chief among them is the need to correctly associate pods across multiple camera views—either from different cameras on the same side or from opposite sides of the plant—so that each pod is linked to a consistent identity when appropriate. Addressing these alignments and correspondence issues will be essential for achieving reliable total pod counts in the next iteration of the phenotyping pipeline.

Objective 2 - Investigate the physiological effects of drought stress on contrasting lines for flower abortion under controlled conditions.
This study aimed to assess the flowers abortion among six soybean lines including PI506862, PI567638, IA3023, CL0J7-3-6-8, PI548318, and PI80837 under progressive water-deficit stress (dry-down phase) and subsequent re-watering recovery (recovery phase) in a greenhouse setting at the West Tennessee Research and Education Center, University of Tennessee. On April 10, 2025, seeds were sown in a 1:1 mixture of sand and Lexington silt loam at a 2 cm depth, then thinned to one plant per pot at 13 days after planting (DAP). Fertilizers were applied at 12 DAP (0.075% V/V liquid, 0–10–10) and 24 DAP (0.06% W/V water-soluble, 24–8–16). Plants were maintained under a 14-hour light/10-hour dark cycle and received 200 mL of water daily during the pre-treatment phase. At 28 DAP, when plants had 4–5 trifoliate leaves, the dry-down (DD) phase began; pots were saturated, drained to their pot capacity, enclosed in 15-L plastic bags (Figure 6) to eliminate the evaporation, and fitted with watering tubes for controlled watering and monitoring plants transpiration/water loss. Based on daily transpiration rate (TR), four pots per genotype were designated as well-watered (WW)/control, and six as drought treatment (DD). The DD plants were watered only if TR exceeded 80 g/day, following Shekoofa et al. (2013). Stress progression was monitored using normalized transpiration rate (NTR), with NTR <0.10 indicating the endpoint of available soil water. Recovery occurred on days 36–41 after (depending on genotype) planting by re-watering DD pots with 350 mL to pot capacity. Data were collected daily including number of flowers, flower rate per day, and wilting score (0–5) during stress and recovery phases. The number of nodes and pods was recorded at the end of the dry-down phase and again at experiment termination (June 18, 2025). Data have been processed and interpreted, and the graphs along with key outcomes are incorporated in the current report.
Results are provided in the attached PDF due to space limitations

Objective 3 - Develop recombinant inbred line (RIL) populations using contrasting (low and high) flower abortion lines identified from different environmental conditions.
Three genetically diverse populations (PI 552538 x PI 556511, IA3023 x PI552538, and HS6-3976 x PI 556511) were created based on performance during the first two years of Phase 1 evaluations. These populations are currently being advanced in the Puerto Rico winter nursery. Additionally, a fourth F1 population (PI 506862 x PI567638, lines used in Obj. 4) will be grown this winter (2025) in Kansas State University greenhouse. Our objective is to obtain F5 progeny by spring 2026, enabling seed increase during the summer of 2026 and coordinated multi-location field evaluations of F4:5 progeny in 2027 across Missouri, Kansas, and Texas.
To expand the genetic resources available for mapping traits related to flower retention and yield potential, we are also initiating the development of four new recombinant inbred line (RIL) populations (PI437578 x PI643395, PI437578 x PI417479, PI643395 x LG05-4317, LG05-4317 x PI417479, lines selected from Texas under stress condition). The parental lines—selected for strong contrasts in flower-abortion rate based on three years of field and greenhouse phenotyping—have been planted in the Kansas State University fall greenhouse to create four additional populations. Controlled crosses to produce F1 progeny will be conducted in early winter, after which these populations will be advanced rapidly via single-seed descent toward the F5–F6 generations for future evaluation. Table 2 shows the lines selected according to contrasting high and low flower abortion rates collected from project’s phase 1 and Table 3 show the eight populations to be advanced to F5-F6 until 2027-2028.

Objective 4 - Identify key hub genes that regulate flower abortion using contrasting lines and functionally characterize using CRISPR/Cas9-mediated knockout (KO) technology.
To elucidate the molecular determinants of flower abortion, we performed a comparative transcriptomic analysis of high and low abortion soybean genotypes across four reproductive stages - bud, closed flower, open flower, dry flower, and the leaf as vegetative tissue for reference. Differential expression and functional enrichment analyses revealed distinct transcriptional landscapes between high and low abortion plants, particularly in genes associated with hormonal signaling, abscission regulation, cell wall remodelling, and reproductive failure (Figure 11). The high abortion plants showed an elevated expression of ethylene responsive and abscission-associated genes, including Inflorescence Deficient in Abscission (IDA), Inflorescence Deficient in Abscission-Like (IDL), Homeobox Protein Knotted-1-Like 6 (KNAT6) and Arabidopsis thaliana Homeobox Gene 1 (ATH1/BEL-1), along with increased activity of pectin-degrading proteins. In contrast, low abortion plants exhibited upregulation of Mitogen-Activated Protein Kinase Phosphatase 1 (MKP1) and KNAT1, consistent with repression of MAPK activity and maintenance of boundary integrity.
The final phase of floral organ separation in Arabidopsis is governed by a conserved signaling network that integrates the IDA-HAE/HSL2-MAPK module with downstream cell wall modifying enzymes. In our study, several orthologs representing this core abscission module and its regulators were differentially expressed between the HA (High abortion) and LA (Low Abortion) genotypes, suggesting their potential roles in modulating floral retention versus shedding. The MAP kinase phosphatase 1 (MKP1) acts as a key negative regulator establishing a signaling threshold prior to activation of abscission cascade. Functionally, MKP1 dephosphorylates MPK3/6 thereby dissipating basal MAPK activity that might otherwise trigger premature cell wall hydrolysis and organ detachment. In our dataset, one of the soybean orthologs (Glyma.08G131200) of AtMKP1 was upregulated and had comparatively higher transcripts in LA, preventing premature activation of abscission responses, thereby enhancing floral retention (see attached PDF for more detailed explanation of the mechanisms).
Taken together, these patterns indicate that in soybean, the differential regulation of KNAT1, KNAT6 and ATH1 establishes opposing transcriptional states that influence floral retention. The LA genotype maintains a KNAT1 like regime that preserves structural integrity and suppresses separation, while the HA genotype transitions toward a KNAT6/ATH1 dominated state that primes abscission-zone differentiation and accelerates the separation process. This interplay mirrors the antagonistic relationships characterized in tomato and Arabidopsis and underscores a conserved developmental logic linking meristem-derived patterning genes to abscission signaling.

View uploaded report PDF file

Final results
• Soybean flower abortion varies widely (26–80%) depending on genotype and environmental stress, confirming strong genetic and climatic influence on reproductive success.
• Phenotypic plasticity in flower retention and/or compensatory flower production is critical for mitigating abortion effects and sustaining yield.
• Flower abortion and total flower number remain major yield determinants under abiotic stress, reinforcing their importance as breeding selection targets.
• Greenhouse drought testing showed strong variation in wilting response: PI548318 and PI567638 wilted most, while IA3023 and CL0J7-3-6-8 exhibited lower wilting and stronger drought resilience.
• Reproductive recovery (greenhouse): PI506862 and CL0J7-3-6-8 recovered well after re-watering also, indicating drought resilience.
• A field phenotyping platform enabled high-throughput, automated flower and pod detection and temporal tracking, overcoming manual counting limitations and enabling real-time flowering dynamics.
• The Faster R-CNN + SORT pipeline reached 85% AP and distinguished new vs. old flowers, supporting genotype differentiation based on flowering intensity and timing.
• Modern computer vision systems—especially YOLOv8 for detection and ByteTrack for tracking—enabled reliable pod quantification across genotypes, environments, and developmental stages.
• Transcriptomic analysis uncovered distinct molecular differences between high- and low-abortion genotypes, especially in hormone signaling, abscission, and reproductive integrity pathways.
• High-abortion genotypes upregulated abscission and cell wall–degradation genes, while low-abortion lines upregulated MKP1 and KNAT1, supporting reduced MAPK activity, improved boundary maintenance, and stronger floral retention.

Major outputs for 2025
Phase 1 of this project has yielded significant progress towards addressing the long-term challenge of flower abortion and our ability to fully capture soybean’s yield potential. The outcomes have resulted in five manuscript submissions, which are currently under review or revision and listed below:

Paper 1: Plasticity in flower number and abortion shape soybean (Glycine max (L.) Merr.) yield under different environmental stresses. (Revised version submitted: Journal of Agronomy and Crop Science)
Outcome: Distinct flowering plasticity strategies were detected, with LG05-4317 and PI506862 identified as promising candidates for breeding high-yielding cultivars with optimized flower abortion.

Paper 2: Automated flower counting method for in-field soybean phenotyping. (Under review: Computers and Electronics in Agriculture)
Outcome: A scalable field-based phenotyping system that combined a mobile imaging platform achieved 85% average precision in distinguishing ‘new’ and ‘old’ flowers and captured genotype-specific flowering dynamics, providing a foundation for estimating flower abortion rates.

Paper 3: Soybean Pod-MOTS: A video dataset and benchmark for soybean pod detection, segmentation, tracking and counting in field videos. (Under review: Smart Agricultural Technology)
Outcome: The first multi-object tracking and segmentation (MOTS) from field videos, covering 637,520 pod instances in six genotypes and multiple environments, with 80% accuracy established a benchmark for scalable, automated pod phenotyping, and flower abortion analysis.

Paper 4: A contrasting transcriptomic atlas of high and low flower abortion soybean genotypes reveals coordinated regulation of abscission and reproductive failure (submission: Plant Cell Reports)
Outcome: Hormone signaling, abscission regulation, and cell wall remodeling were key pathways that influenced floral retention. High-abortion lines had elevated expression of ethylene-responsive and abscission-related genes (e.g., IDA, IDL, KNAT6, ATH1), whereas low-abortion lines upregulated MKP1 and KNAT1, suggesting suppression of MAPK activity and maintenance of boundary integrity.

Paper 5: Drought-induced flower abortion and recovery dynamics in soybean: Insights from a controlled greenhouse experiment. (Under review: Journal of Agronomy and Crop Science)
Outcome: PI506862 was identified as a superior line with enhanced drought tolerance and reproductive resilience, making it a valuable candidate for improving flower and pod retention in breeding programs.

Benefit To Soybean Farmers

Understanding the genetic diversity associated with flower abortion is necessary to discover untapped yield potential in soybean and increase profitability for soybean producers. Although flower abortion is a major cause of soybean yield loss in the US and elsewhere, this challenge has been largely overlooked. Currently, there is limited information available in the public domain and no systematic efforts have been initiated to address this challenge to fully benefit from soybean’s yield potential. Two major bottlenecks for exploring flower retention and yield improvement have been the complexity of the trait and the lack of robust field based high-throughput phenotyping. Therefore, to address this major knowledge gap in soybean improvement, we have assembled a team with expertise in crop physiology, agronomy, conventional and molecular breeding, soybean genetics, computer science with extensive experience in crop-related AI (artificial intelligence) systems, genomics, and molecular biology.

Aiming for a 20 to 30% increase in flower and pod retention, potentially leading to about 10 to 15% increase in yield provides a strong justification for addressing this challenge by exploring genetic diversity in flower retention and developing tools that will allow translating the advantage into local popular US soybeans. The multi-regional team’s goal is to address the issue of flower abortion and pod retention across different environments, which further justifies that the outputs generated can benefit a large sector of the US soybean producers. Currently, soybeans are grown over 80 million acres in the US with a national average of over 53 bu/acre. A 10% increase in yield due to higher retention of flowers and pods would raise the average yield to approximately 58.5 bu/acre, increasing total production by an additional 440 million bushels, generating an extra $6.07 billion in revenue across the U.S. soybean industry at the current market price.

The United Soybean Research Retention policy will display final reports with the project once completed but working files will be purged after three years. And financial information after seven years. All pertinent information is in the final report or if you want more information, please contact the project lead at your state soybean organization or principal investigator listed on the project.