Abstract
Multimodal freight transportation plays a vital role in supporting the U.S. economy. Truck and rail are the two most dominant modes, which are responsible for approximately 70 percent of national freight ton-miles over the past two decades and enable long-distance movement of goods across the country. As freight volumes continue to grow, they contribute to rising environmental pollution, public health risks, infrastructure deterioration, and safety concerns, especially in communities located near major freight corridors. These growing challenges highlight the urgent need for high-resolution monitoring systems that can accurately capture the complexity and movement of freight across different transportation modes. However, existing data sources present distinct limitations for both truck and rail freight. Truck freight data often relies on surveys or axle- and body-type-based datasets, which provide information related vehicle structure and physical characteristics and fail to capture key attributes such as maximum legal weight. Rail freight data is even more limited, which is often derived from aggregated reports that are delayed and typically lack detailed rail vehicle configuration information as well as spatiotemporal characteristics. To address major gaps in freight data, including the lack of weight-related classification in truck freight and the absence of detailed rail vehicle configuration in rail freight, this dissertation developed novel sensing and machine learning approaches that enable high-resolution monitoring of multimodal freight movements. It utilizes non-intrusive infrastructure-based sensors, such as advanced inductive loop sensors and roadside infrared cameras, to enable continuous freight activity monitoring. The modeling approach emphasizes accuracy and domain adaptability, which starts with supervised deep learning approaches and extends to investigation of label-free methods using emerging vision-language models (VLMs) to reduce reliance on manual annotations. First, a deep-learning approach was developed for direct classification of trucks by their maximum legal weight using data from advanced inductive loop sensors and side-view video cameras. This approach achieved highly accurate performance that surpasses the state-of-the-art mapping methods and enables the direct and accurate measurement of this type of data rather than inferring or mapping indirectly from other classification schemes. Second, a vision-based deep-learning approach was developed for real-time rail freight monitoring that integrates depth-aware background subtraction and a rail object detection model to identify locomotives and railcars across diverse environmental conditions. The method achieved counts errors of under 5 percent for rail vehicles in both day and night modes. While these supervised methods demonstrated strong performance, they require extensive labeled data. To address this limitation, the study investigated a zero-shot framework to eliminate the need for manual annotation and showed promising performance with an average F1 score of 0.99 in tests on truck classification based on engine types and cargo configurations using structured text prompts. Although effective, this approach depends heavily on hand-crafted descriptions of vehicle characteristics. To overcome this challenge, an automated elicited knowledge framework was designed to automatically improve VLM performance by refining its prompts based on errors, which improved the model performance compared without elicited knowledge, and allows the system to adapt to complex freight vehicle identification tasks without retraining. In summary, this dissertation presents advanced sensing and modeling approaches that achieve over 90 percent accuracy in addressing data gaps for high-resolution multimodal freight activity monitoring that supports sustainable freight transportation.