Video Foundation Models & Data for Multimodal Understanding
Spatio-Temporal Action Localization System