Company News

News Center

Bring together comprehensive, cutting-edge and in-depth information and media focused reports

The First CVPR 2023 Big Model Challenge | Minivision Technology Ranked in the "Top Three", Tackling the Challenges of Intelligent Transportation

Company News 2023-06-30 1577 views

Recently, Minivision Technology competed fiercely with more than 70 teams worldwide in the first CVPR 2023 WorkShop Big Model Challenge, achieving excellent results of first place on the A list and third place on the B list.


首届CVPR 2023大模型挑战赛 小视科技位列“前三”,把脉智能交通难题1.jpg


In the CVPR 2023 1st foundation model challenge - TRACK 2 leaderboard, Minivision Technology MiniModel stands out


This challenge is the first large-scale model workshop held by Baidu on CVPR 2023 this year. The competition focuses on intelligent transportation, and the track where Minivision Technology is located focuses on understanding and perception of scene text images, aiming to improve the accuracy of text image retrieval in traffic scenes.


 01 How can the 01 model serve intelligent transportation?


The ChatGPT craze has made us feel the charm of big models for the first time. Language robots can chat with you like old friends. So, what will big models do in the field of intelligent transportation?


There is a large demand for retrieving vehicles and pedestrians in traffic scenes, and high-performance image retrieval capabilities play a crucial role in traffic law enforcement and public security governance.


首届CVPR 2023大模型挑战赛  小视科技位列“前三”,把脉智能交通难题4.jpg


Traditional image retrieval methods have high annotation costs and are not convenient for category expansion. With the development of multimodal large model technology, the unified representation and modal transformation of text and images have been widely studied and applied, which can effectively utilize the massive image text description data on the Internet to train foundational models. This not only reduces the cost of downstream fine-tuning tasks, but the model itself also has strong Zero shot (zero sample learning) ability, which can better identify new things. This model further improves the accuracy and flexibility of image retrieval, serving intelligent transportation.


02 Minivision Technology Algorithm Scheme


The dataset of this competition includes traffic participants such as pedestrians and vehicles, as well as a large amount of noise data, which increases the difficulty of the task. There are significant differences in vehicle data, with both monitoring and non monitoring perspectives, and high requirements for the migration ability of the basic model.


Vehicle data varies greatly


We completed the traffic scene retrieval task using multimodal unified feature expression optimization technology.


Our approach focuses on data processing, model structure, training strategies, and model fusion, with the addition of model generated data and open source data to further enhance the representation ability of the foundation model in the field. We use multiple heterogeneous models for later fusion and reorder the search results.


In addition, we use prompt enhancement technology during training to optimize word segmentation ambiguity and enhance attribute feature representation ability, use loss truncation to suppress noisy data, and use frozen parameters to suppress overfitting.


首届CVPR 2023大模型挑战赛  小视科技位列“前三”,把脉智能交通难题7.png


We use methods such as data simulation and generation to leverage the potential of the foundational model, and adopt novel model integration methods to improve the accuracy of downstream retrieval tasks by using techniques such as loss truncation to suppress noisy data and prompt enhancement.


Adopting the technical route of multimodal contrastive learning, fully utilizing the capabilities of large models, it performs very well in responding to scene changes and processing multiple scenarios simultaneously. This scheme fully leverages the potential of multimodal unified feature expression optimization technology and can be better applied to real traffic scenarios, with high practical value.

首届CVPR 2023大模型挑战赛  小视科技位列“前三”,把脉智能交通难题8.png


In addition, these methods also have certain reference value in other scenarios. The Minivision team will continue to conduct in-depth research on multimodal large model technology, explore more vertical scene applications, and let more people experience the new experiences and new lives brought by cutting-edge AI technology.