The First CVPR 2023 Big Model Challenge | Minivision Technology Ranked in the "Top Three", Tackling the Challenges of Intelligent Transportation-Minivision Technology (Jiangsu) Co., Ltd

The First CVPR 2023 Big Model Challenge | Minivision Technology Ranked in the "Top Three", Tackling the Challenges of Intelligent Transportation

Company News 2023-06-30 1914 views

Recently, Minivision Technology competed fiercely with more than 70 teams worldwide in the first CVPR 2023 WorkShop Big Model Challenge, achieving excellent results of first place on the A list and third place on the B list.

首届CVPR 2023大模型挑战赛小视科技位列“前三”，把脉智能交通难题1.jpg

In the CVPR 2023 1st foundation model challenge - TRACK 2 leaderboard, Minivision Technology MiniModel stands out

This challenge is the first large-scale model workshop held by Baidu on CVPR 2023 this year. The competition focuses on intelligent transportation, and the track where Minivision Technology is located focuses on understanding and perception of scene text images, aiming to improve the accuracy of text image retrieval in traffic scenes.

01 How can the 01 model serve intelligent transportation?

The ChatGPT craze has made us feel the charm of big models for the first time. Language robots can chat with you like old friends. So, what will big models do in the field of intelligent transportation?

There is a large demand for retrieving vehicles and pedestrians in traffic scenes, and high-performance image retrieval capabilities play a crucial role in traffic law enforcement and public security governance.

首届CVPR 2023大模型挑战赛小视科技位列“前三”，把脉智能交通难题4.jpg

Traditional image retrieval methods have high annotation costs and are not convenient for category expansion. With the development of multimodal large model technology, the unified representation and modal transformation of text and images have been widely studied and applied, which can effectively utilize the massive image text description data on the Internet to train foundational models. This not only reduces the cost of downstream fine-tuning tasks, but the model itself also has strong Zero shot (zero sample learning) ability, which can better identify new things. This model further improves the accuracy and flexibility of image retrieval, serving intelligent transportation.

02 Minivision Technology Algorithm Scheme

The dataset of this competition includes traffic participants such as pedestrians and vehicles, as well as a large amount of noise data, which increases the difficulty of the task. There are significant differences in vehicle data, with both monitoring and non monitoring perspectives, and high requirements for the migration ability of the basic model.

Vehicle data varies greatly

We completed the traffic scene retrieval task using multimodal unified feature expression optimization technology.

Our approach focuses on data processing, model structure, training strategies, and model fusion, with the addition of model generated data and open source data to further enhance the representation ability of the foundation model in the field. We use multiple heterogeneous models for later fusion and reorder the search results.

In addition, we use prompt enhancement technology during training to optimize word segmentation ambiguity and enhance attribute feature representation ability, use loss truncation to suppress noisy data, and use frozen parameters to suppress overfitting.

首届CVPR 2023大模型挑战赛小视科技位列“前三”，把脉智能交通难题7.png

We use methods such as data simulation and generation to leverage the potential of the foundational model, and adopt novel model integration methods to improve the accuracy of downstream retrieval tasks by using techniques such as loss truncation to suppress noisy data and prompt enhancement.

Adopting the technical route of multimodal contrastive learning, fully utilizing the capabilities of large models, it performs very well in responding to scene changes and processing multiple scenarios simultaneously. This scheme fully leverages the potential of multimodal unified feature expression optimization technology and can be better applied to real traffic scenarios, with high practical value.

首届CVPR 2023大模型挑战赛小视科技位列“前三”，把脉智能交通难题8.png

In addition, these methods also have certain reference value in other scenarios. The Minivision team will continue to conduct in-depth research on multimodal large model technology, explore more vertical scene applications, and let more people experience the new experiences and new lives brought by cutting-edge AI technology.

The application solution is being upgraded

News Center

The First CVPR 2023 Big Model Challenge | Minivision Technology Ranked in the "Top Three", Tackling the Challenges of Intelligent Transportation

Minivision Technology Wins the Championship! Winning the "One Crown, One Season" Big Model Competition | CVPR 2023 VIZWIZ Grand Challenge

Hot article