Google brain nas-fpn outperforms sota models for object detection

feature-image

Play all audios:

Loading...

Current state-of-the-art convolutional architectures for object detection tasks are human-designed. In a recent paper, Google Brain researchers leveraged the advantages of Neural


Architecture Search (NAS) to propose NAS-FPN, a new automatic search method for feature pyramid architecture. NAS-FPN achieves a better accuracy and latency tradeoff than current SOTA models


for object detection. From the paper’s abstract: > _“Here we aim to learn a better architecture of feature pyramid > network for object detection. We adopt Neural Architecture Search


> and discover a new feature pyramid architecture in a novel scalable > search space covering all cross-scale connections. The discovered > architecture, named NAS-FPN, consists of 


a combination of top-down > and bottom-up connections to fuse features across scales. NAS-FPN, > combined with various backbone models in the RetinaNet framework, > achieves better 


accuracy and latency tradeoff compared to > state-of-the-art object detection models. NAS-FPN improves mobile > detection accuracy by 2 AP compared to state-of-the-art SSDLite with


> MobileNetV2 model in [32] and achieves 48.3 AP which surpasses Mask > R-CNN [10] detection accuracy with less computation time.” > (arXiv)._ SYNCED INVITED DR. DAWEI DU, A


POSTDOCTORAL RESEARCHER AT THE STATE UNIVERSITY OF NEW YORK WITH A RESEARCH FOCUS ON VISUAL TRACKING, OBJECT DETECTION AND VIDEO SEGMENTATION APPLICATIONS, TO SHARE HIS THOUGHTS ON GOOGLE


BRAIN’S NAS-FPN. _HOW WOULD YOU DESCRIBE NAS-FPN?_ FPN is a pyramid representation for deep learning that combines low-resolution but strong semantic features and weak semantic but


high-resolution features via top-down and lateral connections. Moreover, NAS-FPN is an automatic neural architecture search algorithm that focuses on finding optimal connections between


different layers for pyramidal representations. Specifically, the RNN controller is trained to select the best architecture using reinforcement learning. First the child networks are sampled


by combining any two different layers. Then the accuracy score is regarded as the _reward_ of reinforcement learning to calculate the policy gradient to update the parameters of the


controller. During the training iterations, it is possible to generate the structure with better accuracy via the controller. Experiments on the COCO test set show the proposed method


achieves considerable accuracy improvement compared to existing object detection models, e.g., MobileNetV2, and RetinaNet. _WHY DOES THIS RESEARCH MATTER?_ Deep learning dominates various


tasks in computer vision. However, the majority of the previous works focuses on training the parameters of networks in human-designed architectures. Recently, there has been increasing


interest in designing the structure of neural networks automatically. Xie et al. explored randomly wired neural networks for image recognition. Liu et al. proposed searching the network


level structure in addition to the cell level structure for semantic segmentation. Differing from the aforementioned work, this paper provides another research direction that makes it


possible to search the optimal cross-layer connections to achieve discriminative multiscale feature representation of neural networks. _WHAT IMPACT MIGHT THIS RESEARCH BRING TO THE RESEARCH


COMMUNITY?_ Combing multi-scale features from different layers is one of the important techniques in deep learning for effectively improving the performance of many computer vision tasks.


However, the previously proposed human-designed structures may be not optimal, resulting in limited performance. Inspired by NAS-FPN, researchers can transfer the optimal network structures


to related tasks such as visual tracking and semantic segmentation. _CAN YOU IDENTIFY ANY BOTTLENECKS IN THE RESEARCH?_ The computational complexity of NAS is extremely high (100 TPUs used


in this paper), especially for complex backbones (e.g., ResNet-101). Therefore, it is very difficult to follow for labs without many computational resources. Besides, we still know little


about the insights of the optimal network generated by the NAS method. Why such layer connection combinations achieve better performance than human-designed ones? Can we learn from the


network design and transfer to other tasks (e.g., tracking, segmentation and classification)? The interpretation of it remains unsolved due to the complex cross-layer connections. _CAN YOU


PREDICT ANY POTENTIAL FUTURE DEVELOPMENTS RELATED TO THIS RESEARCH?_ I believe there will be much work using the NAS method in the future. According to prior knowledge of a specific task,


researchers can reduce the NAS search space by pruning some unnecessary connections. Besides, some effective modules will be found based on similar connections of optimal networks in


different tasks. It is interesting to think about designing a network that considers the tradeoff between complexity and accuracy, especially in embedded systems with limited resources. The


paper_ NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection_ is on arXiv. ABOUT DR. DAWEI DU Dawei Du is a Postdoctoral Researcher with the Computer Science


Department at the University at Albany, State University of New York. He received his PhD from the University of the Chinese Academy of Sciences. His research mainly focuses on visual


tracking, object detection and video segmentation. He is organizing the “Vision meets Drones: A Challenge” Workshop conjunction with ICCV 2019. Read his recent research on automatic checkout


here: Data Priming Network for Automatic Check-Out. SYNCED INSIGHT PARTNER PROGRAM The _Synced Insight Partner Program _is an invitation-only program that brings together influential


organizations, companies, academic experts and industry leaders to share professional experiences and insights through interviews and public speaking engagements, etc. Synced invites all


industry experts, professionals, analysts, and others working in AI technologies and machine learning to participate. Simply APPLY FOR THE SYNCED INSIGHT PARTNER PROGRAM and let us know


about yourself and your focus in AI. We will give you a response once your application is approved. _2018 FORTUNE GLOBAL 500 PUBLIC COMPANY AI ADAPTIVITY REPORT__ _is out! Purchase a


Kindle-formatted report on AMAZON. Apply for INSIGHT PARTNER PROGRAM to get a complimentary full PDF report. FOLLOW US ON TWITTER @SYNCED_GLOBAL FOR DAILY AI NEWS! We know you don’t want to


miss any stories. SUBSCRIBE TO OUR POPULAR _SYNCED GLOBAL AI WEEKLY_ TO GET WEEKLY AI UPDATES.