机器视觉课题组

模式识别国家重点实验室

中国科学院自动化研究所

  

最新科研进展

Dynamic Graph Cuts in Parallel
Miao Yu, Shuhan Shen, Zhanyi Hu
IEEE Transactions on Image Processing, 26(8): 3775-3788, 2017.

Abstract: This paper aims at bridging the two important trends in efficient graph cuts in the literature, the one is to decompose a graph into several smaller subgraphs to take the advantage of parallel computation, the other is to reuse the solution of the max-flow problem on a residual graph to boost the efficiency on another similar graph. Our proposed parallel dynamic graph cuts algorithm takes the advantages of both, and is extremely efficient for certain dynamically changing MRF models in computer vision. The performance of our proposed algorithm is validated on two typical dynamic graph cuts problems: the foreground background segmentation in video, where similar graph cuts problems need to be solved in sequential and GrabCut, where graph cuts are used iteratively.

  

HSfM: Hybrid Structure-from-Motion
Hainan Cui, Xiang Gao, Shuhan Shen, Zhanyi Hu
CVPR 2017 (Spotlight)

Abstract: Structure-from-Motion (SfM) methods can be broadly categorized as incremental or global according to their ways to estimate initial camera poses. While incremental system has advanced in robustness and accuracy, the efficiency remains its key challenge. To solve this problem, global reconstruction system simultaneously estimates all camera poses from the epipolar geometry graph, but it is usually sensitive to outliers. In this work, we propose a new hybrid SfM method to tackle the issues of efficiency, accuracy and robustness in a unified framework. More specifically, we propose an adaptive community-based rotation averaging method first to estimate camera rotations in a global manner. Then, based on these estimated camera rotations, camera centers are computed in an incremental way. Extensive experiments show that our hybrid method performs similarly or better than many of the state-of-the-art global SfM approaches, in terms of computational efficiency, while achieves similar reconstruction accuracy and robustness with two other state-of-the-art incremental SfM approaches.

  
L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space
Yurun Tian, Bin Fan, Fuchao Wu
CVPR 2017

Abstract: The research focus of designing local patch descriptors has gradually shifted from handcrafted ones (e.g., SIFT) to learned ones. In this paper, we propose to learn high performance descriptor in Euclidean space via the Convolutional Neural Network (CNN). Our method is distinctive in four aspects: (i) We propose a progressive sampling strategy which enables the network to access billions of training samples in a few epochs. (ii) Derived from the basic concept of local patch matching problem, we emphasize the relative distance between descriptors. (iii) Extra supervision is imposed on the intermediate feature maps.(iv) Compactness of the descriptor is taken into account. The proposed network is named as L2-Net since the output descriptor can be matched in Euclidean space by L2 distance. L2-Net achieves state-of-the-art performance on the Brown datasets [16], Oxford dataset [18] and the newly proposed Hpatches dataset [11]. The good generalization ability shown by experiments indicates that L2-Net can serve as a direct substitution of the existing handcrafted descriptors. The pre-trained L2-Net is publicly available.

  
Statistics of Visual Responses to Object Stimuli from Primate AIT Neurons to DNN Neurons
Qiulei Dong, Hong Wang, Zhanyi Hu
Neural Computation 2017

Currently deep neural network (DNN) has achieved comparable image object categorization performance with human beings, however its exceptionally good categorization ability is not well understood. Recently, a goal-driven paradigm is proposed for the understanding of visual object recognition pathway [DiCarlo et al.2016], in which it is advocated that by only controlling the last layer’s categorization performance in the learning phase of a hierarchical liner-nonlinear networks, not only its last layer’s output can quantitatively predict IT neuron responses, but its intermediate layers can only automatically predict the responses of the intermediate visual areas, such as V4. In this work, we would explore whether the DNN neurons could possess similar image object representational statistics to monkey IT neurons, in particular, when the network becomes deeper, and the image category becomes larger, via VGG19, a typical deep network of 19 layers. Lehky et al.[2011,2014] systematically investigated the monkey’s IT neuron response statistics by three different measures: single neuron response selectivity, population response sparseness, and the intrinsic dimensionality of neural object representation. In this work, we used the above same three measures to evaluate the DNN neurons responses to images in ImageNet, which contains million images of 1000 different categories. Our results show that VGG19 neurons have quite different response statistics to image objects compared with IT neurons in [Lehky et al. 2011,2014], which seems indicate that a good hierarchical categorization network does not necessarily demand similar response statistics to images with the IT neurons.

 
Comparison of IT Neural Response Statistics with Simulations
Qiulei Dong, Bo Liu, Zhanyi Hu
Frontiers in Computational Neuroscience 2017

Abstract: Lehky et al. (2011) provided a statistical analysis on the responses of the recorded 674 neurons to 806 image stimuli in anterior inferotemporalm (AIT) cortex of two monkeys. In terms of kurtosis and Pareto tail index, they observed that the population sparseness of both unnormalized and normalized responses is always larger than their single-neuron selectivity, hence concluded that the critical features for individual neurons in primate AIT cortex are not very complex, but there is an indefinitely large number of them. In this work, we explore an “inverse problem” by simulation, that is, by simulating each neuron indeed only responds to a very limited number of stimuli among a very large number of neurons and stimuli, to assess whether the population sparseness is always larger than the single-neuron selectivity. Our simulation results show that the population sparseness exceeds the single-neuron selectivity in most cases even if the number of neurons and stimuli are much larger than several hundreds, which confirms the observations in Lehky et al. (2011). In addition, we found that the variances of the computed kurtosis and Pareto tail index are quite large in some cases, which reveals some limitations of these two criteria when used for neuron response evaluation.