模式识别国家重点实验室
中国科学院自动化研究所
Dynamic Graph Cuts in
Parallel 

Abstract: This paper aims at bridging the two important trends in efficient graph cuts in the literature, the one is to decompose a graph into several smaller subgraphs to take the advantage of parallel computation, the other is to reuse the solution of the maxflow problem on a residual graph to boost the efficiency on another similar graph. Our proposed parallel dynamic graph cuts algorithm takes the advantages of both, and is extremely efficient for certain dynamically changing MRF models in computer vision. The performance of our proposed algorithm is validated on two typical dynamic graph cuts problems: the foreground background segmentation in video, where similar graph cuts problems need to be solved in sequential and GrabCut, where graph cuts are used iteratively. 

HSfM:
Hybrid StructurefromMotion 

Abstract: StructurefromMotion (SfM) methods can be broadly categorized as incremental or global according to their ways to estimate initial camera poses. While incremental system has advanced in robustness and accuracy, the efficiency remains its key challenge. To solve this problem, global reconstruction system simultaneously estimates all camera poses from the epipolar geometry graph, but it is usually sensitive to outliers. In this work, we propose a new hybrid SfM method to tackle the issues of efficiency, accuracy and robustness in a unified framework. More specifically, we propose an adaptive communitybased rotation averaging method first to estimate camera rotations in a global manner. Then, based on these estimated camera rotations, camera centers are computed in an incremental way. Extensive experiments show that our hybrid method performs similarly or better than many of the stateoftheart global SfM approaches, in terms of computational efficiency, while achieves similar reconstruction accuracy and robustness with two other stateoftheart incremental SfM approaches. 

L2Net:
Deep Learning of Discriminative Patch Descriptor in
Euclidean Space Yurun Tian, Bin Fan, Fuchao Wu CVPR 2017 

Abstract: The research focus of designing local patch descriptors has gradually shifted from handcrafted ones (e.g., SIFT) to learned ones. In this paper, we propose to learn high performance descriptor in Euclidean space via the Convolutional Neural Network (CNN). Our method is distinctive in four aspects: (i) We propose a progressive sampling strategy which enables the network to access billions of training samples in a few epochs. (ii) Derived from the basic concept of local patch matching problem, we emphasize the relative distance between descriptors. (iii) Extra supervision is imposed on the intermediate feature maps.(iv) Compactness of the descriptor is taken into account. The proposed network is named as L2Net since the output descriptor can be matched in Euclidean space by L2 distance. L2Net achieves stateoftheart performance on the Brown datasets [16], Oxford dataset [18] and the newly proposed Hpatches dataset [11]. The good generalization ability shown by experiments indicates that L2Net can serve as a direct substitution of the existing handcrafted descriptors. The pretrained L2Net is publicly available. 

Statistics of Visual Responses
to Object Stimuli from Primate AIT Neurons to DNN Neurons Qiulei Dong, Hong Wang, Zhanyi Hu Neural Computation 2017 

Currently deep neural network (DNN) has achieved comparable image object categorization performance with human beings, however its exceptionally good categorization ability is not well understood. Recently, a goaldriven paradigm is proposed for the understanding of visual object recognition pathway [DiCarlo et al.2016], in which it is advocated that by only controlling the last layer’s categorization performance in the learning phase of a hierarchical linernonlinear networks, not only its last layer’s output can quantitatively predict IT neuron responses, but its intermediate layers can only automatically predict the responses of the intermediate visual areas, such as V4. In this work, we would explore whether the DNN neurons could possess similar image object representational statistics to monkey IT neurons, in particular, when the network becomes deeper, and the image category becomes larger, via VGG19, a typical deep network of 19 layers. Lehky et al.[2011,2014] systematically investigated the monkey’s IT neuron response statistics by three different measures: single neuron response selectivity, population response sparseness, and the intrinsic dimensionality of neural object representation. In this work, we used the above same three measures to evaluate the DNN neurons responses to images in ImageNet, which contains million images of 1000 different categories. Our results show that VGG19 neurons have quite different response statistics to image objects compared with IT neurons in [Lehky et al. 2011,2014], which seems indicate that a good hierarchical categorization network does not necessarily demand similar response statistics to images with the IT neurons. 

Comparison
of IT Neural Response Statistics with Simulations Qiulei Dong, Bo Liu, Zhanyi Hu Frontiers in Computational Neuroscience 2017 

Abstract: Lehky et al. (2011) provided a statistical analysis on the responses of the recorded 674 neurons to 806 image stimuli in anterior inferotemporalm (AIT) cortex of two monkeys. In terms of kurtosis and Pareto tail index, they observed that the population sparseness of both unnormalized and normalized responses is always larger than their singleneuron selectivity, hence concluded that the critical features for individual neurons in primate AIT cortex are not very complex, but there is an indefinitely large number of them. In this work, we explore an “inverse problem” by simulation, that is, by simulating each neuron indeed only responds to a very limited number of stimuli among a very large number of neurons and stimuli, to assess whether the population sparseness is always larger than the singleneuron selectivity. Our simulation results show that the population sparseness exceeds the singleneuron selectivity in most cases even if the number of neurons and stimuli are much larger than several hundreds, which confirms the observations in Lehky et al. (2011). In addition, we found that the variances of the computed kurtosis and Pareto tail index are quite large in some cases, which reveals some limitations of these two criteria when used for neuron response evaluation. 