arXiv Paper Daily: Fri, 4 Sep 2020 - JOYK Joy of Geek, Geek News, Link all geek

Neural and Evolutionary Computing

Tree Neural Networks in HOL4

Thibault Gauthier Subjects : Neural and Evolutionary Computing (cs.NE)

We present an implementation of tree neural networks within the proof

assistant HOL4. Their architecture makes them naturally suited for

approximating functions whose domain is a set of formulas. We measure the

performance of our implementation and compare it with other machine learning

predictors on the tasks of evaluating arithmetical expressions and estimating

the truth of propositional formulas.

Sparse Meta Networks for Sequential Adaptation and its Application to Adaptive Language Modelling

Tsendsuren Munkhdalai

Comments: 9 pages, 4 figures, 2 tables

Subjects

Neural and Evolutionary Computing (cs.NE)

; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)

Training a deep neural network requires a large amount of single-task data

and involves a long time-consuming optimization phase. This is not scalable to

complex, realistic environments with new unexpected changes. Humans can perform

fast incremental learning on the fly and memory systems in the brain play a

critical role. We introduce Sparse Meta Networks — a meta-learning approach to

learn online sequential adaptation algorithms for deep neural networks, by

using deep neural networks. We augment a deep neural network with a

layer-specific fast-weight memory. The fast-weights are generated sparsely at

each time step and accumulated incrementally through time providing a useful

inductive bias for online continual adaptation. We demonstrate strong

performance on a variety of sequential adaptation scenarios, from a simple

online reinforcement learning to a large scale adaptive language modelling.

Multidisciplinary Design Optimization of Reusable Launch Vehicles for Different Propellants and Objectives

Kai Dresia , Simon Jentzsch , Günther Waxenegger-Wilfing , Robson Hahn , Jan Deeken , Michael Oschwald , Fabio Mota Subjects : Neural and Evolutionary Computing (cs.NE) ; Systems and Control (eess.SY)

Identifying the optimal design of a new launch vehicle is most important

since design decisions made in the early development phase limit the vehicles’

later performance and determines the associated costs. Reusing the first stage

via retro-propulsive landing increases the complexity even more. Therefore, we

develop an optimization framework for partially reusable launch vehicles, which

enables multidisciplinary design studies. The framework contains suitable mass

estimates of all essential subsystems and a routine to calculate the needed

propellant for the ascent and landing maneuvers. For design optimization, the

framework can be coupled with a genetic algorithm. The overall goal is to

reveal the implications of different propellant combinations and objective

functions on the launcher’s optimal design for various mission scenarios. The

results show that the optimization objective influences the most suitable

propellant choice and the overall launcher design, concerning staging, weight,

size, and rocket engine parameters. In terms of gross lift-off weight, liquid

hydrogen seems to be favorable. When optimizing for a minimum structural mass

or an expandable structural mass, hydrocarbon-based solutions show better

results. Finally, launch vehicles using a hydrocarbon fuel in the first stage

and liquid hydrogen in the upper stage are an appealing alternative, combining

both fuels’ benefits.

End-to-End Learning of Neuromorphic Wireless Systems for Low-Power Edge Artificial Intelligence

Nicolas Skatchkovsky , Hyeryung Jang , Osvaldo Simeone

Comments: To be presented at Asilomar 2020

Subjects

Neural and Evolutionary Computing (cs.NE)

; Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)

This paper introduces a novel “all-spike” low-power solution for remote

wireless inference that is based on neuromorphic sensing, Impulse Radio (IR),

and Spiking Neural Networks (SNNs). In the proposed system, event-driven

neuromorphic sensors produce asynchronous time-encoded data streams that are

encoded by an SNN, whose output spiking signals are pulse modulated via IR and

transmitted over general frequence-selective channels; while the receiver’s

inputs are obtained via hard detection of the received signals and fed to an

SNN for classification. We introduce an end-to-end training procedure that

treats the cascade of encoder, channel, and decoder as a probabilistic

SNN-based autoencoder that implements Joint Source-Channel Coding (JSCC). The

proposed system, termed NeuroJSCC, is compared to conventional synchronous

frame-based and uncoded transmissions in terms of latency and accuracy. The

experiments confirm that the proposed end-to-end neuromorphic edge architecture

provides a promising framework for efficient and low-latency remote sensing,

communication, and inference.

Auto-Classifier: A Robust Defect Detector Based on an AutoML Head

Vasco Lopes , Luís A. Alexandre

Comments: 12 pages, 2 figures. Published in ICONIP2020, proceedings published in the Springer’s series of Lecture Notes in Computer Science

Subjects

Computer Vision and Pattern Recognition (cs.CV)

; Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

The dominant approach for surface defect detection is the use of hand-crafted

feature-based methods. However, this falls short when conditions vary that

affect extracted images. So, in this paper, we sought to determine how well

several state-of-the-art Convolutional Neural Networks perform in the task of

surface defect detection. Moreover, we propose two methods: CNN-Fusion, that

fuses the prediction of all the networks into a final one, and Auto-Classifier,

which is a novel proposal that improves a Convolutional Neural Network by

modifying its classification component using AutoML. We carried out experiments

to evaluate the proposed methods in the task of surface defect detection using

different datasets from DAGM2007. We show that the use of Convolutional Neural

Networks achieves better results than traditional methods, and also, that

Auto-Classifier out-performs all other methods, by achieving 100% accuracy and

100% AUC results throughout all the datasets.

Physarum Multi-Commodity Flow Dynamics

Vincenzo Bonifaci , Enrico Facca , Frederic Folz , Andreas Karrenbauer , Pavel Kolev , Kurt Mehlhorn , Giovanna Morigi , Golnoosh Shahkarami , Quentin Vermande Subjects : Data Structures and Algorithms (cs.DS) ; Neural and Evolutionary Computing (cs.NE)

In wet-lab experiments cite{Nakagaki-Yamada-Toth,Tero-Takagi-etal}, the

slime mold Physarum polycephalum has demonstrated its ability to solve shortest

path problems and to design efficient networks, see Figure

ef{Wet-Lab

Experiments} for illustrations. Physarum polycephalum is a slime mold in the

Mycetozoa group. For the shortest path problem, a mathematical model for the

evolution of the slime was proposed in cite{Tero-Kobayashi-Nakagaki} and its

biological relevance was argued. The model was shown to solve shortest path

problems, first in computer simulations and then by mathematical proof. It was

later shown that the slime mold dynamics can solve more general linear programs

and that many variants of the dynamics have similar convergence behavior. In

this paper, we introduce a dynamics for the network design problem. We

formulate network design as the problem of constructing a network that

efficiently supports a multi-commodity flow problem. We investigate the

dynamics in computer simulations and analytically. The simulations show that

the dynamics is able to construct efficient and elegant networks. In the

theoretical part we show that the dynamics minimizes an objective combining the

cost of the network and the cost of routing the demands through the network. We

also give alternative characterization of the optimum solution.

Computer Vision and Pattern Recognition

Flow-edge Guided Video Completion

Chen Gao , Ayush Saraf , Jia-Bin Huang , Johannes Kopf

Comments: ECCV 2020. Project: this http URL

Subjects

Computer Vision and Pattern Recognition (cs.CV)

We present a new flow-based video completion algorithm. Previous flow

completion methods are often unable to retain the sharpness of motion

boundaries. Our method first extracts and completes motion edges, and then uses

them to guide piecewise-smooth flow completion with sharp edges. Existing

methods propagate colors among local flow connections between adjacent frames.

However, not all missing regions in a video can be reached in this way because

the motion boundaries form impenetrable barriers. Our method alleviates this

problem by introducing non-local flow connections to temporally distant frames,

enabling propagating video content over motion boundaries. We validate our

approach on the DAVIS dataset. Both visual and quantitative results show that

our method compares favorably against the state-of-the-art algorithms.

Computational Analysis of Deformable Manifolds: from Geometric Modelling to Deep Learning

Stefan C Schonsheck

Comments: PhD Thesis, Versions of several chapters have previously appeard or been submitted under different titles

Subjects

Computer Vision and Pattern Recognition (cs.CV)

; Machine Learning (cs.LG); Numerical Analysis (math.NA)

Leo Tolstoy opened his monumental novel Anna Karenina with the now famous

words: Happy families are all alike; every unhappy family is unhappy in its own

way A similar notion also applies to mathematical spaces: Every flat space is

alike; every unflat space is unflat in its own way. However, rather than being

a source of unhappiness, we will show that the diversity of non-flat spaces

provides a rich area of study. The genesis of the so-called big data era and

the proliferation of social and scientific databases of increasing size has led

to a need for algorithms that can efficiently process, analyze and, even

generate high dimensional data. However, the curse of dimensionality leads to

the fact that many classical approaches do not scale well with respect to the

size of these problems. One technique to avoid some of these ill-effects is to

exploit the geometric structure of coherent data. In this thesis, we will

explore geometric methods for shape processing and data analysis. More

specifically, we will study techniques for representing manifolds and signals

supported on them through a variety of mathematical tools including, but not

limited to, computational differential geometry, variational PDE modeling, and

deep learning. First, we will explore non-isometric shape matching through

variational modeling. Next, we will use ideas from parallel transport on

manifolds to generalize convolution and convolutional neural networks to

deformable manifolds. Finally, we conclude by proposing a novel auto-regressive

model for capturing the intrinsic geometry and topology of data. Throughout

this work, we will use the idea of computing correspondences as a though-line

to both motivate our work and analyze our results.

Synthetic-to-Real Unsupervised Domain Adaptation for Scene Text Detection in the Wild

Weijia Wu , Ning Lu , Enze Xie Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)

Deep learning-based scene text detection can achieve preferable performance,

powered with sufficient labeled training data. However, manual labeling is time

consuming and laborious. At the extreme, the corresponding annotated data are

unavailable. Exploiting synthetic data is a very promising solution except for

domain distribution mismatches between synthetic datasets and real datasets. To

address the severe domain distribution mismatch, we propose a synthetic-to-real

domain adaptation method for scene text detection, which transfers knowledge

from synthetic data (source domain) to real data (target domain). In this

paper, a text self-training (TST) method and adversarial text instance

alignment (ATA) for domain adaptive scene text detection are introduced. ATA

helps the network learn domain-invariant features by training a domain

classifier in an adversarial manner. TST diminishes the adverse effects of

false positives~(FPs) and false negatives~(FNs) from inaccurate pseudo-labels.

Two components have positive effects on improving the performance of scene text

detectors when adapting from synthetic-to-real scenes. We evaluate the proposed

method by transferring from SynthText, VISD to ICDAR2015, ICDAR2013. The

results demonstrate the effectiveness of the proposed method with up to 10%

improvement, which has important exploration significance for domain adaptive

scene text detection. Code is available at

this https URL

MIPGAN — Generating Robust and High QualityMorph Attacks Using Identity Prior Driven GAN

Haoyu Zhang , Sushma Venkatesh , Raghavendra Ramachandra , Kiran Raja , Naser Damer , Christoph Busch

Comments: Submitted to IEEE T-BIOM 2020

Subjects

Computer Vision and Pattern Recognition (cs.CV)

; Cryptography and Security (cs.CR)

Face morphing attacks target to circumvent Face Recognition Systems (FRS) by

employing face images derived from multiple data subjects (e.g., accomplices

and malicious actors). Morphed images can verify against contributing data

subjects with a reasonable success rate, given they have a high degree of

identity resemblance. The success of the morphing attacks is directly dependent

on the quality of the generated morph images. We present a new approach for

generating robust attacks extending our earlier framework for generating face

morphs. We present a new approach using an Identity Prior Driven Generative

Adversarial Network, which we refer to as extit{MIPGAN (Morphing through

Identity Prior driven GAN)}. The proposed MIPGAN is derived from the StyleGAN

with a newly formulated loss function exploiting perceptual quality and

identity factor to generate a high quality morphed face image with minimal

artifacts and with higher resolution. We demonstrate the proposed approach’s

applicability to generate robust morph attacks by evaluating it against a

commercial Face Recognition System (FRS) and demonstrate the success rate of

attacks. Extensive experiments are carried out to assess the FRS’s

vulnerability against the proposed morphed face generation technique on three

types of data such as digital images, re-digitized (printed and scanned)

images, and compressed images after re-digitization from newly generated

extit{MIPGAN Face Morph Dataset}. The obtained results demonstrate that the

proposed approach of morph generation profoundly threatens the FRS.

Multi-Loss Weighting with Coefficient of Variations

Rick Groenendijk , Sezer Karaoglu , Theo Gevers , Thomas Mensink Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)

Many interesting tasks in machine learning and computer vision are learned by

optimising an objective function defined as a weighted linear combination of

multiple losses. The final performance is sensitive to choosing the correct

(relative) weights for these losses. Finding a good set of weights is often

done by adopting them into the set of hyper-parameters, which are set using an

extensive grid search. This is computationally expensive. In this paper, the

weights are defined based on properties observed while training the model,

including the specific batch loss, the average loss, and the variance for each

of the losses. An additional advantage is that the defined weights evolve

during training, instead of using static loss weights. In literature, loss

weighting is mostly used in a multi-task learning setting, where the different

tasks obtain different weights. However, there is a plethora of single-task

multi-loss problems that can benefit from automatic loss weighting. In this

paper, it is shown that these multi-task approaches do not work on single

tasks. Instead, a method is proposed that automatically and dynamically tunes

loss weights throughout training specifically for single-task multi-loss

problems. The method incorporates a measure of uncertainty to balance the

losses. The validity of the approach is shown empirically for different tasks

on multiple datasets.

Future Frame Prediction of a Video Sequence

Jasmeen Kaur , Sukhendu Das

Comments: Acknowledgement: the contributions, support, and help of Sonam Gupta, PhD Scholar, VPLAB, Deptt. of CS&E, IIT Madras

Subjects

Computer Vision and Pattern Recognition (cs.CV)

Predicting future frames of a video sequence has been a problem of high

interest in the field of Computer Vision as it caters to a multitude of

applications. The ability to predict, anticipate and reason about future events

is the essence of intelligence and one of the main goals of decision-making

systems such as human-machine interaction, robot navigation and autonomous

driving. However, the challenge lies in the ambiguous nature of the problem as

there may be multiple future sequences possible for the same input video shot.

A naively designed model averages multiple possible futures into a single

blurry prediction.

Recently, two distinct approaches have attempted to address this problem as:

(a) use of latent variable models that represent underlying stochasticity and

(b) adversarially trained models that aim to produce sharper images. A latent

variable model often struggles to produce realistic results, while an

adversarially trained model underutilizes latent variables and thus fails to

produce diverse predictions. These methods have revealed complementary

strengths and weaknesses. Combining the two approaches produces predictions

that appear more realistic and better cover the range of plausible futures.

This forms the basis and objective of study in this project work.

In this paper, we proposed a novel multi-scale architecture combining both

approaches. We validate our proposed model through a series of experiments and

empirical evaluations on Moving MNIST, UCF101, and Penn Action datasets. Our

method outperforms the results obtained using the baseline methods.

Multi-domain semantic segmentation with pyramidal fusion

Marin Oršić , Petra Bevandić , Ivan Grubišić , Josip Šarić , Siniša Šegvić

Comments: 2 pages, 3 tables

Subjects

Computer Vision and Pattern Recognition (cs.CV)

We present our submission to the semantic segmentation contest of the Robust

Vision Challenge held at ECCV 2020. The contest requires submitting the same

model to seven benchmarks from three different domains. Our approach is based

on the SwiftNet architecture with pyramidal fusion. We address inconsistent

taxonomies with a single-level 193-dimensional softmax output. We strive to

train with large batches in order to stabilize optimization of a hard

recognition problem, and to favour smooth evolution of batchnorm statistics. We

achieve this by implementing a custom backward step through log-sum-prob loss,

and by using small crops before freezing the population statistics. Our model

ranks first on the RVC semantic segmentation challenge as well as on the

WildDash 2 leaderboard. This suggests that pyramidal fusion is competitive not

only for efficient inference with lightweight backbones, but also in

large-scale setups for multi-domain application.

Modification method for single-stage object detectors that allows to exploit the temporal behaviour of a scene to improve detection accuracy

Menua Gevorgyan

Comments: 5 pages, 5 figures

Subjects

Computer Vision and Pattern Recognition (cs.CV)

A simple modification method for single-stage generic object detection neural

networks, such as YOLO and SSD, is proposed, which allows for improving the

detection accuracy on video data by exploiting the temporal behavior of the

scene in the detection pipeline. It is shown that, using this method, the

detection accuracy of the base network can be considerably improved, especially

for occluded and hidden objects. It is shown that a modified network is more

prone to detect hidden objects with more confidence than an unmodified one. A

weakly supervised training method is proposed, which allows for training a

modified network without requiring any additional annotated data.

Few-shot Object Detection with Feature Attention Highlight Module in Remote Sensing Images

Zixuan Xiao , Ping Zhong , Yuan Quan , Xuping Yin , Wei Xue Subjects : Computer Vision and Pattern Recognition (cs.CV)

In recent years, there are many applications of object detection in remote

sensing field, which demands a great number of labeled data. However, in many

cases, data is extremely rare. In this paper, we proposed a few-shot object

detector which is designed for detecting novel objects based on only a few

examples. Through fully leveraging labeled base classes, our model that is

composed of a feature-extractor, a feature attention highlight module as well

as a two-stage detection backend can quickly adapt to novel classes. The

pre-trained feature extractor whose parameters are shared produces general

features. While the feature attention highlight module is designed to be

light-weighted and simple in order to fit the few-shot cases. Although it is

simple, the information provided by it in a serial way is helpful to make the

general features to be specific for few-shot objects. Then the object-specific

features are delivered to the two-stage detection backend for the detection

results. The experiments demonstrate the effectiveness of the proposed method

for few-shot cases.

SCG-Net: Self-Constructing Graph Neural Networks for Semantic Segmentation

Qinghui Liu , Michael Kampffmeyer , Robert Jenssen , Arnt-Børre Salberg

Comments: 11 pages, 5 figs. Draf version to TGRS, code will be open soon

Subjects

Computer Vision and Pattern Recognition (cs.CV)

Capturing global contextual representations by exploiting long-range

pixel-pixel dependencies has shown to improve semantic segmentation

performance. However, how to do this efficiently is an open question as current

approaches of utilising attention schemes or very deep models to increase the

models field of view, result in complex models with large memory consumption.

Inspired by recent work on graph neural networks, we propose the

Self-Constructing Graph (SCG) module that learns a long-range dependency graph

directly from the image and uses it to propagate contextual information

efficiently to improve semantic segmentation. The module is optimised via a

novel adaptive diagonal enhancement method and a variational lower bound that

consists of a customized graph reconstruction term and a Kullback-Leibler

divergence regularization term. When incorporated into a neural network

(SCG-Net), semantic segmentation is performed in an end-to-end manner and

competitive performance (mean F1-scores of 92.0% and 89.8% respectively) on the

publicly available ISPRS Potsdam and Vaihingen datasets is achieved, with much

fewer parameters, and at a lower computational cost compared to related pure

convolutional neural network (CNN) based models.

Layer-specific Optimization for Mixed Data Flow with Mixed Precision in FPGA Design for CNN-based Object Detectors

Duy Thanh Nguyen , Hyun Kim , Hyuk-Jae Lee

Comments: Accepted for publication in IEEE Transaction on Circuit and System for Video Technology

Subjects

Computer Vision and Pattern Recognition (cs.CV)

; Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)

Convolutional neural networks (CNNs) require both intensive computation and

frequent memory access, which lead to a low processing speed and large power

dissipation. Although the characteristics of the different layers in a CNN are

frequently quite different, previous hardware designs have employed common

optimization schemes for them. This paper proposes a layer-specific design that

employs different organizations that are optimized for the different layers.

The proposed design employs two layer-specific optimizations: layer-specific

mixed data flow and layer-specific mixed precision. The mixed data flow aims to

minimize the off-chip access while demanding a minimal on-chip memory (BRAM)

resource of an FPGA device. The mixed precision quantization is to achieve both

a lossless accuracy and an aggressive model compression, thereby further

reducing the off-chip access. A Bayesian optimization approach is used to

select the best sparsity for each layer, achieving the best trade-off between

the accuracy and compression. This mixing scheme allows the entire network

model to be stored in BRAMs of the FPGA to aggressively reduce the off-chip

access, and thereby achieves a significant performance enhancement. The model

size is reduced by 22.66-28.93 times compared to that in a full-precision

network with a negligible degradation of accuracy on VOC, COCO, and ImageNet

datasets. Furthermore, the combination of mixed dataflow and mixed precision

significantly outperforms the previous works in terms of both throughput,

off-chip access, and on-chip memory requirement.

DESC: Domain Adaptation for Depth Estimation via Semantic Consistency

Adrian Lopez-Rodriguez , Krystian Mikolajczyk

Comments: BMVC20 (Oral). Code: this https URL

Subjects

Computer Vision and Pattern Recognition (cs.CV)

Accurate real depth annotations are difficult to acquire, needing the use of

special devices such as a LiDAR sensor. Self-supervised methods try to overcome

this problem by processing video or stereo sequences, which may not always be

available. Instead, in this paper, we propose a domain adaptation approach to

train a monocular depth estimation model using a fully-annotated source dataset

and a non-annotated target dataset. We bridge the domain gap by leveraging

semantic predictions and low-level edge features to provide guidance for the

target domain. We enforce consistency between the main model and a second model

trained with semantic segmentation and edge maps, and introduce priors in the

form of instance heights. Our approach is evaluated on standard domain

adaptation benchmarks for monocular depth estimation and show consistent

improvement upon the state-of-the-art.