13

arXiv Paper Daily: Fri, 4 Sep 2020

 3 years ago
source link: https://www.52ml.net/22514.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Neural and Evolutionary Computing

Tree Neural Networks in HOL4

Thibault Gauthier Subjects : Neural and Evolutionary Computing (cs.NE)

We present an implementation of tree neural networks within the proof

assistant HOL4. Their architecture makes them naturally suited for

approximating functions whose domain is a set of formulas. We measure the

performance of our implementation and compare it with other machine learning

predictors on the tasks of evaluating arithmetical expressions and estimating

the truth of propositional formulas.

Sparse Meta Networks for Sequential Adaptation and its Application to Adaptive Language Modelling

Tsendsuren Munkhdalai

Comments: 9 pages, 4 figures, 2 tables

Subjects

:

Neural and Evolutionary Computing (cs.NE)

; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)

Training a deep neural network requires a large amount of single-task data

and involves a long time-consuming optimization phase. This is not scalable to

complex, realistic environments with new unexpected changes. Humans can perform

fast incremental learning on the fly and memory systems in the brain play a

critical role. We introduce Sparse Meta Networks — a meta-learning approach to

learn online sequential adaptation algorithms for deep neural networks, by

using deep neural networks. We augment a deep neural network with a

layer-specific fast-weight memory. The fast-weights are generated sparsely at

each time step and accumulated incrementally through time providing a useful

inductive bias for online continual adaptation. We demonstrate strong

performance on a variety of sequential adaptation scenarios, from a simple

online reinforcement learning to a large scale adaptive language modelling.

Multidisciplinary Design Optimization of Reusable Launch Vehicles for Different Propellants and Objectives

Kai Dresia , Simon Jentzsch , Günther Waxenegger-Wilfing , Robson Hahn , Jan Deeken , Michael Oschwald , Fabio Mota Subjects : Neural and Evolutionary Computing (cs.NE) ; Systems and Control (eess.SY)

Identifying the optimal design of a new launch vehicle is most important

since design decisions made in the early development phase limit the vehicles’

later performance and determines the associated costs. Reusing the first stage

via retro-propulsive landing increases the complexity even more. Therefore, we

develop an optimization framework for partially reusable launch vehicles, which

enables multidisciplinary design studies. The framework contains suitable mass

estimates of all essential subsystems and a routine to calculate the needed

propellant for the ascent and landing maneuvers. For design optimization, the

framework can be coupled with a genetic algorithm. The overall goal is to

reveal the implications of different propellant combinations and objective

functions on the launcher’s optimal design for various mission scenarios. The

results show that the optimization objective influences the most suitable

propellant choice and the overall launcher design, concerning staging, weight,

size, and rocket engine parameters. In terms of gross lift-off weight, liquid

hydrogen seems to be favorable. When optimizing for a minimum structural mass

or an expandable structural mass, hydrocarbon-based solutions show better

results. Finally, launch vehicles using a hydrocarbon fuel in the first stage

and liquid hydrogen in the upper stage are an appealing alternative, combining

both fuels’ benefits.

End-to-End Learning of Neuromorphic Wireless Systems for Low-Power Edge Artificial Intelligence

Nicolas Skatchkovsky , Hyeryung Jang , Osvaldo Simeone

Comments: To be presented at Asilomar 2020

Subjects

:

Neural and Evolutionary Computing (cs.NE)

; Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)

This paper introduces a novel “all-spike” low-power solution for remote

wireless inference that is based on neuromorphic sensing, Impulse Radio (IR),

and Spiking Neural Networks (SNNs). In the proposed system, event-driven

neuromorphic sensors produce asynchronous time-encoded data streams that are

encoded by an SNN, whose output spiking signals are pulse modulated via IR and

transmitted over general frequence-selective channels; while the receiver’s

inputs are obtained via hard detection of the received signals and fed to an

SNN for classification. We introduce an end-to-end training procedure that

treats the cascade of encoder, channel, and decoder as a probabilistic

SNN-based autoencoder that implements Joint Source-Channel Coding (JSCC). The

proposed system, termed NeuroJSCC, is compared to conventional synchronous

frame-based and uncoded transmissions in terms of latency and accuracy. The

experiments confirm that the proposed end-to-end neuromorphic edge architecture

provides a promising framework for efficient and low-latency remote sensing,

communication, and inference.

Auto-Classifier: A Robust Defect Detector Based on an AutoML Head

Vasco Lopes , Luís A. Alexandre

Comments: 12 pages, 2 figures. Published in ICONIP2020, proceedings published in the Springer’s series of Lecture Notes in Computer Science

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

The dominant approach for surface defect detection is the use of hand-crafted

feature-based methods. However, this falls short when conditions vary that

affect extracted images. So, in this paper, we sought to determine how well

several state-of-the-art Convolutional Neural Networks perform in the task of

surface defect detection. Moreover, we propose two methods: CNN-Fusion, that

fuses the prediction of all the networks into a final one, and Auto-Classifier,

which is a novel proposal that improves a Convolutional Neural Network by

modifying its classification component using AutoML. We carried out experiments

to evaluate the proposed methods in the task of surface defect detection using

different datasets from DAGM2007. We show that the use of Convolutional Neural

Networks achieves better results than traditional methods, and also, that

Auto-Classifier out-performs all other methods, by achieving 100% accuracy and

100% AUC results throughout all the datasets.

Physarum Multi-Commodity Flow Dynamics

Vincenzo Bonifaci , Enrico Facca , Frederic Folz , Andreas Karrenbauer , Pavel Kolev , Kurt Mehlhorn , Giovanna Morigi , Golnoosh Shahkarami , Quentin Vermande Subjects : Data Structures and Algorithms (cs.DS) ; Neural and Evolutionary Computing (cs.NE)

In wet-lab experiments cite{Nakagaki-Yamada-Toth,Tero-Takagi-etal}, the

slime mold Physarum polycephalum has demonstrated its ability to solve shortest

path problems and to design efficient networks, see Figure

ef{Wet-Lab

Experiments} for illustrations. Physarum polycephalum is a slime mold in the

Mycetozoa group. For the shortest path problem, a mathematical model for the

evolution of the slime was proposed in cite{Tero-Kobayashi-Nakagaki} and its

biological relevance was argued. The model was shown to solve shortest path

problems, first in computer simulations and then by mathematical proof. It was

later shown that the slime mold dynamics can solve more general linear programs

and that many variants of the dynamics have similar convergence behavior. In

this paper, we introduce a dynamics for the network design problem. We

formulate network design as the problem of constructing a network that

efficiently supports a multi-commodity flow problem. We investigate the

dynamics in computer simulations and analytically. The simulations show that

the dynamics is able to construct efficient and elegant networks. In the

theoretical part we show that the dynamics minimizes an objective combining the

cost of the network and the cost of routing the demands through the network. We

also give alternative characterization of the optimum solution.

Computer Vision and Pattern Recognition

Flow-edge Guided Video Completion

Chen Gao , Ayush Saraf , Jia-Bin Huang , Johannes Kopf

Comments: ECCV 2020. Project: this http URL

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

We present a new flow-based video completion algorithm. Previous flow

completion methods are often unable to retain the sharpness of motion

boundaries. Our method first extracts and completes motion edges, and then uses

them to guide piecewise-smooth flow completion with sharp edges. Existing

methods propagate colors among local flow connections between adjacent frames.

However, not all missing regions in a video can be reached in this way because

the motion boundaries form impenetrable barriers. Our method alleviates this

problem by introducing non-local flow connections to temporally distant frames,

enabling propagating video content over motion boundaries. We validate our

approach on the DAVIS dataset. Both visual and quantitative results show that

our method compares favorably against the state-of-the-art algorithms.

Computational Analysis of Deformable Manifolds: from Geometric Modelling to Deep Learning

Stefan C Schonsheck

Comments: PhD Thesis, Versions of several chapters have previously appeard or been submitted under different titles

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Machine Learning (cs.LG); Numerical Analysis (math.NA)

Leo Tolstoy opened his monumental novel Anna Karenina with the now famous

words: Happy families are all alike; every unhappy family is unhappy in its own

way A similar notion also applies to mathematical spaces: Every flat space is

alike; every unflat space is unflat in its own way. However, rather than being

a source of unhappiness, we will show that the diversity of non-flat spaces

provides a rich area of study. The genesis of the so-called big data era and

the proliferation of social and scientific databases of increasing size has led

to a need for algorithms that can efficiently process, analyze and, even

generate high dimensional data. However, the curse of dimensionality leads to

the fact that many classical approaches do not scale well with respect to the

size of these problems. One technique to avoid some of these ill-effects is to

exploit the geometric structure of coherent data. In this thesis, we will

explore geometric methods for shape processing and data analysis. More

specifically, we will study techniques for representing manifolds and signals

supported on them through a variety of mathematical tools including, but not

limited to, computational differential geometry, variational PDE modeling, and

deep learning. First, we will explore non-isometric shape matching through

variational modeling. Next, we will use ideas from parallel transport on

manifolds to generalize convolution and convolutional neural networks to

deformable manifolds. Finally, we conclude by proposing a novel auto-regressive

model for capturing the intrinsic geometry and topology of data. Throughout

this work, we will use the idea of computing correspondences as a though-line

to both motivate our work and analyze our results.

Synthetic-to-Real Unsupervised Domain Adaptation for Scene Text Detection in the Wild

Weijia Wu , Ning Lu , Enze Xie Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)

Deep learning-based scene text detection can achieve preferable performance,

powered with sufficient labeled training data. However, manual labeling is time

consuming and laborious. At the extreme, the corresponding annotated data are

unavailable. Exploiting synthetic data is a very promising solution except for

domain distribution mismatches between synthetic datasets and real datasets. To

address the severe domain distribution mismatch, we propose a synthetic-to-real

domain adaptation method for scene text detection, which transfers knowledge

from synthetic data (source domain) to real data (target domain). In this

paper, a text self-training (TST) method and adversarial text instance

alignment (ATA) for domain adaptive scene text detection are introduced. ATA

helps the network learn domain-invariant features by training a domain

classifier in an adversarial manner. TST diminishes the adverse effects of

false positives~(FPs) and false negatives~(FNs) from inaccurate pseudo-labels.

Two components have positive effects on improving the performance of scene text

detectors when adapting from synthetic-to-real scenes. We evaluate the proposed

method by transferring from SynthText, VISD to ICDAR2015, ICDAR2013. The

results demonstrate the effectiveness of the proposed method with up to 10%

improvement, which has important exploration significance for domain adaptive

scene text detection. Code is available at

this https URL

MIPGAN — Generating Robust and High QualityMorph Attacks Using Identity Prior Driven GAN

Haoyu Zhang , Sushma Venkatesh , Raghavendra Ramachandra , Kiran Raja , Naser Damer , Christoph Busch

Comments: Submitted to IEEE T-BIOM 2020

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Cryptography and Security (cs.CR)

Face morphing attacks target to circumvent Face Recognition Systems (FRS) by

employing face images derived from multiple data subjects (e.g., accomplices

and malicious actors). Morphed images can verify against contributing data

subjects with a reasonable success rate, given they have a high degree of

identity resemblance. The success of the morphing attacks is directly dependent

on the quality of the generated morph images. We present a new approach for

generating robust attacks extending our earlier framework for generating face

morphs. We present a new approach using an Identity Prior Driven Generative

Adversarial Network, which we refer to as extit{MIPGAN (Morphing through

Identity Prior driven GAN)}. The proposed MIPGAN is derived from the StyleGAN

with a newly formulated loss function exploiting perceptual quality and

identity factor to generate a high quality morphed face image with minimal

artifacts and with higher resolution. We demonstrate the proposed approach’s

applicability to generate robust morph attacks by evaluating it against a

commercial Face Recognition System (FRS) and demonstrate the success rate of

attacks. Extensive experiments are carried out to assess the FRS’s

vulnerability against the proposed morphed face generation technique on three

types of data such as digital images, re-digitized (printed and scanned)

images, and compressed images after re-digitization from newly generated

extit{MIPGAN Face Morph Dataset}. The obtained results demonstrate that the

proposed approach of morph generation profoundly threatens the FRS.

Multi-Loss Weighting with Coefficient of Variations

Rick Groenendijk , Sezer Karaoglu , Theo Gevers , Thomas Mensink Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)

Many interesting tasks in machine learning and computer vision are learned by

optimising an objective function defined as a weighted linear combination of

multiple losses. The final performance is sensitive to choosing the correct

(relative) weights for these losses. Finding a good set of weights is often

done by adopting them into the set of hyper-parameters, which are set using an

extensive grid search. This is computationally expensive. In this paper, the

weights are defined based on properties observed while training the model,

including the specific batch loss, the average loss, and the variance for each

of the losses. An additional advantage is that the defined weights evolve

during training, instead of using static loss weights. In literature, loss

weighting is mostly used in a multi-task learning setting, where the different

tasks obtain different weights. However, there is a plethora of single-task

multi-loss problems that can benefit from automatic loss weighting. In this

paper, it is shown that these multi-task approaches do not work on single

tasks. Instead, a method is proposed that automatically and dynamically tunes

loss weights throughout training specifically for single-task multi-loss

problems. The method incorporates a measure of uncertainty to balance the

losses. The validity of the approach is shown empirically for different tasks

on multiple datasets.

Future Frame Prediction of a Video Sequence

Jasmeen Kaur , Sukhendu Das

Comments: Acknowledgement: the contributions, support, and help of Sonam Gupta, PhD Scholar, VPLAB, Deptt. of CS&E, IIT Madras

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Predicting future frames of a video sequence has been a problem of high

interest in the field of Computer Vision as it caters to a multitude of

applications. The ability to predict, anticipate and reason about future events

is the essence of intelligence and one of the main goals of decision-making

systems such as human-machine interaction, robot navigation and autonomous

driving. However, the challenge lies in the ambiguous nature of the problem as

there may be multiple future sequences possible for the same input video shot.

A naively designed model averages multiple possible futures into a single

blurry prediction.

Recently, two distinct approaches have attempted to address this problem as:

(a) use of latent variable models that represent underlying stochasticity and

(b) adversarially trained models that aim to produce sharper images. A latent

variable model often struggles to produce realistic results, while an

adversarially trained model underutilizes latent variables and thus fails to

produce diverse predictions. These methods have revealed complementary

strengths and weaknesses. Combining the two approaches produces predictions

that appear more realistic and better cover the range of plausible futures.

This forms the basis and objective of study in this project work.

In this paper, we proposed a novel multi-scale architecture combining both

approaches. We validate our proposed model through a series of experiments and

empirical evaluations on Moving MNIST, UCF101, and Penn Action datasets. Our

method outperforms the results obtained using the baseline methods.

Multi-domain semantic segmentation with pyramidal fusion

Marin Oršić , Petra Bevandić , Ivan Grubišić , Josip Šarić , Siniša Šegvić

Comments: 2 pages, 3 tables

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

We present our submission to the semantic segmentation contest of the Robust

Vision Challenge held at ECCV 2020. The contest requires submitting the same

model to seven benchmarks from three different domains. Our approach is based

on the SwiftNet architecture with pyramidal fusion. We address inconsistent

taxonomies with a single-level 193-dimensional softmax output. We strive to

train with large batches in order to stabilize optimization of a hard

recognition problem, and to favour smooth evolution of batchnorm statistics. We

achieve this by implementing a custom backward step through log-sum-prob loss,

and by using small crops before freezing the population statistics. Our model

ranks first on the RVC semantic segmentation challenge as well as on the

WildDash 2 leaderboard. This suggests that pyramidal fusion is competitive not

only for efficient inference with lightweight backbones, but also in

large-scale setups for multi-domain application.

Modification method for single-stage object detectors that allows to exploit the temporal behaviour of a scene to improve detection accuracy

Menua Gevorgyan

Comments: 5 pages, 5 figures

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

A simple modification method for single-stage generic object detection neural

networks, such as YOLO and SSD, is proposed, which allows for improving the

detection accuracy on video data by exploiting the temporal behavior of the

scene in the detection pipeline. It is shown that, using this method, the

detection accuracy of the base network can be considerably improved, especially

for occluded and hidden objects. It is shown that a modified network is more

prone to detect hidden objects with more confidence than an unmodified one. A

weakly supervised training method is proposed, which allows for training a

modified network without requiring any additional annotated data.

Few-shot Object Detection with Feature Attention Highlight Module in Remote Sensing Images

Zixuan Xiao , Ping Zhong , Yuan Quan , Xuping Yin , Wei Xue Subjects : Computer Vision and Pattern Recognition (cs.CV)

In recent years, there are many applications of object detection in remote

sensing field, which demands a great number of labeled data. However, in many

cases, data is extremely rare. In this paper, we proposed a few-shot object

detector which is designed for detecting novel objects based on only a few

examples. Through fully leveraging labeled base classes, our model that is

composed of a feature-extractor, a feature attention highlight module as well

as a two-stage detection backend can quickly adapt to novel classes. The

pre-trained feature extractor whose parameters are shared produces general

features. While the feature attention highlight module is designed to be

light-weighted and simple in order to fit the few-shot cases. Although it is

simple, the information provided by it in a serial way is helpful to make the

general features to be specific for few-shot objects. Then the object-specific

features are delivered to the two-stage detection backend for the detection

results. The experiments demonstrate the effectiveness of the proposed method

for few-shot cases.

SCG-Net: Self-Constructing Graph Neural Networks for Semantic Segmentation

Qinghui Liu , Michael Kampffmeyer , Robert Jenssen , Arnt-Børre Salberg

Comments: 11 pages, 5 figs. Draf version to TGRS, code will be open soon

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Capturing global contextual representations by exploiting long-range

pixel-pixel dependencies has shown to improve semantic segmentation

performance. However, how to do this efficiently is an open question as current

approaches of utilising attention schemes or very deep models to increase the

models field of view, result in complex models with large memory consumption.

Inspired by recent work on graph neural networks, we propose the

Self-Constructing Graph (SCG) module that learns a long-range dependency graph

directly from the image and uses it to propagate contextual information

efficiently to improve semantic segmentation. The module is optimised via a

novel adaptive diagonal enhancement method and a variational lower bound that

consists of a customized graph reconstruction term and a Kullback-Leibler

divergence regularization term. When incorporated into a neural network

(SCG-Net), semantic segmentation is performed in an end-to-end manner and

competitive performance (mean F1-scores of 92.0% and 89.8% respectively) on the

publicly available ISPRS Potsdam and Vaihingen datasets is achieved, with much

fewer parameters, and at a lower computational cost compared to related pure

convolutional neural network (CNN) based models.

Layer-specific Optimization for Mixed Data Flow with Mixed Precision in FPGA Design for CNN-based Object Detectors

Duy Thanh Nguyen , Hyun Kim , Hyuk-Jae Lee

Comments: Accepted for publication in IEEE Transaction on Circuit and System for Video Technology

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)

Convolutional neural networks (CNNs) require both intensive computation and

frequent memory access, which lead to a low processing speed and large power

dissipation. Although the characteristics of the different layers in a CNN are

frequently quite different, previous hardware designs have employed common

optimization schemes for them. This paper proposes a layer-specific design that

employs different organizations that are optimized for the different layers.

The proposed design employs two layer-specific optimizations: layer-specific

mixed data flow and layer-specific mixed precision. The mixed data flow aims to

minimize the off-chip access while demanding a minimal on-chip memory (BRAM)

resource of an FPGA device. The mixed precision quantization is to achieve both

a lossless accuracy and an aggressive model compression, thereby further

reducing the off-chip access. A Bayesian optimization approach is used to

select the best sparsity for each layer, achieving the best trade-off between

the accuracy and compression. This mixing scheme allows the entire network

model to be stored in BRAMs of the FPGA to aggressively reduce the off-chip

access, and thereby achieves a significant performance enhancement. The model

size is reduced by 22.66-28.93 times compared to that in a full-precision

network with a negligible degradation of accuracy on VOC, COCO, and ImageNet

datasets. Furthermore, the combination of mixed dataflow and mixed precision

significantly outperforms the previous works in terms of both throughput,

off-chip access, and on-chip memory requirement.

DESC: Domain Adaptation for Depth Estimation via Semantic Consistency

Adrian Lopez-Rodriguez , Krystian Mikolajczyk

Comments: BMVC20 (Oral). Code: this https URL

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Accurate real depth annotations are difficult to acquire, needing the use of

special devices such as a LiDAR sensor. Self-supervised methods try to overcome

this problem by processing video or stereo sequences, which may not always be

available. Instead, in this paper, we propose a domain adaptation approach to

train a monocular depth estimation model using a fully-annotated source dataset

and a non-annotated target dataset. We bridge the domain gap by leveraging

semantic predictions and low-level edge features to provide guidance for the

target domain. We enforce consistency between the main model and a second model

trained with semantic segmentation and edge maps, and introduce priors in the

form of instance heights. Our approach is evaluated on standard domain

adaptation benchmarks for monocular depth estimation and show consistent

improvement upon the state-of-the-art.

Auto-Classifier: A Robust Defect Detector Based on an AutoML Head

Vasco Lopes , Luís A. Alexandre

Comments: 12 pages, 2 figures. Published in ICONIP2020, proceedings published in the Springer’s series of Lecture Notes in Computer Science

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

The dominant approach for surface defect detection is the use of hand-crafted

feature-based methods. However, this falls short when conditions vary that

affect extracted images. So, in this paper, we sought to determine how well

several state-of-the-art Convolutional Neural Networks perform in the task of

surface defect detection. Moreover, we propose two methods: CNN-Fusion, that

fuses the prediction of all the networks into a final one, and Auto-Classifier,

which is a novel proposal that improves a Convolutional Neural Network by

modifying its classification component using AutoML. We carried out experiments

to evaluate the proposed methods in the task of surface defect detection using

different datasets from DAGM2007. We show that the use of Convolutional Neural

Networks achieves better results than traditional methods, and also, that

Auto-Classifier out-performs all other methods, by achieving 100% accuracy and

100% AUC results throughout all the datasets.

1st Place Solution of LVIS Challenge 2020: A Good Box is not a Guarantee of a Good Mask

Jingru Tan , Gang Zhang , Hanming Deng , Changbao Wang , Lewei Lu , Quanquan Li , Jifeng Dai

Comments: Winner of LVIS challenge 2020

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

This article introduces the solutions of the team lvisTraveler for LVIS

Challenge 2020. In this work, two characteristics of LVIS dataset are mainly

considered: the long-tailed distribution and high quality instance segmentation

mask. We adopt a two-stage training pipeline. In the first stage, we

incorporate EQL and self-training to learn generalized representation. In the

second stage, we utilize Balanced GroupSoftmax to promote the classifier, and

propose a novel proposal assignment strategy and a new balanced mask loss for

mask head to get more precise mask predictions. Finally, we achieve 41.5 and

41.2 AP on LVIS v1.0 val and test-dev splits respectively, outperforming the

baseline based on X101-FPN-MaskRCNN by a large margin.

Physics-based Shading Reconstruction for Intrinsic Image Decomposition

Anil S. Baslamisli , Yang Liu , Sezer Karaoglu , Theo Gevers

Comments: Submitted to Computer Vision and Image Understanding (CVIU)

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

We investigate the use of photometric invariance and deep learning to compute

intrinsic images (albedo and shading). We propose albedo and shading gradient

descriptors which are derived from physics-based models. Using the descriptors,

albedo transitions are masked out and an initial sparse shading map is

calculated directly from the corresponding RGB image gradients in a

learning-free unsupervised manner. Then, an optimization method is proposed to

reconstruct the full dense shading map. Finally, we integrate the generated

shading map into a novel deep learning framework to refine it and also to

predict corresponding albedo image to achieve intrinsic image decomposition. By

doing so, we are the first to directly address the texture and intensity

ambiguity problems of the shading estimations. Large scale experiments show

that our approach steered by physics-based invariant descriptors achieve

superior results on MIT Intrinsics, NIR-RGB Intrinsics, Multi-Illuminant

Intrinsic Images, Spectral Intrinsic Images, As Realistic As Possible, and

competitive results on Intrinsic Images in the Wild datasets while achieving

state-of-the-art shading estimations.

A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports

Yikuan Li , Hanyin Wang , Yuan Luo

Comments: 10 pages, 3 figures, submitted to BIBM2020

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Joint image-text embedding extracted from medical images and associated

contextual reports is the bedrock for most biomedical vision-and-language (V+L)

tasks, including medical visual question answering, clinical image-text

retrieval, clinical report auto-generation. In this study, we adopt four

pre-trained V+L models: LXMERT, VisualBERT, UNIER and PixelBERT to learn

multimodal representation from MIMIC-CXR radiographs and associated reports.

The extrinsic evaluation on OpenI dataset shows that in comparison to the

pioneering CNN-RNN model, the joint embedding learned by pre-trained V+L models

demonstrate performance improvement in the thoracic findings classification

task. We conduct an ablation study to analyze the contribution of certain model

components and validate the advantage of joint embedding over text-only

embedding. We also visualize attention maps to illustrate the attention

mechanism of V+L models.

TRACE: Transform Aggregate and Compose Visiolinguistic Representations for Image Search with Text Feedback

Surgan Jandial , Ayush Chopra , Pinkesh Badjatiya , Pranit Chawla , Mausoom Sarkar , Balaji Krishnamurthy

Comments: Surgan Jandial, Ayush Chopra and Pinkesh Badjatiya contributed equally to this work

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Artificial Intelligence (cs.AI)

The ability to efficiently search for images over an indexed database is the

cornerstone for several user experiences. Incorporating user feedback, through

multi-modal inputs provide flexible and interaction to serve fine-grained

specificity in requirements. We specifically focus on text feedback, through

descriptive natural language queries. Given a reference image and textual user

feedback, our goal is to retrieve images that satisfy constraints specified by

both of these input modalities. The task is challenging as it requires

understanding the textual semantics from the text feedback and then applying

these changes to the visual representation. To address these challenges, we

propose a novel architecture TRACE which contains a hierarchical feature

aggregation module to learn the composite visio-linguistic representations.

TRACE achieves the SOTA performance on 3 benchmark datasets: FashionIQ, Shoes,

and Birds-to-Words, with an average improvement of at least ~5.7%, ~3%, and ~5%

respectively in R@K metric. Our extensive experiments and ablation studies show

that TRACE consistently outperforms the existing techniques by significant

margins both quantitatively and qualitatively.

Modeling Global Body Configurations in American Sign Language

Nicholas Wilkins , Beck Cordes Galbraith , Ifeoma Nwogu Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG)

American Sign Language (ASL) is the fourth most commonly used language in the

United States and is the language most commonly used by Deaf people in the

United States and the English-speaking regions of Canada. Unfortunately, until

recently, ASL received little research. This is due, in part, to its delayed

recognition as a language until William C. Stokoe’s publication in 1960.

Limited data has been a long-standing obstacle to ASL research and

computational modeling. The lack of large-scale datasets has prohibited many

modern machine-learning techniques, such as Neural Machine Translation, from

being applied to ASL. In addition, the modality required to capture sign

language (i.e. video) is complex in natural settings (as one must deal with

background noise, motion blur, and the curse of dimensionality). Finally, when

compared with spoken languages, such as English, there has been limited

research conducted into the linguistics of ASL.

We realize a simplified version of Liddell and Johnson’s Movement-Hold (MH)

Model using a Probabilistic Graphical Model (PGM). We trained our model on

ASLing, a dataset collected from three fluent ASL signers. We evaluate our PGM

against other models to determine its ability to model ASL. Finally, we

interpret various aspects of the PGM and draw conclusions about ASL phonetics.

The main contributions of this paper are

Adherent Mist and Raindrop Removal from a Single Image Using Attentive Convolutional Network

Da He , Xiaoyu Shang , Jiajia Luo

Comments: 21 pages (including 4 pages of supplementary materials)

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Temperature difference-induced mist adhered to the windshield, camera lens,

etc. are often inhomogeneous and obscure, which can easily obstruct the vision

and degrade the image severely. Together with adherent raindrops, they bring

considerable challenges to various vision systems but without enough attention.

Recent methods for similar problems typically use hand-crafted priors to

generate spatial attention maps. In this work, we propose to visually remove

the adherent mist and raindrop jointly from a single image using attentive

convolutional neural networks. We apply classification activation map attention

to our model to strengthen the spatial attention without hand-crafted priors.

In addition, the smoothed dilated convolution is adopted to obtain a large

receptive field without spatial information loss, and the dual attention module

is utilized for efficiently selecting channels and spatial features. Our

experiments show our method achieves state-of-the-art performance, and

demonstrate that this underrated practical problem is critical to high-level

vision scenes.

Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding

Long Chen , Wenbo Ma , Jun Xiao , Hanwang Zhang , Wei Liu , Shih-Fu Chang Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)

The prevailing framework for solving referring expression grounding is based

on a two-stage process: 1) detecting proposals with an object detector and 2)

grounding the referent to one of the proposals. Existing two-stage solutions

mostly focus on the grounding step, which aims to align the expressions with

the proposals. In this paper, we argue that these methods overlook an obvious

mismatch between the roles of proposals in the two stages: they generate

proposals solely based on the detection confidence (i.e., expression-agnostic),

hoping that the proposals contain all right instances in the expression (i.e.,

expression-aware). Due to this mismatch, current two-stage methods suffer from

a severe performance drop between detected and ground-truth proposals. To this

end, we propose Ref-NMS, which is the first method to yield expression-aware

proposals at the first stage. Ref-NMS regards all nouns in the expression as

critical objects, and introduces a lightweight module to predict a score for

aligning each box with a critical object. These scores can guide the

NMSoperation to filter out the boxes irrelevant to the expression, increasing

the recall of critical objects, resulting in a significantly improved grounding

performance. Since Ref-NMS is agnostic to the grounding step, it can be easily

integrated into any state-of-the-art two-stage method. Extensive ablation

studies on several backbones, benchmarks, and tasks consistently demonstrate

the superiority of Ref-NMS.

Tasks Integrated Networks: Joint Detection and Retrieval for Image Search

Lei Zhang , Zhenwei He , Yi Yang , Liang Wang , Xinbo Gao

Comments: To appear in IEEE TPAMI, 18 pages

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Artificial Intelligence (cs.AI)

The traditional object retrieval task aims to learn a discriminative feature

representation with intra-similarity and inter-dissimilarity, which supposes

that the objects in an image are manually or automatically pre-cropped exactly.

However, in many real-world searching scenarios (e.g., video surveillance), the

objects (e.g., persons, vehicles, etc.) are seldom accurately detected or

annotated. Therefore, object-level retrieval becomes intractable without

bounding-box annotation, which leads to a new but challenging topic, i.e.

image-level search. In this paper, to address the image search issue, we first

introduce an end-to-end Integrated Net (I-Net), which has three merits: 1) A

Siamese architecture and an on-line pairing strategy for similar and dissimilar

objects in the given images are designed. 2) A novel on-line pairing (OLP) loss

is introduced with a dynamic feature dictionary, which alleviates the

multi-task training stagnation problem, by automatically generating a number of

negative pairs to restrict the positives. 3) A hard example priority (HEP)

based softmax loss is proposed to improve the robustness of classification task

by selecting hard categories. With the philosophy of divide and conquer, we

further propose an improved I-Net, called DC-I-Net, which makes two new

contributions: 1) two modules are tailored to handle different tasks separately

in the integrated framework, such that the task specification is guaranteed. 2)

A class-center guided HEP loss (C2HEP) by exploiting the stored class centers

is proposed, such that the intra-similarity and inter-dissimilarity can be

captured for ultimate retrieval. Extensive experiments on famous image-level

search oriented benchmark datasets demonstrate that the proposed DC-I-Net

outperforms the state-of-the-art tasks-integrated and tasks-separated image

search models.

Spatial Transformer Point Convolution

Yuan Fang , Chunyan Xu , Zhen Cui , Yuan Zong , Jian Yang Subjects : Computer Vision and Pattern Recognition (cs.CV)

Point clouds are unstructured and unordered in the embedded 3D space. In

order to produce consistent responses under different permutation layouts, most

existing methods aggregate local spatial points through maximum or summation

operation. But such an aggregation essentially belongs to the isotropic

filtering on all operated points therein, which tends to lose the information

of geometric structures. In this paper, we propose a spatial transformer point

convolution (STPC) method to achieve anisotropic convolution filtering on point

clouds. To capture and represent implicit geometric structures, we specifically

introduce spatial direction dictionary to learn those latent geometric

components. To better encode unordered neighbor points, we design sparse

deformer to transform them into the canonical ordered dictionary space by using

direction dictionary learning. In the transformed space, the standard

image-like convolution can be leveraged to generate anisotropic filtering,

which is more robust to express those finer variances of local regions.

Dictionary learning and encoding processes are encapsulated into a network

module and jointly learnt in an end-to-end manner. Extensive experiments on

several public datasets (including S3DIS, Semantic3D, SemanticKITTI)

demonstrate the effectiveness of our proposed method in point clouds semantic

segmentation task.

Noise-Aware Texture-Preserving Low-Light Enhancement

Zohreh Azizi , Xuejing Lei , C.-C Jay Kuo

Comments: Accepted by IEEE VCIP 2020. The final version will appear in IEEE VCIP 2020

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Image and Video Processing (eess.IV)

A simple and effective low-light image enhancement method based on a

noise-aware texture-preserving retinex model is proposed in this work. The new

method, called NATLE, attempts to strike a balance between noise removal and

natural texture preservation through a low-complexity solution. Its cost

function includes an estimated piece-wise smooth illumination map and a

noise-free texture-preserving reflectance map. Afterwards, illumination is

adjusted to form the enhanced image together with the reflectance map.

Extensive experiments are conducted on common low-light image enhancement

datasets to demonstrate the superior performance of NATLE.

Towards Practical Implementations of Person Re-Identification from Full Video Frames

Felix O. Sumari , Luigy Machaca , Jose Huaman , Esteban W. G. Clua , Joris Guérin

Comments: 7 pages, 9 figures, This paper is under consideration at Pattern Recognition Letters

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

With the major adoption of automation for cities security, person

re-identification (Re-ID) has been extensively studied recently. In this paper,

we argue that the current way of studying person re-identification, i.e. by

trying to re-identify a person within already detected and pre-cropped images

of people, is not sufficient to implement practical security applications,

where the inputs to the system are the full frames of the video streams. To

support this claim, we introduce the Full Frame Person Re-ID setting (FF-PRID)

and define specific metrics to evaluate FF-PRID implementations. To improve

robustness, we also formalize the hybrid human-machine collaboration framework,

which is inherent to any Re-ID security applications. To demonstrate the

importance of considering the FF-PRID setting, we build an experiment showing

that combining a good people detection network with a good Re-ID model does not

necessarily produce good results for the final application. This underlines a

failure of the current formulation in assessing the quality of a Re-ID model

and justifies the use of different metrics. We hope that this work will

motivate the research community to consider the full problem in order to

develop algorithms that are better suited to real-world scenarios.

NITES: A Non-Parametric Interpretable Texture Synthesis Method

Xuejing Lei , Ganning Zhao , C.-C. Jay Kuo Subjects : Computer Vision and Pattern Recognition (cs.CV)

A non-parametric interpretable texture synthesis method, called the NITES

method, is proposed in this work. Although automatic synthesis of visually

pleasant texture can be achieved by deep neural networks nowadays, the

associated generation models are mathematically intractable and their training

demands higher computational cost. NITES offers a new texture synthesis

solution to address these shortcomings. NITES is mathematically transparent and

efficient in training and inference. The input is a single exemplary texture

image. The NITES method crops out patches from the input and analyzes the

statistical properties of these texture patches to obtain their joint

spatial-spectral representations. Then, the probabilistic distributions of

samples in the joint spatial-spectral spaces are characterized. Finally,

numerous texture images that are visually similar to the exemplary texture

image can be generated automatically. Experimental results are provided to show

the superior quality of generated texture images and efficiency of the proposed

NITES method in terms of both training and inference time.

Robust Object Classification Approach using Spherical Harmonics

Ayman Mukhaimar , Ruwan Tennakoon , Chow Yin Lai , Reza Hoseinnezhad , Alireza Bab-Hadiashar Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG)

In this paper, we present a robust spherical harmonics approach for the

classification of point cloud-based objects. Spherical harmonics have been used

for classification over the years, with several frameworks existing in the

literature. These approaches use variety of spherical harmonics based

descriptors to classify objects. We first investigated these frameworks

robustness against data augmentation, such as outliers and noise, as it has not

been studied before. Then we propose a spherical convolution neural network

framework for robust object classification. The proposed framework uses the

voxel grid of concentric spheres to learn features over the unit ball. Our

proposed model learn features that are less sensitive to data augmentation due

to the selected sampling strategy and the designed convolution operation. We

tested our proposed model against several types of data augmentation, such as

noise and outliers. Our results show that the proposed model outperforms the

state of art networks in terms of robustness to data augmentation.

Unsupervised Point Cloud Registration via Salient Points Analysis (SPA)

Pranav Kadam , Min Zhang , Shan Liu , C.-C. Jay Kuo

Comments: 7 pages, 5 figures, final version is accepted by IEEE International Conference on Visual Communications and Image Processing (VCIP) 2020

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

An unsupervised point cloud registration method, called salient points

analysis (SPA), is proposed in this work. The proposed SPA method can register

two point clouds effectively using only a small subset of salient points. It

first applies the PointHop++ method to point clouds, finds corresponding

salient points in two point clouds based on the local surface characteristics

of points and performs registration by matching the corresponding salient

points. The SPA method offers several advantages over the recent deep learning

based solutions for registration. Deep learning methods such as PointNetLK and

DCP train end-to-end networks and rely on full supervision (namely, ground

truth transformation matrix and class label). In contrast, the SPA is

completely unsupervised. Furthermore, SPA’s training time and model size are

much less. The effectiveness of the SPA method is demonstrated by experiments

on seen and unseen classes and noisy point clouds from the ModelNet-40 dataset.

Unsupervised Feedforward Feature (UFF) Learning for Point Cloud Classification and Segmentation

Min Zhang , Pranav Kadam , Shan Liu , C. -C. Jay Kuo

Comments: 7 pages, 2 figures, the final version is accepted by VCIP 2020

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

In contrast to supervised backpropagation-based feature learning in deep

neural networks (DNNs), an unsupervised feedforward feature (UFF) learning

scheme for joint classification and segmentation of 3D point clouds is proposed

in this work. The UFF method exploits statistical correlations of points in a

point cloud set to learn shape and point features in a one-pass feedforward

manner through a cascaded encoder-decoder architecture. It learns global shape

features through the encoder and local point features through the concatenated

encoder-decoder architecture. The extracted features of an input point cloud

are fed to classifiers for shape classification and part segmentation.

Experiments are conducted to evaluate the performance of the UFF method. For

shape classification, the UFF is superior to existing unsupervised methods and

on par with state-of-the-art DNNs. For part segmentation, the UFF outperforms

semi-supervised methods and performs slightly worse than DNNs.

Efficiency in Real-time Webcam Gaze Tracking

Amogh Gudi , Xin Li , Jan van Gemert

Comments: Awarded Best Paper at European Conference on Computer Vision (ECCV) Workshop on Eye Gaze in AR, VR, and in the Wild (OpenEyes) 2020

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Efficiency and ease of use are essential for practical applications of camera

based eye/gaze-tracking. Gaze tracking involves estimating where a person is

looking on a screen based on face images from a computer-facing camera. In this

paper we investigate two complementary forms of efficiency in gaze tracking: 1.

The computational efficiency of the system which is dominated by the inference

speed of a CNN predicting gaze-vectors; 2. The usability efficiency which is

determined by the tediousness of the mandatory calibration of the gaze-vector

to a computer screen. To do so, we evaluate the computational speed/accuracy

trade-off for the CNN and the calibration effort/accuracy trade-off for screen

calibration. For the CNN, we evaluate the full face, two-eyes, and single eye

input. For screen calibration, we measure the number of calibration points

needed and evaluate three types of calibration: 1. pure geometry, 2. pure

machine learning, and 3. hybrid geometric regression. Results suggest that a

single eye input and geometric regression calibration achieve the best

trade-off.

CNN-Based Ultrasound Image Reconstruction for Ultrafast Displacement Tracking

Dimitris Perdios , Manuel Vonlanthen , Florian Martinez , Marcel Arditi , Jean-Philippe Thiran

Comments: Main text: 10 pages (3 figures). Animation and slideshow of figure 3 are provided as ancillary files. This work has been submitted to the IEEE Transactions on Medical Imaging for possible publication

Subjects

:

Image and Video Processing (eess.IV)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Thanks to its capability of acquiring full-view frames at multiple kilohertz,

ultrafast ultrasound imaging unlocked the analysis of rapidly changing physical

phenomena in the human body, with pioneering applications such as

ultrasensitive flow imaging in the cardiovascular system or shear-wave

elastography. The accuracy achievable with these motion estimation techniques

is strongly contingent upon two contradictory requirements: a high quality of

consecutive frames and a high frame rate. Indeed, the image quality can usually

be improved by increasing the number of steered ultrafast acquisitions, but at

the expense of a reduced frame rate and possible motion artifacts. To achieve

accurate motion estimation at uncompromised frame rates and immune to motion

artifacts, the proposed approach relies on single ultrafast acquisitions to

reconstruct high-quality frames and on only two consecutive frames to obtain

2-D displacement estimates. To this end, we deployed a convolutional neural

network-based image reconstruction method combined with a speckle tracking

algorithm based on cross-correlation. Numerical and in vivo experiments,

conducted in the context of plane-wave imaging, demonstrate that the proposed

approach is capable of estimating displacements in regions where the presence

of side lobe and grating lobe artifacts prevents any displacement estimation

with a state-of-the-art technique that rely on conventional delay-and-sum

beamforming. The proposed approach may therefore unlock the full potential of

ultrafast ultrasound, in applications such as ultrasensitive cardiovascular

motion and flow analysis or shear-wave elastography.

Limited View Tomographic Reconstruction Using a Deep Recurrent Framework with Residual Dense Spatial-Channel Attention Network and Sinogram Consistency

Bo Zhou , S. Kevin Zhou , James S. Duncan , Chi Liu

Comments: Submitted to the IEEE for possible publication

Subjects

:

Image and Video Processing (eess.IV)

; Computer Vision and Pattern Recognition (cs.CV)

Limited view tomographic reconstruction aims to reconstruct a tomographic

image from a limited number of sinogram or projection views arising from sparse

view or limited angle acquisitions that reduce radiation dose or shorten

scanning time. However, such a reconstruction suffers from high noise and

severe artifacts due to the incompleteness of sinogram. To derive quality

reconstruction, previous state-of-the-art methods use UNet-like neural

architectures to directly predict the full view reconstruction from limited

view data; but these methods leave the deep network architecture issue largely

intact and cannot guarantee the consistency between the sinogram of the

reconstructed image and the acquired sinogram, leading to a non-ideal

reconstruction. In this work, we propose a novel recurrent reconstruction

framework that stacks the same block multiple times. The recurrent block

consists of a custom-designed residual dense spatial-channel attention network.

Further, we develop a sinogram consistency layer interleaved in our recurrent

framework in order to ensure that the sampled sinogram is consistent with the

sinogram of the intermediate outputs of the recurrent blocks. We evaluate our

methods on two datasets. Our experimental results on AAPM Low Dose CT Grand

Challenge datasets demonstrate that our algorithm achieves a consistent and

significant improvement over the existing state-of-the-art neural methods on

both limited angle reconstruction (over 5dB better in terms of PSNR) and sparse

view reconstruction (about 4dB better in term of PSNR). In addition, our

experimental results on Deep Lesion datasets demonstrate that our method is

able to generate high-quality reconstruction for 8 major lesion types.

Software Effort Estimation using parameter tuned Models

Akanksha Baghel , Meemansa Rathod , Pradeep Singh

Comments: Nine Tables

Subjects

:

Software Engineering (cs.SE)

; Computer Vision and Pattern Recognition (cs.CV)

Software estimation is one of the most important activities in the software

project. The software effort estimation is required in the early stages of

software life cycle. Project Failure is the major problem undergoing nowadays

as seen by software project managers. The imprecision of the estimation is the

reason for this problem. Assize of software size grows, it also makes a system

complex, thus difficult to accurately predict the cost of software development

process. The greatest pitfall of the software industry was the fast-changing

nature of software development which has made it difficult to develop

parametric models that yield high accuracy for software development in all

domains. We need the development of useful models that accurately predict the

cost of developing a software product. This study presents the novel analysis

of various regression models with hyperparameter tuning to get the effective

model. Nine different regression techniques are considered for model

development

Heightmap Reconstruction of Macula on Color Fundus Images Using Conditional Generative Adversarial Networks

Peyman Tahghighi , Reza A.Zoroofi , Sareh Saffi , Alireza Ramezani Subjects : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV)

For medical diagnosis based on retinal images, a clear understanding of 3D

structure is often required but due to the 2D nature of images captured, we

cannot infer that information. However, by utilizing 3D reconstruction methods,

we can construct the 3D structure of the macula area on fundus images which can

be helpful for diagnosis and screening of macular disorders. Recent approaches

have used shading information for 3D reconstruction or heightmap prediction but

their output was not accurate since they ignored the dependency between nearby

pixels. Additionally, other methods were dependent on the availability of more

than one image of the eye which is not available in practice. In this paper, we

use conditional generative adversarial networks (cGANs) to generate images that

contain height information of the macula area on a fundus image. Results using

our dataset show a 0.6077 improvement in Structural Similarity Index (SSIM) and

0.071 improvements in Mean Squared Error (MSE) metric over Shape from Shading

(SFS) method. Additionally, Qualitative studies also indicate that our method

outperforms recent approaches.

Multimodal brain tumor classification

Marvin Lerousseau , Eric Deutsh , Nikos Paragios Subjects : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Cancer is a complex disease that provides various types of information

depending on the scale of observation. While most tumor diagnostics are

performed by observing histopathological slides, radiology images should yield

additional knowledge towards the efficacy of cancer diagnostics. This work

investigates a deep learning method combining whole slide images and magnetic

resonance images to classify tumors. Experiments are prospectively conducted on

the 2020 Computational Precision Medicine challenge, in a 3-classes unbalanced

classification task. We report cross-validation (resp. validation)

balanced-accuracy, kappa and f1 of 0.913, 0.897 and 0.951 (resp. 0.91, 0.90 and

0.94). The complete code of the method is open-source at XXXX. Those include

histopathological data pre-processing, and can therefore be used off-the-shelf

for other histopathological and/or radiological classification.

Detection-Aware Trajectory Generation for a Drone Cinematographer

Boseong Felipe Jeon , Dongseok Shim , H. Jin Kim

Comments: 8 pages, IROS 2020 accepted

Subjects

:

Robotics (cs.RO)

; Computer Vision and Pattern Recognition (cs.CV)

This work investigates an efficient trajectory generation for chasing a

dynamic target, which incorporates the detectability objective. The proposed

method actively guides the motion of a cinematographer drone so that the color

of a target is well-distinguished against the colors of the background in the

view of the drone. For the objective, we define a measure of color

detectability given a chasing path. After computing a discrete path optimized

for the metric, we generate a dynamically feasible trajectory. The whole

pipeline can be updated on-the-fly to respond to the motion of the target. For

the efficient discrete path generation, we construct a directed acyclic graph

(DAG) for which a topological sorting can be determined analytically without

the depth-first search. The smooth path is obtained in quadratic programming

(QP) framework. We validate the enhanced performance of state-of-the-art object

detection and tracking algorithms when the camera drone executes the trajectory

obtained from the proposed method.

Fundus Image Analysis for Age Related Macular Degeneration: ADAM-2020 Challenge Report

Sharath M Shankaranarayana Subjects : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV)

Age related macular degeneration (AMD) is one of the major causes for

blindness in the elderly population. In this report, we propose deep learning

based methods for retinal analysis using color fundus images for computer aided

diagnosis of AMD. We leverage the recent state of the art deep networks for

building a single fundus image based AMD classification pipeline. We also

propose methods for the other directly relevant and auxiliary tasks such as

lesions detection and segmentation, fovea detection and optic disc

segmentation. We propose the use of generative adversarial networks (GANs) for

the tasks of segmentation and detection. We also propose a novel method of

fovea detection using GANs.

TopoMap: A 0-dimensional Homology Preserving Projection of High-Dimensional Data

Harish Doraiswamy , Julien Tierny , Paulo J. S. Silva , Luis Gustavo Nonato , Claudio Silva Subjects : Graphics (cs.GR) ; Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Multidimensional Projection is a fundamental tool for high-dimensional data

analytics and visualization. With very few exceptions, projection techniques

are designed to map data from a high-dimensional space to a visual space so as

to preserve some dissimilarity (similarity) measure, such as the Euclidean

distance for example. In fact, although adopting distinct mathematical

formulations designed to favor different aspects of the data, most

multidimensional projection methods strive to preserve dissimilarity measures

that encapsulate geometric properties such as distances or the proximity

relation between data objects. However, geometric relations are not the only

interesting property to be preserved in a projection. For instance, the

analysis of particular structures such as clusters and outliers could be more

reliably performed if the mapping process gives some guarantee as to

topological invariants such as connected components and loops. This paper

introduces TopoMap, a novel projection technique which provides topological

guarantees during the mapping process. In particular, the proposed method

performs the mapping from a high-dimensional space to a visual space, while

preserving the 0-dimensional persistence diagram of the Rips filtration of the

high-dimensional data, ensuring that the filtrations generate the same

connected components when applied to the original as well as projected data.

The presented case studies show that the topological guarantee provided by

TopoMap not only brings confidence to the visual analytic process but also can

be used to assist in the assessment of other projection methods.

TAP-Net: Transport-and-Pack using Reinforcement Learning

Ruizhen Hu , Juzhan Xu , Bin Chen , Minglun Gong , Hao Zhang , Hui Huang

Journal-ref: ACM Transactions on Graphics 2020

Subjects

:

Graphics (cs.GR)

; Computer Vision and Pattern Recognition (cs.CV)

We introduce the transport-and-pack(TAP) problem, a frequently encountered

instance of real-world packing, and develop a neural optimization solution

based on reinforcement learning. Given an initial spatial configuration of

boxes, we seek an efficient method to iteratively transport and pack the boxes

compactly into a target container. Due to obstruction and accessibility

constraints, our problem has to add a new search dimension, i.e., finding an

optimal transport sequence, to the already immense search space for packing

alone. Using a learning-based approach, a trained network can learn and encode

solution patterns to guide the solution of new problem instances instead of

executing an expensive online search. In our work, we represent the transport

constraints using a precedence graph and train a neural network, coined

TAP-Net, using reinforcement learning to reward efficient and stable packing.

The network is built on an encoder-decoder architecture, where the encoder

employs convolution layers to encode the box geometry and precedence graph and

the decoder is a recurrent neural network (RNN) which inputs the current

encoder output, as well as the current box packing state of the target

container, and outputs the next box to pack, as well as its orientation. We

train our network on randomly generated initial box configurations, without

supervision, via policy gradients to learn optimal TAP policies to maximize

packing efficiency and stability. We demonstrate the performance of TAP-Net on

a variety of examples, evaluating the network through ablation studies and

comparisons to baselines and alternative network designs. We also show that our

network generalizes well to larger problem instances, when trained on

small-sized inputs.

Dexterous Robotic Grasping with Object-Centric Visual Affordances

Priyanka Mandikal , Kristen Grauman Subjects : Robotics (cs.RO) ; Computer Vision and Pattern Recognition (cs.CV)

Dexterous robotic hands are appealing for their agility and human-like

morphology, yet their high degree of freedom makes learning to manipulate

challenging. We introduce an approach for learning dexterous grasping. Our key

idea is to embed an object-centric visual affordance model within a deep

reinforcement learning loop to learn grasping policies that favor the same

object regions favored by people. Unlike traditional approaches that learn from

human demonstration trajectories (e.g., hand joint sequences captured with a

glove), the proposed prior is object-centric and image-based, allowing the

agent to anticipate useful affordance regions for objects unseen during policy

learning. We demonstrate our idea with a 30-DoF five-fingered robotic hand

simulator on 40 objects from two datasets, where it successfully and

efficiently learns policies for stable grasps. Our affordance-guided policies

are significantly more effective, generalize better to novel objects, and train

3 X faster than the baselines. Our work offers a step towards manipulation

agents that learn by watching how people use objects, without requiring state

and action information about the human body. Project website:

this http URL

Real Image Super Resolution Via Heterogeneous Model using GP-NAS

Zhihong Pan , Baopu Li , Teng Xi , Yanwen Fan , Gang Zhang , Jingtuo Liu , Junyu Han , Errui Ding

Comments: This is a manuscript related to our algorithm that won the ECCV AIM 2020 Real Image Super-Resolution Challenge

Subjects

:

Image and Video Processing (eess.IV)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

With advancement in deep neural network (DNN), recent state-of-the-art (SOTA)

image superresolution (SR) methods have achieved impressive performance using

deep residual network with dense skip connections. While these models perform

well on benchmark dataset where low-resolution (LR) images are constructed from

high-resolution (HR) references with known blur kernel, real image SR is more

challenging when both images in the LR-HR pair are collected from real cameras.

Based on existing dense residual networks, a Gaussian process based neural

architecture search (GP-NAS) scheme is utilized to find candidate network

architectures using a large search space by varying the number of dense

residual blocks, the block size and the number of features. A suite of

heterogeneous models with diverse network structure and hyperparameter are

selected for model-ensemble to achieve outstanding performance in real image

SR. The proposed method won the first place in all three tracks of the AIM 2020

Real Image Super-Resolution Challenge.

An Internal Cluster Validity Index Based on Distance-based Separability Measure

Shuyue Guan , Murray Loew

Comments: 8 pages, 4 figures. Accepted by ICTAI 2020

Subjects

:

Machine Learning (cs.LG)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

To evaluate clustering results is a significant part in cluster analysis.

Usually, there is no true class labels for clustering as a typical unsupervised

learning. Thus, a number of internal evaluations, which use predicted labels

and data, have been created. They also named internal cluster validity indices

(CVIs). Without true labels, to design an effective CVI is not simple because

it is similar to create a clustering method. And, to have more CVIs is crucial

because there is no universal CVI that can be used to measure all datasets, and

no specific method for selecting a proper CVI for clusters without true labels.

Therefore, to apply more CVIs to evaluate clustering results is necessary. In

this paper, we propose a novel CVI – called Distance-based Separability Index

(DSI), based on a data separability measure. We applied the DSI and eight other

internal CVIs including early studies from Dunn (1974) to most recent studies

CVDD (2019) as comparison. We used an external CVI as ground truth for

clustering results of five clustering algorithms on 12 real and 97 synthetic

datasets. Results show DSI is an effective, unique, and competitive CVI to

other compared CVIs. In addition, we summarized the general process to evaluate

CVIs and created a new method – rank difference – to compare the results of

CVIs.

When Image Decomposition Meets Deep Learning: A Novel Infrared and Visible Image Fusion Method

Zixiang Zhao , Shuang Xu , Rui Feng , Chunxia Zhang , Junmin Liu , Jiangshe Zhang

Comments: arXiv admin note: substantial text overlap with arXiv:2003.09210

Subjects

:

Image and Video Processing (eess.IV)

; Computer Vision and Pattern Recognition (cs.CV)

Infrared and visible image fusion, as a hot topic in image processing and

image enhancement, aims to produce fused images retaining the detail texture

information in visible images and the thermal radiation information in infrared

images. In this paper, we propose a novel two-stream auto-encoder (AE) based

fusion network. The core idea is that the encoder decomposes an image into base

and detail feature maps with low- and high-frequency information, respectively,

and that the decoder is responsible for the original image reconstruction. To

this end, a well-designed loss function is established to make the base/detail

feature maps similar/dissimilar. In the test phase, base and detail feature

maps are respectively merged via a fusion module, and the fused image is

recovered by the decoder. Qualitative and quantitative results demonstrate that

our method can generate fusion images containing highlighted targets and

abundant detail texture information with strong reproducibility and meanwhile

superior than the state-of-the-art (SOTA) approaches.

Artificial Intelligence

SEDRo: A Simulated Environment for Developmental Robotics

Aishwarya Pothula , Md Ashaduzzaman Rubel Mondol , Sanath Narasimhan , Sm Mazharul Islam , Deokgun Park Subjects : Artificial Intelligence (cs.AI)

Even with impressive advances in application-specific models, we still lack

knowledge about how to build a model that can learn in a human-like way and do

multiple tasks. To learn in a human-like way, we need to provide a diverse

experience that is comparable to humans. In this paper, we introduce our

ongoing effort to build a simulated environment for developmental robotics

(SEDRo). SEDRo provides diverse human experiences ranging from those of a fetus

to a 12th-month-old. A series of simulated tests based on developmental

psychology will be used to evaluate the progress of a learning model. We

anticipate SEDRo to lower the cost of entry and facilitate research in the

developmental robotics community.

Action and Perception as Divergence Minimization

Danijar Hafner , Pedro A. Ortega , Jimmy Ba , Thomas Parr , Karl Friston , Nicolas Heess

Comments: 13 pages, 10 figures

Subjects

:

Artificial Intelligence (cs.AI)

; Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)

We introduce a unified objective for action and perception of intelligent

agents. Extending representation learning and control, we minimize the joint

divergence between the world and a target distribution. Intuitively, such

agents use perception to align their beliefs with the world, and use actions to

align the world with their beliefs. Minimizing the joint divergence to an

expressive target maximizes the mutual information between the agent’s

representations and inputs, thus inferring representations that are informative

of past inputs and exploring future inputs that are informative of the

representations. This lets us derive intrinsic objectives, such as

representation learning, information gain, empowerment, and skill discovery

from minimal assumptions. Moreover, interpreting the target distribution as a

latent variable model suggests expressive world models as a path toward highly

adaptive agents that seek large niches in their environments, while rendering

task rewards optional. The presented framework provides a common language for

comparing a wide range of objectives, facilitates understanding of latent

variables for decision making, and offers a recipe for designing novel

objectives. We recommend deriving future agent objectives from the joint

divergence to facilitate comparison, to point out the agent’s target

distribution, and to identify the intrinsic objective terms needed to reach

that distribution.

Grounded Language Learning Fast and Slow

Felix Hill , Olivier Tieleman , Tamara von Glehn , Nathaniel Wong , Hamza Merzic , Stephen Clark Subjects : Artificial Intelligence (cs.AI)

Recent work has shown that large text-based neural language models, trained

with conventional supervised learning objectives, acquire a surprising

propensity for few- and one-shot learning. Here, we show that an embodied agent

situated in a simulated 3D world, and endowed with a novel dual-coding external

memory, can exhibit similar one-shot word learning when trained with

conventional reinforcement learning algorithms. After a single introduction to

a novel object via continuous visual perception and a language prompt (“This is

a dax”), the agent can re-identify the object and manipulate it as instructed

(“Put the dax on the bed”). In doing so, it seamlessly integrates short-term,

within-episode knowledge of the appropriate referent for the word “dax” with

long-term lexical and motor knowledge acquired across episodes (i.e. “bed” and

“putting”). We find that, under certain training conditions and with a

particular memory writing mechanism, the agent’s one-shot word-object binding

generalizes to novel exemplars within the same ShapeNet category, and is

effective in settings with unfamiliar numbers of objects. We further show how

dual-coding memory can be exploited as a signal for intrinsic motivation,

stimulating the agent to seek names for objects that may be useful for later

executing instructions. Together, the results demonstrate that deep neural

networks can exploit meta-learning, episodic memory and an explicitly

multi-modal environment to account for ‘fast-mapping’, a fundamental pillar of

human cognitive development and a potentially transformative capacity for

agents that interact with human users.

On Population-Based Algorithms for Distributed Constraint Optimization Problems

Saaduddin Mahmud , Md. Mosaddek Khan , Nicholas R. Jennings

Comments: 7 Figures. arXiv admin note: text overlap with arXiv:1909.06254 , arXiv:2002.12001

Subjects

:

Artificial Intelligence (cs.AI)

; Multiagent Systems (cs.MA)

Distributed Constraint Optimization Problems (DCOPs) are a widely studied

class of optimization problems in which interaction between a set of

cooperative agents are modeled as a set of constraints. DCOPs are NP-hard and

significant effort has been devoted to developing methods for finding

incomplete solutions. In this paper, we study an emerging class of such

incomplete algorithms that are broadly termed as population-based algorithms.

The main characteristic of these algorithms is that they maintain a population

of candidate solutions of a given problem and use this population to cover a

large area of the search space and to avoid local-optima. In recent years, this

class of algorithms has gained significant attention due to their ability to

produce high-quality incomplete solutions. With the primary goal of further

improving the quality of solutions compared to the state-of-the-art incomplete

DCOP algorithms, we present two new population-based algorithms in this paper.

Our first approach, Anytime Evolutionary DCOP or AED, exploits evolutionary

optimization meta-heuristics to solve DCOPs. We also present a novel anytime

update mechanism that gives AED its anytime property. While in our second

contribution, we show that population-based approaches can be combined with

local search approaches. Specifically, we develop an algorithm called DPSA

based on the Simulated Annealing meta-heuristic. We empirically evaluate these

two algorithms to illustrate their respective effectiveness in different

settings against the state-of-the-art incomplete DCOP algorithms including all

existing population-based algorithms in a wide variety of benchmarks. Our

evaluation shows AED and DPSA markedly outperform the state-of-the-art and

produce up to 75% improved solutions.

Derived metrics for the game of Go — intrinsic network strength assessment and cheat-detection

Attila Egri-Nagy , Antti Törmänen

Comments: 16 pages, 12 figures, final version will be published elsewhere

Subjects

:

Artificial Intelligence (cs.AI)

The widespread availability of superhuman AI engines is changing how we play

the ancient game of Go. The open-source software packages developed after the

AlphaGo series shifted focus from producing strong playing entities to

providing tools for analyzing games. Here we describe two ways of how the

innovations of the second generation engines (e.g.~score estimates, variable

komi) can be used for defining new metrics that help deepen our understanding

of the game. First, we study how much information the search component

contributes in addition to the raw neural network policy output. This gives an

intrinsic strength measurement for the neural network. Second, we define the

effect of a move by the difference in score estimates. This gives a

fine-grained, move-by-move performance evaluation of a player. We use this in

combating the new challenge of detecting online cheating.

Fairness in the Eyes of the Data: Certifying Machine-Learning Models

Shahar Segal , Yossi Adi , Benny Pinkas , Carsten Baum , Chaya Ganesh , Joseph Keshet Subjects : Artificial Intelligence (cs.AI) ; Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)

We present a framework that allows to certify the fairness degree of a model

based on an interactive and privacy-preserving test. The framework verifies any

trained model, regardless of its training process and architecture. Thus, it

allows us to evaluate any deep learning model on multiple fairness definitions

empirically. We tackle two scenarios, where either the test data is privately

available only to the tester or is publicly known in advance, even to the model

creator. We investigate the soundness of the proposed approach using

theoretical analysis and present statistical guarantees for the interactive

test. Finally, we provide a cryptographic technique to automate fairness

testing and certified inference with only black-box access to the model at hand

while hiding the participants’ sensitive data.

User Intention Recognition and Requirement Elicitation Method for Conversational AI Services

Junrui Tian , Zhiying Tu , Zhongjie Wang , Xiaofei Xu , Min Liu

Comments: accepted as a full paper at IEEE ICWS 2020

Subjects

:

Artificial Intelligence (cs.AI)

In recent years, chat-bot has become a new type of intelligent terminal to

guide users to consume services. However, it is criticized most that the

services it provides are not what users expect or most expect. This defect

mostly dues to two problems, one is that the incompleteness and uncertainty of

user’s requirement expression caused by the information asymmetry, the other is

that the diversity of service resources leads to the difficulty of service

selection. Conversational bot is a typical mesh device, so the guided

multi-rounds Q(&)A is the most effective way to elicit user requirements.

Obviously, complex Q(&)A with too many rounds is boring and always leads to

bad user experience. Therefore, we aim to obtain user requirements as

accurately as possible in as few rounds as possible. To achieve this, a user

intention recognition method based on Knowledge Graph (KG) was developed for

fuzzy requirement inference, and a requirement elicitation method based on

Granular Computing was proposed for dialog policy generation. Experimental

results show that these two methods can effectively reduce the number of

conversation rounds, and can quickly and accurately identify the user

intention.

Learning to Infer User Hidden States for Online Sequential Advertising

Zhaoqing Peng , Junqi Jin , Lan Luo , Yaodong Yang , Rui Luo , Jun Wang , Weinan Zhang , Haiyang Xu , Miao Xu , Chuan Yu , Tiejian Luo , Han Li , Jian Xu , Kun Gai

Comments: to be published in CIKM 2020

Subjects

:

Artificial Intelligence (cs.AI)

To drive purchase in online advertising, it is of the advertiser’s great

interest to optimize the sequential advertising strategy whose performance and

interpretability are both important. The lack of interpretability in existing

deep reinforcement learning methods makes it not easy to understand, diagnose

and further optimize the strategy. In this paper, we propose our Deep Intents

Sequential Advertising (DISA) method to address these issues. The key part of

interpretability is to understand a consumer’s purchase intent which is,

however, unobservable (called hidden states). In this paper, we model this

intention as a latent variable and formulate the problem as a Partially

Observable Markov Decision Process (POMDP) where the underlying intents are

inferred based on the observable behaviors. Large-scale industrial offline and

online experiments demonstrate our method’s superior performance over several

baselines. The inferred hidden states are analyzed, and the results prove the

rationality of our inference.

FairXGBoost: Fairness-aware Classification in XGBoost

Srinivasan Ravichandran , Drona Khurana , Bharath Venkatesh , Narayanan Unny Edakunni Subjects : Artificial Intelligence (cs.AI)

Highly regulated domains such as finance have long favoured the use of

machine learning algorithms that are scalable, transparent, robust and yield

better performance. One of the most prominent examples of such an algorithm is

XGBoost. Meanwhile, there is also a growing interest in building fair and

unbiased models in these regulated domains and numerous bias-mitigation

algorithms have been proposed to this end. However, most of these

bias-mitigation methods are restricted to specific model families such as

logistic regression or support vector machine models, thus leaving modelers

with a difficult decision of choosing between fairness from the bias-mitigation

algorithms and scalability, transparency, performance from algorithms such as

XGBoost. We aim to leverage the best of both worlds by proposing a fair variant

of XGBoost that enjoys all the advantages of XGBoost, while also matching the

levels of fairness from the state-of-the-art bias-mitigation algorithms.

Furthermore, the proposed solution requires very little in terms of changes to

the original XGBoost library, thus making it easy for adoption. We provide an

empirical analysis of our proposed method on standard benchmark datasets used

in the fairness community.

Sparse Meta Networks for Sequential Adaptation and its Application to Adaptive Language Modelling

Tsendsuren Munkhdalai

Comments: 9 pages, 4 figures, 2 tables

Subjects

:

Neural and Evolutionary Computing (cs.NE)

; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)

Training a deep neural network requires a large amount of single-task data

and involves a long time-consuming optimization phase. This is not scalable to

complex, realistic environments with new unexpected changes. Humans can perform

fast incremental learning on the fly and memory systems in the brain play a

critical role. We introduce Sparse Meta Networks — a meta-learning approach to

learn online sequential adaptation algorithms for deep neural networks, by

using deep neural networks. We augment a deep neural network with a

layer-specific fast-weight memory. The fast-weights are generated sparsely at

each time step and accumulated incrementally through time providing a useful

inductive bias for online continual adaptation. We demonstrate strong

performance on a variety of sequential adaptation scenarios, from a simple

online reinforcement learning to a large scale adaptive language modelling.

Ramifications of Approximate Posterior Inference for Bayesian Deep Learning in Adversarial and Out-of-Distribution Settings

John Mitros , Arjun Pakrashi , Brian Mac Namee

Comments: ARRW@ECCV2020

Subjects

:

Machine Learning (stat.ML)

; Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)

Deep neural networks have been successful in diverse discriminative

classification tasks, although, they are poorly calibrated often assigning high

probability to misclassified predictions. Potential consequences could lead to

trustworthiness and accountability of the models when deployed in real

applications, where predictions are evaluated based on their confidence scores.

Existing solutions suggest the benefits attained by combining deep neural

networks and Bayesian inference to quantify uncertainty over the models’

predictions for ambiguous datapoints. In this work we propose to validate and

test the efficacy of likelihood based models in the task of out of distribution

detection (OoD). Across different datasets and metrics we show that Bayesian

deep learning models on certain occasions marginally outperform conventional

neural networks and in the event of minimal overlap between in/out distribution

classes, even the best models exhibit a reduction in AUC scores in detecting

OoD data. Preliminary investigations indicate the potential inherent role of

bias due to choices of initialisation, architecture or activation functions. We

hypothesise that the sensitivity of neural networks to unseen inputs could be a

multi-factor phenomenon arising from the different architectural design choices

often amplified by the curse of dimensionality. Furthermore, we perform a study

to find the effect of the adversarial noise resistance methods on in and

out-of-distribution performance, as well as, also investigate adversarial noise

robustness of Bayesian deep learners.

HyperBench: A Benchmark and Tool for Hypergraphs and Empirical Findings

Wolfgang Fischl , Georg Gottlob , Davide Mario Longo , Reinhard Pichler

Comments: arXiv admin note: substantial text overlap with arXiv:1811.08181

Subjects

:

Databases (cs.DB)

; Artificial Intelligence (cs.AI)

To cope with the intractability of answering Conjunctive Queries (CQs) and

solving Constraint Satisfaction Problems (CSPs), several notions of hypergraph

decompositions have been proposed — giving rise to different notions of width,

noticeably, plain, generalized, and fractional hypertree width (hw, ghw, and

fhw). Given the increasing interest in using such decomposition methods in

practice, a publicly accessible repository of decomposition software, as well

as a large set of benchmarks, and a web-accessible workbench for inserting,

analyzing, and retrieving hypergraphs are called for.

We address this need by providing (i) concrete implementations of hypergraph

decompositions (including new practical algorithms), (ii) a new, comprehensive

benchmark of hypergraphs stemming from disparate CQ and CSP collections, and

(iii) HyperBench, our new web-inter-face for accessing the benchmark and the

results of our analyses. In addition, we describe a number of actual

experiments we carried out with this new infrastructure.

Synthetic-to-Real Unsupervised Domain Adaptation for Scene Text Detection in the Wild

Weijia Wu , Ning Lu , Enze Xie Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)

Deep learning-based scene text detection can achieve preferable performance,

powered with sufficient labeled training data. However, manual labeling is time

consuming and laborious. At the extreme, the corresponding annotated data are

unavailable. Exploiting synthetic data is a very promising solution except for

domain distribution mismatches between synthetic datasets and real datasets. To

address the severe domain distribution mismatch, we propose a synthetic-to-real

domain adaptation method for scene text detection, which transfers knowledge

from synthetic data (source domain) to real data (target domain). In this

paper, a text self-training (TST) method and adversarial text instance

alignment (ATA) for domain adaptive scene text detection are introduced. ATA

helps the network learn domain-invariant features by training a domain

classifier in an adversarial manner. TST diminishes the adverse effects of

false positives~(FPs) and false negatives~(FNs) from inaccurate pseudo-labels.

Two components have positive effects on improving the performance of scene text

detectors when adapting from synthetic-to-real scenes. We evaluate the proposed

method by transferring from SynthText, VISD to ICDAR2015, ICDAR2013. The

results demonstrate the effectiveness of the proposed method with up to 10%

improvement, which has important exploration significance for domain adaptive

scene text detection. Code is available at

this https URL

Max-value Entropy Search for Multi-Objective Bayesian Optimization with Constraints

Syrine Belakaria , Aryan Deshwal , Janardhan Rao Doppa

Comments: 2 figure, 1 table. arXiv admin note: text overlap with arXiv:2008.07029

Subjects

:

Machine Learning (cs.LG)

; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We consider the problem of constrained multi-objective blackbox optimization

using expensive function evaluations, where the goal is to approximate the true

Pareto set of solutions satisfying a set of constraints while minimizing the

number of function evaluations. For example, in aviation power system design

applications, we need to find the designs that trade-off total energy and the

mass while satisfying specific thresholds for motor temperature and voltage of

cells. This optimization requires performing expensive computational

simulations to evaluate designs. In this paper, we propose a new approach

referred as {em Max-value Entropy Search for Multi-objective Optimization with

Constraints (MESMOC)} to solve this problem. MESMOC employs an output-space

entropy based acquisition function to efficiently select the sequence of inputs

for evaluation to uncover high-quality pareto-set solutions while satisfying

constraints.

We apply MESMOC to two real-world engineering design applications to

demonstrate its effectiveness over state-of-the-art algorithms.

Multi-Loss Weighting with Coefficient of Variations

Rick Groenendijk , Sezer Karaoglu , Theo Gevers , Thomas Mensink Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)

Many interesting tasks in machine learning and computer vision are learned by

optimising an objective function defined as a weighted linear combination of

multiple losses. The final performance is sensitive to choosing the correct

(relative) weights for these losses. Finding a good set of weights is often

done by adopting them into the set of hyper-parameters, which are set using an

extensive grid search. This is computationally expensive. In this paper, the

weights are defined based on properties observed while training the model,

including the specific batch loss, the average loss, and the variance for each

of the losses. An additional advantage is that the defined weights evolve

during training, instead of using static loss weights. In literature, loss

weighting is mostly used in a multi-task learning setting, where the different

tasks obtain different weights. However, there is a plethora of single-task

multi-loss problems that can benefit from automatic loss weighting. In this

paper, it is shown that these multi-task approaches do not work on single

tasks. Instead, a method is proposed that automatically and dynamically tunes

loss weights throughout training specifically for single-task multi-loss

problems. The method incorporates a measure of uncertainty to balance the

losses. The validity of the approach is shown empirically for different tasks

on multiple datasets.

Quasi-symplectic Langevin Variational Autoencoder

Zihao Wang , Hervé Delingette Subjects : Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Variational autoencoder (VAE) as one of the well investigated generative

model is very popular in nowadays neural learning research works. To leverage

VAE in practical tasks which have high dimensions and huge dataset often face

the problem of low variance evidence lower bounds construction. Markov chain

Monte Carlo (MCMC) is an effective approach to tight the evidence lower bound

(ELBO) for approximating the posterior distribution. Hamiltonian Variational

Autoencoder (HVAE) is one of the effective MCMC inspired approaches for

constructing the unbiased low-variance ELBO which is also amenable for

reparameterization trick. The solution significantly improves the performance

of the posterior estimation effectiveness, yet, a main drawback of HVAE is the

leapfrog method need to access the posterior gradient twice which leads to bad

inference efficiency performance and the GPU memory requirement is fair large.

This flaw limited the application of Hamiltonian based inference framework for

large scale networks inference. To tackle this problem, we propose a

Quasi-symplectic Langevin Variational autoencoder (Langevin-VAE), which can be

a significant improvement over resource usage efficiency. We qualitatively and

quantitatively demonstrate the effectiveness of the Langevin-VAE compared to

the state-of-art gradients informed inference framework.

Deep Learning Based Antenna Selection for Channel Extrapolation in FDD Massive MIMO

Yindi Yang , Shun Zhang , Feifei Gao , Chao Xu , Jianpeng Ma , Octavia A. Dobre

Comments: 6 pages, 5 figures

Subjects

:

Signal Processing (eess.SP)

; Artificial Intelligence (cs.AI)

In massive multiple-input multiple-output (MIMO) systems, the large number of

antennas would bring a great challenge for the acquisition of the accurate

channel state information, especially in the frequency division duplex mode. To

overcome the bottleneck of the limited number of radio links in hybrid

beamforming, we utilize the neural networks (NNs) to capture the inherent

connection between the uplink and downlink channel data sets and extrapolate

the downlink channels from a subset of the uplink channel state information. We

study the antenna subset selection problem in order to achieve the best channel

extrapolation and decrease the data size of NNs. The probabilistic sampling

theory is utilized to approximate the discrete antenna selection as a

continuous and differentiable function, which makes the back propagation of the

deep learning feasible. Then, we design the proper off-line training strategy

to optimize both the antenna selection pattern and the extrapolation NNs.

Finally, numerical results are presented to verify the effectiveness of our

proposed massive MIMO channel extrapolation algorithm.

Deep Learning Optimized Sparse Antenna Activation for Reconfigurable Intelligent Surface Assisted Communication

Shunbo Zhang , Shun Zhang , Feifei Gao , Jianpeng Ma , Octavia A. Dobre

Comments: 30 pages, 14 figures

Subjects

:

Signal Processing (eess.SP)

; Artificial Intelligence (cs.AI)

To capture the communications gain of the massive radiating elements with low

power cost, the conventional reconfigurable intelligent surface (RIS) usually

works in passive mode. However, due to the cascaded channel structure and the

lack of signal processing ability, it is difficult for RIS to obtain the

individual channel state information and optimize the beamforming vector. In

this paper, we add signal processing units for a few antennas at RIS to

partially acquire the channels. To solve the crucial active antenna selection

problem, we construct an active antenna selection network that utilizes the

probabilistic sampling theory to select the optimal locations of these active

antennas. With this active antenna selection network, we further design two

deep learning (DL) based schemes, i.e., the channel extrapolation scheme and

the beam searching scheme, to enable the RIS communication system. The former

utilizes the selection network and a convolutional neural network to

extrapolate the full channels from the partial channels received by the active

RIS antennas, while the latter adopts a fully-connected neural network to

achieve the direct mapping between the partial channels and the optimal

beamforming vector with maximal transmission rate. Simulation results are

provided to demonstrate the effectiveness of the designed DL-based schemes.

TRACE: Transform Aggregate and Compose Visiolinguistic Representations for Image Search with Text Feedback

Surgan Jandial , Ayush Chopra , Pinkesh Badjatiya , Pranit Chawla , Mausoom Sarkar , Balaji Krishnamurthy

Comments: Surgan Jandial, Ayush Chopra and Pinkesh Badjatiya contributed equally to this work

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Artificial Intelligence (cs.AI)

The ability to efficiently search for images over an indexed database is the

cornerstone for several user experiences. Incorporating user feedback, through

multi-modal inputs provide flexible and interaction to serve fine-grained

specificity in requirements. We specifically focus on text feedback, through

descriptive natural language queries. Given a reference image and textual user

feedback, our goal is to retrieve images that satisfy constraints specified by

both of these input modalities. The task is challenging as it requires

understanding the textual semantics from the text feedback and then applying

these changes to the visual representation. To address these challenges, we

propose a novel architecture TRACE which contains a hierarchical feature

aggregation module to learn the composite visio-linguistic representations.

TRACE achieves the SOTA performance on 3 benchmark datasets: FashionIQ, Shoes,

and Birds-to-Words, with an average improvement of at least ~5.7%, ~3%, and ~5%

respectively in R@K metric. Our extensive experiments and ablation studies show

that TRACE consistently outperforms the existing techniques by significant

margins both quantitatively and qualitatively.

Penalty and Augmented Lagrangian Methods for Layer-parallel Training of Residual Networks

Qi Sun , Hexing Dong , Zewei Chen , Weizhen Dian , Jiacheng Sun , Yitong Sun , Zhenguo Li , Bin Dong Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Algorithms for training residual networks (ResNets) typically require forward

pass of data, followed by backpropagating of loss gradient to perform parameter

updates, which can take many hours or even days for networks with hundreds of

layers. Inspired by the penalty and augmented Lagrangian methods, a

layer-parallel training algorithm is proposed in this work to overcome the

scalability barrier caused by the serial nature of forward-backward propagation

in deep residual learning. Moreover, by viewing the supervised classification

task as a numerical discretization of the terminal control problem, we bridge

the concept of synthetic gradient for decoupling backpropagation with the

parareal method for solving differential equations, which not only offers a

novel perspective on the design of synthetic loss function but also performs

parameter updates with reduced storage overhead. Experiments on a preliminary

example demonstrate that the proposed algorithm achieves comparable or even

better testing accuracy to the full serial backpropagation approach, while

enabling layer-parallelism can provide speedup over the traditional

layer-serial training methods.

Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding

Long Chen , Wenbo Ma , Jun Xiao , Hanwang Zhang , Wei Liu , Shih-Fu Chang Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)

The prevailing framework for solving referring expression grounding is based

on a two-stage process: 1) detecting proposals with an object detector and 2)

grounding the referent to one of the proposals. Existing two-stage solutions

mostly focus on the grounding step, which aims to align the expressions with

the proposals. In this paper, we argue that these methods overlook an obvious

mismatch between the roles of proposals in the two stages: they generate

proposals solely based on the detection confidence (i.e., expression-agnostic),

hoping that the proposals contain all right instances in the expression (i.e.,

expression-aware). Due to this mismatch, current two-stage methods suffer from

a severe performance drop between detected and ground-truth proposals. To this

end, we propose Ref-NMS, which is the first method to yield expression-aware

proposals at the first stage. Ref-NMS regards all nouns in the expression as

critical objects, and introduces a lightweight module to predict a score for

aligning each box with a critical object. These scores can guide the

NMSoperation to filter out the boxes irrelevant to the expression, increasing

the recall of critical objects, resulting in a significantly improved grounding

performance. Since Ref-NMS is agnostic to the grounding step, it can be easily

integrated into any state-of-the-art two-stage method. Extensive ablation

studies on several backbones, benchmarks, and tasks consistently demonstrate

the superiority of Ref-NMS.

Computational prediction of RNA tertiary structures using machine learning methods

Bin Huang , Yuanyang Du , Shuai Zhang , Wenfei Li , Jun Wang , Jian Zhang

Comments: 20 pages, 2 figures. Chinese Physics B, Aug. 2020

Journal-ref: Chinese Physics B, Sept. 2020

Subjects

:

Biological Physics (physics.bio-ph)

; Artificial Intelligence (cs.AI)

RNAs play crucial and versatile roles in biological processes. Computational

prediction approaches can help to understand RNA structures and their

stabilizing factors, thus providing information on their functions, and

facilitating the design of new RNAs. Machine learning (ML) techniques have made

tremendous progress in many fields in the past few years. Although their usage

in protein-related fields has a long history, the use of ML methods in

predicting RNA tertiary structures is new and rare. Here, we review the recent

advances of using ML methods on RNA structure predictions and discuss the

advantages and limitation, the difficulties and potentials of these approaches

when applied in the field.

Tasks Integrated Networks: Joint Detection and Retrieval for Image Search

Lei Zhang , Zhenwei He , Yi Yang , Liang Wang , Xinbo Gao

Comments: To appear in IEEE TPAMI, 18 pages

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Artificial Intelligence (cs.AI)

The traditional object retrieval task aims to learn a discriminative feature

representation with intra-similarity and inter-dissimilarity, which supposes

that the objects in an image are manually or automatically pre-cropped exactly.

However, in many real-world searching scenarios (e.g., video surveillance), the

objects (e.g., persons, vehicles, etc.) are seldom accurately detected or

annotated. Therefore, object-level retrieval becomes intractable without

bounding-box annotation, which leads to a new but challenging topic, i.e.

image-level search. In this paper, to address the image search issue, we first

introduce an end-to-end Integrated Net (I-Net), which has three merits: 1) A

Siamese architecture and an on-line pairing strategy for similar and dissimilar

objects in the given images are designed. 2) A novel on-line pairing (OLP) loss

is introduced with a dynamic feature dictionary, which alleviates the

multi-task training stagnation problem, by automatically generating a number of

negative pairs to restrict the positives. 3) A hard example priority (HEP)

based softmax loss is proposed to improve the robustness of classification task

by selecting hard categories. With the philosophy of divide and conquer, we

further propose an improved I-Net, called DC-I-Net, which makes two new

contributions: 1) two modules are tailored to handle different tasks separately

in the integrated framework, such that the task specification is guaranteed. 2)

A class-center guided HEP loss (C2HEP) by exploiting the stored class centers

is proposed, such that the intra-similarity and inter-dissimilarity can be

captured for ultimate retrieval. Extensive experiments on famous image-level

search oriented benchmark datasets demonstrate that the proposed DC-I-Net

outperforms the state-of-the-art tasks-integrated and tasks-separated image

search models.

Learning to summarize from human feedback

Nisan Stiennon , Long Ouyang , Jeff Wu , Daniel M. Ziegler , Ryan Lowe , Chelsea Voss , Alec Radford , Dario Amodei , Paul Christiano Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

As language models become more powerful, training and evaluation are

increasingly bottlenecked by the data and metrics used for a particular task.

For example, summarization models are often trained to predict human reference

summaries and evaluated using ROUGE, but both of these metrics are rough

proxies for what we really care about—summary quality. In this work, we show

that it is possible to significantly improve summary quality by training a

model to optimize for human preferences. We collect a large, high-quality

dataset of human comparisons between summaries, train a model to predict the

human-preferred summary, and use that model as a reward function to fine-tune a

summarization policy using reinforcement learning. We apply our method to a

version of the TL;DR dataset of Reddit posts and find that our models

significantly outperform both human reference summaries and much larger models

fine-tuned with supervised learning alone. Our models also transfer to CNN/DM

news articles, producing summaries nearly as good as the human reference

without any news-specific fine-tuning. We conduct extensive analyses to

understand our human feedback dataset and fine-tuned models. We establish that

our reward model generalizes to new datasets, and that optimizing our reward

model results in better summaries than optimizing ROUGE according to humans. We

hope the evidence from our paper motivates machine learning researchers to pay

closer attention to how their training loss affects the model behavior they

actually want.

Convolutional Speech Recognition with Pitch and Voice Quality Features

Guillermo Cámbara , Jordi Luque , Mireia Farrús

Comments: 5 pages

Subjects

:

Audio and Speech Processing (eess.AS)

; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)

The effects of adding pitch and voice quality features such as jitter and

shimmer to a state-of-the-art CNN model for Automatic Speech Recognition are

studied in this work. Pitch features have been previously used for improving

classical HMM and DNN baselines, while jitter and shimmer parameters have

proven to be useful for tasks like speaker or emotion recognition. Up to our

knowledge, this is the first work combining such pitch and voice quality

features with modern convolutional architectures, showing improvements up to 2%

absolute WER points, for the publicly available Spanish Common Voice dataset.

Particularly, our work combines these features with mel-frequency spectral

coefficients (MFSCs) to train a convolutional architecture with Gated Linear

Units (Conv GLUs). Such models have shown to yield small word error rates,

while being very suitable for parallel processing for online streaming

recognition use cases. We have added pitch and voice quality functionality to

Facebook’s wav2letter speech recognition framework, and we provide with such

code and recipes to the community, to carry on with further experiments.

Besides, to the best of our knowledge, our Spanish Common Voice recipe is the

first public Spanish recipe for wav2letter.

Efficiency in Real-time Webcam Gaze Tracking

Amogh Gudi , Xin Li , Jan van Gemert

Comments: Awarded Best Paper at European Conference on Computer Vision (ECCV) Workshop on Eye Gaze in AR, VR, and in the Wild (OpenEyes) 2020

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Efficiency and ease of use are essential for practical applications of camera

based eye/gaze-tracking. Gaze tracking involves estimating where a person is

looking on a screen based on face images from a computer-facing camera. In this

paper we investigate two complementary forms of efficiency in gaze tracking: 1.

The computational efficiency of the system which is dominated by the inference

speed of a CNN predicting gaze-vectors; 2. The usability efficiency which is

determined by the tediousness of the mandatory calibration of the gaze-vector

to a computer screen. To do so, we evaluate the computational speed/accuracy

trade-off for the CNN and the calibration effort/accuracy trade-off for screen

calibration. For the CNN, we evaluate the full face, two-eyes, and single eye

input. For screen calibration, we measure the number of calibration points

needed and evaluate three types of calibration: 1. pure geometry, 2. pure

machine learning, and 3. hybrid geometric regression. Results suggest that a

single eye input and geometric regression calibration achieve the best

trade-off.

Information Retrieval

Exploring Artist Gender Bias in Music Recommendation

Dougal Shakespeare , Lorenzo Porcaro , Emilia Gómez , Carlos Castillo

Comments: To be presented at 2nd Workshop on the Impact of Recommender Systems (ImpactRS), at the 14th ACM Conference on Recommender Systems (RecSys 2020)

Subjects

:

Information Retrieval (cs.IR)

Music Recommender Systems (mRS) are designed to give personalised and

meaningful recommendations of items (i.e. songs, playlists or artists) to a

user base, thereby reflecting and further complementing individual users’

specific music preferences. Whilst accuracy metrics have been widely applied to

evaluate recommendations in mRS literature, evaluating a user’s item utility

from other impact-oriented perspectives, including their potential for

discrimination, is still a novel evaluation practice in the music domain. In

this work, we center our attention on a specific phenomenon for which we want

to estimate if mRS may exacerbate its impact: extit{gender bias}.Our work

presents an exploratory study, analyzing the extent to which commonly deployed

state of the art Collaborative Filtering (CF) algorithms may act to further

increase or decrease artist gender bias. % on a popular music streaming

platform, this http URL . To assess group biases introduced by CF, we deploy a

recently proposed metric of bias disparity on two listening event datasets: the

LFM-1b dataset, and the earlier constructed Celma’s dataset. Our work traces

the causes of disparity to variations in input gender distributions and

user-item preferences, highlighting the effect such configurations can have on

user’s gender bias after recommendation generation.

Comparing Fair Ranking Metrics

Amifa Raj , Connor Wood , Ananda Montoly , Michael D. Ekstrand Subjects : Information Retrieval (cs.IR)

Ranking is a fundamental aspect of recommender systems. However, ranked

outputs can be susceptible to various biases; some of these may cause

disadvantages to members of protected groups. Several metrics have been

proposed to quantify the (un)fairness of rankings, but there has not been to

date any direct comparison of these metrics. This complicates deciding what

fairness metrics are applicable for specific scenarios, and assessing the

extent to which metrics agree or disagree. In this paper, we describe several

fair ranking metrics in a common notation, enabling direct comparison of their

approaches and assumptions, and empirically compare them on the same

experimental setup and data set. Our work provides a direct comparative

analysis identifying similarities and differences of fair ranking metrics

selected for our work.

Computation and Language

A Python Library for Exploratory Data Analysis and Knowledge Discovery on Twitter Data

Mario Graff , Daniela Moctezuma , Sabino Miranda-Jiménez , Eric S. Tellez Subjects : Computation and Language (cs.CL)

Twitter is perhaps the social media more amenable for research. It requires

only a few steps to obtain information, and there are plenty of libraries that

can help in this regard. Nonetheless, knowing whether a particular event is

expressed on Twitter is a challenging task that requires a considerable

collection of tweets. This proposal aims to facilitate, a researcher interested

in Twitter data, the process of mining events on Twitter. The events could be

related to natural disasters, health issues, people’s mobility, among other

studies that can be pursued with the library proposed. Different applications

are presented in this contribution to illustrate the library’s capabilities,

starting from an exploratory analysis of the topics discovered in tweets,

following it by studying the similarity among dialects of the Spanish language,

and complementing it with a mobility report on different countries. In summary,

the Python library presented retrieves a plethora of information processed from

Twitter (since December 2015) in terms of words, bigrams of words, and their

frequencies by day for Arabic, English, Spanish, and Russian languages.

Finally, the mobility information considered is related to the number of

travels among locations for more than 245 countries or territories.

The ADAPT Enhanced Dependency Parser at the IWPT 2020 Shared Task

James Barry , Joachim Wagner , Jennifer Foster

Comments: Submitted to the 2020 IWPT shared task on parsing Enhanced Universal Dependencies

Journal-ref: Proceedings of the 16th International Conference on Parsing

Technologies and the IWPT 2020 Shared Task (2020) 227-235

Subjects

:

Computation and Language (cs.CL)

We describe the ADAPT system for the 2020 IWPT Shared Task on parsing

enhanced Universal Dependencies in 17 languages. We implement a pipeline

approach using UDPipe and UDPipe-future to provide initial levels of

annotation. The enhanced dependency graph is either produced by a graph-based

semantic dependency parser or is built from the basic tree using a small set of

heuristics. Our results show that, for the majority of languages, a semantic

dependency parser can be successfully applied to the task of parsing enhanced

dependencies.

Unfortunately, we did not ensure a connected graph as part of our pipeline

approach and our competition submission relied on a last-minute fix to pass the

validation script which harmed our official evaluation scores significantly.

Our submission ranked eighth in the official evaluation with a macro-averaged

coarse ELAS F1 of 67.23 and a treebank average of 67.49. We later implemented

our own graph-connecting fix which resulted in a score of 79.53 (language

average) or 79.76 (treebank average), which would have placed fourth in the

competition evaluation.

SRQA: Synthetic Reader for Factoid Question Answering

Jiuniu Wang , Wenjia Xu , Xingyu Fu , Yang Wei , Li Jin , Ziyan Chen , Guangluan Xu , Yirong Wu

Comments: arXiv admin note: text overlap with arXiv:1809.00676

Journal-ref: Knowledge-Based Systems, Volume 193, 6 April 2020, 105415

Subjects

:

Computation and Language (cs.CL)

; Machine Learning (cs.LG)

The question answering system can answer questions from various fields and

forms with deep neural networks, but it still lacks effective ways when facing

multiple evidences. We introduce a new model called SRQA, which means Synthetic

Reader for Factoid Question Answering. This model enhances the question

answering system in the multi-document scenario from three aspects: model

structure, optimization goal, and training method, corresponding to Multilayer

Attention (MA), Cross Evidence (CE), and Adversarial Training (AT)

respectively. First, we propose a multilayer attention network to obtain a

better representation of the evidences. The multilayer attention mechanism

conducts interaction between the question and the passage within each layer,

making the token representation of evidences in each layer takes the

requirement of the question into account. Second, we design a cross evidence

strategy to choose the answer span within more evidences. We improve the

optimization goal, considering all the answers’ locations in multiple evidences

as training targets, which leads the model to reason among multiple evidences.

Third, adversarial training is employed to high-level variables besides the

word embedding in our model. A new normalization method is also proposed for

adversarial perturbations so that we can jointly add perturbations to several

target variables. As an effective regularization method, adversarial training

enhances the model’s ability to process noisy data. Combining these three

strategies, we enhance the contextual representation and locating ability of

our model, which could synthetically extract the answer span from several

evidences. We perform SRQA on the WebQA dataset, and experiments show that our

model outperforms the state-of-the-art models (the best fuzzy score of our

model is up to 78.56%, with an improvement of about 2%).

Biomedical named entity recognition using BERT in the machine reading comprehension framework

Cong Sun , Zhihao Yang , Lei Wang , Yin Zhang , Hongfei Lin , Jian Wang

Comments: 8 pages, 2 figures

Subjects

:

Computation and Language (cs.CL)

Recognition of biomedical entities from literature is a challenging research

focus, which is the foundation for extracting a large amount of biomedical

knowledge existing in unstructured texts into structured formats. Using the

sequence labeling framework to implement biomedical named entity recognition

(BioNER) is currently a conventional method. This method, however, often cannot

take full advantage of the semantic information in the dataset, and the

performance is not always satisfactory. In this work, instead of treating the

BioNER task as a sequence labeling problem, we formulate it as a machine

reading comprehension (MRC) problem. This formulation can introduce more prior

knowledge utilizing well-designed queries, and no longer need decoding

processes such as conditional random fields (CRF). We conduct experiments on

six BioNER datasets, and the experimental results demonstrate the effectiveness

of our method. Our method achieves state-of-the-art (SOTA) performance on the

BC4CHEMD, BC5CDR-Chem, BC5CDR-Disease, NCBI Disease, BC2GM and JNLPBA datasets,

with F1-scores of 92.38%, 94.19%, 87.36%, 90.04%, 84.98% and 78.93%,

respectively.

orgFAQ: A New Dataset and Analysis on Organizational FAQs and User Questions

Guy Lev , Michal Shmueli-Scheuer , Achiya Jerbi , David Konopnicki Subjects : Computation and Language (cs.CL)

Frequently Asked Questions (FAQ) webpages are created by organizations for

their users. FAQs are used in several scenarios, e.g., to answer user

questions. On the other hand, the content of FAQs is affected by user questions

by definition. In order to promote research in this field, several FAQ datasets

exist. However, we claim that being collected from community websites, they do

not correctly represent challenges associated with FAQs in an organizational

context. Thus, we release orgFAQ, a new dataset composed of (6988) user

questions and (1579) corresponding FAQs that were extracted from organizations’

FAQ webpages in the Jobs domain. In this paper, we provide an analysis of the

properties of such FAQs, and demonstrate the usefulness of our new dataset by

utilizing it in a relevant task from the Jobs domain. We also show the value of

the orgFAQ dataset in a task of a different domain – the COVID-19 pandemic.

Learning to summarize from human feedback

Nisan Stiennon , Long Ouyang , Jeff Wu , Daniel M. Ziegler , Ryan Lowe , Chelsea Voss , Alec Radford , Dario Amodei , Paul Christiano Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

As language models become more powerful, training and evaluation are

increasingly bottlenecked by the data and metrics used for a particular task.

For example, summarization models are often trained to predict human reference

summaries and evaluated using ROUGE, but both of these metrics are rough

proxies for what we really care about—summary quality. In this work, we show

that it is possible to significantly improve summary quality by training a

model to optimize for human preferences. We collect a large, high-quality

dataset of human comparisons between summaries, train a model to predict the

human-preferred summary, and use that model as a reward function to fine-tune a

summarization policy using reinforcement learning. We apply our method to a

version of the TL;DR dataset of Reddit posts and find that our models

significantly outperform both human reference summaries and much larger models

fine-tuned with supervised learning alone. Our models also transfer to CNN/DM

news articles, producing summaries nearly as good as the human reference

without any news-specific fine-tuning. We conduct extensive analyses to

understand our human feedback dataset and fine-tuned models. We establish that

our reward model generalizes to new datasets, and that optimizing our reward

model results in better summaries than optimizing ROUGE according to humans. We

hope the evidence from our paper motivates machine learning researchers to pay

closer attention to how their training loss affects the model behavior they

actually want.

A Simple Global Neural Discourse Parser

Yichu Zhou , Omri Koshorek , Vivek Srikumar , Jonathan Berant Subjects : Computation and Language (cs.CL)

Discourse parsing is largely dominated by greedy parsers with

manually-designed features, while global parsing is rare due to its

computational expense. In this paper, we propose a simple chart-based neural

discourse parser that does not require any manually-crafted features and is

based on learned span representations only. To overcome the computational

challenge, we propose an independence assumption between the label assigned to

a node in the tree and the splitting point that separates its children, which

results in tractable decoding. We empirically demonstrate that our model

achieves the best performance among global parsers, and comparable performance

to state-of-art greedy parsers, using only learned span representations.

Comparative Evaluation of Pretrained Transfer Learning Models on Automatic Short Answer Grading

Sasi Kiran Gaddipati , Deebul Nair , Paul G. Plöger

Comments: 7 pages, 3 figures, 3 tables. “for associated work, refer this https URL

Subjects

:

Computation and Language (cs.CL)

Automatic Short Answer Grading (ASAG) is the process of grading the student

answers by computational approaches given a question and the desired answer.

Previous works implemented the methods of concept mapping, facet mapping, and

some used the conventional word embeddings for extracting semantic features.

They extracted multiple features manually to train on the corresponding

datasets. We use pretrained embeddings of the transfer learning models, ELMo,

BERT, GPT, and GPT-2 to assess their efficiency on this task. We train with a

single feature, cosine similarity, extracted from the embeddings of these

models. We compare the RMSE scores and correlation measurements of the four

models with previous works on Mohler dataset. Our work demonstrates that ELMo

outperformed the other three models. We also, briefly describe the four

transfer learning models and conclude with the possible causes of poor results

of transfer learning models.

Knowing What to Listen to: Early Attention for Deep Speech Representation Learning

Amirhossein Hajavi , Ali Etemad Subjects : Audio and Speech Processing (eess.AS) ; Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)

Deep learning techniques have considerably improved speech processing in

recent years. Speech representations extracted by deep learning models are

being used in a wide range of tasks such as speech recognition, speaker

recognition, and speech emotion recognition. Attention models play an important

role in improving deep learning models. However current attention mechanisms

are unable to attend to fine-grained information items. In this paper we

propose the novel Fine-grained Early Frequency Attention (FEFA) for speech

signals. This model is capable of focusing on information items as small as

frequency bins. We evaluate the proposed model on two popular tasks of speaker

recognition and speech emotion recognition. Two widely used public datasets,

VoxCeleb and IEMOCAP, are used for our experiments. The model is implemented on

top of several prominent deep models as backbone networks to evaluate its

impact on performance compared to the original networks and other related work.

Our experiments show that by adding FEFA to different CNN architectures,

performance is consistently improved by substantial margins, even setting a new

state-of-the-art for the speaker recognition task. We also tested our model

against different levels of added noise showing improvements in robustness and

less sensitivity compared to the backbone networks.

Sparse Meta Networks for Sequential Adaptation and its Application to Adaptive Language Modelling

Tsendsuren Munkhdalai

Comments: 9 pages, 4 figures, 2 tables

Subjects

:

Neural and Evolutionary Computing (cs.NE)

; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)

Training a deep neural network requires a large amount of single-task data

and involves a long time-consuming optimization phase. This is not scalable to

complex, realistic environments with new unexpected changes. Humans can perform

fast incremental learning on the fly and memory systems in the brain play a

critical role. We introduce Sparse Meta Networks — a meta-learning approach to

learn online sequential adaptation algorithms for deep neural networks, by

using deep neural networks. We augment a deep neural network with a

layer-specific fast-weight memory. The fast-weights are generated sparsely at

each time step and accumulated incrementally through time providing a useful

inductive bias for online continual adaptation. We demonstrate strong

performance on a variety of sequential adaptation scenarios, from a simple

online reinforcement learning to a large scale adaptive language modelling.

HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis

Jiawei Chen , Xu Tan , Jian Luan , Tao Qin , Tie-Yan Liu Subjects : Audio and Speech Processing (eess.AS) ; Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)

High-fidelity singing voices usually require higher sampling rate (e.g.,

48kHz) to convey expression and emotion. However, higher sampling rate causes

the wider frequency band and longer waveform sequences and throws challenges

for singing voice synthesis (SVS) in both frequency and time domains.

Conventional SVS systems that adopt small sampling rate cannot well address the

above challenges. In this paper, we develop HiFiSinger, an SVS system towards

high-fidelity singing voice. HiFiSinger consists of a FastSpeech based acoustic

model and a Parallel WaveGAN based vocoder to ensure fast training and

inference and also high voice quality. To tackle the difficulty of singing

modeling caused by high sampling rate (wider frequency band and longer

waveform), we introduce multi-scale adversarial training in both the acoustic

model and vocoder to improve singing modeling. Specifically, 1) To handle the

larger range of frequencies caused by higher sampling rate, we propose a novel

sub-frequency GAN (SF-GAN) on mel-spectrogram generation, which splits the full

80-dimensional mel-frequency into multiple sub-bands and models each sub-band

with a separate discriminator. 2) To model longer waveform sequences caused by

higher sampling rate, we propose a multi-length GAN (ML-GAN) for waveform

generation to model different lengths of waveform sequences with separate

discriminators. 3) We also introduce several additional designs and findings in

HiFiSinger that are crucial for high-fidelity voices, such as adding F0 (pitch)

and V/UV (voiced/unvoiced flag) as acoustic features, choosing an appropriate

window/hop size for mel-spectrogram, and increasing the receptive field in

vocoder for long vowel modeling. Experiment results show that HiFiSinger

synthesizes high-fidelity singing voices with much higher quality: 0.32/0.44

MOS gain over 48kHz/24kHz baseline and 0.83 MOS gain over previous SVS systems.

Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding

Long Chen , Wenbo Ma , Jun Xiao , Hanwang Zhang , Wei Liu , Shih-Fu Chang Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)

The prevailing framework for solving referring expression grounding is based

on a two-stage process: 1) detecting proposals with an object detector and 2)

grounding the referent to one of the proposals. Existing two-stage solutions

mostly focus on the grounding step, which aims to align the expressions with

the proposals. In this paper, we argue that these methods overlook an obvious

mismatch between the roles of proposals in the two stages: they generate

proposals solely based on the detection confidence (i.e., expression-agnostic),

hoping that the proposals contain all right instances in the expression (i.e.,

expression-aware). Due to this mismatch, current two-stage methods suffer from

a severe performance drop between detected and ground-truth proposals. To this

end, we propose Ref-NMS, which is the first method to yield expression-aware

proposals at the first stage. Ref-NMS regards all nouns in the expression as

critical objects, and introduces a lightweight module to predict a score for

aligning each box with a critical object. These scores can guide the

NMSoperation to filter out the boxes irrelevant to the expression, increasing

the recall of critical objects, resulting in a significantly improved grounding

performance. Since Ref-NMS is agnostic to the grounding step, it can be easily

integrated into any state-of-the-art two-stage method. Extensive ablation

studies on several backbones, benchmarks, and tasks consistently demonstrate

the superiority of Ref-NMS.

Data Programming by Demonstration: A Framework for Interactively Learning Labeling Functions

Sara Evensen , Chang Ge , Dongjin Choi , Çağatay Demiralp Subjects : Machine Learning (cs.LG) ; Computation and Language (cs.CL); Databases (cs.DB); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)

Data programming is a programmatic weak supervision approach to efficiently

curate large-scale labeled training data. Writing data programs (labeling

functions) requires, however, both programming literacy and domain expertise.

Many subject matter experts have neither programming proficiency nor time to

effectively write data programs. Furthermore, regardless of one’s expertise in

coding or machine learning, transferring domain expertise into labeling

functions by enumerating rules and thresholds is not only time consuming but

also inherently difficult. Here we propose a new framework, data programming by

demonstration (DPBD), to generate labeling rules using interactive

demonstrations of users. DPBD aims to relieve the burden of writing labeling

functions from users, enabling them to focus on higher-level semantics such as

identifying relevant signals for labeling tasks. We operationalize our

framework with Ruler, an interactive system that synthesizes labeling rules for

document classification by using span-level annotations of users on document

examples. We compare Ruler with conventional data programming through a user

study conducted with 10 data scientists creating labeling functions for

sentiment and spam classification tasks. We find that Ruler is easier to use

and learn and offers higher overall satisfaction, while providing

discriminative model performances comparable to ones achieved by conventional

data programming.

Towards Earnings Call and Stock Price Movement

Zhiqiang Ma , Grace Bang , Chong Wang , Xiaomo Liu

Comments: Accepted by KDD 2020 MLF workshop

Subjects

:

Statistical Finance (q-fin.ST)

; Computational Engineering, Finance, and Science (cs.CE); Computation and Language (cs.CL); Machine Learning (cs.LG)

Earnings calls are hosted by management of public companies to discuss the

company’s financial performance with analysts and investors. Information

disclosed during an earnings call is an essential source of data for analysts

and investors to make investment decisions. Thus, we leverage earnings call

transcripts to predict future stock price dynamics. We propose to model the

language in transcripts using a deep learning framework, where an attention

mechanism is applied to encode the text data into vectors for the

discriminative network classifier to predict stock price movements. Our

empirical experiments show that the proposed model is superior to the

traditional machine learning baselines and earnings call information can boost

the stock price prediction performance.

Distributed, Parallel, and Cluster Computing

Fast Byzantine Gathering with Visibility in Graphs

Avery Miller , Ullash Saha

Comments: Conference version appeared at ALGOSENSORS 2020

Subjects

:

Distributed, Parallel, and Cluster Computing (cs.DC)

; Data Structures and Algorithms (cs.DS)

We consider the gathering task by a team of (m) synchronous mobile robots in

a graph of (n) nodes. Each robot has an identifier (ID) and runs its own

deterministic algorithm, i.e., there is no centralized coordinator. We consider

a particularly challenging scenario: there are (f) Byzantine robots in the team

that can behave arbitrarily, and even have the ability to change their IDs to

any value at any time. There is no way to distinguish these robots from

non-faulty robots, other than perhaps observing strange or unexpected

behaviour. The goal of the gathering task is to eventually have all non-faulty

robots located at the same node in the same round. It is known that no

algorithm can solve this task unless there at least (f+1) non-faulty robots in

the team. In this paper, we design an algorithm that runs in polynomial time

with respect to (n) and (m) that matches this bound, i.e., it works in a team

that has exactly (f+1) non-faulty robots. In our model, we have equipped the

robots with sensors that enable each robot to see the subgraph (including

robots) within some distance (H) of its current node. We prove that the

gathering task is solvable if this visibility range (H) is at least the radius

of the graph, and not solvable if (H) is any fixed constant.

Software-Distributed Shared Memory for Heterogeneous Machines: Design and Use Considerations

Loïc Cudennec (DACLE-LIST, DGA.MI) Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)

Distributed shared memory (DSM) allows to implement and deploy applications

onto distributed architectures using the convenient shared memory programming

model in which a set of tasks are able to allocate and access data despite

their remote localization. With the development of distributed heterogeneous

architectures in both HPC and embedded contexts, there is a renewal of interest

for systems such as DSM that ease the programmability of complex hardware. In

this report, some design considerations are given to build a complete

software-DSM (S-DSM). This S-DSM called SAT (Share Among Things) is developed

at CEA (the French Alternative Energies and Atomic Energy Commission) within

the framework of European project M2DC (Modular Microserver DataCentre) to

tackle the problem of managing shared data over microserver architec-tures. The

S-DSM features the automatic decomposition of large data into atomic pieces

called chunks, the possibility to deploy multiple coherence protocols to manage

different chunks, an hybrid programming model based on event programming and a

micro-sleep mechanism to decrease the energy consumption on message reception.

Distributed Online Optimization via Gradient Tracking with Adaptive Momentum

Guido Carnevale , Francesco Farina , Ivano Notarnicola , Giuseppe Notarstefano Subjects : Optimization and Control (math.OC) ; Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

This paper deals with a network of computing agents aiming to solve an online

optimization problem in a distributed fashion, i.e., by means of local

computation and communication, without any central coordinator. We propose the

gradient tracking with adaptive momentum estimation (GTAdam) distributed

algorithm, which combines a gradient tracking mechanism with first and second

order momentum estimates of the gradient. The algorithm is analyzed in the

online setting for strongly convex and smooth cost functions. We prove that the

average dynamic regret is bounded and that the convergence rate is linear. The

algorithm is tested on a time-varying classification problem, on a (moving)

target localization problem and in a stochastic optimization setup from image

classification. In these numerical experiments from multi-agent learning,

GTAdam outperforms state-of-the-art distributed optimization methods.

Layer-specific Optimization for Mixed Data Flow with Mixed Precision in FPGA Design for CNN-based Object Detectors

Duy Thanh Nguyen , Hyun Kim , Hyuk-Jae Lee

Comments: Accepted for publication in IEEE Transaction on Circuit and System for Video Technology

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)

Convolutional neural networks (CNNs) require both intensive computation and

frequent memory access, which lead to a low processing speed and large power

dissipation. Although the characteristics of the different layers in a CNN are

frequently quite different, previous hardware designs have employed common

optimization schemes for them. This paper proposes a layer-specific design that

employs different organizations that are optimized for the different layers.

The proposed design employs two layer-specific optimizations: layer-specific

mixed data flow and layer-specific mixed precision. The mixed data flow aims to

minimize the off-chip access while demanding a minimal on-chip memory (BRAM)

resource of an FPGA device. The mixed precision quantization is to achieve both

a lossless accuracy and an aggressive model compression, thereby further

reducing the off-chip access. A Bayesian optimization approach is used to

select the best sparsity for each layer, achieving the best trade-off between

the accuracy and compression. This mixing scheme allows the entire network

model to be stored in BRAMs of the FPGA to aggressively reduce the off-chip

access, and thereby achieves a significant performance enhancement. The model

size is reduced by 22.66-28.93 times compared to that in a full-precision

network with a negligible degradation of accuracy on VOC, COCO, and ImageNet

datasets. Furthermore, the combination of mixed dataflow and mixed precision

significantly outperforms the previous works in terms of both throughput,

off-chip access, and on-chip memory requirement.

DRLE: Decentralized Reinforcement Learning at the Edge for Traffic Light Control

Pengyuan Zhou , Xianfu Chen , Zhi Liu , Tristan Braud , Pan Hui , Jussi Kangasharju Subjects : Multiagent Systems (cs.MA) ; Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Systems and Control (eess.SY)

The Internet of Vehicles (IoV) enables real-time data exchange among vehicles

and roadside units and thus provides a promising solution to alleviate traffic

jams in the urban area. Meanwhile, better traffic management via efficient

traffic light control can benefit the IoV as well by enabling a better

communication environment and decreasing the network load. As such, IoV and

efficient traffic light control can formulate a virtuous cycle. Edge computing,

an emerging technology to provide low-latency computation capabilities at the

edge of the network, can further improve the performance of this cycle.

However, while the collected information is valuable, an efficient solution for

better utilization and faster feedback has yet to be developed for

edge-empowered IoV. To this end, we propose a Decentralized Reinforcement

Learning at the Edge for traffic light control in the IoV (DRLE). DRLE exploits

the ubiquity of the IoV to accelerate the collection of traffic data and its

interpretation towards alleviating congestion and providing better traffic

light control. DRLE operates within the coverage of the edge servers and uses

aggregated data from neighboring edge servers to provide city-scale traffic

light control. DRLE decomposes the highly complex problem of large area

control. into a decentralized multi-agent problem. We prove its global optima

with concrete mathematical reasoning. The proposed decentralized reinforcement

learning algorithm running at each edge node adapts the traffic lights in real

time. We conduct extensive evaluations and demonstrate the superiority of this

approach over several state-of-the-art algorithms.

Local Fast Rerouting with Low Congestion: A Randomized Approach

Gregor Bankhamer , Robert Elsässer , Stefan Schmid Subjects : Networking and Internet Architecture (cs.NI) ; Distributed, Parallel, and Cluster Computing (cs.DC)

Most modern communication networks include fast rerouting mechanisms,

implemented entirely in the data plane, to quickly recover connectivity after

link failures. By relying on local failure information only, these data plane

mechanisms provide very fast reaction times, but at the same time introduce an

algorithmic challenge in case of multiple link failures: failover routes need

to be robust to additional but locally unknown failures downstream.

This paper presents local fast rerouting algorithms which not only provide a

high degree of resilience against multiple link failures, but also ensure a low

congestion on the resulting failover paths. We consider a randomized approach

and focus on networks which are highly connected before the failures occur. Our

main contribution are three simple algorithms which come with provable

guarantees and provide interesting resilience-load tradeoffs, significantly

outperforming any deterministic fast rerouting algorithm with high probability.

Towards Efficient and Scalable Acceleration of Online Decision Tree Learning on FPGA

Zhe Lin , Sharad Sinha , Wei Zhang

Comments: appear as a conference paper in FCCM 2019

Subjects

:

Machine Learning (cs.LG)

; Distributed, Parallel, and Cluster Computing (cs.DC)

Decision trees are machine learning models commonly used in various

application scenarios. In the era of big data, traditional decision tree

induction algorithms are not suitable for learning large-scale datasets due to

their stringent data storage requirement. Online decision tree learning

algorithms have been devised to tackle this problem by concurrently training

with incoming samples and providing inference results. However, even the most

up-to-date online tree learning algorithms still suffer from either high memory

usage or high computational intensity with dependency and long latency, making

them challenging to implement in hardware. To overcome these difficulties, we

introduce a new quantile-based algorithm to improve the induction of the

Hoeffding tree, one of the state-of-the-art online learning models. The

proposed algorithm is light-weight in terms of both memory and computational

demand, while still maintaining high generalization ability. A series of

optimization techniques dedicated to the proposed algorithm have been

investigated from the hardware perspective, including coarse-grained and

fine-grained parallelism, dynamic and memory-based resource sharing, pipelining

with data forwarding. We further present a high-performance, hardware-efficient

and scalable online decision tree learning system on a field-programmable gate

array (FPGA) with system-level optimization techniques. Experimental results

show that our proposed algorithm outperforms the state-of-the-art Hoeffding

tree learning method, leading to 0.05% to 12.3% improvement in inference

accuracy. Real implementation of the complete learning system on the FPGA

demonstrates a 384x to 1581x speedup in execution time over the

state-of-the-art design.

Learning

Physics-Consistent Data-driven Waveform Inversion with Adaptive Data Augmentation

Renán Rojas-Gómez , Jihyun Yang , Youzuo Lin , James Theiler , Brendt Wohlberg Subjects : Machine Learning (cs.LG) ; Image and Video Processing (eess.IV); Machine Learning (stat.ML)

Seismic full-waveform inversion (FWI) is a nonlinear computational imaging

technique that can provide detailed estimates of subsurface geophysical

properties. Solving the FWI problem can be challenging due to its ill-posedness

and high computational cost. In this work, we develop a new hybrid

computational approach to solve FWI that combines physics-based models with

data-driven methodologies. In particular, we develop a data augmentation

strategy that can not only improve the representativity of the training set but

also incorporate important governing physics into the training process and

therefore improve the inversion accuracy. To validate the performance, we apply

our method to synthetic elastic seismic waveform data generated from a

subsurface geologic model built on a carbon sequestration site at Kimberlina,

California. We compare our physics-consistent data-driven inversion method to

both purely physics-based and purely data-driven approaches and observe that

our method yields higher accuracy and greater generalization ability.

A Wholistic View of Continual Learning with Deep Neural Networks: Forgotten Lessons and the Bridge to Active and Open World Learning

Martin Mundt , Yong Won Hong , Iuliia Pliushch , Visvanathan Ramesh

Comments: 32 pages

Subjects

:

Machine Learning (cs.LG)

; Machine Learning (stat.ML)

Current deep learning research is dominated by benchmark evaluation. A method

is regarded as favorable if it empirically performs well on the dedicated test

set. This mentality is seamlessly reflected in the resurfacing area of

continual learning, where consecutively arriving sets of benchmark data are

investigated. The core challenge is framed as protecting previously acquired

representations from being catastrophically forgotten due to the iterative

parameter updates. However, comparison of individual methods is nevertheless

treated in isolation from real world application and typically judged by

monitoring accumulated test set performance. The closed world assumption

remains predominant. It is assumed that during deployment a model is guaranteed

to encounter data that stems from the same distribution as used for training.

This poses a massive challenge as neural networks are well known to provide

overconfident false predictions on unknown instances and break down in the face

of corrupted data. In this work we argue that notable lessons from open set

recognition, the identification of statistically deviating data outside of the

observed dataset, and the adjacent field of active learning, where data is

incrementally queried such that the expected performance gain is maximized, are

frequently overlooked in the deep learning era. Based on these forgotten

lessons, we propose a consolidated view to bridge continual learning, active

learning and open set recognition in deep neural networks. Our results show

that this not only benefits each individual paradigm, but highlights the

natural synergies in a common framework. We empirically demonstrate

improvements when alleviating catastrophic forgetting, querying data in active

learning, selecting task orders, while exhibiting robust open world application

where previously proposed methods fail.

Max-value Entropy Search for Multi-Objective Bayesian Optimization with Constraints

Syrine Belakaria , Aryan Deshwal , Janardhan Rao Doppa

Comments: 2 figure, 1 table. arXiv admin note: text overlap with arXiv:2008.07029

Subjects

:

Machine Learning (cs.LG)

; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We consider the problem of constrained multi-objective blackbox optimization

using expensive function evaluations, where the goal is to approximate the true

Pareto set of solutions satisfying a set of constraints while minimizing the

number of function evaluations. For example, in aviation power system design

applications, we need to find the designs that trade-off total energy and the

mass while satisfying specific thresholds for motor temperature and voltage of

cells. This optimization requires performing expensive computational

simulations to evaluate designs. In this paper, we propose a new approach

referred as {em Max-value Entropy Search for Multi-objective Optimization with

Constraints (MESMOC)} to solve this problem. MESMOC employs an output-space

entropy based acquisition function to efficiently select the sequence of inputs

for evaluation to uncover high-quality pareto-set solutions while satisfying

constraints.

We apply MESMOC to two real-world engineering design applications to

demonstrate its effectiveness over state-of-the-art algorithms.

CAGNN: Cluster-Aware Graph Neural Networks for Unsupervised Graph Representation Learning

Yanqiao Zhu , Yichen Xu , Feng Yu , Shu Wu , Liang Wang

Comments: 21 pages, in submission to ACM TIST

Subjects

:

Machine Learning (cs.LG)

; Social and Information Networks (cs.SI); Machine Learning (stat.ML)

Unsupervised graph representation learning aims to learn low-dimensional node

embeddings without supervision while preserving graph topological structures

and node attributive features. Previous graph neural networks (GNN) require a

large number of labeled nodes, which may not be accessible in real-world graph

data. In this paper, we present a novel cluster-aware graph neural network

(CAGNN) model for unsupervised graph representation learning using

self-supervised techniques. In CAGNN, we perform clustering on the node

embeddings and update the model parameters by predicting the cluster

assignments. Moreover, we observe that graphs often contain inter-class edges,

which mislead the GNN model to aggregate noisy information from neighborhood

nodes. We further refine the graph topology by strengthening intra-class edges

and reducing node connections between different classes based on cluster

labels, which better preserves cluster structures in the embedding space. We

conduct comprehensive experiments on two benchmark tasks using real-world

datasets. The results demonstrate the superior performance of the proposed

model over existing baseline methods. Notably, our model gains over 7%

improvements in terms of accuracy on node clustering over state-of-the-arts.

Yet Meta Learning Can Adapt Fast, It Can Also Break Easily

Han Xu , Yaxin Li , Xiaorui Liu , Hui Liu , Jiliang Tang

Comments: Meta Learning Robustnss

Subjects

:

Machine Learning (cs.LG)

; Machine Learning (stat.ML)

Meta learning algorithms have been widely applied in many tasks for efficient

learning, such as few-shot image classification and fast reinforcement

learning. During meta training, the meta learner develops a common learning

strategy, or experience, from a variety of learning tasks. Therefore, during

meta test, the meta learner can use the learned strategy to quickly adapt to

new tasks even with a few training samples. However, there is still a dark side

about meta learning in terms of reliability and robustness. In particular, is

meta learning vulnerable to adversarial attacks? In other words, would a

well-trained meta learner utilize its learned experience to build wrong or

likely useless knowledge, if an adversary unnoticeably manipulates the given

training set? Without the understanding of this problem, it is extremely risky

to apply meta learning in safety-critical applications. Thus, in this paper, we

perform the initial study about adversarial attacks on meta learning under the

few-shot classification problem. In particular, we formally define key elements

of adversarial attacks unique to meta learning and propose the first attacking

algorithm against meta learning under various settings. We evaluate the

effectiveness of the proposed attacking strategy as well as the robustness of

several representative meta learning algorithms. Experimental results

demonstrate that the proposed attacking strategy can easily break the meta

learner and meta learning is vulnerable to adversarial attacks. The

implementation of the proposed framework will be released upon the acceptance

of this paper.

MixBoost: Synthetic Oversampling with Boosted Mixup for Handling Extreme Imbalance

Anubha Kabra , Ayush Chopra , Nikaash Puri , Pinkesh Badjatiya , Sukriti Verma , Piyush Gupta , Balaji K

Comments: Work done as part of internship at MDSR

Subjects

:

Machine Learning (cs.LG)

; Machine Learning (stat.ML)

Training a classification model on a dataset where the instances of one class

outnumber those of the other class is a challenging problem. Such imbalanced

datasets are standard in real-world situations such as fraud detection, medical

diagnosis, and computational advertising. We propose an iterative data

augmentation method, MixBoost, which intelligently selects (Boost) and then

combines (Mix) instances from the majority and minority classes to generate

synthetic hybrid instances that have characteristics of both classes. We

evaluate MixBoost on 20 benchmark datasets, show that it outperforms existing

approaches, and test its efficacy through significance testing. We also present

ablation studies to analyze the impact of the different components of MixBoost.

Can AutoML outperform humans? An evaluation on popular OpenML datasets using AutoML Benchmark

Marc Hanussek , Matthias Blohm , Maximilien Kintz Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

In the last few years, Automated Machine Learning (AutoML) has gained much

attention. With that said, the question arises whether AutoML can outperform

results achieved by human data scientists. This paper compares four AutoML

frameworks on 12 different popular datasets from OpenML; six of them supervised

classification tasks and the other six supervised regression ones.

Additionally, we consider a real-life dataset from one of our recent projects.

The results show that the automated frameworks perform better or equal than the

machine learning community in 7 out of 12 OpenML tasks.

Process Mining Meets Causal Machine Learning: Discovering Causal Rules from Event Logs

Zahra Dasht Bozorgi , Irene Teinemaa , Marlon Dumas , Marcello La Rosa , Artem Polyvyanyy

Comments: 8 pages, 4 figures, conference

Subjects

:

Machine Learning (cs.LG)

; Machine Learning (stat.ML)

This paper proposes an approach to analyze an event log of a business process

in order to generate case-level recommendations of treatments that maximize the

probability of a given outcome. Users classify the attributes in the event log

into controllable and non-controllable, where the former correspond to

attributes that can be altered during an execution of the process (the possible

treatments). We use an action rule mining technique to identify treatments that

co-occur with the outcome under some conditions. Since action rules are

generated based on correlation rather than causation, we then use a causal

machine learning technique, specifically uplift trees, to discover subgroups of

cases for which a treatment has a high causal effect on the outcome after

adjusting for confounding variables. We test the relevance of this approach

using an event log of a loan application process and compare our findings with

recommendations manually produced by process mining experts.

Sample-Efficient Automated Deep Reinforcement Learning

Jörg K.H. Franke , Gregor Köhler , André Biedenkapp , Frank Hutter Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

Despite significant progress in challenging problems across various domains,

applying state-of-the-art deep reinforcement learning (RL) algorithms remains

challenging due to their sensitivity to the choice of hyperparameters. This

sensitivity can partly be attributed to the non-stationarity of the RL problem,

potentially requiring different hyperparameter settings at different stages of

the learning process. Additionally, in the RL setting, hyperparameter

optimization (HPO) requires a large number of environment interactions,

hindering the transfer of the successes in RL to real-world applications. In

this work, we tackle the issues of sample-efficient and dynamic HPO in RL. We

propose a population-based automated RL (AutoRL) framework to meta-optimize

arbitrary off-policy RL algorithms. In this framework, we optimize the

hyperparameters, including architecture hyperparameters while simultaneously

training the agent. By sharing the collected experience across the population,

we substantially increase the sample efficiency of the meta-optimization. We

demonstrate the capabilities of our sample-efficient AutoRL approach in a case

study with the popular TD3 algorithm in the MuJoCo benchmark suite, where we

reduce the number of environment interactions needed for meta-optimization by

up to an order of magnitude compared to population-based training.

Bounded Risk-Sensitive Markov Game and Its Inverse Reward Learning Problem

Ran Tian , Liting Sun , Masayoshi Tomizuka Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

Classical game-theoretic approaches for multi-agent systems in both the

forward policy learning/design problem and the inverse reward learning problem

often make strong rationality assumptions: agents are perfectly rational

expected utility maximizers. Specifically, the agents are risk-neutral to all

uncertainties, maximize their expected rewards, and have unlimited computation

resources to explore such policies. Such assumptions, however, substantially

mismatch with many observed humans’ behaviors such as satisficing with

sub-optimal policies, risk-seeking and loss-aversion decisions. In this paper,

we investigate the problem of bounded risk-sensitive Markov Game (BRSMG) and

its inverse reward learning problem. Instead of assuming unlimited computation

resources, we consider the influence of bounded intelligence by exploiting

iterative reasoning models in BRSMG. Instead of assuming agents maximize their

expected utilities (a risk-neutral measure), we consider the impact of

risk-sensitive measures such as the cumulative prospect theory. Convergence

analysis of BRSMG for both the forward policy learning and the inverse reward

learning are established. The proposed forward policy learning and inverse

reward learning algorithms in BRSMG are validated through a navigation

scenario. Simulation results show that the behaviors of agents in BRSMG

demonstrate both risk-averse and risk-seeking phenomena, which are consistent

with observations from humans. Moreover, in the inverse reward learning task,

the proposed bounded risk-sensitive inverse learning algorithm outperforms the

baseline risk-neutral inverse learning algorithm.

Explainable Empirical Risk Minimization

A. Jung Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

The widespread use of modern machine learning methods in decision making

crucially depends on their interpretability or explainability. The human users

(decision makers) of machine learning methods are often not only interested in

getting accurate predictions or projections. Rather, as a decision-maker, the

user also needs a convincing answer (or explanation) to the question of why a

particular prediction was delivered. Explainable machine learning might be a

legal requirement when used for decision making with an immediate effect on the

health of human beings. As an example consider the computer vision of a

self-driving car whose predictions are used to decide if to stop the car. We

have recently proposed an information-theoretic approach to construct

personalized explanations for predictions obtained from ML. This method was

model-agnostic and only required some training samples of the model to be

explained along with a user feedback signal. This paper uses an

information-theoretic measure for the quality of an explanation to learn

predictors that are intrinsically explainable to a specific user. Our approach

is not restricted to a particular hypothesis space, such as linear maps or

shallow decision trees, whose predictor maps are considered as explainable by

definition. Rather, we regularize an arbitrary hypothesis space using a

personalized measure for the explainability of a particular predictor.

Optimality-based Analysis of XCSF Compaction in Discrete Reinforcement Learning

Jordan T. Bishop , Marcus Gallagher Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

Learning classifier systems (LCSs) are population-based predictive systems

that were originally envisioned as agents to act in reinforcement learning (RL)

environments. These systems can suffer from population bloat and so are

amenable to compaction techniques that try to strike a balance between

population size and performance. A well-studied LCS architecture is XCSF, which

in the RL setting acts as a Q-function approximator. We apply XCSF to a

deterministic and stochastic variant of the FrozenLake8x8 environment from

OpenAI Gym, with its performance compared in terms of function approximation

error and policy accuracy to the optimal Q-functions and policies produced by

solving the environments via dynamic programming. We then introduce a novel

compaction algorithm (Greedy Niche Mass Compaction – GNMC) and study its

operation on XCSF’s trained populations. Results show that given a suitable

parametrisation, GNMC preserves or even slightly improves function

approximation error while yielding a significant reduction in population size.

Reasonable preservation of policy accuracy also occurs, and we link this metric

to the commonly used steps-to-goal metric in maze-like environments,

illustrating how the metrics are complementary rather than competitive.

Penalty and Augmented Lagrangian Methods for Layer-parallel Training of Residual Networks

Qi Sun , Hexing Dong , Zewei Chen , Weizhen Dian , Jiacheng Sun , Yitong Sun , Zhenguo Li , Bin Dong Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Algorithms for training residual networks (ResNets) typically require forward

pass of data, followed by backpropagating of loss gradient to perform parameter

updates, which can take many hours or even days for networks with hundreds of

layers. Inspired by the penalty and augmented Lagrangian methods, a

layer-parallel training algorithm is proposed in this work to overcome the

scalability barrier caused by the serial nature of forward-backward propagation

in deep residual learning. Moreover, by viewing the supervised classification

task as a numerical discretization of the terminal control problem, we bridge

the concept of synthetic gradient for decoupling backpropagation with the

parareal method for solving differential equations, which not only offers a

novel perspective on the design of synthetic loss function but also performs

parameter updates with reduced storage overhead. Experiments on a preliminary

example demonstrate that the proposed algorithm achieves comparable or even

better testing accuracy to the full serial backpropagation approach, while

enabling layer-parallelism can provide speedup over the traditional

layer-serial training methods.

Error estimate for a universal function approximator of ReLU network with a local connection

Jae-Mo Kang , Sunghwan Moon Subjects : Machine Learning (cs.LG) ; Information Theory (cs.IT); Machine Learning (stat.ML)

Neural networks have shown high successful performance in a wide range of

tasks, but further studies are needed to improve its performance. We analyze

the approximation error of the specific neural network architecture with a

local connection and higher application than one with the full connection

because the local-connected network can be used to explain diverse neural

networks such as CNNs. Our error estimate depends on two parameters: one

controlling the depth of the hidden layer, and the other, the width of the

hidden layers.

FairGNN: Eliminating the Discrimination in Graph Neural Networks with Limited Sensitive Attribute Information

Enyan Dai , Suhang Wang Subjects : Machine Learning (cs.LG)

Graph neural networks (GNNs) have shown great power in modeling graph

structured data. However, similar to other machine learning models, GNNs may

make predictions biased on protected sensitive attributes, e.g., skin color,

gender, and nationality. Because machine learning algorithms including GNNs are

trained to faithfully reflect the distribution of the training data which often

contains historical bias towards sensitive attributes. In addition, the

discrimination in GNNs can be magnified by graph structures and the

message-passing mechanism. As a result, the applications of GNNs in sensitive

domains such as crime rate prediction would be largely limited. Though

extensive studies of fair classification have been conducted on i.i.d data,

methods to address the problem of discrimination on non-i.i.d data are rather

limited. Furthermore, the practical scenario of sparse annotations in sensitive

attributes is rarely considered in existing works. Therefore, we study the

novel and important problem of learning fair GNNs with limited sensitive

attribute information. FairGNN is proposed to eliminate the bias of GNNs whilst

maintaining high node classification accuracy by leveraging graph structures

and limited sensitive information. Our theoretical analysis shows that FairGNN

can ensure the fairness of GNNs under mild conditions given limited nodes with

known sensitive attributes. Extensive experiments on real-world datasets also

demonstrate the effectiveness of FairGNN in debiasing and keeping high

accuracy.

Data Programming by Demonstration: A Framework for Interactively Learning Labeling Functions

Sara Evensen , Chang Ge , Dongjin Choi , Çağatay Demiralp Subjects : Machine Learning (cs.LG) ; Computation and Language (cs.CL); Databases (cs.DB); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)

Data programming is a programmatic weak supervision approach to efficiently

curate large-scale labeled training data. Writing data programs (labeling

functions) requires, however, both programming literacy and domain expertise.

Many subject matter experts have neither programming proficiency nor time to

effectively write data programs. Furthermore, regardless of one’s expertise in

coding or machine learning, transferring domain expertise into labeling

functions by enumerating rules and thresholds is not only time consuming but

also inherently difficult. Here we propose a new framework, data programming by

demonstration (DPBD), to generate labeling rules using interactive

demonstrations of users. DPBD aims to relieve the burden of writing labeling

functions from users, enabling them to focus on higher-level semantics such as

identifying relevant signals for labeling tasks. We operationalize our

framework with Ruler, an interactive system that synthesizes labeling rules for

document classification by using span-level annotations of users on document

examples. We compare Ruler with conventional data programming through a user

study conducted with 10 data scientists creating labeling functions for

sentiment and spam classification tasks. We find that Ruler is easier to use

and learn and offers higher overall satisfaction, while providing

discriminative model performances comparable to ones achieved by conventional

data programming.

Algebraic Neural Networks: Stability Properties

Alejandro Parada-Mayorga , Alejandro Ribeiro Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

In this work we study the stability of algebraic neural networks (AlgNNs)

with commutative algebras which unify CNNs and GNNs under the umbrella of

algebraic signal processing. An AlgNN is a stacked layered structure where each

layer is conformed by an algebra (mathcal{A}), a vector space (mathcal{M})

and a homomorphism (

ho:mathcal{A}

ightarrow ext{End}(mathcal{M})), where

( ext{End}(mathcal{M})) is the set of endomorphims of (mathcal{M}). Signals

in each layer are modeled as elements of (mathcal{M}) and are processed by

elements of ( ext{End}(mathcal{M})) defined according to the structure of

(mathcal{A}) via (

ho). This framework provides a general scenario that

covers several types of neural network architectures where formal convolution

operators are being used. We obtain stability conditions regarding to

perturbations which are defined as distortions of (

ho), reaching general

results whose particular cases are consistent with recent findings in the

literature for CNNs and GNNs. We consider conditions on the domain of the

homomorphisms in the algebra that lead to stable operators. Interestingly, we

found that these conditions are related to the uniform boundedness of the

Fréchet derivative of a function

(p: ext{End}(mathcal{M})

ightarrow ext{End}(mathcal{M})) that maps the

images of the generators of (mathcal{A}) on ( ext{End}(mathcal{M})) into a

power series representation that defines the filtering of elements in

(mathcal{M}). Additionally, our results show that stability is universal to

convolutional architectures whose algebraic signal model uses the same algebra.

Towards Efficient and Scalable Acceleration of Online Decision Tree Learning on FPGA

Zhe Lin , Sharad Sinha , Wei Zhang

Comments: appear as a conference paper in FCCM 2019

Subjects

:

Machine Learning (cs.LG)

; Distributed, Parallel, and Cluster Computing (cs.DC)

Decision trees are machine learning models commonly used in various

application scenarios. In the era of big data, traditional decision tree

induction algorithms are not suitable for learning large-scale datasets due to

their stringent data storage requirement. Online decision tree learning

algorithms have been devised to tackle this problem by concurrently training

with incoming samples and providing inference results. However, even the most

up-to-date online tree learning algorithms still suffer from either high memory

usage or high computational intensity with dependency and long latency, making

them challenging to implement in hardware. To overcome these difficulties, we

introduce a new quantile-based algorithm to improve the induction of the

Hoeffding tree, one of the state-of-the-art online learning models. The

proposed algorithm is light-weight in terms of both memory and computational

demand, while still maintaining high generalization ability. A series of

optimization techniques dedicated to the proposed algorithm have been

investigated from the hardware perspective, including coarse-grained and

fine-grained parallelism, dynamic and memory-based resource sharing, pipelining

with data forwarding. We further present a high-performance, hardware-efficient

and scalable online decision tree learning system on a field-programmable gate

array (FPGA) with system-level optimization techniques. Experimental results

show that our proposed algorithm outperforms the state-of-the-art Hoeffding

tree learning method, leading to 0.05% to 12.3% improvement in inference

accuracy. Real implementation of the complete learning system on the FPGA

demonstrates a 384x to 1581x speedup in execution time over the

state-of-the-art design.

It's Hard for Neural Networks To Learn the Game of Life

Jacob M. Springer , Garrett T. Kenyon

Comments: 12 pages, 6 figures

Subjects

:

Machine Learning (cs.LG)

; Machine Learning (stat.ML)

Efforts to improve the learning abilities of neural networks have focused

mostly on the role of optimization methods rather than on weight

initializations. Recent findings, however, suggest that neural networks rely on

lucky random initial weights of subnetworks called “lottery tickets” that

converge quickly to a solution. To investigate how weight initializations

affect performance, we examine small convolutional networks that are trained to

predict n steps of the two-dimensional cellular automaton Conway’s Game of

Life, the update rules of which can be implemented efficiently in a 2n+1 layer

convolutional network. We find that networks of this architecture trained on

this task rarely converge. Rather, networks require substantially more

parameters to consistently converge. In addition, near-minimal architectures

are sensitive to tiny changes in parameters: changing the sign of a single

weight can cause the network to fail to learn. Finally, we observe a critical

value d_0 such that training minimal networks with examples in which cells are

alive with probability d_0 dramatically increases the chance of convergence to

a solution. We conclude that training convolutional neural networks to learn

the input/output function represented by n steps of Game of Life exhibits many

characteristics predicted by the lottery ticket hypothesis, namely, that the

size of the networks required to learn this function are often significantly

larger than the minimal network required to implement the function.

A Partial Regularization Method for Network Compression

E Zhenqian , Gao Weiguo

Comments: arXiv admin note: substantial text overlap with arXiv:1912.05078

Subjects

:

Machine Learning (cs.LG)

; Machine Learning (stat.ML)

Deep Neural Networks have achieved remarkable success relying on the

developing availability of GPUs and large-scale datasets with increasing

network depth and width. However, due to the expensive computation and

intensive memory, researchers have concentrated on designing compression

methods in order to make them practical for constrained platforms. In this

paper, we propose an approach of partial regularization rather than the

original form of penalizing all parameters, which is said to be full

regularization, to conduct model compression at a higher speed. It is

reasonable and feasible according to the existence of the permutation invariant

property of neural networks. Experimental results show that as we expected, the

computational complexity is reduced by observing less running time in almost

all situations. It should be owing to the fact that partial regularization

method invovles a lower number of elements for calculation. Surprisingly, it

helps to improve some important metrics such as regression fitting results and

classification accuracy in both training and test phases on multiple datasets,

telling us that the pruned models have better performance and generalization

ability. What’s more, we analyze the results and draw a conclusion that an

optimal network structure must exist and depend on the input data.

A Heaviside Function Approximation for Neural Network Binary Classification

Nathan Tsoi , Yofti Milkessa , Marynel Vázquez Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

Neural network binary classifiers are often evaluated on metrics like

accuracy and (F_1)-Score, which are based on confusion matrix values (True

Positives, False Positives, False Negatives, and True Negatives). However,

these classifiers are commonly trained with a different loss, e.g. log loss.

While it is preferable to perform training on the same loss as the evaluation

metric, this is difficult in the case of confusion matrix based metrics because

set membership is a step function without a derivative useful for

backpropagation. To address this challenge, we propose an approximation of the

step function that adheres to the properties necessary for effective training

of binary networks using confusion matrix based metrics. This approach allows

for end-to-end training of binary deep neural classifiers via batch gradient

descent. We demonstrate the flexibility of this approach in several

applications with varying levels of class imbalance. We also demonstrate how

the approximation allows balancing between precision and recall in the

appropriate ratio for the task at hand.

All Data Inclusive, Deep Learning Models to Predict Critical Events in the Medical Information Mart for Intensive Care III Database (MIMIC III)

Anubhav Reddy Nallabasannagari , Madhu Reddiboina , Ryan Seltzer , Trevor Zeffiro , Ajay Sharma , Mahendra Bhandari

Comments: 18 pages, 9 Figures

Subjects

:

Machine Learning (cs.LG)

Intensive care clinicians need reliable clinical practice tools to preempt

unexpected critical events that might harm their patients in intensive care

units (ICU), to pre-plan timely interventions, and to keep the patient’s family

well informed. The conventional statistical models are built by curating only a

limited number of key variables, which means a vast unknown amount of

potentially precious data remains unused. Deep learning models (DLMs) can be

leveraged to learn from large complex datasets and construct predictive

clinical tools. This retrospective study was performed using 42,818 hospital

admissions involving 35,348 patients, which is a subset of the MIMIC-III

dataset. Natural language processing (NLP) techniques were applied to build

DLMs to predict in-hospital mortality (IHM) and length of stay >=7 days (LOS).

Over 75 million events across multiple data sources were processed, resulting

in over 355 million tokens. DLMs for predicting IHM using data from all sources

(AS) and chart data (CS) achieved an AUC-ROC of 0.9178 and 0.9029,

respectively, and PR-AUC of 0.6251 and 0.5701, respectively. DLMs for

predicting LOS using AS and CS achieved an AUC-ROC of 0.8806 and 0.8642,

respectively, and PR-AUC of 0.6821 and 0.6575, respectively. The observed

AUC-ROC difference between models was found to be significant for both IHM and

LOS at p=0.05. The observed PR-AUC difference between the models was found to

be significant for IHM and statistically insignificant for LOS at p=0.05. In

this study, deep learning models were constructed using data combined from a

variety of sources in Electronic Health Records (EHRs) such as chart data,

input and output events, laboratory values, microbiology events, procedures,

notes, and prescriptions. It is possible to predict in-hospital mortality with

much better confidence and higher reliability from models built using all

sources of data.

Change Point Detection by Cross-Entropy Maximization

Aurélien Serre , Didier Chételat , Andrea Lodi

Comments: Preprint

Subjects

:

Machine Learning (cs.LG)

; Signal Processing (eess.SP); Machine Learning (stat.ML)

Many offline unsupervised change point detection algorithms rely on

minimizing a penalized sum of segment-wise costs. We extend this framework by

proposing to minimize a sum of discrepancies between segments. In particular,

we propose to select the change points so as to maximize the cross-entropy

between successive segments, balanced by a penalty for introducing new change

points. We propose a dynamic programming algorithm to solve this problem and

analyze its complexity. Experiments on two challenging datasets demonstrate the

advantages of our method compared to three state-of-the-art approaches.

An Internal Cluster Validity Index Based on Distance-based Separability Measure

Shuyue Guan , Murray Loew

Comments: 8 pages, 4 figures. Accepted by ICTAI 2020

Subjects

:

Machine Learning (cs.LG)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

To evaluate clustering results is a significant part in cluster analysis.

Usually, there is no true class labels for clustering as a typical unsupervised

learning. Thus, a number of internal evaluations, which use predicted labels

and data, have been created. They also named internal cluster validity indices

(CVIs). Without true labels, to design an effective CVI is not simple because

it is similar to create a clustering method. And, to have more CVIs is crucial

because there is no universal CVI that can be used to measure all datasets, and

no specific method for selecting a proper CVI for clusters without true labels.

Therefore, to apply more CVIs to evaluate clustering results is necessary. In

this paper, we propose a novel CVI – called Distance-based Separability Index

(DSI), based on a data separability measure. We applied the DSI and eight other

internal CVIs including early studies from Dunn (1974) to most recent studies

CVDD (2019) as comparison. We used an external CVI as ground truth for

clustering results of five clustering algorithms on 12 real and 97 synthetic

datasets. Results show DSI is an effective, unique, and competitive CVI to

other compared CVIs. In addition, we summarized the general process to evaluate

CVIs and created a new method – rank difference – to compare the results of

CVIs.

Understanding the wiring evolution in differentiable neural architecture search

Sirui Xie , Shoukang Hu , Xinjiang Wang , Chunxiao Liu , Jianping Shi , Xunying Liu , Dahua Lin Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)

Controversy exists on whether differentiable neural architecture search

methods discover wiring topology effectively. To understand how wiring topology

evolves, we study the underlying mechanism of several existing differentiable

NAS frameworks. Our investigation is motivated by three observed searching

patterns of differentiable NAS: 1) they search by growing instead of pruning;

2) wider networks are more preferred than deeper ones; 3) no edges are selected

in bi-level optimization. To anatomize these phenomena, we propose a unified

view on searching algorithms of existing frameworks, transferring the global

optimization to local cost minimization. Based on this reformulation, we

conduct empirical and theoretical analyses, revealing implicit inductive biases

in the cost’s assignment mechanism and evolution dynamics that cause the

observed phenomena. These biases indicate strong discrimination towards certain

topologies. To this end, we pose questions that future differentiable methods

for neural wiring discovery need to confront, hoping to evoke a discussion and

rethinking on how much bias has been enforced implicitly in existing NAS

methods.

Knowing What to Listen to: Early Attention for Deep Speech Representation Learning

Amirhossein Hajavi , Ali Etemad Subjects : Audio and Speech Processing (eess.AS) ; Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)

Deep learning techniques have considerably improved speech processing in

recent years. Speech representations extracted by deep learning models are

being used in a wide range of tasks such as speech recognition, speaker

recognition, and speech emotion recognition. Attention models play an important

role in improving deep learning models. However current attention mechanisms

are unable to attend to fine-grained information items. In this paper we

propose the novel Fine-grained Early Frequency Attention (FEFA) for speech

signals. This model is capable of focusing on information items as small as

frequency bins. We evaluate the proposed model on two popular tasks of speaker

recognition and speech emotion recognition. Two widely used public datasets,

VoxCeleb and IEMOCAP, are used for our experiments. The model is implemented on

top of several prominent deep models as backbone networks to evaluate its

impact on performance compared to the original networks and other related work.

Our experiments show that by adding FEFA to different CNN architectures,

performance is consistently improved by substantial margins, even setting a new

state-of-the-art for the speaker recognition task. We also tested our model

against different levels of added noise showing improvements in robustness and

less sensitivity compared to the backbone networks.

CNN-Based Ultrasound Image Reconstruction for Ultrafast Displacement Tracking

Dimitris Perdios , Manuel Vonlanthen , Florian Martinez , Marcel Arditi , Jean-Philippe Thiran

Comments: Main text: 10 pages (3 figures). Animation and slideshow of figure 3 are provided as ancillary files. This work has been submitted to the IEEE Transactions on Medical Imaging for possible publication

Subjects

:

Image and Video Processing (eess.IV)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Thanks to its capability of acquiring full-view frames at multiple kilohertz,

ultrafast ultrasound imaging unlocked the analysis of rapidly changing physical

phenomena in the human body, with pioneering applications such as

ultrasensitive flow imaging in the cardiovascular system or shear-wave

elastography. The accuracy achievable with these motion estimation techniques

is strongly contingent upon two contradictory requirements: a high quality of

consecutive frames and a high frame rate. Indeed, the image quality can usually

be improved by increasing the number of steered ultrafast acquisitions, but at

the expense of a reduced frame rate and possible motion artifacts. To achieve

accurate motion estimation at uncompromised frame rates and immune to motion

artifacts, the proposed approach relies on single ultrafast acquisitions to

reconstruct high-quality frames and on only two consecutive frames to obtain

2-D displacement estimates. To this end, we deployed a convolutional neural

network-based image reconstruction method combined with a speckle tracking

algorithm based on cross-correlation. Numerical and in vivo experiments,

conducted in the context of plane-wave imaging, demonstrate that the proposed

approach is capable of estimating displacements in regions where the presence

of side lobe and grating lobe artifacts prevents any displacement estimation

with a state-of-the-art technique that rely on conventional delay-and-sum

beamforming. The proposed approach may therefore unlock the full potential of

ultrafast ultrasound, in applications such as ultrasensitive cardiovascular

motion and flow analysis or shear-wave elastography.

Sparse Meta Networks for Sequential Adaptation and its Application to Adaptive Language Modelling

Tsendsuren Munkhdalai

Comments: 9 pages, 4 figures, 2 tables

Subjects

:

Neural and Evolutionary Computing (cs.NE)

; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)

Training a deep neural network requires a large amount of single-task data

and involves a long time-consuming optimization phase. This is not scalable to

complex, realistic environments with new unexpected changes. Humans can perform

fast incremental learning on the fly and memory systems in the brain play a

critical role. We introduce Sparse Meta Networks — a meta-learning approach to

learn online sequential adaptation algorithms for deep neural networks, by

using deep neural networks. We augment a deep neural network with a

layer-specific fast-weight memory. The fast-weights are generated sparsely at

each time step and accumulated incrementally through time providing a useful

inductive bias for online continual adaptation. We demonstrate strong

performance on a variety of sequential adaptation scenarios, from a simple

online reinforcement learning to a large scale adaptive language modelling.

Ramifications of Approximate Posterior Inference for Bayesian Deep Learning in Adversarial and Out-of-Distribution Settings

John Mitros , Arjun Pakrashi , Brian Mac Namee

Comments: ARRW@ECCV2020

Subjects

:

Machine Learning (stat.ML)

; Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)

Deep neural networks have been successful in diverse discriminative

classification tasks, although, they are poorly calibrated often assigning high

probability to misclassified predictions. Potential consequences could lead to

trustworthiness and accountability of the models when deployed in real

applications, where predictions are evaluated based on their confidence scores.

Existing solutions suggest the benefits attained by combining deep neural

networks and Bayesian inference to quantify uncertainty over the models’

predictions for ambiguous datapoints. In this work we propose to validate and

test the efficacy of likelihood based models in the task of out of distribution

detection (OoD). Across different datasets and metrics we show that Bayesian

deep learning models on certain occasions marginally outperform conventional

neural networks and in the event of minimal overlap between in/out distribution

classes, even the best models exhibit a reduction in AUC scores in detecting

OoD data. Preliminary investigations indicate the potential inherent role of

bias due to choices of initialisation, architecture or activation functions. We

hypothesise that the sensitivity of neural networks to unseen inputs could be a

multi-factor phenomenon arising from the different architectural design choices

often amplified by the curse of dimensionality. Furthermore, we perform a study

to find the effect of the adversarial noise resistance methods on in and

out-of-distribution performance, as well as, also investigate adversarial noise

robustness of Bayesian deep learners.

Action and Perception as Divergence Minimization

Danijar Hafner , Pedro A. Ortega , Jimmy Ba , Thomas Parr , Karl Friston , Nicolas Heess

Comments: 13 pages, 10 figures

Subjects

:

Artificial Intelligence (cs.AI)

; Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)

We introduce a unified objective for action and perception of intelligent

agents. Extending representation learning and control, we minimize the joint

divergence between the world and a target distribution. Intuitively, such

agents use perception to align their beliefs with the world, and use actions to

align the world with their beliefs. Minimizing the joint divergence to an

expressive target maximizes the mutual information between the agent’s

representations and inputs, thus inferring representations that are informative

of past inputs and exploring future inputs that are informative of the

representations. This lets us derive intrinsic objectives, such as

representation learning, information gain, empowerment, and skill discovery

from minimal assumptions. Moreover, interpreting the target distribution as a

latent variable model suggests expressive world models as a path toward highly

adaptive agents that seek large niches in their environments, while rendering

task rewards optional. The presented framework provides a common language for

comparing a wide range of objectives, facilitates understanding of latent

variables for decision making, and offers a recipe for designing novel

objectives. We recommend deriving future agent objectives from the joint

divergence to facilitate comparison, to point out the agent’s target

distribution, and to identify the intrinsic objective terms needed to reach

that distribution.

Private Weighted Random Walk Stochastic Gradient Descent

Ghadir Ayache , Salim El Rouayheb Subjects : Information Theory (cs.IT) ; Machine Learning (cs.LG)

We consider a decentralized learning setting in which data is distributed

over nodes in a graph. The goal is to learn a global model on the distributed

data without involving any central entity that needs to be trusted. While

gossip-based stochastic gradient descent (SGD) can be used to achieve this

learning objective, it incurs high communication and computation costs, since

it has to wait for all the local models at all the nodes to converge. To speed

up the convergence, we propose instead to study random walk based SGD in which

a global model is updated based on a random walk on the graph. We propose two

algorithms based on two types of random walks that achieve, in a decentralized

way, uniform sampling and importance sampling of the data. We provide a

non-asymptotic analysis on the rate of convergence, taking into account the

constants related to the data and the graph. Our numerical results show that

the weighted random walk based algorithm has a better performance for

high-variance data. Moreover, we propose a privacy-preserving random walk

algorithm that achieves local differential privacy based on a Gamma noise

mechanism that we propose. We also give numerical results on the convergence of

this algorithm and show that it outperforms additive Laplace-based privacy

mechanisms.

Computational Analysis of Deformable Manifolds: from Geometric Modelling to Deep Learning

Stefan C Schonsheck

Comments: PhD Thesis, Versions of several chapters have previously appeard or been submitted under different titles

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Machine Learning (cs.LG); Numerical Analysis (math.NA)

Leo Tolstoy opened his monumental novel Anna Karenina with the now famous

words: Happy families are all alike; every unhappy family is unhappy in its own

way A similar notion also applies to mathematical spaces: Every flat space is

alike; every unflat space is unflat in its own way. However, rather than being

a source of unhappiness, we will show that the diversity of non-flat spaces

provides a rich area of study. The genesis of the so-called big data era and

the proliferation of social and scientific databases of increasing size has led

to a need for algorithms that can efficiently process, analyze and, even

generate high dimensional data. However, the curse of dimensionality leads to

the fact that many classical approaches do not scale well with respect to the

size of these problems. One technique to avoid some of these ill-effects is to

exploit the geometric structure of coherent data. In this thesis, we will

explore geometric methods for shape processing and data analysis. More

specifically, we will study techniques for representing manifolds and signals

supported on them through a variety of mathematical tools including, but not

limited to, computational differential geometry, variational PDE modeling, and

deep learning. First, we will explore non-isometric shape matching through

variational modeling. Next, we will use ideas from parallel transport on

manifolds to generalize convolution and convolutional neural networks to

deformable manifolds. Finally, we conclude by proposing a novel auto-regressive

model for capturing the intrinsic geometry and topology of data. Throughout

this work, we will use the idea of computing correspondences as a though-line

to both motivate our work and analyze our results.

Quantum Long Short-Term Memory

Samuel Yen-Chi Chen , Shinjae Yoo , Yao-Lung L. Fang Subjects : Quantum Physics (quant-ph) ; Machine Learning (cs.LG)

Long short-term memory (LSTM) is a kind of recurrent neural networks (RNN)

for sequence and temporal dependency data modeling and its effectiveness has

been extensively established. In this work, we propose a hybrid

quantum-classical model of LSTM, which we dub QLSTM. We demonstrate that the

proposed model successfully learns several kinds of temporal data. In

particular, we show that for certain testing cases, this quantum version of

LSTM converges faster, or equivalently, reaches a better accuracy, than its

classical counterpart. Due to the variational nature of our approach, the

requirements on qubit counts and circuit depth are eased, and our work thus

paves the way toward implementing machine learning algorithms for sequence

modeling on noisy intermediate-scale quantum (NISQ) devices.

HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis

Jiawei Chen , Xu Tan , Jian Luan , Tao Qin , Tie-Yan Liu Subjects : Audio and Speech Processing (eess.AS) ; Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)

High-fidelity singing voices usually require higher sampling rate (e.g.,

48kHz) to convey expression and emotion. However, higher sampling rate causes

the wider frequency band and longer waveform sequences and throws challenges

for singing voice synthesis (SVS) in both frequency and time domains.

Conventional SVS systems that adopt small sampling rate cannot well address the

above challenges. In this paper, we develop HiFiSinger, an SVS system towards

high-fidelity singing voice. HiFiSinger consists of a FastSpeech based acoustic

model and a Parallel WaveGAN based vocoder to ensure fast training and

inference and also high voice quality. To tackle the difficulty of singing

modeling caused by high sampling rate (wider frequency band and longer

waveform), we introduce multi-scale adversarial training in both the acoustic

model and vocoder to improve singing modeling. Specifically, 1) To handle the

larger range of frequencies caused by higher sampling rate, we propose a novel

sub-frequency GAN (SF-GAN) on mel-spectrogram generation, which splits the full

80-dimensional mel-frequency into multiple sub-bands and models each sub-band

with a separate discriminator. 2) To model longer waveform sequences caused by

higher sampling rate, we propose a multi-length GAN (ML-GAN) for waveform

generation to model different lengths of waveform sequences with separate

discriminators. 3) We also introduce several additional designs and findings in

HiFiSinger that are crucial for high-fidelity voices, such as adding F0 (pitch)

and V/UV (voiced/unvoiced flag) as acoustic features, choosing an appropriate

window/hop size for mel-spectrogram, and increasing the receptive field in

vocoder for long vowel modeling. Experiment results show that HiFiSinger

synthesizes high-fidelity singing voices with much higher quality: 0.32/0.44

MOS gain over 48kHz/24kHz baseline and 0.83 MOS gain over previous SVS systems.

Distributed Online Optimization via Gradient Tracking with Adaptive Momentum

Guido Carnevale , Francesco Farina , Ivano Notarnicola , Giuseppe Notarstefano Subjects : Optimization and Control (math.OC) ; Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

This paper deals with a network of computing agents aiming to solve an online

optimization problem in a distributed fashion, i.e., by means of local

computation and communication, without any central coordinator. We propose the

gradient tracking with adaptive momentum estimation (GTAdam) distributed

algorithm, which combines a gradient tracking mechanism with first and second

order momentum estimates of the gradient. The algorithm is analyzed in the

online setting for strongly convex and smooth cost functions. We prove that the

average dynamic regret is bounded and that the convergence rate is linear. The

algorithm is tested on a time-varying classification problem, on a (moving)

target localization problem and in a stochastic optimization setup from image

classification. In these numerical experiments from multi-agent learning,

GTAdam outperforms state-of-the-art distributed optimization methods.

Online Community Detection for Event Streams on Networks

Guanhua Fang , Owen G. Ward , Tian Zheng

Comments: 38 pages

Subjects

:

Social and Information Networks (cs.SI)

; Machine Learning (cs.LG); Machine Learning (stat.ML)

A common goal in network modeling is to uncover the latent community

structure present among nodes. For many real-world networks, observed

connections consist of events arriving as streams, which are then aggregated to

form edges, ignoring the temporal dynamic component. A natural way to take

account of this temporal dynamic component of interactions is to use point

processes as the foundation of the network models for community detection.

Computational complexity hampers the scalability of such approaches to large

sparse networks. To circumvent this challenge, we propose a fast online

variational inference algorithm for learning the community structure underlying

dynamic event arrivals on a network using continuous-time point process latent

network models. We provide regret bounds on the loss function of this

procedure, giving theoretical guarantees on performance. The proposed algorithm

is illustrated, using both simulation studies and real data, to have comparable

performance in terms of community structure in terms of community recovery to

non-online variants. Our proposed framework can also be readily modified to

incorporate other popular network structures.

Bayesian Perceptron: Towards fully Bayesian Neural Networks

Marco F. Huber

Comments: Accepted for publication at the 59th IEEE Conference on Decision and Control (CDC) 2020

Subjects

:

Machine Learning (stat.ML)

; Machine Learning (cs.LG)

Artificial neural networks (NNs) have become the de facto standard in machine

learning. They allow learning highly nonlinear transformations in a plethora of

applications. However, NNs usually only provide point estimates without

systematically quantifying corresponding uncertainties. In this paper a novel

approach towards fully Bayesian NNs is proposed, where training and predictions

of a perceptron are performed within the Bayesian inference framework in

closed-form. The weights and the predictions of the perceptron are considered

Gaussian random variables. Analytical expressions for predicting the

perceptron’s output and for learning the weights are provided for commonly used

activation functions like sigmoid or ReLU. This approach requires no

computationally expensive gradient calculations and further allows sequential

learning.

On the study of the Beran estimator for generalized censoring indicators

Mikael Escobar-Bach , Olivier Goudet Subjects : Machine Learning (stat.ML) ; Machine Learning (cs.LG)

Along with the analysis of time-to-event data, it is common to assume that

only partial information is given at hand. In the presence of right-censored

data with covariates, the conditional Kaplan-Meier estimator (also referred as

the Beran estimator) is known to propose a consistent estimate for the

lifetimes conditional survival function. However, a necessary condition is the

clear knowledge of whether each individual is censored or not, although, this

information might be incomplete or even totally absent in practice. We thus

propose a study on the Beran estimator when the censoring indicator is not

clearly specified. From this, we provide a new estimator for the conditional

survival function and establish its asymptotic normality under mild conditions.

We further study the supervised learning problem where the conditional survival

function is to be predicted with no censorship indicators. To this aim, we

investigate various approaches estimating the conditional expectation for the

censoring indicator. Along with the theoretical results, we illustrate how the

estimators work for small samples by means of a simulation study and show their

practical applicability with the analysis of synthetic data and the study of

real data for the prognosis of monoclonal gammopathy.

Simulation of an Elevator Group Control Using Generative Adversarial Networks and Related AI Tools

Tom Peetz , Sebastian Vogt , Martin Zaefferer , Thomas Bartz-Beielstein Subjects : Machine Learning (stat.ML) ; Machine Learning (cs.LG)

Testing new, innovative technologies is a crucial task for safety and

acceptance. But how can new systems be tested if no historical real-world data

exist? Simulation provides an answer to this important question. Classical

simulation tools such as event-based simulation are well accepted. But most of

these established simulation models require the specification of many

parameters. Furthermore, simulation runs, e.g., CFD simulations, are very time

consuming. Generative Adversarial Networks (GANs) are powerful tools for

generating new data for a variety of tasks. Currently, their most frequent

application domain is image generation. This article investigates the

applicability of GANs for imitating simulations. We are comparing the

simulation output of a technical system with the output of a GAN. To exemplify

this approach, a well-known multi-car elevator system simulator was chosen. Our

study demonstrates the feasibility of this approach. It also discusses pitfalls

and technical problems that occurred during the implementation. Although we

were able to show that in principle, GANs can be used as substitutes for

expensive simulation runs, we also show that they cannot be used “out of the

box”. Fine tuning is needed. We present a proof-of-concept, which can serve as

a starting point for further research.

Quasi-symplectic Langevin Variational Autoencoder

Zihao Wang , Hervé Delingette Subjects : Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Variational autoencoder (VAE) as one of the well investigated generative

model is very popular in nowadays neural learning research works. To leverage

VAE in practical tasks which have high dimensions and huge dataset often face

the problem of low variance evidence lower bounds construction. Markov chain

Monte Carlo (MCMC) is an effective approach to tight the evidence lower bound

(ELBO) for approximating the posterior distribution. Hamiltonian Variational

Autoencoder (HVAE) is one of the effective MCMC inspired approaches for

constructing the unbiased low-variance ELBO which is also amenable for

reparameterization trick. The solution significantly improves the performance

of the posterior estimation effectiveness, yet, a main drawback of HVAE is the

leapfrog method need to access the posterior gradient twice which leads to bad

inference efficiency performance and the GPU memory requirement is fair large.

This flaw limited the application of Hamiltonian based inference framework for

large scale networks inference. To tackle this problem, we propose a

Quasi-symplectic Langevin Variational autoencoder (Langevin-VAE), which can be

a significant improvement over resource usage efficiency. We qualitatively and

quantitatively demonstrate the effectiveness of the Langevin-VAE compared to

the state-of-art gradients informed inference framework.

Learning Unknown Physics of non-Newtonian Fluids

Brandon Reyes , Amanda A. Howard , Paris Perdikaris , Alexandre M. Tartakovsky Subjects : Computational Physics (physics.comp-ph) ; Machine Learning (cs.LG); Fluid Dynamics (physics.flu-dyn)

We extend the physics-informed neural network (PINN) method to learn

viscosity models of two non-Newtonian systems (polymer melts and suspensions of

particles) using only velocity measurements. The PINN-inferred viscosity models

agree with the empirical models for shear rates with large absolute values but

deviate for shear rates near zero where the analytical models have an

unphysical singularity. Once a viscosity model is learned, we use the PINN

method to solve the momentum conservation equation for non-Newtonian fluid flow

using only the boundary conditions.

A free web service for fast COVID-19 classification of chest X-Ray images

Jose David Bermudez Castro , Ricardo Rei , Jose E. Ruiz , Pedro Achanccaray Diaz , Smith Arauco Canchumuni , Cristian Muñoz Villalobos , Felipe Borges Coelho , Leonardo Forero Mendoza , Marco Aurelio C. Pacheco

Comments: 14 pages, 12 figures

Subjects

:

Image and Video Processing (eess.IV)

; Machine Learning (cs.LG)

The coronavirus outbreak became a major concern for society worldwide.

Technological innovation and ingenuity are essential to fight COVID-19 pandemic

and bring us one step closer to overcome it. Researchers over the world are

working actively to find available alternatives in different fields, such as

the Healthcare System, pharmaceutic, health prevention, among others. With the

rise of artificial intelligence (AI) in the last 10 years, IA-based

applications have become the prevalent solution in different areas because of

its higher capability, being now adopted to help combat against COVID-19. This

work provides a fast detection system of COVID-19 characteristics in X-Ray

images based on deep learning (DL) techniques. This system is available as a

free web deployed service for fast patient classification, alleviating the high

demand for standards method for COVID-19 diagnosis. It is constituted of two

deep learning models, one to differentiate between X-Ray and non-X-Ray images

based on Mobile-Net architecture, and another one to identify chest X-Ray

images with characteristics of COVID-19 based on the DenseNet architecture. For

real-time inference, it is provided a pair of dedicated GPUs, which reduce the

computational time. The whole system can filter out non-chest X-Ray images, and

detect whether the X-Ray presents characteristics of COVID-19, highlighting the

most sensitive regions.

SRQA: Synthetic Reader for Factoid Question Answering

Jiuniu Wang , Wenjia Xu , Xingyu Fu , Yang Wei , Li Jin , Ziyan Chen , Guangluan Xu , Yirong Wu

Comments: arXiv admin note: text overlap with arXiv:1809.00676

Journal-ref: Knowledge-Based Systems, Volume 193, 6 April 2020, 105415

Subjects

:

Computation and Language (cs.CL)

; Machine Learning (cs.LG)

The question answering system can answer questions from various fields and

forms with deep neural networks, but it still lacks effective ways when facing

multiple evidences. We introduce a new model called SRQA, which means Synthetic

Reader for Factoid Question Answering. This model enhances the question

answering system in the multi-document scenario from three aspects: model

structure, optimization goal, and training method, corresponding to Multilayer

Attention (MA), Cross Evidence (CE), and Adversarial Training (AT)

respectively. First, we propose a multilayer attention network to obtain a

better representation of the evidences. The multilayer attention mechanism

conducts interaction between the question and the passage within each layer,

making the token representation of evidences in each layer takes the

requirement of the question into account. Second, we design a cross evidence

strategy to choose the answer span within more evidences. We improve the

optimization goal, considering all the answers’ locations in multiple evidences

as training targets, which leads the model to reason among multiple evidences.

Third, adversarial training is employed to high-level variables besides the

word embedding in our model. A new normalization method is also proposed for

adversarial perturbations so that we can jointly add perturbations to several

target variables. As an effective regularization method, adversarial training

enhances the model’s ability to process noisy data. Combining these three

strategies, we enhance the contextual representation and locating ability of

our model, which could synthetically extract the answer span from several

evidences. We perform SRQA on the WebQA dataset, and experiments show that our

model outperforms the state-of-the-art models (the best fuzzy score of our

model is up to 78.56%, with an improvement of about 2%).

Multimodal brain tumor classification

Marvin Lerousseau , Eric Deutsh , Nikos Paragios Subjects : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Cancer is a complex disease that provides various types of information

depending on the scale of observation. While most tumor diagnostics are

performed by observing histopathological slides, radiology images should yield

additional knowledge towards the efficacy of cancer diagnostics. This work

investigates a deep learning method combining whole slide images and magnetic

resonance images to classify tumors. Experiments are prospectively conducted on

the 2020 Computational Precision Medicine challenge, in a 3-classes unbalanced

classification task. We report cross-validation (resp. validation)

balanced-accuracy, kappa and f1 of 0.913, 0.897 and 0.951 (resp. 0.91, 0.90 and

0.94). The complete code of the method is open-source at XXXX. Those include

histopathological data pre-processing, and can therefore be used off-the-shelf

for other histopathological and/or radiological classification.

Large Dimensional Analysis and Improvement of Multi Task Learning

Malik Tiomoko , Romain Couillet , Hafiz Tiomoko Subjects : Machine Learning (stat.ML) ; Machine Learning (cs.LG)

Multi Task Learning (MTL) efficiently leverages useful information contained

in multiple related tasks to help improve the generalization performance of all

tasks. This article conducts a large dimensional analysis of a simple but, as

we shall see, extremely powerful when carefully tuned, Least Square Support

Vector Machine (LSSVM) version of MTL, in the regime where the dimension (p) of

the data and their number (n) grow large at the same rate.

Under mild assumptions on the input data, the theoretical analysis of the

MTL-LSSVM algorithm first reveals the “sufficient statistics” exploited by the

algorithm and their interaction at work. These results demonstrate, as a

striking consequence, that the standard approach to MTL-LSSVM is largely

suboptimal, can lead to severe effects of negative transfer but that these

impairments are easily corrected. These corrections are turned into an improved

MTL-LSSVM algorithm which can only benefit from additional data, and the

theoretical performance of which is also analyzed.

As evidenced and theoretically sustained in numerous recent works, these

large dimensional results are robust to broad ranges of data distributions,

which our present experiments corroborate. Specifically, the article reports a

systematically close behavior between theoretical and empirical performances on

popular datasets, which is strongly suggestive of the applicability of the

proposed carefully tuned MTL-LSSVM method to real data. This fine-tuning is

fully based on the theoretical analysis and does not in particular require any

cross validation procedure. Besides, the reported performances on real datasets

almost systematically outperform much more elaborate and less intuitive

state-of-the-art multi-task and transfer learning methods.

Auto-Classifier: A Robust Defect Detector Based on an AutoML Head

Vasco Lopes , Luís A. Alexandre

Comments: 12 pages, 2 figures. Published in ICONIP2020, proceedings published in the Springer’s series of Lecture Notes in Computer Science

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

The dominant approach for surface defect detection is the use of hand-crafted

feature-based methods. However, this falls short when conditions vary that

affect extracted images. So, in this paper, we sought to determine how well

several state-of-the-art Convolutional Neural Networks perform in the task of

surface defect detection. Moreover, we propose two methods: CNN-Fusion, that

fuses the prediction of all the networks into a final one, and Auto-Classifier,

which is a novel proposal that improves a Convolutional Neural Network by

modifying its classification component using AutoML. We carried out experiments

to evaluate the proposed methods in the task of surface defect detection using

different datasets from DAGM2007. We show that the use of Convolutional Neural

Networks achieves better results than traditional methods, and also, that

Auto-Classifier out-performs all other methods, by achieving 100% accuracy and

100% AUC results throughout all the datasets.

Automated identification of metamorphic test scenarios for an ocean-modeling application

Dilip J. Hiremath , Martin Claus , Wilhelm Hasselbring , Willi Rath

Comments: Shot paper:2 pages, 2020 IEEE International Conference On Artificial Intelligence Testing (AITest)

Subjects

:

Software Engineering (cs.SE)

; Machine Learning (cs.LG)

Metamorphic testing seeks to validate software in the absence of test

oracles. Our application domain is ocean modeling, where test oracles often do

not exist, but where symmetries of the simulated physical systems are known. In

this short paper we present work in progress for automated generation of

metamorphic test scenarios using machine learning. Metamorphic testing may be

expressed as f(g(X))=h(f(X)) with f being the application under test, with

input data X, and with the metamorphic relation (g, h). Automatically generated

metamorphic relations can be used for constructing regression tests, and for

comparing different versions of the same software application. Here, we

restrict to h being the identity map. Then, the task of constructing tests

means finding different g which we tackle using machine learning algorithms.

These algorithms typically minimize a cost function. As one possible g is

already known to be the identity map, for finding a second possible g, we

construct the cost function to minimize for g being a metamorphic relation and

to penalize for g being the identity map. After identifying the first

metamorphic relation, the procedure is repeated with a cost function rewarding

g that are orthogonal to previously found metamorphic relations. For

experimental evaluation, two implementations of an ocean-modeling application

will be subjected to the proposed method with the objective of presenting the

use of metamorphic relations to test the implementations of the applications.

Fairness in the Eyes of the Data: Certifying Machine-Learning Models

Shahar Segal , Yossi Adi , Benny Pinkas , Carsten Baum , Chaya Ganesh , Joseph Keshet Subjects : Artificial Intelligence (cs.AI) ; Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)

We present a framework that allows to certify the fairness degree of a model

based on an interactive and privacy-preserving test. The framework verifies any

trained model, regardless of its training process and architecture. Thus, it

allows us to evaluate any deep learning model on multiple fairness definitions

empirically. We tackle two scenarios, where either the test data is privately

available only to the tester or is publicly known in advance, even to the model

creator. We investigate the soundness of the proposed approach using

theoretical analysis and present statistical guarantees for the interactive

test. Finally, we provide a cryptographic technique to automate fairness

testing and certified inference with only black-box access to the model at hand

while hiding the participants’ sensitive data.

End-to-End Learning of Neuromorphic Wireless Systems for Low-Power Edge Artificial Intelligence

Nicolas Skatchkovsky , Hyeryung Jang , Osvaldo Simeone

Comments: To be presented at Asilomar 2020

Subjects

:

Neural and Evolutionary Computing (cs.NE)

; Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)

This paper introduces a novel “all-spike” low-power solution for remote

wireless inference that is based on neuromorphic sensing, Impulse Radio (IR),

and Spiking Neural Networks (SNNs). In the proposed system, event-driven

neuromorphic sensors produce asynchronous time-encoded data streams that are

encoded by an SNN, whose output spiking signals are pulse modulated via IR and

transmitted over general frequence-selective channels; while the receiver’s

inputs are obtained via hard detection of the received signals and fed to an

SNN for classification. We introduce an end-to-end training procedure that

treats the cascade of encoder, channel, and decoder as a probabilistic

SNN-based autoencoder that implements Joint Source-Channel Coding (JSCC). The

proposed system, termed NeuroJSCC, is compared to conventional synchronous

frame-based and uncoded transmissions in terms of latency and accuracy. The

experiments confirm that the proposed end-to-end neuromorphic edge architecture

provides a promising framework for efficient and low-latency remote sensing,

communication, and inference.

Smoke Testing for Machine Learning: Simple Tests to Discover Severe Defects

Steffen Herbold , Tobias Haar

Comments: under review

Subjects

:

Software Engineering (cs.SE)

; Machine Learning (cs.LG)

Machine learning is nowadays a standard technique for data analysis within

software applications. Software engineers need quality assurance techniques

that are suitable for these new kinds of systems. Within this article, we

discuss the question whether standard software testing techniques that have

been part of textbooks since decades are also useful for the testing of machine

learning software. Concretely, we try to determine generic smoke tests that can

be used to assert that basic functions can be executed without crashing. We

found that we can derive such tests using techniques similar to equivalence

classes and boundary value analysis. Moreover, we found that these concepts can

also be applied to hyperparameters, to further improve the quality of the smoke

tests. Even though our approach is almost trivial, we were able to find bugs in

all three machine learning libraries that we tested and severe bugs in two of

the three libraries. This demonstrates that common software testing techniques

are still valid in the age of machine learning and that they are suitable to

find and prevent severe bugs, even in mature machine learning libraries.

TopoMap: A 0-dimensional Homology Preserving Projection of High-Dimensional Data

Harish Doraiswamy , Julien Tierny , Paulo J. S. Silva , Luis Gustavo Nonato , Claudio Silva Subjects : Graphics (cs.GR) ; Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Multidimensional Projection is a fundamental tool for high-dimensional data

analytics and visualization. With very few exceptions, projection techniques

are designed to map data from a high-dimensional space to a visual space so as

to preserve some dissimilarity (similarity) measure, such as the Euclidean

distance for example. In fact, although adopting distinct mathematical

formulations designed to favor different aspects of the data, most

multidimensional projection methods strive to preserve dissimilarity measures

that encapsulate geometric properties such as distances or the proximity

relation between data objects. However, geometric relations are not the only

interesting property to be preserved in a projection. For instance, the

analysis of particular structures such as clusters and outliers could be more

reliably performed if the mapping process gives some guarantee as to

topological invariants such as connected components and loops. This paper

introduces TopoMap, a novel projection technique which provides topological

guarantees during the mapping process. In particular, the proposed method

performs the mapping from a high-dimensional space to a visual space, while

preserving the 0-dimensional persistence diagram of the Rips filtration of the

high-dimensional data, ensuring that the filtrations generate the same

connected components when applied to the original as well as projected data.

The presented case studies show that the topological guarantee provided by

TopoMap not only brings confidence to the visual analytic process but also can

be used to assist in the assessment of other projection methods.

DRLE: Decentralized Reinforcement Learning at the Edge for Traffic Light Control

Pengyuan Zhou , Xianfu Chen , Zhi Liu , Tristan Braud , Pan Hui , Jussi Kangasharju Subjects : Multiagent Systems (cs.MA) ; Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Systems and Control (eess.SY)

The Internet of Vehicles (IoV) enables real-time data exchange among vehicles

and roadside units and thus provides a promising solution to alleviate traffic

jams in the urban area. Meanwhile, better traffic management via efficient

traffic light control can benefit the IoV as well by enabling a better

communication environment and decreasing the network load. As such, IoV and

efficient traffic light control can formulate a virtuous cycle. Edge computing,

an emerging technology to provide low-latency computation capabilities at the

edge of the network, can further improve the performance of this cycle.

However, while the collected information is valuable, an efficient solution for

better utilization and faster feedback has yet to be developed for

edge-empowered IoV. To this end, we propose a Decentralized Reinforcement

Learning at the Edge for traffic light control in the IoV (DRLE). DRLE exploits

the ubiquity of the IoV to accelerate the collection of traffic data and its

interpretation towards alleviating congestion and providing better traffic

light control. DRLE operates within the coverage of the edge servers and uses

aggregated data from neighboring edge servers to provide city-scale traffic

light control. DRLE decomposes the highly complex problem of large area

control. into a decentralized multi-agent problem. We prove its global optima

with concrete mathematical reasoning. The proposed decentralized reinforcement

learning algorithm running at each edge node adapts the traffic lights in real

time. We conduct extensive evaluations and demonstrate the superiority of this

approach over several state-of-the-art algorithms.

Modeling Global Body Configurations in American Sign Language

Nicholas Wilkins , Beck Cordes Galbraith , Ifeoma Nwogu Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG)

American Sign Language (ASL) is the fourth most commonly used language in the

United States and is the language most commonly used by Deaf people in the

United States and the English-speaking regions of Canada. Unfortunately, until

recently, ASL received little research. This is due, in part, to its delayed

recognition as a language until William C. Stokoe’s publication in 1960.

Limited data has been a long-standing obstacle to ASL research and

computational modeling. The lack of large-scale datasets has prohibited many

modern machine-learning techniques, such as Neural Machine Translation, from

being applied to ASL. In addition, the modality required to capture sign

language (i.e. video) is complex in natural settings (as one must deal with

background noise, motion blur, and the curse of dimensionality). Finally, when

compared with spoken languages, such as English, there has been limited

research conducted into the linguistics of ASL.

We realize a simplified version of Liddell and Johnson’s Movement-Hold (MH)

Model using a Probabilistic Graphical Model (PGM). We trained our model on

ASLing, a dataset collected from three fluent ASL signers. We evaluate our PGM

against other models to determine its ability to model ASL. Finally, we

interpret various aspects of the PGM and draw conclusions about ASL phonetics.

The main contributions of this paper are

Decision Tree Based Hardware Power Monitoring for Run Time Dynamic Power Management in FPGA

Zhe Lin , Wei Zhang , Sharad Sinha

Comments: published as a conference paper in FPL 2017

Subjects

:

Hardware Architecture (cs.AR)

; Machine Learning (cs.LG)

Fine-grained runtime power management techniques could be promising solutions

for power reduction. Therefore, it is essential to establish accurate power

monitoring schemes to obtain dynamic power variation in a short period (i.e.,

tens or hundreds of clock cycles). In this paper, we leverage a

decision-tree-based power modeling approach to establish fine-grained hardware

power monitoring on FPGA platforms. A generic and complete design flow is

developed to implement the decision tree power model which is capable of

precisely estimating dynamic power in a fine-grained manner. A flexible

architecture of the hardware power monitoring is proposed, which can be

instrumented in any RTL design for runtime power estimation, dispensing with

the need for extra power measurement devices. Experimental results of applying

the proposed model to benchmarks with different resource types reveal an

average error up to 4% for dynamic power estimation. Moreover, the overheads of

area, power and performance incurred by the power monitoring circuitry are

extremely low. Finally, we apply our power monitoring technique to the power

management using phase shedding with an on-chip multi-phase regulator as a

proof of concept and the results demonstrate 14% efficiency enhancement for the

power supply of the FPGA internal logic.

An Ensemble Learning Approach for In-situ Monitoring of FPGA Dynamic Power

Zhe Lin , Sharad Sinha , Wei Zhang

Comments: published as a journal (TCAD) paper in 2018

Subjects

:

Hardware Architecture (cs.AR)

; Machine Learning (cs.LG)

As field-programmable gate arrays become prevalent in critical application

domains, their power consumption is of high concern. In this paper, we present

and evaluate a power monitoring scheme capable of accurately estimating the

runtime dynamic power of FPGAs in a fine-grained timescale, in order to support

emerging power management techniques. In particular, we describe a novel and

specialized ensemble model which can be decomposed into multiple customized

decision-tree-based base learners. To aid in model synthesis, a generic

computer-aided design flow is proposed to generate samples, select features,

tune hyperparameters and train the ensemble estimator. Besides this, a hardware

realization of the trained ensemble estimator is presented for on-chip

real-time power estimation. In the experiments, we first show that a single

decision tree model can achieve prediction error within 4.51% of a commercial

gate-level power estimation tool, which is 2.41–6.07x lower than provided by

the commonly used linear model. More importantly, we study the extra gains in

inference accuracy using the proposed ensemble model. Experimental results

reveal that the ensemble monitoring method can further improve the accuracy of

power predictions to within a maximum error of 1.90%. Moreover, the lookup

table (LUT) overhead of the ensemble monitoring hardware employing up to 64

base learners is within 1.22% of the target FPGA, indicating its light-weight

and scalable characteristics.

Accelerating engineering design by automatic selection of simulation cases through Pool-Based Active Learning

José Hugo C. Gaspar Elsas , Nicholas A. G. Casaprima , Ivan F. M. Menezes

Comments: 28 pages, 9 figures

Subjects

:

Computational Engineering, Finance, and Science (cs.CE)

; Machine Learning (cs.LG)

A common workflow for many engineering design problems requires the

evaluation of the design system to be investigated under a range of conditions.

These conditions usually involve a combination of several parameters. To

perform a complete evaluation of a single candidate configuration, it may be

necessary to perform hundreds to thousands of simulations. This can be

computationally very expensive, particularly if several configurations need to

be evaluated, as in the case of the mathematical optimization of a design

problem. Although the simulations are extremely complex, generally, there is a

high degree of redundancy in them, as many of the cases vary only slightly from

one another. This redundancy can be exploited by omitting some simulations that

are uninformative, thereby reducing the number of simulations required to

obtain a reasonable approximation of the complete system. The decision of which

simulations are useful is made through the use of machine learning techniques,

which allow us to estimate the results of “yet-to-be-performed” simulations

from the ones that are already performed. In this study, we present the results

of one such technique, namely active learning, to provide an approximate result

of an entire offshore riser design simulation portfolio from a subset that is

80\% smaller than the original one. These results are expected to facilitate a

significant speed-up in the offshore riser design.

Learning from Protein Structure with Geometric Vector Perceptrons

Bowen Jing , Stephan Eismann , Patricia Suriana , Raphael J.L. Townshend , Ron Dror Subjects : Biomolecules (q-bio.BM) ; Machine Learning (cs.LG); Machine Learning (stat.ML)

Learning on 3D structures of large biomolecules is emerging as a distinct

area in machine learning, but there has yet to emerge a unifying network

architecture that simultaneously leverages the graph-structured and geometric

aspects of the problem domain. To address this gap, we introduce geometric

vector perceptrons, which extend standard dense layers to operate on

collections of Euclidean vectors. Graph neural networks equipped with such

layers are able to perform both geometric and relational reasoning on efficient

and natural representations of macromolecular structure. We demonstrate our

approach on two important problems in learning from protein structure: model

quality assessment and computational protein design. Our approach improves over

existing classes of architectures, including state-of-the-art graph-based and

voxel-based methods.

P6: A Declarative Language for Integrating Machine Learning in Visual Analytics

Jianping Kelvin Li , Kwan-Liu Ma

Comments: Accepted for presentation at IEEE VIS 2020

Subjects

:

Software Engineering (cs.SE)

; Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Programming Languages (cs.PL)

We present P6, a declarative language for building high performance visual

analytics systems through its support for specifying and integrating machine

learning and interactive visualization methods. As data analysis methods based

on machine learning and artificial intelligence continue to advance, a visual

analytics solution can leverage these methods for better exploiting large and

complex data. However, integrating machine learning methods with interactive

visual analysis is challenging. Existing declarative programming libraries and

toolkits for visualization lack support for coupling machine learning methods.

By providing a declarative language for visual analytics, P6 can empower more

developers to create visual analytics applications that combine machine

learning and visualization methods for data analysis and problem solving.

Through a variety of example applications, we demonstrate P6’s capabilities and

show the benefits of using declarative specifications to build visual analytics

systems. We also identify and discuss the research opportunities and challenges

for declarative visual analytics.

Real Image Super Resolution Via Heterogeneous Model using GP-NAS

Zhihong Pan , Baopu Li , Teng Xi , Yanwen Fan , Gang Zhang , Jingtuo Liu , Junyu Han , Errui Ding

Comments: This is a manuscript related to our algorithm that won the ECCV AIM 2020 Real Image Super-Resolution Challenge

Subjects

:

Image and Video Processing (eess.IV)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

With advancement in deep neural network (DNN), recent state-of-the-art (SOTA)

image superresolution (SR) methods have achieved impressive performance using

deep residual network with dense skip connections. While these models perform

well on benchmark dataset where low-resolution (LR) images are constructed from

high-resolution (HR) references with known blur kernel, real image SR is more

challenging when both images in the LR-HR pair are collected from real cameras.

Based on existing dense residual networks, a Gaussian process based neural

architecture search (GP-NAS) scheme is utilized to find candidate network

architectures using a large search space by varying the number of dense

residual blocks, the block size and the number of features. A suite of

heterogeneous models with diverse network structure and hyperparameter are

selected for model-ensemble to achieve outstanding performance in real image

SR. The proposed method won the first place in all three tracks of the AIM 2020

Real Image Super-Resolution Challenge.

Robust Object Classification Approach using Spherical Harmonics

Ayman Mukhaimar , Ruwan Tennakoon , Chow Yin Lai , Reza Hoseinnezhad , Alireza Bab-Hadiashar Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG)

In this paper, we present a robust spherical harmonics approach for the

classification of point cloud-based objects. Spherical harmonics have been used

for classification over the years, with several frameworks existing in the

literature. These approaches use variety of spherical harmonics based

descriptors to classify objects. We first investigated these frameworks

robustness against data augmentation, such as outliers and noise, as it has not

been studied before. Then we propose a spherical convolution neural network

framework for robust object classification. The proposed framework uses the

voxel grid of concentric spheres to learn features over the unit ball. Our

proposed model learn features that are less sensitive to data augmentation due

to the selected sampling strategy and the designed convolution operation. We

tested our proposed model against several types of data augmentation, such as

noise and outliers. Our results show that the proposed model outperforms the

state of art networks in terms of robustness to data augmentation.

Cost-aware Feature Selection for IoT Device Classification

Biswadeep Chakraborty , Dinil Mon Divakaran , Ido Nevat , Gareth W. Peters , Mohan Gurusamy

Comments: 32 Pages, 8 figures

Subjects

:

Networking and Internet Architecture (cs.NI)

; Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Classification of IoT devices into different types is of paramount

importance, from multiple perspectives, including security and privacy aspects.

Recent works have explored machine learning techniques for fingerprinting (or

classifying) IoT devices, with promising results. However, existing works have

assumed that the features used for building the machine learning models are

readily available or can be easily extracted from the network traffic; in other

words, they do not consider the costs associated with feature extraction. In

this work, we take a more realistic approach, and argue that feature extraction

has a cost, and the costs are different for different features. We also take a

step forward from the current practice of considering the misclassification

loss as a binary value, and make a case for different losses based on the

misclassification performance. Thereby, and more importantly, we introduce the

notion of risk for IoT device classification. We define and formulate the

problem of cost-aware IoT device classification. This being a combinatorial

optimization problem, we develop a novel algorithm to solve it in a fast and

effective way using the Cross-Entropy (CE) based stochastic optimization

technique. Using traffic of real devices, we demonstrate the capability of the

CE based algorithm in selecting features with minimal risk of misclassification

while keeping the cost for feature extraction within a specified limit.

Non-parametric generalized linear model

Matthew Dowling , Yuan Zhao , Il Memming Park Subjects : Machine Learning (stat.ML) ; Machine Learning (cs.LG)

A fundamental problem in statistical neuroscience is to model how neurons

encode information by analyzing electrophysiological recordings. A popular and

widely-used approach is to fit the spike trains with an autoregressive point

process model. These models are characterized by a set of convolutional

temporal filters, whose subsequent analysis can help reveal how neurons encode

stimuli, interact with each other, and process information. In practice a

sufficiently rich but small ensemble of temporal basis functions needs to be

chosen to parameterize the filters. However, obtaining a satisfactory fit often

requires burdensome model selection and fine tuning the form of the basis

functions and their temporal span. In this paper we propose a nonparametric

approach for jointly inferring the filters and hyperparameters using the

Gaussian process framework. Our method is computationally efficient taking

advantage of the sparse variational approximation while being flexible and rich

enough to characterize arbitrary filters in continuous time lag. Moreover, our

method automatically learns the temporal span of the filter. For the particular

application in neuroscience, we designed priors for stimulus and history

filters useful for the spike trains. We compare and validate our method on

simulated and real neural spike train data.

Bid Shading in The Brave New World of First-Price Auctions

Djordje Gligorijevic , Tian Zhou , Bharatbhushan Shetty , Brendan Kitts , Shengjun Pan , Junwei Pan , Aaron Flores

Comments: In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM’20), October 19-23, 2020, Virtual Event, Ireland

Subjects

:

Computer Science and Game Theory (cs.GT)

; Machine Learning (cs.LG); Machine Learning (stat.ML)

Online auctions play a central role in online advertising, and are one of the

main reasons for the industry’s scalability and growth. With great changes in

how auctions are being organized, such as changing the second- to first-price

auction type, advertisers and demand platforms are compelled to adapt to a new

volatile environment. Bid shading is a known technique for preventing

overpaying in auction systems that can help maintain the strategy equilibrium

in first-price auctions, tackling one of its greatest drawbacks. In this study,

we propose a machine learning approach of modeling optimal bid shading for

non-censored online first-price ad auctions. We clearly motivate the approach

and extensively evaluate it in both offline and online settings on a major

demand side platform. The results demonstrate the superiority and robustness of

the new approach as compared to the existing approaches across a range of

performance metrics.

Learning to summarize from human feedback

Nisan Stiennon , Long Ouyang , Jeff Wu , Daniel M. Ziegler , Ryan Lowe , Chelsea Voss , Alec Radford , Dario Amodei , Paul Christiano Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

As language models become more powerful, training and evaluation are

increasingly bottlenecked by the data and metrics used for a particular task.

For example, summarization models are often trained to predict human reference

summaries and evaluated using ROUGE, but both of these metrics are rough

proxies for what we really care about—summary quality. In this work, we show

that it is possible to significantly improve summary quality by training a

model to optimize for human preferences. We collect a large, high-quality

dataset of human comparisons between summaries, train a model to predict the

human-preferred summary, and use that model as a reward function to fine-tune a

summarization policy using reinforcement learning. We apply our method to a

version of the TL;DR dataset of Reddit posts and find that our models

significantly outperform both human reference summaries and much larger models

fine-tuned with supervised learning alone. Our models also transfer to CNN/DM

news articles, producing summaries nearly as good as the human reference

without any news-specific fine-tuning. We conduct extensive analyses to

understand our human feedback dataset and fine-tuned models. We establish that

our reward model generalizes to new datasets, and that optimizing our reward

model results in better summaries than optimizing ROUGE according to humans. We

hope the evidence from our paper motivates machine learning researchers to pay

closer attention to how their training loss affects the model behavior they

actually want.

Towards Earnings Call and Stock Price Movement

Zhiqiang Ma , Grace Bang , Chong Wang , Xiaomo Liu

Comments: Accepted by KDD 2020 MLF workshop

Subjects

:

Statistical Finance (q-fin.ST)

; Computational Engineering, Finance, and Science (cs.CE); Computation and Language (cs.CL); Machine Learning (cs.LG)

Earnings calls are hosted by management of public companies to discuss the

company’s financial performance with analysts and investors. Information

disclosed during an earnings call is an essential source of data for analysts

and investors to make investment decisions. Thus, we leverage earnings call

transcripts to predict future stock price dynamics. We propose to model the

language in transcripts using a deep learning framework, where an attention

mechanism is applied to encode the text data into vectors for the

discriminative network classifier to predict stock price movements. Our

empirical experiments show that the proposed model is superior to the

traditional machine learning baselines and earnings call information can boost

the stock price prediction performance.

Convolutional Speech Recognition with Pitch and Voice Quality Features

Guillermo Cámbara , Jordi Luque , Mireia Farrús

Comments: 5 pages

Subjects

:

Audio and Speech Processing (eess.AS)

; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)

The effects of adding pitch and voice quality features such as jitter and

shimmer to a state-of-the-art CNN model for Automatic Speech Recognition are

studied in this work. Pitch features have been previously used for improving

classical HMM and DNN baselines, while jitter and shimmer parameters have

proven to be useful for tasks like speaker or emotion recognition. Up to our

knowledge, this is the first work combining such pitch and voice quality

features with modern convolutional architectures, showing improvements up to 2%

absolute WER points, for the publicly available Spanish Common Voice dataset.

Particularly, our work combines these features with mel-frequency spectral

coefficients (MFSCs) to train a convolutional architecture with Gated Linear

Units (Conv GLUs). Such models have shown to yield small word error rates,

while being very suitable for parallel processing for online streaming

recognition use cases. We have added pitch and voice quality functionality to

Facebook’s wav2letter speech recognition framework, and we provide with such

code and recipes to the community, to carry on with further experiments.

Besides, to the best of our knowledge, our Spanish Common Voice recipe is the

first public Spanish recipe for wav2letter.

Micro-entries: Encouraging Deeper Evaluation of Mental Models Over Time for Interactive Data Systems

Jeremy E. Block , Eric D. Ragan

Comments: 10 pages, submitted to BELIV 2020 Workshop

Subjects

:

Human-Computer Interaction (cs.HC)

; Machine Learning (cs.LG)

Many interactive data systems combine visual representations of data with

embedded algorithmic support for automation and data exploration. To

effectively support transparent and explainable data systems, it is important

for researchers and designers to know how users understand the system. We

discuss the evaluation of users’ mental models of system logic. Mental models

are challenging to capture and analyze. While common evaluation methods aim to

approximate the user’s final mental model after a period of system usage, user

understanding continuously evolves as users interact with a system over time.

In this paper, we review many common mental model measurement techniques,

discuss tradeoffs, and recommend methods for deeper, more meaningful evaluation

of mental models when using interactive data analysis and visualization

systems. We present guidelines for evaluating mental models over time that

reveal the evolution of specific model updates and how they may map to the

particular use of interface features and data queries. By asking users to

describe what they know and how they know it, researchers can collect

structured, time-ordered insight into a user’s conceptualization process while

also helping guide users to their own discoveries.

Clustering of Nonnegative Data and an Application to Matrix Completion

C. Strohmeier , D. Needell Subjects : Machine Learning (stat.ML) ; Machine Learning (cs.LG); Signal Processing (eess.SP)

In this paper, we propose a simple algorithm to cluster nonnegative data

lying in disjoint subspaces. We analyze its performance in relation to a

certain measure of correlation between said subspaces. We use our clustering

algorithm to develop a matrix completion algorithm which can outperform

standard matrix completion algorithms on data matrices satisfying certain

natural conditions.

Efficiency in Real-time Webcam Gaze Tracking

Amogh Gudi , Xin Li , Jan van Gemert

Comments: Awarded Best Paper at European Conference on Computer Vision (ECCV) Workshop on Eye Gaze in AR, VR, and in the Wild (OpenEyes) 2020

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Efficiency and ease of use are essential for practical applications of camera

based eye/gaze-tracking. Gaze tracking involves estimating where a person is

looking on a screen based on face images from a computer-facing camera. In this

paper we investigate two complementary forms of efficiency in gaze tracking: 1.

The computational efficiency of the system which is dominated by the inference

speed of a CNN predicting gaze-vectors; 2. The usability efficiency which is

determined by the tediousness of the mandatory calibration of the gaze-vector

to a computer screen. To do so, we evaluate the computational speed/accuracy

trade-off for the CNN and the calibration effort/accuracy trade-off for screen

calibration. For the CNN, we evaluate the full face, two-eyes, and single eye

input. For screen calibration, we measure the number of calibration points

needed and evaluate three types of calibration: 1. pure geometry, 2. pure

machine learning, and 3. hybrid geometric regression. Results suggest that a

single eye input and geometric regression calibration achieve the best

trade-off.

Quantum Discriminator for Binary Classification

Prasanna Date Subjects : Quantum Physics (quant-ph) ; Machine Learning (cs.LG); Machine Learning (stat.ML)

Quantum computers operate in the high-dimensional tensor product spaces and

are known to outperform classical computers on many problems. They are poised

to accelerate machine learning tasks in the future. In this work, we operate in

the quantum machine learning (QML) regime where a QML model is trained using a

quantum-classical hybrid algorithm and inferencing is performed using a quantum

algorithm. We leverage the traditional two-step machine learning workflow,

where features are extracted from the data in the first step and a

discriminator acting on the extracted features is used to classify the data in

the second step. Assuming that the binary features have been extracted from the

data, we propose a quantum discriminator for binary classification. The quantum

discriminator takes as input the binary features of a data point and a

prediction qubit in the zero state, and outputs the correct class of the data

point. The quantum discriminator is defined by a parameterized unitary matrix

(U_Theta) containing (mathcal{O}(N)) parameters, where (N) is the number of

data points in the training data set. Furthermore, we show that the quantum

discriminator can be trained in (mathcal{O}(N log N)) time using

(mathcal{O}(N log N)) classical bits and (mathcal{O}(log N)) qubits. We

also show that inferencing for the quantum discriminator can be done in

(mathcal{O}(N)) time using (mathcal{O}(log N)) qubits. Finally, we use the

quantum discriminator to classify the XOR problem on the IBM Q universal

quantum computer with (100\%) accuracy.

Detecting Parkinson's Disease from Speech-task in an accessible and interpretable manner

Wasifur Rahman , Sangwu Lee , Md. Saiful Islam , Abdullah Al Mamun , Victor Antony , Harshil Ratnu , Mohammad Rafayet Ali , Ehsan Hoque Subjects : Audio and Speech Processing (eess.AS) ; Computers and Society (cs.CY); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)

Every nine minutes a person is diagnosed with Parkinson’s Disease (PD) in the

United States. However, studies have shown that between 25 and 80\% of

individuals with Parkinson’s Disease (PD) remain undiagnosed. An online, in the

wild audio recording application has the potential to help screen for the

disease if risk can be accurately assessed. In this paper, we collect data from

726 unique subjects (262 PD and 464 Non-PD) uttering the “quick brown fox jumps

over the lazy dog ….” to conduct automated PD assessment. We extracted both

standard acoustic features and deep learning based embedding features from the

speech data and trained several machine learning algorithms on them. Our models

achieved 0.75 AUC by modeling the standard acoustic features through the

XGBoost model. We also provide explanation behind our model’s decision and show

that it is focusing mostly on the widely used MFCC features and a subset of

dysphonia features previously used for detecting PD from verbal phonation task.

Ultra Lightweight Image Super-Resolution with Multi-Attention Layers

Abdul Muqeet , Jiwon Hwang , Subin Yang , Jung Heum Kang , Yongwoo Kim , Sung-Ho Bae

Comments: ECCVW AIM2020

Subjects

:

Image and Video Processing (eess.IV)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Lightweight image super-resolution (SR) networks have the utmost significance

for real-world applications. There are several deep learning based SR methods

with remarkable performance, but their memory and computational cost are

hindrances in practical usage. To tackle this problem, we propose a

Multi-Attentive Feature Fusion Super-Resolution Network (MAFFSRN). MAFFSRN

consists of proposed feature fusion groups (FFGs) that serve as a feature

extraction block. Each FFG contains a stack of proposed multi-attention blocks

(MAB) that are combined in a novel feature fusion structure. Further, the MAB

with a cost-efficient attention mechanism (CEA) helps us to refine and extract

the features using multiple attention mechanisms. The comprehensive experiments

show the superiority of our model over the existing state-of-the-art. We

participated in AIM 2020 efficient SR challenge with our MAFFSRN model and won

1st, 3rd, and 4th places in memory usage, floating-point operations (FLOPs) and

number of parameters, respectively.

Information Theory

Private Weighted Random Walk Stochastic Gradient Descent

Ghadir Ayache , Salim El Rouayheb Subjects : Information Theory (cs.IT) ; Machine Learning (cs.LG)

We consider a decentralized learning setting in which data is distributed

over nodes in a graph. The goal is to learn a global model on the distributed

data without involving any central entity that needs to be trusted. While

gossip-based stochastic gradient descent (SGD) can be used to achieve this

learning objective, it incurs high communication and computation costs, since

it has to wait for all the local models at all the nodes to converge. To speed

up the convergence, we propose instead to study random walk based SGD in which

a global model is updated based on a random walk on the graph. We propose two

algorithms based on two types of random walks that achieve, in a decentralized

way, uniform sampling and importance sampling of the data. We provide a

non-asymptotic analysis on the rate of convergence, taking into account the

constants related to the data and the graph. Our numerical results show that

the weighted random walk based algorithm has a better performance for

high-variance data. Moreover, we propose a privacy-preserving random walk

algorithm that achieves local differential privacy based on a Gamma noise

mechanism that we propose. We also give numerical results on the convergence of

this algorithm and show that it outperforms additive Laplace-based privacy

mechanisms.

Optimal Streaming of 360 VR Videos with Perfect, Imperfect and Unknown FoV Viewing Probabilities

Lingzhi Zhao , Ying Cui , Chengjun Guo , Zhi Liu

Comments: 6 pages, 5 figures, to appear in GLOBECOM 2020

Subjects

:

Information Theory (cs.IT)

In this paper, we investigate wireless streaming of multi-quality tiled 360

virtual reality (VR) videos from a multi-antenna server to multiple

single-antenna users in a multi-carrier system. To capture the impact of

field-of-view (FoV) prediction, we consider three cases of FoV viewing

probability distributions, i.e., perfect, imperfect and unknown FoV viewing

probability distributions, and use the average total utility, worst average

total utility and worst total utility as the respective performance metrics. We

adopt rate splitting with successive decoding for efficient transmission of

multiple sets of tiles of different 360 VR videos to their requesting users. In

each case, we optimize the encoding rates of the tiles, minimum encoding rates

of the FoVs, rates of the common and private messages and transmission

beamforming vectors to maximize the total utility. The problems in the three

cases are all challenging nonconvex optimization problems. We successfully

transform the problem in each case into a difference of convex (DC) programming

problem with a differentiable objective function, and obtain a suboptimal

solution using concave-convex procedure (CCCP). Finally, numerical results

demonstrate the proposed solutions achieve notable gains over existing schemes

in all three cases. To the best of our knowledge, this is the first work

revealing the impact of FoV prediction and its accuracy on the performance of

streaming of multi-quality tiled 360 VR videos.

A Design Framework for Epsilon-Private Data Disclosure

Amirreza Zamani , Tobias J. Oechtering , Mikael Skoglund (Division of Information Science and Engineering, KTH Royal Institute of Technology)

Comments: 16 pages, 2 figures

Subjects

:

Information Theory (cs.IT)

In this paper, we study a stochastic disclosure control problem using

information-theoretic methods. The useful data to be disclosed depend on

private data that should be protected. Thus, we design a privacy mechanism to

produce new data which maximizes the disclosed information about the useful

data under a strong (chi^2)-privacy criterion. For sufficiently small leakage,

the privacy mechanism design problem can be geometrically studied in the space

of probability distributions by a local approximation of the mutual

information. By using methods from Euclidean information geometry, the original

highly challenging optimization problem can be reduced to a problem of finding

the principal right-singular vector of a matrix, which characterizes the

optimal privacy mechanism. In two extensions we first consider a noisy

disclosure channel and then we look for a mechanism which finds (U) based on

observing (X), maximizing the mutual information between (U) and (Y) while

satisfying the privacy criterion on (U) and (Z) under the Markov chain

((Z,Y)-X-U).

Optimal Wireless Streaming of Multi-Quality 360 VR Video by Exploiting Natural, Relative Smoothness-enabled and Transcoding-enabled Multicast Opportunities

Kaixuan Long , Ying Cui , Chencheng Ye , Zhi Liu

Comments: 14 pages, 5 figures, major revision, IEEE Transations on Multimedia. arXiv admin note: substantial text overlap with arXiv:2001.01906

Subjects

:

Information Theory (cs.IT)

In this paper, we would like to investigate optimal wireless streaming of a

multi-quality tiled 360 virtual reality (VR) video from a server to multiple

users. To this end, we propose to maximally exploit potential multicast

opportunities by effectively utilizing characteristics of multi-quality tiled

360 VR videos and computation resources at the users’ side. In particular, we

consider two requirements for quality variation in one field-of-view (FoV),

i.e., the absolute smoothness requirement and the relative smoothness

requirement, and two video playback modes, i.e., the direct-playback mode

(without user transcoding) and transcode-playback mode (with user transcoding).

Besides natural multicast opportunities, we introduce two new types of

multicast opportunities, namely, relative smoothness-enabled multicast

opportunities, which allow flexible tradeoff between viewing quality and

communications resource consumption, and transcoding-enabled multicast

opportunities, which allow flexible tradeoff between computation and

communications resource consumptions. Then, we establish a novel mathematical

model that reflects the impacts of natural, relative smoothness-enabled and

transcoding-enabled multicast opportunities on the average transmission energy

and transcoding energy. Based on this model, we optimize the transmission

resource allocation, playback quality level selection and transmission quality

level selection to minimize the energy consumption in the four cases with

different requirements for quality variation and video playback modes. By

comparing the optimal values in the four cases, we prove that the energy

consumption reduces when more multicast opportunities can be utilized. Finally,

numerical results show substantial gains of the proposed solutions over

existing schemes, and demonstrate the importance of effective exploitation of

the three types of multicast opportunities.

On the Size of the Giant Component in Inhomogeneous Random K-out Graphs

Mansi Sood , Osman Yagan

Comments: To appear in 9th IEEE Conference on Decision and Control. arXiv admin note: substantial text overlap with arXiv:1911.05147

Subjects

:

Information Theory (cs.IT)

; Probability (math.PR)

Inhomogeneous random K-out graphs were recently introduced to model

heterogeneous sensor networks secured by random pairwise key predistribution

schemes. First, each of the (n) nodes is classified as type-1 (respectively,

type-2) with probability (0<mu<1) (respectively, (1-mu)) independently from

each other. Next, each type-1 (respectively, type-2) node draws 1 arc towards a

node (respectively, (K_n) arcs towards (K_n) distinct nodes) selected uniformly

at random, and then the orientation of the arcs is ignored. It was recently

established that this graph, denoted by (mathbb{H}(n;mu,K_n)), is connected

with high probability (whp) if and only if (K_n=omega(1)). In other words, if

(K_n=O(1)), then (mathbb{H}(n;mu,K_n)) has a positive probability of being

{not} connected as (n) gets large. Here, we study the size of the largest

connected subgraph of (mathbb{H}(n;mu,K_n)) when (K_n = O(1)). We show that

the trivial condition of (K_n geq 2) for all (n) is sufficient to ensure that

inhomogeneous K-out graph has a connected component of size (n-O(1)) whp. Put

differently, even with (K_n =2), all but finitely many nodes will form a

connected sub-network in this model under any (0<mu<1). We present an upper

bound on the probability that more than (M) nodes are outside of the largest

component, and show that this decays as (O(1)exp{-M(1-mu)(K_n-1)} +o(1)).

Numerical results are presented to demonstrate the size of the largest

connected component when the number of nodes is finite.

Service Rate Region: A New Aspect of Coded Distributed System Design

Mehmet Aktas , Gauri Joshi , Swanand Kadhe , Fatemeh Kazemi , Emina Soljanin Subjects : Information Theory (cs.IT) ; Discrete Mathematics (cs.DM); Performance (cs.PF)

Erasure coding has been recently employed as a powerful method to mitigate

delays due to slow or straggling nodes in distributed systems. In this work, we

show that erasure coding of data objects can flexibly handle skews in the

request rates. Coding can help boost the service rate region, that is, increase

the overall volume of data access requests that can be handled by the system.

The goal of this paper is to postulate the service rate region as an important

consideration in the design of erasure coded distributed systems. We highlight

several open problems that can be grouped into two broad threads: 1)

characterizing the service rate region of a given code and finding the optimal

request allocation, and 2) designing the underlying erasure code for a given

service rate region. As contributions along the first thread, we characterize

the rate regions of maximum-distance-separable, locally repairable, and Simplex

codes. In terms of code design, we show the effectiveness of hybrid codes that

combine replication and erasure coding, and also discover fundamental

connections between multi-set batch codes and the problem of maximizing the

service rate region.

Secure Strong Coordination

Giulia Cervia , German Bassi , Mikael Skoglund

Journal-ref: IEEE WPS 2020 – International Workshop on Privacy and Security for

Information Systems

Subjects

:

Information Theory (cs.IT)

We consider a network of two nodes separated by a noisy channel, in which the

source and its reconstruction have to be strongly coordinated, while

simultaneously satisfying the strong secrecy condition with respect to an

outside observer of the noisy channel. In the case of non-causal encoding and

decoding, we propose a joint source-channel coding scheme for the secure strong

coordination region. Furthermore, we provide a complete characterization of the

secure strong coordination region when the decoder has to reliably reconstruct

the source sequence and the legitimate channel is more capable than the channel

of the eavesdropper.

Remote Joint Strong Coordination and Reliable Communication

Giulia Cervia , Tobias J. Oechtering , Mikael Skoglund

Journal-ref: 2020 IEEE International Symposium on Information Theory (ISIT)

Subjects

:

Information Theory (cs.IT)

We consider a three-node network, in which two agents wish to communicate

over a noisy channel, while controlling the distribution observed by a third

external agent. We use strong coordination to constrain the distribution, and

we provide a complete characterization of the “remote strong coordination and

reliable communication” region.

Smart Meter Data Privacy

Giulio Giaconi , Deniz Gunduz , H. Vincent Poor Subjects : Information Theory (cs.IT)

Smart grids (SGs) promise to deliver dramatic improvements compared to

traditional power grids thanks primarily to the large amount of data being

exchanged and processed within the grid, which enables the grid to be monitored

more accurately and at a much faster pace. The smart meter (SM) is one of the

key devices that enable the SG concept by monitoring a household’s electricity

consumption and reporting it to the utility provider (UP), i.e., the entity

that sells energy to customers, or to the distribution system operator (DSO),

i.e., the entity that operates and manages the grid, with high accuracy and at

a much faster pace compared to traditional meters. However, the very

availability of rich and high-frequency household electricity consumption data,

which enables a very efficient power grid management, also opens up

unprecedented challenges on data security and privacy. To counter these

threats, it is necessary to develop techniques that keep SM data private, and,

for this reason, SM privacy has become a very active research area. The aim of

this chapter is to provide an overview of the most significant

privacy-preserving techniques for SM data, highlighting their main benefits and

disadvantages.

Algebraic geometry codes and some applications

Alain Couvreur , Hugues Randriambololona

Comments: Survey chapter to appear in “A Concise Encyclopedia of Coding Theory”, W.C. Huffman, J.-L. Kim, and P. Sole’ Eds., CRC Press

Subjects

:

Information Theory (cs.IT)

; Cryptography and Security (cs.CR); Algebraic Geometry (math.AG); Number Theory (math.NT)

This article surveys the development of the theory of algebraic geometry

codes since their discovery in the late 70’s. We summarize the major results on

various problems such as: asymptotic parameters, improved estimates on the

minimum distance, and decoding algorithms. In addition, we present various

modern applications of these codes such as public-key cryptography, algebraic

complexity theory, multiparty computation or distributed storage.

Ramifications of Approximate Posterior Inference for Bayesian Deep Learning in Adversarial and Out-of-Distribution Settings

John Mitros , Arjun Pakrashi , Brian Mac Namee

Comments: ARRW@ECCV2020

Subjects

:

Machine Learning (stat.ML)

; Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)

Deep neural networks have been successful in diverse discriminative

classification tasks, although, they are poorly calibrated often assigning high

probability to misclassified predictions. Potential consequences could lead to

trustworthiness and accountability of the models when deployed in real

applications, where predictions are evaluated based on their confidence scores.

Existing solutions suggest the benefits attained by combining deep neural

networks and Bayesian inference to quantify uncertainty over the models’

predictions for ambiguous datapoints. In this work we propose to validate and

test the efficacy of likelihood based models in the task of out of distribution

detection (OoD). Across different datasets and metrics we show that Bayesian

deep learning models on certain occasions marginally outperform conventional

neural networks and in the event of minimal overlap between in/out distribution

classes, even the best models exhibit a reduction in AUC scores in detecting

OoD data. Preliminary investigations indicate the potential inherent role of

bias due to choices of initialisation, architecture or activation functions. We

hypothesise that the sensitivity of neural networks to unseen inputs could be a

multi-factor phenomenon arising from the different architectural design choices

often amplified by the curse of dimensionality. Furthermore, we perform a study

to find the effect of the adversarial noise resistance methods on in and

out-of-distribution performance, as well as, also investigate adversarial noise

robustness of Bayesian deep learners.

Action and Perception as Divergence Minimization

Danijar Hafner , Pedro A. Ortega , Jimmy Ba , Thomas Parr , Karl Friston , Nicolas Heess

Comments: 13 pages, 10 figures

Subjects

:

Artificial Intelligence (cs.AI)

; Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)

We introduce a unified objective for action and perception of intelligent

agents. Extending representation learning and control, we minimize the joint

divergence between the world and a target distribution. Intuitively, such

agents use perception to align their beliefs with the world, and use actions to

align the world with their beliefs. Minimizing the joint divergence to an

expressive target maximizes the mutual information between the agent’s

representations and inputs, thus inferring representations that are informative

of past inputs and exploring future inputs that are informative of the

representations. This lets us derive intrinsic objectives, such as

representation learning, information gain, empowerment, and skill discovery

from minimal assumptions. Moreover, interpreting the target distribution as a

latent variable model suggests expressive world models as a path toward highly

adaptive agents that seek large niches in their environments, while rendering

task rewards optional. The presented framework provides a common language for

comparing a wide range of objectives, facilitates understanding of latent

variables for decision making, and offers a recipe for designing novel

objectives. We recommend deriving future agent objectives from the joint

divergence to facilitate comparison, to point out the agent’s target

distribution, and to identify the intrinsic objective terms needed to reach

that distribution.

End-to-End Learning of Neuromorphic Wireless Systems for Low-Power Edge Artificial Intelligence

Nicolas Skatchkovsky , Hyeryung Jang , Osvaldo Simeone

Comments: To be presented at Asilomar 2020

Subjects

:

Neural and Evolutionary Computing (cs.NE)

; Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)

This paper introduces a novel “all-spike” low-power solution for remote

wireless inference that is based on neuromorphic sensing, Impulse Radio (IR),

and Spiking Neural Networks (SNNs). In the proposed system, event-driven

neuromorphic sensors produce asynchronous time-encoded data streams that are

encoded by an SNN, whose output spiking signals are pulse modulated via IR and

transmitted over general frequence-selective channels; while the receiver’s

inputs are obtained via hard detection of the received signals and fed to an

SNN for classification. We introduce an end-to-end training procedure that

treats the cascade of encoder, channel, and decoder as a probabilistic

SNN-based autoencoder that implements Joint Source-Channel Coding (JSCC). The

proposed system, termed NeuroJSCC, is compared to conventional synchronous

frame-based and uncoded transmissions in terms of latency and accuracy. The

experiments confirm that the proposed end-to-end neuromorphic edge architecture

provides a promising framework for efficient and low-latency remote sensing,

communication, and inference.

Error estimate for a universal function approximator of ReLU network with a local connection

Jae-Mo Kang , Sunghwan Moon Subjects : Machine Learning (cs.LG) ; Information Theory (cs.IT); Machine Learning (stat.ML)

Neural networks have shown high successful performance in a wide range of

tasks, but further studies are needed to improve its performance. We analyze

the approximation error of the specific neural network architecture with a

local connection and higher application than one with the full connection

because the local-connected network can be used to explain diverse neural

networks such as CNNs. Our error estimate depends on two parameters: one

controlling the depth of the hidden layer, and the other, the width of the

hidden layers.

Zuckerli: A New Compressed Representation for Graphs

Luca Versari , Iulia M. Comsa , Alessio Conte , Roberto Grossi Subjects : Data Structures and Algorithms (cs.DS) ; Information Theory (cs.IT)

Zuckerli is a scalable compression system meant for large real-world graphs.

Graphs are notoriously challenging structures to store efficiently due to their

linked nature, which makes it hard to separate them into smaller, compact

components. Therefore, effective compression is crucial when dealing with large

graphs, which can have billions of nodes and edges. Furthermore, a good

compression system should give the user fast and reasonably flexible access to

parts of the compressed data without requiring full decompression, which may be

unfeasible on their system. Zuckerli improves multiple aspects of WebGraph, the

current state-of-the-art in compressing real-world graphs, by using advanced

compression techniques and novel heuristic graph algorithms. It can produce

both a compressed representation for storage and one which allows fast direct

access to the adjacency lists of the compressed graph without decompressing the

entire graph. We validate the effectiveness of Zuckerli on real-world graphs

with up to a billion nodes and 90 billion edges, conducting an extensive

experimental evaluation of both compression density and decompression

performance. We show that Zuckerli-compressed graphs are 10% to 29% smaller,

and more than 20% in most cases, with a resource usage for decompression

comparable to that of WebGraph.

Quantum stabilizer codes, lattices, and CFTs

Anatoly Dymarsky , Alfred Shapere

Comments: 99 pages

Subjects

:

High Energy Physics – Theory (hep-th)

; Information Theory (cs.IT); Combinatorics (math.CO); Quantum Physics (quant-ph)

There is a rich connection between classical error-correcting codes,

Euclidean lattices, and chiral conformal field theories. Here we show that

quantum error-correcting codes, those of the stabilizer type, are related to

Lorentzian lattices and non-chiral CFTs. More specifically, real self-dual

stabilizer codes can be associated with even self-dual Lorentzian lattices, and

thus define Narain CFTs. We dub the resulting theories code CFTs and study

their properties. T-duality transformations of a code CFT, at the level of the

underlying code, reduce to code equivalences. By means of such equivalences,

any stabilizer code can be reduced to a graph code. We can therefore represent

code CFTs by graphs. We study code CFTs with small central charge (c=nleq 12),

and find many interesting examples. Among them is a non-chiral (E_8) theory,

which is based on the root lattice of (E_8) understood as an even self-dual

Lorentzian lattice. By analyzing all graphs with (nleq 8) nodes we find many

pairs and triples of physically distinct isospectral theories. We also

construct numerous modular invariant functions satisfying all the basic

properties expected of the CFT partition function, yet which are not partition

functions of any known CFTs. We consider the ensemble average over all code

theories, calculate the corresponding partition function, and discuss its

possible holographic interpretation. The paper is written in a self-contained

manner, and includes an extensive pedagogical introduction and many explicit

examples.

欢迎加入我爱机器学习QQ14群:336582044

qMfYBbb.jpg!mobile

微信扫一扫,关注我爱机器学习公众号

微博:我爱机器学习


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK