arXiv Paper Daily: Fri, 4 Sep 2020
source link: https://www.52ml.net/22514.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Neural and Evolutionary Computing
Tree Neural Networks in HOL4
Thibault Gauthier Subjects : Neural and Evolutionary Computing (cs.NE)
We present an implementation of tree neural networks within the proof
assistant HOL4. Their architecture makes them naturally suited for
approximating functions whose domain is a set of formulas. We measure the
performance of our implementation and compare it with other machine learning
predictors on the tasks of evaluating arithmetical expressions and estimating
the truth of propositional formulas.
Sparse Meta Networks for Sequential Adaptation and its Application to Adaptive Language Modelling
Comments: 9 pages, 4 figures, 2 tables
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Training a deep neural network requires a large amount of single-task data
and involves a long time-consuming optimization phase. This is not scalable to
complex, realistic environments with new unexpected changes. Humans can perform
fast incremental learning on the fly and memory systems in the brain play a
critical role. We introduce Sparse Meta Networks — a meta-learning approach to
learn online sequential adaptation algorithms for deep neural networks, by
using deep neural networks. We augment a deep neural network with a
layer-specific fast-weight memory. The fast-weights are generated sparsely at
each time step and accumulated incrementally through time providing a useful
inductive bias for online continual adaptation. We demonstrate strong
performance on a variety of sequential adaptation scenarios, from a simple
online reinforcement learning to a large scale adaptive language modelling.
Kai Dresia , Simon Jentzsch , Günther Waxenegger-Wilfing , Robson Hahn , Jan Deeken , Michael Oschwald , Fabio Mota Subjects : Neural and Evolutionary Computing (cs.NE) ; Systems and Control (eess.SY)
Identifying the optimal design of a new launch vehicle is most important
since design decisions made in the early development phase limit the vehicles’
later performance and determines the associated costs. Reusing the first stage
via retro-propulsive landing increases the complexity even more. Therefore, we
develop an optimization framework for partially reusable launch vehicles, which
enables multidisciplinary design studies. The framework contains suitable mass
estimates of all essential subsystems and a routine to calculate the needed
propellant for the ascent and landing maneuvers. For design optimization, the
framework can be coupled with a genetic algorithm. The overall goal is to
reveal the implications of different propellant combinations and objective
functions on the launcher’s optimal design for various mission scenarios. The
results show that the optimization objective influences the most suitable
propellant choice and the overall launcher design, concerning staging, weight,
size, and rocket engine parameters. In terms of gross lift-off weight, liquid
hydrogen seems to be favorable. When optimizing for a minimum structural mass
or an expandable structural mass, hydrocarbon-based solutions show better
results. Finally, launch vehicles using a hydrocarbon fuel in the first stage
and liquid hydrogen in the upper stage are an appealing alternative, combining
both fuels’ benefits.
End-to-End Learning of Neuromorphic Wireless Systems for Low-Power Edge Artificial Intelligence
Comments: To be presented at Asilomar 2020
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
This paper introduces a novel “all-spike” low-power solution for remote
wireless inference that is based on neuromorphic sensing, Impulse Radio (IR),
and Spiking Neural Networks (SNNs). In the proposed system, event-driven
neuromorphic sensors produce asynchronous time-encoded data streams that are
encoded by an SNN, whose output spiking signals are pulse modulated via IR and
transmitted over general frequence-selective channels; while the receiver’s
inputs are obtained via hard detection of the received signals and fed to an
SNN for classification. We introduce an end-to-end training procedure that
treats the cascade of encoder, channel, and decoder as a probabilistic
SNN-based autoencoder that implements Joint Source-Channel Coding (JSCC). The
proposed system, termed NeuroJSCC, is compared to conventional synchronous
frame-based and uncoded transmissions in terms of latency and accuracy. The
experiments confirm that the proposed end-to-end neuromorphic edge architecture
provides a promising framework for efficient and low-latency remote sensing,
communication, and inference.
Auto-Classifier: A Robust Defect Detector Based on an AutoML Head
Comments: 12 pages, 2 figures. Published in ICONIP2020, proceedings published in the Springer’s series of Lecture Notes in Computer Science
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
The dominant approach for surface defect detection is the use of hand-crafted
feature-based methods. However, this falls short when conditions vary that
affect extracted images. So, in this paper, we sought to determine how well
several state-of-the-art Convolutional Neural Networks perform in the task of
surface defect detection. Moreover, we propose two methods: CNN-Fusion, that
fuses the prediction of all the networks into a final one, and Auto-Classifier,
which is a novel proposal that improves a Convolutional Neural Network by
modifying its classification component using AutoML. We carried out experiments
to evaluate the proposed methods in the task of surface defect detection using
different datasets from DAGM2007. We show that the use of Convolutional Neural
Networks achieves better results than traditional methods, and also, that
Auto-Classifier out-performs all other methods, by achieving 100% accuracy and
100% AUC results throughout all the datasets.
Physarum Multi-Commodity Flow Dynamics
Vincenzo Bonifaci , Enrico Facca , Frederic Folz , Andreas Karrenbauer , Pavel Kolev , Kurt Mehlhorn , Giovanna Morigi , Golnoosh Shahkarami , Quentin Vermande Subjects : Data Structures and Algorithms (cs.DS) ; Neural and Evolutionary Computing (cs.NE)
In wet-lab experiments cite{Nakagaki-Yamada-Toth,Tero-Takagi-etal}, the
slime mold Physarum polycephalum has demonstrated its ability to solve shortest
path problems and to design efficient networks, see Figure
ef{Wet-Lab
Experiments} for illustrations. Physarum polycephalum is a slime mold in the
Mycetozoa group. For the shortest path problem, a mathematical model for the
evolution of the slime was proposed in cite{Tero-Kobayashi-Nakagaki} and its
biological relevance was argued. The model was shown to solve shortest path
problems, first in computer simulations and then by mathematical proof. It was
later shown that the slime mold dynamics can solve more general linear programs
and that many variants of the dynamics have similar convergence behavior. In
this paper, we introduce a dynamics for the network design problem. We
formulate network design as the problem of constructing a network that
efficiently supports a multi-commodity flow problem. We investigate the
dynamics in computer simulations and analytically. The simulations show that
the dynamics is able to construct efficient and elegant networks. In the
theoretical part we show that the dynamics minimizes an objective combining the
cost of the network and the cost of routing the demands through the network. We
also give alternative characterization of the optimum solution.
Computer Vision and Pattern Recognition
Flow-edge Guided Video Completion
Comments: ECCV 2020. Project: this http URL
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
We present a new flow-based video completion algorithm. Previous flow
completion methods are often unable to retain the sharpness of motion
boundaries. Our method first extracts and completes motion edges, and then uses
them to guide piecewise-smooth flow completion with sharp edges. Existing
methods propagate colors among local flow connections between adjacent frames.
However, not all missing regions in a video can be reached in this way because
the motion boundaries form impenetrable barriers. Our method alleviates this
problem by introducing non-local flow connections to temporally distant frames,
enabling propagating video content over motion boundaries. We validate our
approach on the DAVIS dataset. Both visual and quantitative results show that
our method compares favorably against the state-of-the-art algorithms.
Computational Analysis of Deformable Manifolds: from Geometric Modelling to Deep Learning
Comments: PhD Thesis, Versions of several chapters have previously appeard or been submitted under different titles
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (cs.LG); Numerical Analysis (math.NA)
Leo Tolstoy opened his monumental novel Anna Karenina with the now famous
words: Happy families are all alike; every unhappy family is unhappy in its own
way A similar notion also applies to mathematical spaces: Every flat space is
alike; every unflat space is unflat in its own way. However, rather than being
a source of unhappiness, we will show that the diversity of non-flat spaces
provides a rich area of study. The genesis of the so-called big data era and
the proliferation of social and scientific databases of increasing size has led
to a need for algorithms that can efficiently process, analyze and, even
generate high dimensional data. However, the curse of dimensionality leads to
the fact that many classical approaches do not scale well with respect to the
size of these problems. One technique to avoid some of these ill-effects is to
exploit the geometric structure of coherent data. In this thesis, we will
explore geometric methods for shape processing and data analysis. More
specifically, we will study techniques for representing manifolds and signals
supported on them through a variety of mathematical tools including, but not
limited to, computational differential geometry, variational PDE modeling, and
deep learning. First, we will explore non-isometric shape matching through
variational modeling. Next, we will use ideas from parallel transport on
manifolds to generalize convolution and convolutional neural networks to
deformable manifolds. Finally, we conclude by proposing a novel auto-regressive
model for capturing the intrinsic geometry and topology of data. Throughout
this work, we will use the idea of computing correspondences as a though-line
to both motivate our work and analyze our results.
Synthetic-to-Real Unsupervised Domain Adaptation for Scene Text Detection in the Wild
Weijia Wu , Ning Lu , Enze Xie Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Deep learning-based scene text detection can achieve preferable performance,
powered with sufficient labeled training data. However, manual labeling is time
consuming and laborious. At the extreme, the corresponding annotated data are
unavailable. Exploiting synthetic data is a very promising solution except for
domain distribution mismatches between synthetic datasets and real datasets. To
address the severe domain distribution mismatch, we propose a synthetic-to-real
domain adaptation method for scene text detection, which transfers knowledge
from synthetic data (source domain) to real data (target domain). In this
paper, a text self-training (TST) method and adversarial text instance
alignment (ATA) for domain adaptive scene text detection are introduced. ATA
helps the network learn domain-invariant features by training a domain
classifier in an adversarial manner. TST diminishes the adverse effects of
false positives~(FPs) and false negatives~(FNs) from inaccurate pseudo-labels.
Two components have positive effects on improving the performance of scene text
detectors when adapting from synthetic-to-real scenes. We evaluate the proposed
method by transferring from SynthText, VISD to ICDAR2015, ICDAR2013. The
results demonstrate the effectiveness of the proposed method with up to 10%
improvement, which has important exploration significance for domain adaptive
scene text detection. Code is available at
this https URLMIPGAN — Generating Robust and High QualityMorph Attacks Using Identity Prior Driven GAN
Comments: Submitted to IEEE T-BIOM 2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Cryptography and Security (cs.CR)
Face morphing attacks target to circumvent Face Recognition Systems (FRS) by
employing face images derived from multiple data subjects (e.g., accomplices
and malicious actors). Morphed images can verify against contributing data
subjects with a reasonable success rate, given they have a high degree of
identity resemblance. The success of the morphing attacks is directly dependent
on the quality of the generated morph images. We present a new approach for
generating robust attacks extending our earlier framework for generating face
morphs. We present a new approach using an Identity Prior Driven Generative
Adversarial Network, which we refer to as extit{MIPGAN (Morphing through
Identity Prior driven GAN)}. The proposed MIPGAN is derived from the StyleGAN
with a newly formulated loss function exploiting perceptual quality and
identity factor to generate a high quality morphed face image with minimal
artifacts and with higher resolution. We demonstrate the proposed approach’s
applicability to generate robust morph attacks by evaluating it against a
commercial Face Recognition System (FRS) and demonstrate the success rate of
attacks. Extensive experiments are carried out to assess the FRS’s
vulnerability against the proposed morphed face generation technique on three
types of data such as digital images, re-digitized (printed and scanned)
images, and compressed images after re-digitization from newly generated
extit{MIPGAN Face Morph Dataset}. The obtained results demonstrate that the
proposed approach of morph generation profoundly threatens the FRS.
Multi-Loss Weighting with Coefficient of Variations
Rick Groenendijk , Sezer Karaoglu , Theo Gevers , Thomas Mensink Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Many interesting tasks in machine learning and computer vision are learned by
optimising an objective function defined as a weighted linear combination of
multiple losses. The final performance is sensitive to choosing the correct
(relative) weights for these losses. Finding a good set of weights is often
done by adopting them into the set of hyper-parameters, which are set using an
extensive grid search. This is computationally expensive. In this paper, the
weights are defined based on properties observed while training the model,
including the specific batch loss, the average loss, and the variance for each
of the losses. An additional advantage is that the defined weights evolve
during training, instead of using static loss weights. In literature, loss
weighting is mostly used in a multi-task learning setting, where the different
tasks obtain different weights. However, there is a plethora of single-task
multi-loss problems that can benefit from automatic loss weighting. In this
paper, it is shown that these multi-task approaches do not work on single
tasks. Instead, a method is proposed that automatically and dynamically tunes
loss weights throughout training specifically for single-task multi-loss
problems. The method incorporates a measure of uncertainty to balance the
losses. The validity of the approach is shown empirically for different tasks
on multiple datasets.
Future Frame Prediction of a Video Sequence
Comments: Acknowledgement: the contributions, support, and help of Sonam Gupta, PhD Scholar, VPLAB, Deptt. of CS&E, IIT Madras
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Predicting future frames of a video sequence has been a problem of high
interest in the field of Computer Vision as it caters to a multitude of
applications. The ability to predict, anticipate and reason about future events
is the essence of intelligence and one of the main goals of decision-making
systems such as human-machine interaction, robot navigation and autonomous
driving. However, the challenge lies in the ambiguous nature of the problem as
there may be multiple future sequences possible for the same input video shot.
A naively designed model averages multiple possible futures into a single
blurry prediction.
Recently, two distinct approaches have attempted to address this problem as:
(a) use of latent variable models that represent underlying stochasticity and
(b) adversarially trained models that aim to produce sharper images. A latent
variable model often struggles to produce realistic results, while an
adversarially trained model underutilizes latent variables and thus fails to
produce diverse predictions. These methods have revealed complementary
strengths and weaknesses. Combining the two approaches produces predictions
that appear more realistic and better cover the range of plausible futures.
This forms the basis and objective of study in this project work.
In this paper, we proposed a novel multi-scale architecture combining both
approaches. We validate our proposed model through a series of experiments and
empirical evaluations on Moving MNIST, UCF101, and Penn Action datasets. Our
method outperforms the results obtained using the baseline methods.
Multi-domain semantic segmentation with pyramidal fusion
Comments: 2 pages, 3 tables
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
We present our submission to the semantic segmentation contest of the Robust
Vision Challenge held at ECCV 2020. The contest requires submitting the same
model to seven benchmarks from three different domains. Our approach is based
on the SwiftNet architecture with pyramidal fusion. We address inconsistent
taxonomies with a single-level 193-dimensional softmax output. We strive to
train with large batches in order to stabilize optimization of a hard
recognition problem, and to favour smooth evolution of batchnorm statistics. We
achieve this by implementing a custom backward step through log-sum-prob loss,
and by using small crops before freezing the population statistics. Our model
ranks first on the RVC semantic segmentation challenge as well as on the
WildDash 2 leaderboard. This suggests that pyramidal fusion is competitive not
only for efficient inference with lightweight backbones, but also in
large-scale setups for multi-domain application.
Comments: 5 pages, 5 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
A simple modification method for single-stage generic object detection neural
networks, such as YOLO and SSD, is proposed, which allows for improving the
detection accuracy on video data by exploiting the temporal behavior of the
scene in the detection pipeline. It is shown that, using this method, the
detection accuracy of the base network can be considerably improved, especially
for occluded and hidden objects. It is shown that a modified network is more
prone to detect hidden objects with more confidence than an unmodified one. A
weakly supervised training method is proposed, which allows for training a
modified network without requiring any additional annotated data.
Few-shot Object Detection with Feature Attention Highlight Module in Remote Sensing Images
Zixuan Xiao , Ping Zhong , Yuan Quan , Xuping Yin , Wei Xue Subjects : Computer Vision and Pattern Recognition (cs.CV)
In recent years, there are many applications of object detection in remote
sensing field, which demands a great number of labeled data. However, in many
cases, data is extremely rare. In this paper, we proposed a few-shot object
detector which is designed for detecting novel objects based on only a few
examples. Through fully leveraging labeled base classes, our model that is
composed of a feature-extractor, a feature attention highlight module as well
as a two-stage detection backend can quickly adapt to novel classes. The
pre-trained feature extractor whose parameters are shared produces general
features. While the feature attention highlight module is designed to be
light-weighted and simple in order to fit the few-shot cases. Although it is
simple, the information provided by it in a serial way is helpful to make the
general features to be specific for few-shot objects. Then the object-specific
features are delivered to the two-stage detection backend for the detection
results. The experiments demonstrate the effectiveness of the proposed method
for few-shot cases.
SCG-Net: Self-Constructing Graph Neural Networks for Semantic Segmentation
Comments: 11 pages, 5 figs. Draf version to TGRS, code will be open soon
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Capturing global contextual representations by exploiting long-range
pixel-pixel dependencies has shown to improve semantic segmentation
performance. However, how to do this efficiently is an open question as current
approaches of utilising attention schemes or very deep models to increase the
models field of view, result in complex models with large memory consumption.
Inspired by recent work on graph neural networks, we propose the
Self-Constructing Graph (SCG) module that learns a long-range dependency graph
directly from the image and uses it to propagate contextual information
efficiently to improve semantic segmentation. The module is optimised via a
novel adaptive diagonal enhancement method and a variational lower bound that
consists of a customized graph reconstruction term and a Kullback-Leibler
divergence regularization term. When incorporated into a neural network
(SCG-Net), semantic segmentation is performed in an end-to-end manner and
competitive performance (mean F1-scores of 92.0% and 89.8% respectively) on the
publicly available ISPRS Potsdam and Vaihingen datasets is achieved, with much
fewer parameters, and at a lower computational cost compared to related pure
convolutional neural network (CNN) based models.
Comments: Accepted for publication in IEEE Transaction on Circuit and System for Video Technology
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)
Convolutional neural networks (CNNs) require both intensive computation and
frequent memory access, which lead to a low processing speed and large power
dissipation. Although the characteristics of the different layers in a CNN are
frequently quite different, previous hardware designs have employed common
optimization schemes for them. This paper proposes a layer-specific design that
employs different organizations that are optimized for the different layers.
The proposed design employs two layer-specific optimizations: layer-specific
mixed data flow and layer-specific mixed precision. The mixed data flow aims to
minimize the off-chip access while demanding a minimal on-chip memory (BRAM)
resource of an FPGA device. The mixed precision quantization is to achieve both
a lossless accuracy and an aggressive model compression, thereby further
reducing the off-chip access. A Bayesian optimization approach is used to
select the best sparsity for each layer, achieving the best trade-off between
the accuracy and compression. This mixing scheme allows the entire network
model to be stored in BRAMs of the FPGA to aggressively reduce the off-chip
access, and thereby achieves a significant performance enhancement. The model
size is reduced by 22.66-28.93 times compared to that in a full-precision
network with a negligible degradation of accuracy on VOC, COCO, and ImageNet
datasets. Furthermore, the combination of mixed dataflow and mixed precision
significantly outperforms the previous works in terms of both throughput,
off-chip access, and on-chip memory requirement.
DESC: Domain Adaptation for Depth Estimation via Semantic Consistency
Comments: BMVC20 (Oral). Code: this https URL
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Accurate real depth annotations are difficult to acquire, needing the use of
special devices such as a LiDAR sensor. Self-supervised methods try to overcome
this problem by processing video or stereo sequences, which may not always be
available. Instead, in this paper, we propose a domain adaptation approach to
train a monocular depth estimation model using a fully-annotated source dataset
and a non-annotated target dataset. We bridge the domain gap by leveraging
semantic predictions and low-level edge features to provide guidance for the
target domain. We enforce consistency between the main model and a second model
trained with semantic segmentation and edge maps, and introduce priors in the
form of instance heights. Our approach is evaluated on standard domain
adaptation benchmarks for monocular depth estimation and show consistent
improvement upon the state-of-the-art.
Auto-Classifier: A Robust Defect Detector Based on an AutoML Head
Comments: 12 pages, 2 figures. Published in ICONIP2020, proceedings published in the Springer’s series of Lecture Notes in Computer Science
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
The dominant approach for surface defect detection is the use of hand-crafted
feature-based methods. However, this falls short when conditions vary that
affect extracted images. So, in this paper, we sought to determine how well
several state-of-the-art Convolutional Neural Networks perform in the task of
surface defect detection. Moreover, we propose two methods: CNN-Fusion, that
fuses the prediction of all the networks into a final one, and Auto-Classifier,
which is a novel proposal that improves a Convolutional Neural Network by
modifying its classification component using AutoML. We carried out experiments
to evaluate the proposed methods in the task of surface defect detection using
different datasets from DAGM2007. We show that the use of Convolutional Neural
Networks achieves better results than traditional methods, and also, that
Auto-Classifier out-performs all other methods, by achieving 100% accuracy and
100% AUC results throughout all the datasets.
1st Place Solution of LVIS Challenge 2020: A Good Box is not a Guarantee of a Good Mask
Comments: Winner of LVIS challenge 2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
This article introduces the solutions of the team lvisTraveler for LVIS
Challenge 2020. In this work, two characteristics of LVIS dataset are mainly
considered: the long-tailed distribution and high quality instance segmentation
mask. We adopt a two-stage training pipeline. In the first stage, we
incorporate EQL and self-training to learn generalized representation. In the
second stage, we utilize Balanced GroupSoftmax to promote the classifier, and
propose a novel proposal assignment strategy and a new balanced mask loss for
mask head to get more precise mask predictions. Finally, we achieve 41.5 and
41.2 AP on LVIS v1.0 val and test-dev splits respectively, outperforming the
baseline based on X101-FPN-MaskRCNN by a large margin.
Physics-based Shading Reconstruction for Intrinsic Image Decomposition
Comments: Submitted to Computer Vision and Image Understanding (CVIU)
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
We investigate the use of photometric invariance and deep learning to compute
intrinsic images (albedo and shading). We propose albedo and shading gradient
descriptors which are derived from physics-based models. Using the descriptors,
albedo transitions are masked out and an initial sparse shading map is
calculated directly from the corresponding RGB image gradients in a
learning-free unsupervised manner. Then, an optimization method is proposed to
reconstruct the full dense shading map. Finally, we integrate the generated
shading map into a novel deep learning framework to refine it and also to
predict corresponding albedo image to achieve intrinsic image decomposition. By
doing so, we are the first to directly address the texture and intensity
ambiguity problems of the shading estimations. Large scale experiments show
that our approach steered by physics-based invariant descriptors achieve
superior results on MIT Intrinsics, NIR-RGB Intrinsics, Multi-Illuminant
Intrinsic Images, Spectral Intrinsic Images, As Realistic As Possible, and
competitive results on Intrinsic Images in the Wild datasets while achieving
state-of-the-art shading estimations.
Comments: 10 pages, 3 figures, submitted to BIBM2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Joint image-text embedding extracted from medical images and associated
contextual reports is the bedrock for most biomedical vision-and-language (V+L)
tasks, including medical visual question answering, clinical image-text
retrieval, clinical report auto-generation. In this study, we adopt four
pre-trained V+L models: LXMERT, VisualBERT, UNIER and PixelBERT to learn
multimodal representation from MIMIC-CXR radiographs and associated reports.
The extrinsic evaluation on OpenI dataset shows that in comparison to the
pioneering CNN-RNN model, the joint embedding learned by pre-trained V+L models
demonstrate performance improvement in the thoracic findings classification
task. We conduct an ablation study to analyze the contribution of certain model
components and validate the advantage of joint embedding over text-only
embedding. We also visualize attention maps to illustrate the attention
mechanism of V+L models.
Comments: Surgan Jandial, Ayush Chopra and Pinkesh Badjatiya contributed equally to this work
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI)
The ability to efficiently search for images over an indexed database is the
cornerstone for several user experiences. Incorporating user feedback, through
multi-modal inputs provide flexible and interaction to serve fine-grained
specificity in requirements. We specifically focus on text feedback, through
descriptive natural language queries. Given a reference image and textual user
feedback, our goal is to retrieve images that satisfy constraints specified by
both of these input modalities. The task is challenging as it requires
understanding the textual semantics from the text feedback and then applying
these changes to the visual representation. To address these challenges, we
propose a novel architecture TRACE which contains a hierarchical feature
aggregation module to learn the composite visio-linguistic representations.
TRACE achieves the SOTA performance on 3 benchmark datasets: FashionIQ, Shoes,
and Birds-to-Words, with an average improvement of at least ~5.7%, ~3%, and ~5%
respectively in R@K metric. Our extensive experiments and ablation studies show
that TRACE consistently outperforms the existing techniques by significant
margins both quantitatively and qualitatively.
Modeling Global Body Configurations in American Sign Language
Nicholas Wilkins , Beck Cordes Galbraith , Ifeoma Nwogu Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG)
American Sign Language (ASL) is the fourth most commonly used language in the
United States and is the language most commonly used by Deaf people in the
United States and the English-speaking regions of Canada. Unfortunately, until
recently, ASL received little research. This is due, in part, to its delayed
recognition as a language until William C. Stokoe’s publication in 1960.
Limited data has been a long-standing obstacle to ASL research and
computational modeling. The lack of large-scale datasets has prohibited many
modern machine-learning techniques, such as Neural Machine Translation, from
being applied to ASL. In addition, the modality required to capture sign
language (i.e. video) is complex in natural settings (as one must deal with
background noise, motion blur, and the curse of dimensionality). Finally, when
compared with spoken languages, such as English, there has been limited
research conducted into the linguistics of ASL.
We realize a simplified version of Liddell and Johnson’s Movement-Hold (MH)
Model using a Probabilistic Graphical Model (PGM). We trained our model on
ASLing, a dataset collected from three fluent ASL signers. We evaluate our PGM
against other models to determine its ability to model ASL. Finally, we
interpret various aspects of the PGM and draw conclusions about ASL phonetics.
The main contributions of this paper are
Adherent Mist and Raindrop Removal from a Single Image Using Attentive Convolutional Network
Comments: 21 pages (including 4 pages of supplementary materials)
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Temperature difference-induced mist adhered to the windshield, camera lens,
etc. are often inhomogeneous and obscure, which can easily obstruct the vision
and degrade the image severely. Together with adherent raindrops, they bring
considerable challenges to various vision systems but without enough attention.
Recent methods for similar problems typically use hand-crafted priors to
generate spatial attention maps. In this work, we propose to visually remove
the adherent mist and raindrop jointly from a single image using attentive
convolutional neural networks. We apply classification activation map attention
to our model to strengthen the spatial attention without hand-crafted priors.
In addition, the smoothed dilated convolution is adopted to obtain a large
receptive field without spatial information loss, and the dual attention module
is utilized for efficiently selecting channels and spatial features. Our
experiments show our method achieves state-of-the-art performance, and
demonstrate that this underrated practical problem is critical to high-level
vision scenes.
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
Long Chen , Wenbo Ma , Jun Xiao , Hanwang Zhang , Wei Liu , Shih-Fu Chang Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
The prevailing framework for solving referring expression grounding is based
on a two-stage process: 1) detecting proposals with an object detector and 2)
grounding the referent to one of the proposals. Existing two-stage solutions
mostly focus on the grounding step, which aims to align the expressions with
the proposals. In this paper, we argue that these methods overlook an obvious
mismatch between the roles of proposals in the two stages: they generate
proposals solely based on the detection confidence (i.e., expression-agnostic),
hoping that the proposals contain all right instances in the expression (i.e.,
expression-aware). Due to this mismatch, current two-stage methods suffer from
a severe performance drop between detected and ground-truth proposals. To this
end, we propose Ref-NMS, which is the first method to yield expression-aware
proposals at the first stage. Ref-NMS regards all nouns in the expression as
critical objects, and introduces a lightweight module to predict a score for
aligning each box with a critical object. These scores can guide the
NMSoperation to filter out the boxes irrelevant to the expression, increasing
the recall of critical objects, resulting in a significantly improved grounding
performance. Since Ref-NMS is agnostic to the grounding step, it can be easily
integrated into any state-of-the-art two-stage method. Extensive ablation
studies on several backbones, benchmarks, and tasks consistently demonstrate
the superiority of Ref-NMS.
Tasks Integrated Networks: Joint Detection and Retrieval for Image Search
Comments: To appear in IEEE TPAMI, 18 pages
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI)
The traditional object retrieval task aims to learn a discriminative feature
representation with intra-similarity and inter-dissimilarity, which supposes
that the objects in an image are manually or automatically pre-cropped exactly.
However, in many real-world searching scenarios (e.g., video surveillance), the
objects (e.g., persons, vehicles, etc.) are seldom accurately detected or
annotated. Therefore, object-level retrieval becomes intractable without
bounding-box annotation, which leads to a new but challenging topic, i.e.
image-level search. In this paper, to address the image search issue, we first
introduce an end-to-end Integrated Net (I-Net), which has three merits: 1) A
Siamese architecture and an on-line pairing strategy for similar and dissimilar
objects in the given images are designed. 2) A novel on-line pairing (OLP) loss
is introduced with a dynamic feature dictionary, which alleviates the
multi-task training stagnation problem, by automatically generating a number of
negative pairs to restrict the positives. 3) A hard example priority (HEP)
based softmax loss is proposed to improve the robustness of classification task
by selecting hard categories. With the philosophy of divide and conquer, we
further propose an improved I-Net, called DC-I-Net, which makes two new
contributions: 1) two modules are tailored to handle different tasks separately
in the integrated framework, such that the task specification is guaranteed. 2)
A class-center guided HEP loss (C2HEP) by exploiting the stored class centers
is proposed, such that the intra-similarity and inter-dissimilarity can be
captured for ultimate retrieval. Extensive experiments on famous image-level
search oriented benchmark datasets demonstrate that the proposed DC-I-Net
outperforms the state-of-the-art tasks-integrated and tasks-separated image
search models.
Spatial Transformer Point Convolution
Yuan Fang , Chunyan Xu , Zhen Cui , Yuan Zong , Jian Yang Subjects : Computer Vision and Pattern Recognition (cs.CV)
Point clouds are unstructured and unordered in the embedded 3D space. In
order to produce consistent responses under different permutation layouts, most
existing methods aggregate local spatial points through maximum or summation
operation. But such an aggregation essentially belongs to the isotropic
filtering on all operated points therein, which tends to lose the information
of geometric structures. In this paper, we propose a spatial transformer point
convolution (STPC) method to achieve anisotropic convolution filtering on point
clouds. To capture and represent implicit geometric structures, we specifically
introduce spatial direction dictionary to learn those latent geometric
components. To better encode unordered neighbor points, we design sparse
deformer to transform them into the canonical ordered dictionary space by using
direction dictionary learning. In the transformed space, the standard
image-like convolution can be leveraged to generate anisotropic filtering,
which is more robust to express those finer variances of local regions.
Dictionary learning and encoding processes are encapsulated into a network
module and jointly learnt in an end-to-end manner. Extensive experiments on
several public datasets (including S3DIS, Semantic3D, SemanticKITTI)
demonstrate the effectiveness of our proposed method in point clouds semantic
segmentation task.
Noise-Aware Texture-Preserving Low-Light Enhancement
Comments: Accepted by IEEE VCIP 2020. The final version will appear in IEEE VCIP 2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Image and Video Processing (eess.IV)
A simple and effective low-light image enhancement method based on a
noise-aware texture-preserving retinex model is proposed in this work. The new
method, called NATLE, attempts to strike a balance between noise removal and
natural texture preservation through a low-complexity solution. Its cost
function includes an estimated piece-wise smooth illumination map and a
noise-free texture-preserving reflectance map. Afterwards, illumination is
adjusted to form the enhanced image together with the reflectance map.
Extensive experiments are conducted on common low-light image enhancement
datasets to demonstrate the superior performance of NATLE.
Towards Practical Implementations of Person Re-Identification from Full Video Frames
Comments: 7 pages, 9 figures, This paper is under consideration at Pattern Recognition Letters
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
With the major adoption of automation for cities security, person
re-identification (Re-ID) has been extensively studied recently. In this paper,
we argue that the current way of studying person re-identification, i.e. by
trying to re-identify a person within already detected and pre-cropped images
of people, is not sufficient to implement practical security applications,
where the inputs to the system are the full frames of the video streams. To
support this claim, we introduce the Full Frame Person Re-ID setting (FF-PRID)
and define specific metrics to evaluate FF-PRID implementations. To improve
robustness, we also formalize the hybrid human-machine collaboration framework,
which is inherent to any Re-ID security applications. To demonstrate the
importance of considering the FF-PRID setting, we build an experiment showing
that combining a good people detection network with a good Re-ID model does not
necessarily produce good results for the final application. This underlines a
failure of the current formulation in assessing the quality of a Re-ID model
and justifies the use of different metrics. We hope that this work will
motivate the research community to consider the full problem in order to
develop algorithms that are better suited to real-world scenarios.
NITES: A Non-Parametric Interpretable Texture Synthesis Method
Xuejing Lei , Ganning Zhao , C.-C. Jay Kuo Subjects : Computer Vision and Pattern Recognition (cs.CV)
A non-parametric interpretable texture synthesis method, called the NITES
method, is proposed in this work. Although automatic synthesis of visually
pleasant texture can be achieved by deep neural networks nowadays, the
associated generation models are mathematically intractable and their training
demands higher computational cost. NITES offers a new texture synthesis
solution to address these shortcomings. NITES is mathematically transparent and
efficient in training and inference. The input is a single exemplary texture
image. The NITES method crops out patches from the input and analyzes the
statistical properties of these texture patches to obtain their joint
spatial-spectral representations. Then, the probabilistic distributions of
samples in the joint spatial-spectral spaces are characterized. Finally,
numerous texture images that are visually similar to the exemplary texture
image can be generated automatically. Experimental results are provided to show
the superior quality of generated texture images and efficiency of the proposed
NITES method in terms of both training and inference time.
Robust Object Classification Approach using Spherical Harmonics
Ayman Mukhaimar , Ruwan Tennakoon , Chow Yin Lai , Reza Hoseinnezhad , Alireza Bab-Hadiashar Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG)
In this paper, we present a robust spherical harmonics approach for the
classification of point cloud-based objects. Spherical harmonics have been used
for classification over the years, with several frameworks existing in the
literature. These approaches use variety of spherical harmonics based
descriptors to classify objects. We first investigated these frameworks
robustness against data augmentation, such as outliers and noise, as it has not
been studied before. Then we propose a spherical convolution neural network
framework for robust object classification. The proposed framework uses the
voxel grid of concentric spheres to learn features over the unit ball. Our
proposed model learn features that are less sensitive to data augmentation due
to the selected sampling strategy and the designed convolution operation. We
tested our proposed model against several types of data augmentation, such as
noise and outliers. Our results show that the proposed model outperforms the
state of art networks in terms of robustness to data augmentation.
Unsupervised Point Cloud Registration via Salient Points Analysis (SPA)
Comments: 7 pages, 5 figures, final version is accepted by IEEE International Conference on Visual Communications and Image Processing (VCIP) 2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
An unsupervised point cloud registration method, called salient points
analysis (SPA), is proposed in this work. The proposed SPA method can register
two point clouds effectively using only a small subset of salient points. It
first applies the PointHop++ method to point clouds, finds corresponding
salient points in two point clouds based on the local surface characteristics
of points and performs registration by matching the corresponding salient
points. The SPA method offers several advantages over the recent deep learning
based solutions for registration. Deep learning methods such as PointNetLK and
DCP train end-to-end networks and rely on full supervision (namely, ground
truth transformation matrix and class label). In contrast, the SPA is
completely unsupervised. Furthermore, SPA’s training time and model size are
much less. The effectiveness of the SPA method is demonstrated by experiments
on seen and unseen classes and noisy point clouds from the ModelNet-40 dataset.
Unsupervised Feedforward Feature (UFF) Learning for Point Cloud Classification and Segmentation
Comments: 7 pages, 2 figures, the final version is accepted by VCIP 2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
In contrast to supervised backpropagation-based feature learning in deep
neural networks (DNNs), an unsupervised feedforward feature (UFF) learning
scheme for joint classification and segmentation of 3D point clouds is proposed
in this work. The UFF method exploits statistical correlations of points in a
point cloud set to learn shape and point features in a one-pass feedforward
manner through a cascaded encoder-decoder architecture. It learns global shape
features through the encoder and local point features through the concatenated
encoder-decoder architecture. The extracted features of an input point cloud
are fed to classifiers for shape classification and part segmentation.
Experiments are conducted to evaluate the performance of the UFF method. For
shape classification, the UFF is superior to existing unsupervised methods and
on par with state-of-the-art DNNs. For part segmentation, the UFF outperforms
semi-supervised methods and performs slightly worse than DNNs.
Efficiency in Real-time Webcam Gaze Tracking
Comments: Awarded Best Paper at European Conference on Computer Vision (ECCV) Workshop on Eye Gaze in AR, VR, and in the Wild (OpenEyes) 2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Efficiency and ease of use are essential for practical applications of camera
based eye/gaze-tracking. Gaze tracking involves estimating where a person is
looking on a screen based on face images from a computer-facing camera. In this
paper we investigate two complementary forms of efficiency in gaze tracking: 1.
The computational efficiency of the system which is dominated by the inference
speed of a CNN predicting gaze-vectors; 2. The usability efficiency which is
determined by the tediousness of the mandatory calibration of the gaze-vector
to a computer screen. To do so, we evaluate the computational speed/accuracy
trade-off for the CNN and the calibration effort/accuracy trade-off for screen
calibration. For the CNN, we evaluate the full face, two-eyes, and single eye
input. For screen calibration, we measure the number of calibration points
needed and evaluate three types of calibration: 1. pure geometry, 2. pure
machine learning, and 3. hybrid geometric regression. Results suggest that a
single eye input and geometric regression calibration achieve the best
trade-off.
CNN-Based Ultrasound Image Reconstruction for Ultrafast Displacement Tracking
Comments: Main text: 10 pages (3 figures). Animation and slideshow of figure 3 are provided as ancillary files. This work has been submitted to the IEEE Transactions on Medical Imaging for possible publication
Subjects:
Image and Video Processing (eess.IV)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Thanks to its capability of acquiring full-view frames at multiple kilohertz,
ultrafast ultrasound imaging unlocked the analysis of rapidly changing physical
phenomena in the human body, with pioneering applications such as
ultrasensitive flow imaging in the cardiovascular system or shear-wave
elastography. The accuracy achievable with these motion estimation techniques
is strongly contingent upon two contradictory requirements: a high quality of
consecutive frames and a high frame rate. Indeed, the image quality can usually
be improved by increasing the number of steered ultrafast acquisitions, but at
the expense of a reduced frame rate and possible motion artifacts. To achieve
accurate motion estimation at uncompromised frame rates and immune to motion
artifacts, the proposed approach relies on single ultrafast acquisitions to
reconstruct high-quality frames and on only two consecutive frames to obtain
2-D displacement estimates. To this end, we deployed a convolutional neural
network-based image reconstruction method combined with a speckle tracking
algorithm based on cross-correlation. Numerical and in vivo experiments,
conducted in the context of plane-wave imaging, demonstrate that the proposed
approach is capable of estimating displacements in regions where the presence
of side lobe and grating lobe artifacts prevents any displacement estimation
with a state-of-the-art technique that rely on conventional delay-and-sum
beamforming. The proposed approach may therefore unlock the full potential of
ultrafast ultrasound, in applications such as ultrasensitive cardiovascular
motion and flow analysis or shear-wave elastography.
Comments: Submitted to the IEEE for possible publication
Subjects:
Image and Video Processing (eess.IV)
; Computer Vision and Pattern Recognition (cs.CV)
Limited view tomographic reconstruction aims to reconstruct a tomographic
image from a limited number of sinogram or projection views arising from sparse
view or limited angle acquisitions that reduce radiation dose or shorten
scanning time. However, such a reconstruction suffers from high noise and
severe artifacts due to the incompleteness of sinogram. To derive quality
reconstruction, previous state-of-the-art methods use UNet-like neural
architectures to directly predict the full view reconstruction from limited
view data; but these methods leave the deep network architecture issue largely
intact and cannot guarantee the consistency between the sinogram of the
reconstructed image and the acquired sinogram, leading to a non-ideal
reconstruction. In this work, we propose a novel recurrent reconstruction
framework that stacks the same block multiple times. The recurrent block
consists of a custom-designed residual dense spatial-channel attention network.
Further, we develop a sinogram consistency layer interleaved in our recurrent
framework in order to ensure that the sampled sinogram is consistent with the
sinogram of the intermediate outputs of the recurrent blocks. We evaluate our
methods on two datasets. Our experimental results on AAPM Low Dose CT Grand
Challenge datasets demonstrate that our algorithm achieves a consistent and
significant improvement over the existing state-of-the-art neural methods on
both limited angle reconstruction (over 5dB better in terms of PSNR) and sparse
view reconstruction (about 4dB better in term of PSNR). In addition, our
experimental results on Deep Lesion datasets demonstrate that our method is
able to generate high-quality reconstruction for 8 major lesion types.
Software Effort Estimation using parameter tuned Models
Comments: Nine Tables
Subjects:
Software Engineering (cs.SE)
; Computer Vision and Pattern Recognition (cs.CV)
Software estimation is one of the most important activities in the software
project. The software effort estimation is required in the early stages of
software life cycle. Project Failure is the major problem undergoing nowadays
as seen by software project managers. The imprecision of the estimation is the
reason for this problem. Assize of software size grows, it also makes a system
complex, thus difficult to accurately predict the cost of software development
process. The greatest pitfall of the software industry was the fast-changing
nature of software development which has made it difficult to develop
parametric models that yield high accuracy for software development in all
domains. We need the development of useful models that accurately predict the
cost of developing a software product. This study presents the novel analysis
of various regression models with hyperparameter tuning to get the effective
model. Nine different regression techniques are considered for model
development
Peyman Tahghighi , Reza A.Zoroofi , Sareh Saffi , Alireza Ramezani Subjects : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV)
For medical diagnosis based on retinal images, a clear understanding of 3D
structure is often required but due to the 2D nature of images captured, we
cannot infer that information. However, by utilizing 3D reconstruction methods,
we can construct the 3D structure of the macula area on fundus images which can
be helpful for diagnosis and screening of macular disorders. Recent approaches
have used shading information for 3D reconstruction or heightmap prediction but
their output was not accurate since they ignored the dependency between nearby
pixels. Additionally, other methods were dependent on the availability of more
than one image of the eye which is not available in practice. In this paper, we
use conditional generative adversarial networks (cGANs) to generate images that
contain height information of the macula area on a fundus image. Results using
our dataset show a 0.6077 improvement in Structural Similarity Index (SSIM) and
0.071 improvements in Mean Squared Error (MSE) metric over Shape from Shading
(SFS) method. Additionally, Qualitative studies also indicate that our method
outperforms recent approaches.
Multimodal brain tumor classification
Marvin Lerousseau , Eric Deutsh , Nikos Paragios Subjects : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cancer is a complex disease that provides various types of information
depending on the scale of observation. While most tumor diagnostics are
performed by observing histopathological slides, radiology images should yield
additional knowledge towards the efficacy of cancer diagnostics. This work
investigates a deep learning method combining whole slide images and magnetic
resonance images to classify tumors. Experiments are prospectively conducted on
the 2020 Computational Precision Medicine challenge, in a 3-classes unbalanced
classification task. We report cross-validation (resp. validation)
balanced-accuracy, kappa and f1 of 0.913, 0.897 and 0.951 (resp. 0.91, 0.90 and
0.94). The complete code of the method is open-source at XXXX. Those include
histopathological data pre-processing, and can therefore be used off-the-shelf
for other histopathological and/or radiological classification.
Detection-Aware Trajectory Generation for a Drone Cinematographer
Comments: 8 pages, IROS 2020 accepted
Subjects:
Robotics (cs.RO)
; Computer Vision and Pattern Recognition (cs.CV)
This work investigates an efficient trajectory generation for chasing a
dynamic target, which incorporates the detectability objective. The proposed
method actively guides the motion of a cinematographer drone so that the color
of a target is well-distinguished against the colors of the background in the
view of the drone. For the objective, we define a measure of color
detectability given a chasing path. After computing a discrete path optimized
for the metric, we generate a dynamically feasible trajectory. The whole
pipeline can be updated on-the-fly to respond to the motion of the target. For
the efficient discrete path generation, we construct a directed acyclic graph
(DAG) for which a topological sorting can be determined analytically without
the depth-first search. The smooth path is obtained in quadratic programming
(QP) framework. We validate the enhanced performance of state-of-the-art object
detection and tracking algorithms when the camera drone executes the trajectory
obtained from the proposed method.
Fundus Image Analysis for Age Related Macular Degeneration: ADAM-2020 Challenge Report
Sharath M Shankaranarayana Subjects : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV)
Age related macular degeneration (AMD) is one of the major causes for
blindness in the elderly population. In this report, we propose deep learning
based methods for retinal analysis using color fundus images for computer aided
diagnosis of AMD. We leverage the recent state of the art deep networks for
building a single fundus image based AMD classification pipeline. We also
propose methods for the other directly relevant and auxiliary tasks such as
lesions detection and segmentation, fovea detection and optic disc
segmentation. We propose the use of generative adversarial networks (GANs) for
the tasks of segmentation and detection. We also propose a novel method of
fovea detection using GANs.
TopoMap: A 0-dimensional Homology Preserving Projection of High-Dimensional Data
Harish Doraiswamy , Julien Tierny , Paulo J. S. Silva , Luis Gustavo Nonato , Claudio Silva Subjects : Graphics (cs.GR) ; Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Multidimensional Projection is a fundamental tool for high-dimensional data
analytics and visualization. With very few exceptions, projection techniques
are designed to map data from a high-dimensional space to a visual space so as
to preserve some dissimilarity (similarity) measure, such as the Euclidean
distance for example. In fact, although adopting distinct mathematical
formulations designed to favor different aspects of the data, most
multidimensional projection methods strive to preserve dissimilarity measures
that encapsulate geometric properties such as distances or the proximity
relation between data objects. However, geometric relations are not the only
interesting property to be preserved in a projection. For instance, the
analysis of particular structures such as clusters and outliers could be more
reliably performed if the mapping process gives some guarantee as to
topological invariants such as connected components and loops. This paper
introduces TopoMap, a novel projection technique which provides topological
guarantees during the mapping process. In particular, the proposed method
performs the mapping from a high-dimensional space to a visual space, while
preserving the 0-dimensional persistence diagram of the Rips filtration of the
high-dimensional data, ensuring that the filtrations generate the same
connected components when applied to the original as well as projected data.
The presented case studies show that the topological guarantee provided by
TopoMap not only brings confidence to the visual analytic process but also can
be used to assist in the assessment of other projection methods.
TAP-Net: Transport-and-Pack using Reinforcement Learning
Journal-ref: ACM Transactions on Graphics 2020
Subjects:
Graphics (cs.GR)
; Computer Vision and Pattern Recognition (cs.CV)
We introduce the transport-and-pack(TAP) problem, a frequently encountered
instance of real-world packing, and develop a neural optimization solution
based on reinforcement learning. Given an initial spatial configuration of
boxes, we seek an efficient method to iteratively transport and pack the boxes
compactly into a target container. Due to obstruction and accessibility
constraints, our problem has to add a new search dimension, i.e., finding an
optimal transport sequence, to the already immense search space for packing
alone. Using a learning-based approach, a trained network can learn and encode
solution patterns to guide the solution of new problem instances instead of
executing an expensive online search. In our work, we represent the transport
constraints using a precedence graph and train a neural network, coined
TAP-Net, using reinforcement learning to reward efficient and stable packing.
The network is built on an encoder-decoder architecture, where the encoder
employs convolution layers to encode the box geometry and precedence graph and
the decoder is a recurrent neural network (RNN) which inputs the current
encoder output, as well as the current box packing state of the target
container, and outputs the next box to pack, as well as its orientation. We
train our network on randomly generated initial box configurations, without
supervision, via policy gradients to learn optimal TAP policies to maximize
packing efficiency and stability. We demonstrate the performance of TAP-Net on
a variety of examples, evaluating the network through ablation studies and
comparisons to baselines and alternative network designs. We also show that our
network generalizes well to larger problem instances, when trained on
small-sized inputs.
Dexterous Robotic Grasping with Object-Centric Visual Affordances
Priyanka Mandikal , Kristen Grauman Subjects : Robotics (cs.RO) ; Computer Vision and Pattern Recognition (cs.CV)
Dexterous robotic hands are appealing for their agility and human-like
morphology, yet their high degree of freedom makes learning to manipulate
challenging. We introduce an approach for learning dexterous grasping. Our key
idea is to embed an object-centric visual affordance model within a deep
reinforcement learning loop to learn grasping policies that favor the same
object regions favored by people. Unlike traditional approaches that learn from
human demonstration trajectories (e.g., hand joint sequences captured with a
glove), the proposed prior is object-centric and image-based, allowing the
agent to anticipate useful affordance regions for objects unseen during policy
learning. We demonstrate our idea with a 30-DoF five-fingered robotic hand
simulator on 40 objects from two datasets, where it successfully and
efficiently learns policies for stable grasps. Our affordance-guided policies
are significantly more effective, generalize better to novel objects, and train
3 X faster than the baselines. Our work offers a step towards manipulation
agents that learn by watching how people use objects, without requiring state
and action information about the human body. Project website:
this http URLReal Image Super Resolution Via Heterogeneous Model using GP-NAS
Comments: This is a manuscript related to our algorithm that won the ECCV AIM 2020 Real Image Super-Resolution Challenge
Subjects:
Image and Video Processing (eess.IV)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
With advancement in deep neural network (DNN), recent state-of-the-art (SOTA)
image superresolution (SR) methods have achieved impressive performance using
deep residual network with dense skip connections. While these models perform
well on benchmark dataset where low-resolution (LR) images are constructed from
high-resolution (HR) references with known blur kernel, real image SR is more
challenging when both images in the LR-HR pair are collected from real cameras.
Based on existing dense residual networks, a Gaussian process based neural
architecture search (GP-NAS) scheme is utilized to find candidate network
architectures using a large search space by varying the number of dense
residual blocks, the block size and the number of features. A suite of
heterogeneous models with diverse network structure and hyperparameter are
selected for model-ensemble to achieve outstanding performance in real image
SR. The proposed method won the first place in all three tracks of the AIM 2020
Real Image Super-Resolution Challenge.
An Internal Cluster Validity Index Based on Distance-based Separability Measure
Comments: 8 pages, 4 figures. Accepted by ICTAI 2020
Subjects:
Machine Learning (cs.LG)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
To evaluate clustering results is a significant part in cluster analysis.
Usually, there is no true class labels for clustering as a typical unsupervised
learning. Thus, a number of internal evaluations, which use predicted labels
and data, have been created. They also named internal cluster validity indices
(CVIs). Without true labels, to design an effective CVI is not simple because
it is similar to create a clustering method. And, to have more CVIs is crucial
because there is no universal CVI that can be used to measure all datasets, and
no specific method for selecting a proper CVI for clusters without true labels.
Therefore, to apply more CVIs to evaluate clustering results is necessary. In
this paper, we propose a novel CVI – called Distance-based Separability Index
(DSI), based on a data separability measure. We applied the DSI and eight other
internal CVIs including early studies from Dunn (1974) to most recent studies
CVDD (2019) as comparison. We used an external CVI as ground truth for
clustering results of five clustering algorithms on 12 real and 97 synthetic
datasets. Results show DSI is an effective, unique, and competitive CVI to
other compared CVIs. In addition, we summarized the general process to evaluate
CVIs and created a new method – rank difference – to compare the results of
CVIs.
When Image Decomposition Meets Deep Learning: A Novel Infrared and Visible Image Fusion Method
Comments: arXiv admin note: substantial text overlap with arXiv:2003.09210
Subjects:
Image and Video Processing (eess.IV)
; Computer Vision and Pattern Recognition (cs.CV)
Infrared and visible image fusion, as a hot topic in image processing and
image enhancement, aims to produce fused images retaining the detail texture
information in visible images and the thermal radiation information in infrared
images. In this paper, we propose a novel two-stream auto-encoder (AE) based
fusion network. The core idea is that the encoder decomposes an image into base
and detail feature maps with low- and high-frequency information, respectively,
and that the decoder is responsible for the original image reconstruction. To
this end, a well-designed loss function is established to make the base/detail
feature maps similar/dissimilar. In the test phase, base and detail feature
maps are respectively merged via a fusion module, and the fused image is
recovered by the decoder. Qualitative and quantitative results demonstrate that
our method can generate fusion images containing highlighted targets and
abundant detail texture information with strong reproducibility and meanwhile
superior than the state-of-the-art (SOTA) approaches.
Artificial Intelligence
SEDRo: A Simulated Environment for Developmental Robotics
Aishwarya Pothula , Md Ashaduzzaman Rubel Mondol , Sanath Narasimhan , Sm Mazharul Islam , Deokgun Park Subjects : Artificial Intelligence (cs.AI)
Even with impressive advances in application-specific models, we still lack
knowledge about how to build a model that can learn in a human-like way and do
multiple tasks. To learn in a human-like way, we need to provide a diverse
experience that is comparable to humans. In this paper, we introduce our
ongoing effort to build a simulated environment for developmental robotics
(SEDRo). SEDRo provides diverse human experiences ranging from those of a fetus
to a 12th-month-old. A series of simulated tests based on developmental
psychology will be used to evaluate the progress of a learning model. We
anticipate SEDRo to lower the cost of entry and facilitate research in the
developmental robotics community.
Action and Perception as Divergence Minimization
Comments: 13 pages, 10 figures
Subjects:
Artificial Intelligence (cs.AI)
; Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)
We introduce a unified objective for action and perception of intelligent
agents. Extending representation learning and control, we minimize the joint
divergence between the world and a target distribution. Intuitively, such
agents use perception to align their beliefs with the world, and use actions to
align the world with their beliefs. Minimizing the joint divergence to an
expressive target maximizes the mutual information between the agent’s
representations and inputs, thus inferring representations that are informative
of past inputs and exploring future inputs that are informative of the
representations. This lets us derive intrinsic objectives, such as
representation learning, information gain, empowerment, and skill discovery
from minimal assumptions. Moreover, interpreting the target distribution as a
latent variable model suggests expressive world models as a path toward highly
adaptive agents that seek large niches in their environments, while rendering
task rewards optional. The presented framework provides a common language for
comparing a wide range of objectives, facilitates understanding of latent
variables for decision making, and offers a recipe for designing novel
objectives. We recommend deriving future agent objectives from the joint
divergence to facilitate comparison, to point out the agent’s target
distribution, and to identify the intrinsic objective terms needed to reach
that distribution.
Grounded Language Learning Fast and Slow
Felix Hill , Olivier Tieleman , Tamara von Glehn , Nathaniel Wong , Hamza Merzic , Stephen Clark Subjects : Artificial Intelligence (cs.AI)
Recent work has shown that large text-based neural language models, trained
with conventional supervised learning objectives, acquire a surprising
propensity for few- and one-shot learning. Here, we show that an embodied agent
situated in a simulated 3D world, and endowed with a novel dual-coding external
memory, can exhibit similar one-shot word learning when trained with
conventional reinforcement learning algorithms. After a single introduction to
a novel object via continuous visual perception and a language prompt (“This is
a dax”), the agent can re-identify the object and manipulate it as instructed
(“Put the dax on the bed”). In doing so, it seamlessly integrates short-term,
within-episode knowledge of the appropriate referent for the word “dax” with
long-term lexical and motor knowledge acquired across episodes (i.e. “bed” and
“putting”). We find that, under certain training conditions and with a
particular memory writing mechanism, the agent’s one-shot word-object binding
generalizes to novel exemplars within the same ShapeNet category, and is
effective in settings with unfamiliar numbers of objects. We further show how
dual-coding memory can be exploited as a signal for intrinsic motivation,
stimulating the agent to seek names for objects that may be useful for later
executing instructions. Together, the results demonstrate that deep neural
networks can exploit meta-learning, episodic memory and an explicitly
multi-modal environment to account for ‘fast-mapping’, a fundamental pillar of
human cognitive development and a potentially transformative capacity for
agents that interact with human users.
On Population-Based Algorithms for Distributed Constraint Optimization Problems
Comments: 7 Figures. arXiv admin note: text overlap with arXiv:1909.06254 , arXiv:2002.12001
Subjects:
Artificial Intelligence (cs.AI)
; Multiagent Systems (cs.MA)
Distributed Constraint Optimization Problems (DCOPs) are a widely studied
class of optimization problems in which interaction between a set of
cooperative agents are modeled as a set of constraints. DCOPs are NP-hard and
significant effort has been devoted to developing methods for finding
incomplete solutions. In this paper, we study an emerging class of such
incomplete algorithms that are broadly termed as population-based algorithms.
The main characteristic of these algorithms is that they maintain a population
of candidate solutions of a given problem and use this population to cover a
large area of the search space and to avoid local-optima. In recent years, this
class of algorithms has gained significant attention due to their ability to
produce high-quality incomplete solutions. With the primary goal of further
improving the quality of solutions compared to the state-of-the-art incomplete
DCOP algorithms, we present two new population-based algorithms in this paper.
Our first approach, Anytime Evolutionary DCOP or AED, exploits evolutionary
optimization meta-heuristics to solve DCOPs. We also present a novel anytime
update mechanism that gives AED its anytime property. While in our second
contribution, we show that population-based approaches can be combined with
local search approaches. Specifically, we develop an algorithm called DPSA
based on the Simulated Annealing meta-heuristic. We empirically evaluate these
two algorithms to illustrate their respective effectiveness in different
settings against the state-of-the-art incomplete DCOP algorithms including all
existing population-based algorithms in a wide variety of benchmarks. Our
evaluation shows AED and DPSA markedly outperform the state-of-the-art and
produce up to 75% improved solutions.
Derived metrics for the game of Go — intrinsic network strength assessment and cheat-detection
Comments: 16 pages, 12 figures, final version will be published elsewhere
Subjects:
Artificial Intelligence (cs.AI)
The widespread availability of superhuman AI engines is changing how we play
the ancient game of Go. The open-source software packages developed after the
AlphaGo series shifted focus from producing strong playing entities to
providing tools for analyzing games. Here we describe two ways of how the
innovations of the second generation engines (e.g.~score estimates, variable
komi) can be used for defining new metrics that help deepen our understanding
of the game. First, we study how much information the search component
contributes in addition to the raw neural network policy output. This gives an
intrinsic strength measurement for the neural network. Second, we define the
effect of a move by the difference in score estimates. This gives a
fine-grained, move-by-move performance evaluation of a player. We use this in
combating the new challenge of detecting online cheating.
Fairness in the Eyes of the Data: Certifying Machine-Learning Models
Shahar Segal , Yossi Adi , Benny Pinkas , Carsten Baum , Chaya Ganesh , Joseph Keshet Subjects : Artificial Intelligence (cs.AI) ; Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)
We present a framework that allows to certify the fairness degree of a model
based on an interactive and privacy-preserving test. The framework verifies any
trained model, regardless of its training process and architecture. Thus, it
allows us to evaluate any deep learning model on multiple fairness definitions
empirically. We tackle two scenarios, where either the test data is privately
available only to the tester or is publicly known in advance, even to the model
creator. We investigate the soundness of the proposed approach using
theoretical analysis and present statistical guarantees for the interactive
test. Finally, we provide a cryptographic technique to automate fairness
testing and certified inference with only black-box access to the model at hand
while hiding the participants’ sensitive data.
User Intention Recognition and Requirement Elicitation Method for Conversational AI Services
Comments: accepted as a full paper at IEEE ICWS 2020
Subjects:
Artificial Intelligence (cs.AI)
In recent years, chat-bot has become a new type of intelligent terminal to
guide users to consume services. However, it is criticized most that the
services it provides are not what users expect or most expect. This defect
mostly dues to two problems, one is that the incompleteness and uncertainty of
user’s requirement expression caused by the information asymmetry, the other is
that the diversity of service resources leads to the difficulty of service
selection. Conversational bot is a typical mesh device, so the guided
multi-rounds Q(&)A is the most effective way to elicit user requirements.
Obviously, complex Q(&)A with too many rounds is boring and always leads to
bad user experience. Therefore, we aim to obtain user requirements as
accurately as possible in as few rounds as possible. To achieve this, a user
intention recognition method based on Knowledge Graph (KG) was developed for
fuzzy requirement inference, and a requirement elicitation method based on
Granular Computing was proposed for dialog policy generation. Experimental
results show that these two methods can effectively reduce the number of
conversation rounds, and can quickly and accurately identify the user
intention.
Learning to Infer User Hidden States for Online Sequential Advertising
Comments: to be published in CIKM 2020
Subjects:
Artificial Intelligence (cs.AI)
To drive purchase in online advertising, it is of the advertiser’s great
interest to optimize the sequential advertising strategy whose performance and
interpretability are both important. The lack of interpretability in existing
deep reinforcement learning methods makes it not easy to understand, diagnose
and further optimize the strategy. In this paper, we propose our Deep Intents
Sequential Advertising (DISA) method to address these issues. The key part of
interpretability is to understand a consumer’s purchase intent which is,
however, unobservable (called hidden states). In this paper, we model this
intention as a latent variable and formulate the problem as a Partially
Observable Markov Decision Process (POMDP) where the underlying intents are
inferred based on the observable behaviors. Large-scale industrial offline and
online experiments demonstrate our method’s superior performance over several
baselines. The inferred hidden states are analyzed, and the results prove the
rationality of our inference.
FairXGBoost: Fairness-aware Classification in XGBoost
Srinivasan Ravichandran , Drona Khurana , Bharath Venkatesh , Narayanan Unny Edakunni Subjects : Artificial Intelligence (cs.AI)
Highly regulated domains such as finance have long favoured the use of
machine learning algorithms that are scalable, transparent, robust and yield
better performance. One of the most prominent examples of such an algorithm is
XGBoost. Meanwhile, there is also a growing interest in building fair and
unbiased models in these regulated domains and numerous bias-mitigation
algorithms have been proposed to this end. However, most of these
bias-mitigation methods are restricted to specific model families such as
logistic regression or support vector machine models, thus leaving modelers
with a difficult decision of choosing between fairness from the bias-mitigation
algorithms and scalability, transparency, performance from algorithms such as
XGBoost. We aim to leverage the best of both worlds by proposing a fair variant
of XGBoost that enjoys all the advantages of XGBoost, while also matching the
levels of fairness from the state-of-the-art bias-mitigation algorithms.
Furthermore, the proposed solution requires very little in terms of changes to
the original XGBoost library, thus making it easy for adoption. We provide an
empirical analysis of our proposed method on standard benchmark datasets used
in the fairness community.
Sparse Meta Networks for Sequential Adaptation and its Application to Adaptive Language Modelling
Comments: 9 pages, 4 figures, 2 tables
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Training a deep neural network requires a large amount of single-task data
and involves a long time-consuming optimization phase. This is not scalable to
complex, realistic environments with new unexpected changes. Humans can perform
fast incremental learning on the fly and memory systems in the brain play a
critical role. We introduce Sparse Meta Networks — a meta-learning approach to
learn online sequential adaptation algorithms for deep neural networks, by
using deep neural networks. We augment a deep neural network with a
layer-specific fast-weight memory. The fast-weights are generated sparsely at
each time step and accumulated incrementally through time providing a useful
inductive bias for online continual adaptation. We demonstrate strong
performance on a variety of sequential adaptation scenarios, from a simple
online reinforcement learning to a large scale adaptive language modelling.
Comments: ARRW@ECCV2020
Subjects:
Machine Learning (stat.ML)
; Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)
Deep neural networks have been successful in diverse discriminative
classification tasks, although, they are poorly calibrated often assigning high
probability to misclassified predictions. Potential consequences could lead to
trustworthiness and accountability of the models when deployed in real
applications, where predictions are evaluated based on their confidence scores.
Existing solutions suggest the benefits attained by combining deep neural
networks and Bayesian inference to quantify uncertainty over the models’
predictions for ambiguous datapoints. In this work we propose to validate and
test the efficacy of likelihood based models in the task of out of distribution
detection (OoD). Across different datasets and metrics we show that Bayesian
deep learning models on certain occasions marginally outperform conventional
neural networks and in the event of minimal overlap between in/out distribution
classes, even the best models exhibit a reduction in AUC scores in detecting
OoD data. Preliminary investigations indicate the potential inherent role of
bias due to choices of initialisation, architecture or activation functions. We
hypothesise that the sensitivity of neural networks to unseen inputs could be a
multi-factor phenomenon arising from the different architectural design choices
often amplified by the curse of dimensionality. Furthermore, we perform a study
to find the effect of the adversarial noise resistance methods on in and
out-of-distribution performance, as well as, also investigate adversarial noise
robustness of Bayesian deep learners.
HyperBench: A Benchmark and Tool for Hypergraphs and Empirical Findings
Comments: arXiv admin note: substantial text overlap with arXiv:1811.08181
Subjects:
Databases (cs.DB)
; Artificial Intelligence (cs.AI)
To cope with the intractability of answering Conjunctive Queries (CQs) and
solving Constraint Satisfaction Problems (CSPs), several notions of hypergraph
decompositions have been proposed — giving rise to different notions of width,
noticeably, plain, generalized, and fractional hypertree width (hw, ghw, and
fhw). Given the increasing interest in using such decomposition methods in
practice, a publicly accessible repository of decomposition software, as well
as a large set of benchmarks, and a web-accessible workbench for inserting,
analyzing, and retrieving hypergraphs are called for.
We address this need by providing (i) concrete implementations of hypergraph
decompositions (including new practical algorithms), (ii) a new, comprehensive
benchmark of hypergraphs stemming from disparate CQ and CSP collections, and
(iii) HyperBench, our new web-inter-face for accessing the benchmark and the
results of our analyses. In addition, we describe a number of actual
experiments we carried out with this new infrastructure.
Synthetic-to-Real Unsupervised Domain Adaptation for Scene Text Detection in the Wild
Weijia Wu , Ning Lu , Enze Xie Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Deep learning-based scene text detection can achieve preferable performance,
powered with sufficient labeled training data. However, manual labeling is time
consuming and laborious. At the extreme, the corresponding annotated data are
unavailable. Exploiting synthetic data is a very promising solution except for
domain distribution mismatches between synthetic datasets and real datasets. To
address the severe domain distribution mismatch, we propose a synthetic-to-real
domain adaptation method for scene text detection, which transfers knowledge
from synthetic data (source domain) to real data (target domain). In this
paper, a text self-training (TST) method and adversarial text instance
alignment (ATA) for domain adaptive scene text detection are introduced. ATA
helps the network learn domain-invariant features by training a domain
classifier in an adversarial manner. TST diminishes the adverse effects of
false positives~(FPs) and false negatives~(FNs) from inaccurate pseudo-labels.
Two components have positive effects on improving the performance of scene text
detectors when adapting from synthetic-to-real scenes. We evaluate the proposed
method by transferring from SynthText, VISD to ICDAR2015, ICDAR2013. The
results demonstrate the effectiveness of the proposed method with up to 10%
improvement, which has important exploration significance for domain adaptive
scene text detection. Code is available at
this https URLMax-value Entropy Search for Multi-Objective Bayesian Optimization with Constraints
Comments: 2 figure, 1 table. arXiv admin note: text overlap with arXiv:2008.07029
Subjects:
Machine Learning (cs.LG)
; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We consider the problem of constrained multi-objective blackbox optimization
using expensive function evaluations, where the goal is to approximate the true
Pareto set of solutions satisfying a set of constraints while minimizing the
number of function evaluations. For example, in aviation power system design
applications, we need to find the designs that trade-off total energy and the
mass while satisfying specific thresholds for motor temperature and voltage of
cells. This optimization requires performing expensive computational
simulations to evaluate designs. In this paper, we propose a new approach
referred as {em Max-value Entropy Search for Multi-objective Optimization with
Constraints (MESMOC)} to solve this problem. MESMOC employs an output-space
entropy based acquisition function to efficiently select the sequence of inputs
for evaluation to uncover high-quality pareto-set solutions while satisfying
constraints.
We apply MESMOC to two real-world engineering design applications to
demonstrate its effectiveness over state-of-the-art algorithms.
Multi-Loss Weighting with Coefficient of Variations
Rick Groenendijk , Sezer Karaoglu , Theo Gevers , Thomas Mensink Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Many interesting tasks in machine learning and computer vision are learned by
optimising an objective function defined as a weighted linear combination of
multiple losses. The final performance is sensitive to choosing the correct
(relative) weights for these losses. Finding a good set of weights is often
done by adopting them into the set of hyper-parameters, which are set using an
extensive grid search. This is computationally expensive. In this paper, the
weights are defined based on properties observed while training the model,
including the specific batch loss, the average loss, and the variance for each
of the losses. An additional advantage is that the defined weights evolve
during training, instead of using static loss weights. In literature, loss
weighting is mostly used in a multi-task learning setting, where the different
tasks obtain different weights. However, there is a plethora of single-task
multi-loss problems that can benefit from automatic loss weighting. In this
paper, it is shown that these multi-task approaches do not work on single
tasks. Instead, a method is proposed that automatically and dynamically tunes
loss weights throughout training specifically for single-task multi-loss
problems. The method incorporates a measure of uncertainty to balance the
losses. The validity of the approach is shown empirically for different tasks
on multiple datasets.
Quasi-symplectic Langevin Variational Autoencoder
Zihao Wang , Hervé Delingette Subjects : Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Variational autoencoder (VAE) as one of the well investigated generative
model is very popular in nowadays neural learning research works. To leverage
VAE in practical tasks which have high dimensions and huge dataset often face
the problem of low variance evidence lower bounds construction. Markov chain
Monte Carlo (MCMC) is an effective approach to tight the evidence lower bound
(ELBO) for approximating the posterior distribution. Hamiltonian Variational
Autoencoder (HVAE) is one of the effective MCMC inspired approaches for
constructing the unbiased low-variance ELBO which is also amenable for
reparameterization trick. The solution significantly improves the performance
of the posterior estimation effectiveness, yet, a main drawback of HVAE is the
leapfrog method need to access the posterior gradient twice which leads to bad
inference efficiency performance and the GPU memory requirement is fair large.
This flaw limited the application of Hamiltonian based inference framework for
large scale networks inference. To tackle this problem, we propose a
Quasi-symplectic Langevin Variational autoencoder (Langevin-VAE), which can be
a significant improvement over resource usage efficiency. We qualitatively and
quantitatively demonstrate the effectiveness of the Langevin-VAE compared to
the state-of-art gradients informed inference framework.
Deep Learning Based Antenna Selection for Channel Extrapolation in FDD Massive MIMO
Comments: 6 pages, 5 figures
Subjects:
Signal Processing (eess.SP)
; Artificial Intelligence (cs.AI)
In massive multiple-input multiple-output (MIMO) systems, the large number of
antennas would bring a great challenge for the acquisition of the accurate
channel state information, especially in the frequency division duplex mode. To
overcome the bottleneck of the limited number of radio links in hybrid
beamforming, we utilize the neural networks (NNs) to capture the inherent
connection between the uplink and downlink channel data sets and extrapolate
the downlink channels from a subset of the uplink channel state information. We
study the antenna subset selection problem in order to achieve the best channel
extrapolation and decrease the data size of NNs. The probabilistic sampling
theory is utilized to approximate the discrete antenna selection as a
continuous and differentiable function, which makes the back propagation of the
deep learning feasible. Then, we design the proper off-line training strategy
to optimize both the antenna selection pattern and the extrapolation NNs.
Finally, numerical results are presented to verify the effectiveness of our
proposed massive MIMO channel extrapolation algorithm.
Comments: 30 pages, 14 figures
Subjects:
Signal Processing (eess.SP)
; Artificial Intelligence (cs.AI)
To capture the communications gain of the massive radiating elements with low
power cost, the conventional reconfigurable intelligent surface (RIS) usually
works in passive mode. However, due to the cascaded channel structure and the
lack of signal processing ability, it is difficult for RIS to obtain the
individual channel state information and optimize the beamforming vector. In
this paper, we add signal processing units for a few antennas at RIS to
partially acquire the channels. To solve the crucial active antenna selection
problem, we construct an active antenna selection network that utilizes the
probabilistic sampling theory to select the optimal locations of these active
antennas. With this active antenna selection network, we further design two
deep learning (DL) based schemes, i.e., the channel extrapolation scheme and
the beam searching scheme, to enable the RIS communication system. The former
utilizes the selection network and a convolutional neural network to
extrapolate the full channels from the partial channels received by the active
RIS antennas, while the latter adopts a fully-connected neural network to
achieve the direct mapping between the partial channels and the optimal
beamforming vector with maximal transmission rate. Simulation results are
provided to demonstrate the effectiveness of the designed DL-based schemes.
Comments: Surgan Jandial, Ayush Chopra and Pinkesh Badjatiya contributed equally to this work
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI)
The ability to efficiently search for images over an indexed database is the
cornerstone for several user experiences. Incorporating user feedback, through
multi-modal inputs provide flexible and interaction to serve fine-grained
specificity in requirements. We specifically focus on text feedback, through
descriptive natural language queries. Given a reference image and textual user
feedback, our goal is to retrieve images that satisfy constraints specified by
both of these input modalities. The task is challenging as it requires
understanding the textual semantics from the text feedback and then applying
these changes to the visual representation. To address these challenges, we
propose a novel architecture TRACE which contains a hierarchical feature
aggregation module to learn the composite visio-linguistic representations.
TRACE achieves the SOTA performance on 3 benchmark datasets: FashionIQ, Shoes,
and Birds-to-Words, with an average improvement of at least ~5.7%, ~3%, and ~5%
respectively in R@K metric. Our extensive experiments and ablation studies show
that TRACE consistently outperforms the existing techniques by significant
margins both quantitatively and qualitatively.
Penalty and Augmented Lagrangian Methods for Layer-parallel Training of Residual Networks
Qi Sun , Hexing Dong , Zewei Chen , Weizhen Dian , Jiacheng Sun , Yitong Sun , Zhenguo Li , Bin Dong Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Algorithms for training residual networks (ResNets) typically require forward
pass of data, followed by backpropagating of loss gradient to perform parameter
updates, which can take many hours or even days for networks with hundreds of
layers. Inspired by the penalty and augmented Lagrangian methods, a
layer-parallel training algorithm is proposed in this work to overcome the
scalability barrier caused by the serial nature of forward-backward propagation
in deep residual learning. Moreover, by viewing the supervised classification
task as a numerical discretization of the terminal control problem, we bridge
the concept of synthetic gradient for decoupling backpropagation with the
parareal method for solving differential equations, which not only offers a
novel perspective on the design of synthetic loss function but also performs
parameter updates with reduced storage overhead. Experiments on a preliminary
example demonstrate that the proposed algorithm achieves comparable or even
better testing accuracy to the full serial backpropagation approach, while
enabling layer-parallelism can provide speedup over the traditional
layer-serial training methods.
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
Long Chen , Wenbo Ma , Jun Xiao , Hanwang Zhang , Wei Liu , Shih-Fu Chang Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
The prevailing framework for solving referring expression grounding is based
on a two-stage process: 1) detecting proposals with an object detector and 2)
grounding the referent to one of the proposals. Existing two-stage solutions
mostly focus on the grounding step, which aims to align the expressions with
the proposals. In this paper, we argue that these methods overlook an obvious
mismatch between the roles of proposals in the two stages: they generate
proposals solely based on the detection confidence (i.e., expression-agnostic),
hoping that the proposals contain all right instances in the expression (i.e.,
expression-aware). Due to this mismatch, current two-stage methods suffer from
a severe performance drop between detected and ground-truth proposals. To this
end, we propose Ref-NMS, which is the first method to yield expression-aware
proposals at the first stage. Ref-NMS regards all nouns in the expression as
critical objects, and introduces a lightweight module to predict a score for
aligning each box with a critical object. These scores can guide the
NMSoperation to filter out the boxes irrelevant to the expression, increasing
the recall of critical objects, resulting in a significantly improved grounding
performance. Since Ref-NMS is agnostic to the grounding step, it can be easily
integrated into any state-of-the-art two-stage method. Extensive ablation
studies on several backbones, benchmarks, and tasks consistently demonstrate
the superiority of Ref-NMS.
Computational prediction of RNA tertiary structures using machine learning methods
Comments: 20 pages, 2 figures. Chinese Physics B, Aug. 2020
Journal-ref: Chinese Physics B, Sept. 2020
Subjects:
Biological Physics (physics.bio-ph)
; Artificial Intelligence (cs.AI)
RNAs play crucial and versatile roles in biological processes. Computational
prediction approaches can help to understand RNA structures and their
stabilizing factors, thus providing information on their functions, and
facilitating the design of new RNAs. Machine learning (ML) techniques have made
tremendous progress in many fields in the past few years. Although their usage
in protein-related fields has a long history, the use of ML methods in
predicting RNA tertiary structures is new and rare. Here, we review the recent
advances of using ML methods on RNA structure predictions and discuss the
advantages and limitation, the difficulties and potentials of these approaches
when applied in the field.
Tasks Integrated Networks: Joint Detection and Retrieval for Image Search
Comments: To appear in IEEE TPAMI, 18 pages
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI)
The traditional object retrieval task aims to learn a discriminative feature
representation with intra-similarity and inter-dissimilarity, which supposes
that the objects in an image are manually or automatically pre-cropped exactly.
However, in many real-world searching scenarios (e.g., video surveillance), the
objects (e.g., persons, vehicles, etc.) are seldom accurately detected or
annotated. Therefore, object-level retrieval becomes intractable without
bounding-box annotation, which leads to a new but challenging topic, i.e.
image-level search. In this paper, to address the image search issue, we first
introduce an end-to-end Integrated Net (I-Net), which has three merits: 1) A
Siamese architecture and an on-line pairing strategy for similar and dissimilar
objects in the given images are designed. 2) A novel on-line pairing (OLP) loss
is introduced with a dynamic feature dictionary, which alleviates the
multi-task training stagnation problem, by automatically generating a number of
negative pairs to restrict the positives. 3) A hard example priority (HEP)
based softmax loss is proposed to improve the robustness of classification task
by selecting hard categories. With the philosophy of divide and conquer, we
further propose an improved I-Net, called DC-I-Net, which makes two new
contributions: 1) two modules are tailored to handle different tasks separately
in the integrated framework, such that the task specification is guaranteed. 2)
A class-center guided HEP loss (C2HEP) by exploiting the stored class centers
is proposed, such that the intra-similarity and inter-dissimilarity can be
captured for ultimate retrieval. Extensive experiments on famous image-level
search oriented benchmark datasets demonstrate that the proposed DC-I-Net
outperforms the state-of-the-art tasks-integrated and tasks-separated image
search models.
Learning to summarize from human feedback
Nisan Stiennon , Long Ouyang , Jeff Wu , Daniel M. Ziegler , Ryan Lowe , Chelsea Voss , Alec Radford , Dario Amodei , Paul Christiano Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
As language models become more powerful, training and evaluation are
increasingly bottlenecked by the data and metrics used for a particular task.
For example, summarization models are often trained to predict human reference
summaries and evaluated using ROUGE, but both of these metrics are rough
proxies for what we really care about—summary quality. In this work, we show
that it is possible to significantly improve summary quality by training a
model to optimize for human preferences. We collect a large, high-quality
dataset of human comparisons between summaries, train a model to predict the
human-preferred summary, and use that model as a reward function to fine-tune a
summarization policy using reinforcement learning. We apply our method to a
version of the TL;DR dataset of Reddit posts and find that our models
significantly outperform both human reference summaries and much larger models
fine-tuned with supervised learning alone. Our models also transfer to CNN/DM
news articles, producing summaries nearly as good as the human reference
without any news-specific fine-tuning. We conduct extensive analyses to
understand our human feedback dataset and fine-tuned models. We establish that
our reward model generalizes to new datasets, and that optimizing our reward
model results in better summaries than optimizing ROUGE according to humans. We
hope the evidence from our paper motivates machine learning researchers to pay
closer attention to how their training loss affects the model behavior they
actually want.
Convolutional Speech Recognition with Pitch and Voice Quality Features
Comments: 5 pages
Subjects:
Audio and Speech Processing (eess.AS)
; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
The effects of adding pitch and voice quality features such as jitter and
shimmer to a state-of-the-art CNN model for Automatic Speech Recognition are
studied in this work. Pitch features have been previously used for improving
classical HMM and DNN baselines, while jitter and shimmer parameters have
proven to be useful for tasks like speaker or emotion recognition. Up to our
knowledge, this is the first work combining such pitch and voice quality
features with modern convolutional architectures, showing improvements up to 2%
absolute WER points, for the publicly available Spanish Common Voice dataset.
Particularly, our work combines these features with mel-frequency spectral
coefficients (MFSCs) to train a convolutional architecture with Gated Linear
Units (Conv GLUs). Such models have shown to yield small word error rates,
while being very suitable for parallel processing for online streaming
recognition use cases. We have added pitch and voice quality functionality to
Facebook’s wav2letter speech recognition framework, and we provide with such
code and recipes to the community, to carry on with further experiments.
Besides, to the best of our knowledge, our Spanish Common Voice recipe is the
first public Spanish recipe for wav2letter.
Efficiency in Real-time Webcam Gaze Tracking
Comments: Awarded Best Paper at European Conference on Computer Vision (ECCV) Workshop on Eye Gaze in AR, VR, and in the Wild (OpenEyes) 2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Efficiency and ease of use are essential for practical applications of camera
based eye/gaze-tracking. Gaze tracking involves estimating where a person is
looking on a screen based on face images from a computer-facing camera. In this
paper we investigate two complementary forms of efficiency in gaze tracking: 1.
The computational efficiency of the system which is dominated by the inference
speed of a CNN predicting gaze-vectors; 2. The usability efficiency which is
determined by the tediousness of the mandatory calibration of the gaze-vector
to a computer screen. To do so, we evaluate the computational speed/accuracy
trade-off for the CNN and the calibration effort/accuracy trade-off for screen
calibration. For the CNN, we evaluate the full face, two-eyes, and single eye
input. For screen calibration, we measure the number of calibration points
needed and evaluate three types of calibration: 1. pure geometry, 2. pure
machine learning, and 3. hybrid geometric regression. Results suggest that a
single eye input and geometric regression calibration achieve the best
trade-off.
Information Retrieval
Exploring Artist Gender Bias in Music Recommendation
Comments: To be presented at 2nd Workshop on the Impact of Recommender Systems (ImpactRS), at the 14th ACM Conference on Recommender Systems (RecSys 2020)
Subjects:
Information Retrieval (cs.IR)
Music Recommender Systems (mRS) are designed to give personalised and
meaningful recommendations of items (i.e. songs, playlists or artists) to a
user base, thereby reflecting and further complementing individual users’
specific music preferences. Whilst accuracy metrics have been widely applied to
evaluate recommendations in mRS literature, evaluating a user’s item utility
from other impact-oriented perspectives, including their potential for
discrimination, is still a novel evaluation practice in the music domain. In
this work, we center our attention on a specific phenomenon for which we want
to estimate if mRS may exacerbate its impact: extit{gender bias}.Our work
presents an exploratory study, analyzing the extent to which commonly deployed
state of the art Collaborative Filtering (CF) algorithms may act to further
increase or decrease artist gender bias. % on a popular music streaming
platform, this http URL . To assess group biases introduced by CF, we deploy a
recently proposed metric of bias disparity on two listening event datasets: the
LFM-1b dataset, and the earlier constructed Celma’s dataset. Our work traces
the causes of disparity to variations in input gender distributions and
user-item preferences, highlighting the effect such configurations can have on
user’s gender bias after recommendation generation.
Comparing Fair Ranking Metrics
Amifa Raj , Connor Wood , Ananda Montoly , Michael D. Ekstrand Subjects : Information Retrieval (cs.IR)
Ranking is a fundamental aspect of recommender systems. However, ranked
outputs can be susceptible to various biases; some of these may cause
disadvantages to members of protected groups. Several metrics have been
proposed to quantify the (un)fairness of rankings, but there has not been to
date any direct comparison of these metrics. This complicates deciding what
fairness metrics are applicable for specific scenarios, and assessing the
extent to which metrics agree or disagree. In this paper, we describe several
fair ranking metrics in a common notation, enabling direct comparison of their
approaches and assumptions, and empirically compare them on the same
experimental setup and data set. Our work provides a direct comparative
analysis identifying similarities and differences of fair ranking metrics
selected for our work.
Computation and Language
A Python Library for Exploratory Data Analysis and Knowledge Discovery on Twitter Data
Mario Graff , Daniela Moctezuma , Sabino Miranda-Jiménez , Eric S. Tellez Subjects : Computation and Language (cs.CL)
Twitter is perhaps the social media more amenable for research. It requires
only a few steps to obtain information, and there are plenty of libraries that
can help in this regard. Nonetheless, knowing whether a particular event is
expressed on Twitter is a challenging task that requires a considerable
collection of tweets. This proposal aims to facilitate, a researcher interested
in Twitter data, the process of mining events on Twitter. The events could be
related to natural disasters, health issues, people’s mobility, among other
studies that can be pursued with the library proposed. Different applications
are presented in this contribution to illustrate the library’s capabilities,
starting from an exploratory analysis of the topics discovered in tweets,
following it by studying the similarity among dialects of the Spanish language,
and complementing it with a mobility report on different countries. In summary,
the Python library presented retrieves a plethora of information processed from
Twitter (since December 2015) in terms of words, bigrams of words, and their
frequencies by day for Arabic, English, Spanish, and Russian languages.
Finally, the mobility information considered is related to the number of
travels among locations for more than 245 countries or territories.
The ADAPT Enhanced Dependency Parser at the IWPT 2020 Shared Task
Comments: Submitted to the 2020 IWPT shared task on parsing Enhanced Universal Dependencies
Journal-ref: Proceedings of the 16th International Conference on Parsing
Technologies and the IWPT 2020 Shared Task (2020) 227-235
Subjects:
Computation and Language (cs.CL)
We describe the ADAPT system for the 2020 IWPT Shared Task on parsing
enhanced Universal Dependencies in 17 languages. We implement a pipeline
approach using UDPipe and UDPipe-future to provide initial levels of
annotation. The enhanced dependency graph is either produced by a graph-based
semantic dependency parser or is built from the basic tree using a small set of
heuristics. Our results show that, for the majority of languages, a semantic
dependency parser can be successfully applied to the task of parsing enhanced
dependencies.
Unfortunately, we did not ensure a connected graph as part of our pipeline
approach and our competition submission relied on a last-minute fix to pass the
validation script which harmed our official evaluation scores significantly.
Our submission ranked eighth in the official evaluation with a macro-averaged
coarse ELAS F1 of 67.23 and a treebank average of 67.49. We later implemented
our own graph-connecting fix which resulted in a score of 79.53 (language
average) or 79.76 (treebank average), which would have placed fourth in the
competition evaluation.
SRQA: Synthetic Reader for Factoid Question Answering
Comments: arXiv admin note: text overlap with arXiv:1809.00676
Journal-ref: Knowledge-Based Systems, Volume 193, 6 April 2020, 105415
Subjects:
Computation and Language (cs.CL)
; Machine Learning (cs.LG)
The question answering system can answer questions from various fields and
forms with deep neural networks, but it still lacks effective ways when facing
multiple evidences. We introduce a new model called SRQA, which means Synthetic
Reader for Factoid Question Answering. This model enhances the question
answering system in the multi-document scenario from three aspects: model
structure, optimization goal, and training method, corresponding to Multilayer
Attention (MA), Cross Evidence (CE), and Adversarial Training (AT)
respectively. First, we propose a multilayer attention network to obtain a
better representation of the evidences. The multilayer attention mechanism
conducts interaction between the question and the passage within each layer,
making the token representation of evidences in each layer takes the
requirement of the question into account. Second, we design a cross evidence
strategy to choose the answer span within more evidences. We improve the
optimization goal, considering all the answers’ locations in multiple evidences
as training targets, which leads the model to reason among multiple evidences.
Third, adversarial training is employed to high-level variables besides the
word embedding in our model. A new normalization method is also proposed for
adversarial perturbations so that we can jointly add perturbations to several
target variables. As an effective regularization method, adversarial training
enhances the model’s ability to process noisy data. Combining these three
strategies, we enhance the contextual representation and locating ability of
our model, which could synthetically extract the answer span from several
evidences. We perform SRQA on the WebQA dataset, and experiments show that our
model outperforms the state-of-the-art models (the best fuzzy score of our
model is up to 78.56%, with an improvement of about 2%).
Biomedical named entity recognition using BERT in the machine reading comprehension framework
Comments: 8 pages, 2 figures
Subjects:
Computation and Language (cs.CL)
Recognition of biomedical entities from literature is a challenging research
focus, which is the foundation for extracting a large amount of biomedical
knowledge existing in unstructured texts into structured formats. Using the
sequence labeling framework to implement biomedical named entity recognition
(BioNER) is currently a conventional method. This method, however, often cannot
take full advantage of the semantic information in the dataset, and the
performance is not always satisfactory. In this work, instead of treating the
BioNER task as a sequence labeling problem, we formulate it as a machine
reading comprehension (MRC) problem. This formulation can introduce more prior
knowledge utilizing well-designed queries, and no longer need decoding
processes such as conditional random fields (CRF). We conduct experiments on
six BioNER datasets, and the experimental results demonstrate the effectiveness
of our method. Our method achieves state-of-the-art (SOTA) performance on the
BC4CHEMD, BC5CDR-Chem, BC5CDR-Disease, NCBI Disease, BC2GM and JNLPBA datasets,
with F1-scores of 92.38%, 94.19%, 87.36%, 90.04%, 84.98% and 78.93%,
respectively.
orgFAQ: A New Dataset and Analysis on Organizational FAQs and User Questions
Guy Lev , Michal Shmueli-Scheuer , Achiya Jerbi , David Konopnicki Subjects : Computation and Language (cs.CL)
Frequently Asked Questions (FAQ) webpages are created by organizations for
their users. FAQs are used in several scenarios, e.g., to answer user
questions. On the other hand, the content of FAQs is affected by user questions
by definition. In order to promote research in this field, several FAQ datasets
exist. However, we claim that being collected from community websites, they do
not correctly represent challenges associated with FAQs in an organizational
context. Thus, we release orgFAQ, a new dataset composed of (6988) user
questions and (1579) corresponding FAQs that were extracted from organizations’
FAQ webpages in the Jobs domain. In this paper, we provide an analysis of the
properties of such FAQs, and demonstrate the usefulness of our new dataset by
utilizing it in a relevant task from the Jobs domain. We also show the value of
the orgFAQ dataset in a task of a different domain – the COVID-19 pandemic.
Learning to summarize from human feedback
Nisan Stiennon , Long Ouyang , Jeff Wu , Daniel M. Ziegler , Ryan Lowe , Chelsea Voss , Alec Radford , Dario Amodei , Paul Christiano Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
As language models become more powerful, training and evaluation are
increasingly bottlenecked by the data and metrics used for a particular task.
For example, summarization models are often trained to predict human reference
summaries and evaluated using ROUGE, but both of these metrics are rough
proxies for what we really care about—summary quality. In this work, we show
that it is possible to significantly improve summary quality by training a
model to optimize for human preferences. We collect a large, high-quality
dataset of human comparisons between summaries, train a model to predict the
human-preferred summary, and use that model as a reward function to fine-tune a
summarization policy using reinforcement learning. We apply our method to a
version of the TL;DR dataset of Reddit posts and find that our models
significantly outperform both human reference summaries and much larger models
fine-tuned with supervised learning alone. Our models also transfer to CNN/DM
news articles, producing summaries nearly as good as the human reference
without any news-specific fine-tuning. We conduct extensive analyses to
understand our human feedback dataset and fine-tuned models. We establish that
our reward model generalizes to new datasets, and that optimizing our reward
model results in better summaries than optimizing ROUGE according to humans. We
hope the evidence from our paper motivates machine learning researchers to pay
closer attention to how their training loss affects the model behavior they
actually want.
A Simple Global Neural Discourse Parser
Yichu Zhou , Omri Koshorek , Vivek Srikumar , Jonathan Berant Subjects : Computation and Language (cs.CL)
Discourse parsing is largely dominated by greedy parsers with
manually-designed features, while global parsing is rare due to its
computational expense. In this paper, we propose a simple chart-based neural
discourse parser that does not require any manually-crafted features and is
based on learned span representations only. To overcome the computational
challenge, we propose an independence assumption between the label assigned to
a node in the tree and the splitting point that separates its children, which
results in tractable decoding. We empirically demonstrate that our model
achieves the best performance among global parsers, and comparable performance
to state-of-art greedy parsers, using only learned span representations.
Comparative Evaluation of Pretrained Transfer Learning Models on Automatic Short Answer Grading
Comments: 7 pages, 3 figures, 3 tables. “for associated work, refer this https URL ”
Subjects:
Computation and Language (cs.CL)
Automatic Short Answer Grading (ASAG) is the process of grading the student
answers by computational approaches given a question and the desired answer.
Previous works implemented the methods of concept mapping, facet mapping, and
some used the conventional word embeddings for extracting semantic features.
They extracted multiple features manually to train on the corresponding
datasets. We use pretrained embeddings of the transfer learning models, ELMo,
BERT, GPT, and GPT-2 to assess their efficiency on this task. We train with a
single feature, cosine similarity, extracted from the embeddings of these
models. We compare the RMSE scores and correlation measurements of the four
models with previous works on Mohler dataset. Our work demonstrates that ELMo
outperformed the other three models. We also, briefly describe the four
transfer learning models and conclude with the possible causes of poor results
of transfer learning models.
Knowing What to Listen to: Early Attention for Deep Speech Representation Learning
Amirhossein Hajavi , Ali Etemad Subjects : Audio and Speech Processing (eess.AS) ; Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
Deep learning techniques have considerably improved speech processing in
recent years. Speech representations extracted by deep learning models are
being used in a wide range of tasks such as speech recognition, speaker
recognition, and speech emotion recognition. Attention models play an important
role in improving deep learning models. However current attention mechanisms
are unable to attend to fine-grained information items. In this paper we
propose the novel Fine-grained Early Frequency Attention (FEFA) for speech
signals. This model is capable of focusing on information items as small as
frequency bins. We evaluate the proposed model on two popular tasks of speaker
recognition and speech emotion recognition. Two widely used public datasets,
VoxCeleb and IEMOCAP, are used for our experiments. The model is implemented on
top of several prominent deep models as backbone networks to evaluate its
impact on performance compared to the original networks and other related work.
Our experiments show that by adding FEFA to different CNN architectures,
performance is consistently improved by substantial margins, even setting a new
state-of-the-art for the speaker recognition task. We also tested our model
against different levels of added noise showing improvements in robustness and
less sensitivity compared to the backbone networks.
Sparse Meta Networks for Sequential Adaptation and its Application to Adaptive Language Modelling
Comments: 9 pages, 4 figures, 2 tables
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Training a deep neural network requires a large amount of single-task data
and involves a long time-consuming optimization phase. This is not scalable to
complex, realistic environments with new unexpected changes. Humans can perform
fast incremental learning on the fly and memory systems in the brain play a
critical role. We introduce Sparse Meta Networks — a meta-learning approach to
learn online sequential adaptation algorithms for deep neural networks, by
using deep neural networks. We augment a deep neural network with a
layer-specific fast-weight memory. The fast-weights are generated sparsely at
each time step and accumulated incrementally through time providing a useful
inductive bias for online continual adaptation. We demonstrate strong
performance on a variety of sequential adaptation scenarios, from a simple
online reinforcement learning to a large scale adaptive language modelling.
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
Jiawei Chen , Xu Tan , Jian Luan , Tao Qin , Tie-Yan Liu Subjects : Audio and Speech Processing (eess.AS) ; Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
High-fidelity singing voices usually require higher sampling rate (e.g.,
48kHz) to convey expression and emotion. However, higher sampling rate causes
the wider frequency band and longer waveform sequences and throws challenges
for singing voice synthesis (SVS) in both frequency and time domains.
Conventional SVS systems that adopt small sampling rate cannot well address the
above challenges. In this paper, we develop HiFiSinger, an SVS system towards
high-fidelity singing voice. HiFiSinger consists of a FastSpeech based acoustic
model and a Parallel WaveGAN based vocoder to ensure fast training and
inference and also high voice quality. To tackle the difficulty of singing
modeling caused by high sampling rate (wider frequency band and longer
waveform), we introduce multi-scale adversarial training in both the acoustic
model and vocoder to improve singing modeling. Specifically, 1) To handle the
larger range of frequencies caused by higher sampling rate, we propose a novel
sub-frequency GAN (SF-GAN) on mel-spectrogram generation, which splits the full
80-dimensional mel-frequency into multiple sub-bands and models each sub-band
with a separate discriminator. 2) To model longer waveform sequences caused by
higher sampling rate, we propose a multi-length GAN (ML-GAN) for waveform
generation to model different lengths of waveform sequences with separate
discriminators. 3) We also introduce several additional designs and findings in
HiFiSinger that are crucial for high-fidelity voices, such as adding F0 (pitch)
and V/UV (voiced/unvoiced flag) as acoustic features, choosing an appropriate
window/hop size for mel-spectrogram, and increasing the receptive field in
vocoder for long vowel modeling. Experiment results show that HiFiSinger
synthesizes high-fidelity singing voices with much higher quality: 0.32/0.44
MOS gain over 48kHz/24kHz baseline and 0.83 MOS gain over previous SVS systems.
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
Long Chen , Wenbo Ma , Jun Xiao , Hanwang Zhang , Wei Liu , Shih-Fu Chang Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
The prevailing framework for solving referring expression grounding is based
on a two-stage process: 1) detecting proposals with an object detector and 2)
grounding the referent to one of the proposals. Existing two-stage solutions
mostly focus on the grounding step, which aims to align the expressions with
the proposals. In this paper, we argue that these methods overlook an obvious
mismatch between the roles of proposals in the two stages: they generate
proposals solely based on the detection confidence (i.e., expression-agnostic),
hoping that the proposals contain all right instances in the expression (i.e.,
expression-aware). Due to this mismatch, current two-stage methods suffer from
a severe performance drop between detected and ground-truth proposals. To this
end, we propose Ref-NMS, which is the first method to yield expression-aware
proposals at the first stage. Ref-NMS regards all nouns in the expression as
critical objects, and introduces a lightweight module to predict a score for
aligning each box with a critical object. These scores can guide the
NMSoperation to filter out the boxes irrelevant to the expression, increasing
the recall of critical objects, resulting in a significantly improved grounding
performance. Since Ref-NMS is agnostic to the grounding step, it can be easily
integrated into any state-of-the-art two-stage method. Extensive ablation
studies on several backbones, benchmarks, and tasks consistently demonstrate
the superiority of Ref-NMS.
Data Programming by Demonstration: A Framework for Interactively Learning Labeling Functions
Sara Evensen , Chang Ge , Dongjin Choi , Çağatay Demiralp Subjects : Machine Learning (cs.LG) ; Computation and Language (cs.CL); Databases (cs.DB); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)
Data programming is a programmatic weak supervision approach to efficiently
curate large-scale labeled training data. Writing data programs (labeling
functions) requires, however, both programming literacy and domain expertise.
Many subject matter experts have neither programming proficiency nor time to
effectively write data programs. Furthermore, regardless of one’s expertise in
coding or machine learning, transferring domain expertise into labeling
functions by enumerating rules and thresholds is not only time consuming but
also inherently difficult. Here we propose a new framework, data programming by
demonstration (DPBD), to generate labeling rules using interactive
demonstrations of users. DPBD aims to relieve the burden of writing labeling
functions from users, enabling them to focus on higher-level semantics such as
identifying relevant signals for labeling tasks. We operationalize our
framework with Ruler, an interactive system that synthesizes labeling rules for
document classification by using span-level annotations of users on document
examples. We compare Ruler with conventional data programming through a user
study conducted with 10 data scientists creating labeling functions for
sentiment and spam classification tasks. We find that Ruler is easier to use
and learn and offers higher overall satisfaction, while providing
discriminative model performances comparable to ones achieved by conventional
data programming.
Towards Earnings Call and Stock Price Movement
Comments: Accepted by KDD 2020 MLF workshop
Subjects:
Statistical Finance (q-fin.ST)
; Computational Engineering, Finance, and Science (cs.CE); Computation and Language (cs.CL); Machine Learning (cs.LG)
Earnings calls are hosted by management of public companies to discuss the
company’s financial performance with analysts and investors. Information
disclosed during an earnings call is an essential source of data for analysts
and investors to make investment decisions. Thus, we leverage earnings call
transcripts to predict future stock price dynamics. We propose to model the
language in transcripts using a deep learning framework, where an attention
mechanism is applied to encode the text data into vectors for the
discriminative network classifier to predict stock price movements. Our
empirical experiments show that the proposed model is superior to the
traditional machine learning baselines and earnings call information can boost
the stock price prediction performance.
Distributed, Parallel, and Cluster Computing
Fast Byzantine Gathering with Visibility in Graphs
Comments: Conference version appeared at ALGOSENSORS 2020
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
; Data Structures and Algorithms (cs.DS)
We consider the gathering task by a team of (m) synchronous mobile robots in
a graph of (n) nodes. Each robot has an identifier (ID) and runs its own
deterministic algorithm, i.e., there is no centralized coordinator. We consider
a particularly challenging scenario: there are (f) Byzantine robots in the team
that can behave arbitrarily, and even have the ability to change their IDs to
any value at any time. There is no way to distinguish these robots from
non-faulty robots, other than perhaps observing strange or unexpected
behaviour. The goal of the gathering task is to eventually have all non-faulty
robots located at the same node in the same round. It is known that no
algorithm can solve this task unless there at least (f+1) non-faulty robots in
the team. In this paper, we design an algorithm that runs in polynomial time
with respect to (n) and (m) that matches this bound, i.e., it works in a team
that has exactly (f+1) non-faulty robots. In our model, we have equipped the
robots with sensors that enable each robot to see the subgraph (including
robots) within some distance (H) of its current node. We prove that the
gathering task is solvable if this visibility range (H) is at least the radius
of the graph, and not solvable if (H) is any fixed constant.
Software-Distributed Shared Memory for Heterogeneous Machines: Design and Use Considerations
Loïc Cudennec (DACLE-LIST, DGA.MI) Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)
Distributed shared memory (DSM) allows to implement and deploy applications
onto distributed architectures using the convenient shared memory programming
model in which a set of tasks are able to allocate and access data despite
their remote localization. With the development of distributed heterogeneous
architectures in both HPC and embedded contexts, there is a renewal of interest
for systems such as DSM that ease the programmability of complex hardware. In
this report, some design considerations are given to build a complete
software-DSM (S-DSM). This S-DSM called SAT (Share Among Things) is developed
at CEA (the French Alternative Energies and Atomic Energy Commission) within
the framework of European project M2DC (Modular Microserver DataCentre) to
tackle the problem of managing shared data over microserver architec-tures. The
S-DSM features the automatic decomposition of large data into atomic pieces
called chunks, the possibility to deploy multiple coherence protocols to manage
different chunks, an hybrid programming model based on event programming and a
micro-sleep mechanism to decrease the energy consumption on message reception.
Distributed Online Optimization via Gradient Tracking with Adaptive Momentum
Guido Carnevale , Francesco Farina , Ivano Notarnicola , Giuseppe Notarstefano Subjects : Optimization and Control (math.OC) ; Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
This paper deals with a network of computing agents aiming to solve an online
optimization problem in a distributed fashion, i.e., by means of local
computation and communication, without any central coordinator. We propose the
gradient tracking with adaptive momentum estimation (GTAdam) distributed
algorithm, which combines a gradient tracking mechanism with first and second
order momentum estimates of the gradient. The algorithm is analyzed in the
online setting for strongly convex and smooth cost functions. We prove that the
average dynamic regret is bounded and that the convergence rate is linear. The
algorithm is tested on a time-varying classification problem, on a (moving)
target localization problem and in a stochastic optimization setup from image
classification. In these numerical experiments from multi-agent learning,
GTAdam outperforms state-of-the-art distributed optimization methods.
Comments: Accepted for publication in IEEE Transaction on Circuit and System for Video Technology
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)
Convolutional neural networks (CNNs) require both intensive computation and
frequent memory access, which lead to a low processing speed and large power
dissipation. Although the characteristics of the different layers in a CNN are
frequently quite different, previous hardware designs have employed common
optimization schemes for them. This paper proposes a layer-specific design that
employs different organizations that are optimized for the different layers.
The proposed design employs two layer-specific optimizations: layer-specific
mixed data flow and layer-specific mixed precision. The mixed data flow aims to
minimize the off-chip access while demanding a minimal on-chip memory (BRAM)
resource of an FPGA device. The mixed precision quantization is to achieve both
a lossless accuracy and an aggressive model compression, thereby further
reducing the off-chip access. A Bayesian optimization approach is used to
select the best sparsity for each layer, achieving the best trade-off between
the accuracy and compression. This mixing scheme allows the entire network
model to be stored in BRAMs of the FPGA to aggressively reduce the off-chip
access, and thereby achieves a significant performance enhancement. The model
size is reduced by 22.66-28.93 times compared to that in a full-precision
network with a negligible degradation of accuracy on VOC, COCO, and ImageNet
datasets. Furthermore, the combination of mixed dataflow and mixed precision
significantly outperforms the previous works in terms of both throughput,
off-chip access, and on-chip memory requirement.
DRLE: Decentralized Reinforcement Learning at the Edge for Traffic Light Control
Pengyuan Zhou , Xianfu Chen , Zhi Liu , Tristan Braud , Pan Hui , Jussi Kangasharju Subjects : Multiagent Systems (cs.MA) ; Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Systems and Control (eess.SY)
The Internet of Vehicles (IoV) enables real-time data exchange among vehicles
and roadside units and thus provides a promising solution to alleviate traffic
jams in the urban area. Meanwhile, better traffic management via efficient
traffic light control can benefit the IoV as well by enabling a better
communication environment and decreasing the network load. As such, IoV and
efficient traffic light control can formulate a virtuous cycle. Edge computing,
an emerging technology to provide low-latency computation capabilities at the
edge of the network, can further improve the performance of this cycle.
However, while the collected information is valuable, an efficient solution for
better utilization and faster feedback has yet to be developed for
edge-empowered IoV. To this end, we propose a Decentralized Reinforcement
Learning at the Edge for traffic light control in the IoV (DRLE). DRLE exploits
the ubiquity of the IoV to accelerate the collection of traffic data and its
interpretation towards alleviating congestion and providing better traffic
light control. DRLE operates within the coverage of the edge servers and uses
aggregated data from neighboring edge servers to provide city-scale traffic
light control. DRLE decomposes the highly complex problem of large area
control. into a decentralized multi-agent problem. We prove its global optima
with concrete mathematical reasoning. The proposed decentralized reinforcement
learning algorithm running at each edge node adapts the traffic lights in real
time. We conduct extensive evaluations and demonstrate the superiority of this
approach over several state-of-the-art algorithms.
Local Fast Rerouting with Low Congestion: A Randomized Approach
Gregor Bankhamer , Robert Elsässer , Stefan Schmid Subjects : Networking and Internet Architecture (cs.NI) ; Distributed, Parallel, and Cluster Computing (cs.DC)
Most modern communication networks include fast rerouting mechanisms,
implemented entirely in the data plane, to quickly recover connectivity after
link failures. By relying on local failure information only, these data plane
mechanisms provide very fast reaction times, but at the same time introduce an
algorithmic challenge in case of multiple link failures: failover routes need
to be robust to additional but locally unknown failures downstream.
This paper presents local fast rerouting algorithms which not only provide a
high degree of resilience against multiple link failures, but also ensure a low
congestion on the resulting failover paths. We consider a randomized approach
and focus on networks which are highly connected before the failures occur. Our
main contribution are three simple algorithms which come with provable
guarantees and provide interesting resilience-load tradeoffs, significantly
outperforming any deterministic fast rerouting algorithm with high probability.
Towards Efficient and Scalable Acceleration of Online Decision Tree Learning on FPGA
Comments: appear as a conference paper in FCCM 2019
Subjects:
Machine Learning (cs.LG)
; Distributed, Parallel, and Cluster Computing (cs.DC)
Decision trees are machine learning models commonly used in various
application scenarios. In the era of big data, traditional decision tree
induction algorithms are not suitable for learning large-scale datasets due to
their stringent data storage requirement. Online decision tree learning
algorithms have been devised to tackle this problem by concurrently training
with incoming samples and providing inference results. However, even the most
up-to-date online tree learning algorithms still suffer from either high memory
usage or high computational intensity with dependency and long latency, making
them challenging to implement in hardware. To overcome these difficulties, we
introduce a new quantile-based algorithm to improve the induction of the
Hoeffding tree, one of the state-of-the-art online learning models. The
proposed algorithm is light-weight in terms of both memory and computational
demand, while still maintaining high generalization ability. A series of
optimization techniques dedicated to the proposed algorithm have been
investigated from the hardware perspective, including coarse-grained and
fine-grained parallelism, dynamic and memory-based resource sharing, pipelining
with data forwarding. We further present a high-performance, hardware-efficient
and scalable online decision tree learning system on a field-programmable gate
array (FPGA) with system-level optimization techniques. Experimental results
show that our proposed algorithm outperforms the state-of-the-art Hoeffding
tree learning method, leading to 0.05% to 12.3% improvement in inference
accuracy. Real implementation of the complete learning system on the FPGA
demonstrates a 384x to 1581x speedup in execution time over the
state-of-the-art design.
Learning
Physics-Consistent Data-driven Waveform Inversion with Adaptive Data Augmentation
Renán Rojas-Gómez , Jihyun Yang , Youzuo Lin , James Theiler , Brendt Wohlberg Subjects : Machine Learning (cs.LG) ; Image and Video Processing (eess.IV); Machine Learning (stat.ML)
Seismic full-waveform inversion (FWI) is a nonlinear computational imaging
technique that can provide detailed estimates of subsurface geophysical
properties. Solving the FWI problem can be challenging due to its ill-posedness
and high computational cost. In this work, we develop a new hybrid
computational approach to solve FWI that combines physics-based models with
data-driven methodologies. In particular, we develop a data augmentation
strategy that can not only improve the representativity of the training set but
also incorporate important governing physics into the training process and
therefore improve the inversion accuracy. To validate the performance, we apply
our method to synthetic elastic seismic waveform data generated from a
subsurface geologic model built on a carbon sequestration site at Kimberlina,
California. We compare our physics-consistent data-driven inversion method to
both purely physics-based and purely data-driven approaches and observe that
our method yields higher accuracy and greater generalization ability.
Comments: 32 pages
Subjects:
Machine Learning (cs.LG)
; Machine Learning (stat.ML)
Current deep learning research is dominated by benchmark evaluation. A method
is regarded as favorable if it empirically performs well on the dedicated test
set. This mentality is seamlessly reflected in the resurfacing area of
continual learning, where consecutively arriving sets of benchmark data are
investigated. The core challenge is framed as protecting previously acquired
representations from being catastrophically forgotten due to the iterative
parameter updates. However, comparison of individual methods is nevertheless
treated in isolation from real world application and typically judged by
monitoring accumulated test set performance. The closed world assumption
remains predominant. It is assumed that during deployment a model is guaranteed
to encounter data that stems from the same distribution as used for training.
This poses a massive challenge as neural networks are well known to provide
overconfident false predictions on unknown instances and break down in the face
of corrupted data. In this work we argue that notable lessons from open set
recognition, the identification of statistically deviating data outside of the
observed dataset, and the adjacent field of active learning, where data is
incrementally queried such that the expected performance gain is maximized, are
frequently overlooked in the deep learning era. Based on these forgotten
lessons, we propose a consolidated view to bridge continual learning, active
learning and open set recognition in deep neural networks. Our results show
that this not only benefits each individual paradigm, but highlights the
natural synergies in a common framework. We empirically demonstrate
improvements when alleviating catastrophic forgetting, querying data in active
learning, selecting task orders, while exhibiting robust open world application
where previously proposed methods fail.
Max-value Entropy Search for Multi-Objective Bayesian Optimization with Constraints
Comments: 2 figure, 1 table. arXiv admin note: text overlap with arXiv:2008.07029
Subjects:
Machine Learning (cs.LG)
; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We consider the problem of constrained multi-objective blackbox optimization
using expensive function evaluations, where the goal is to approximate the true
Pareto set of solutions satisfying a set of constraints while minimizing the
number of function evaluations. For example, in aviation power system design
applications, we need to find the designs that trade-off total energy and the
mass while satisfying specific thresholds for motor temperature and voltage of
cells. This optimization requires performing expensive computational
simulations to evaluate designs. In this paper, we propose a new approach
referred as {em Max-value Entropy Search for Multi-objective Optimization with
Constraints (MESMOC)} to solve this problem. MESMOC employs an output-space
entropy based acquisition function to efficiently select the sequence of inputs
for evaluation to uncover high-quality pareto-set solutions while satisfying
constraints.
We apply MESMOC to two real-world engineering design applications to
demonstrate its effectiveness over state-of-the-art algorithms.
CAGNN: Cluster-Aware Graph Neural Networks for Unsupervised Graph Representation Learning
Comments: 21 pages, in submission to ACM TIST
Subjects:
Machine Learning (cs.LG)
; Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Unsupervised graph representation learning aims to learn low-dimensional node
embeddings without supervision while preserving graph topological structures
and node attributive features. Previous graph neural networks (GNN) require a
large number of labeled nodes, which may not be accessible in real-world graph
data. In this paper, we present a novel cluster-aware graph neural network
(CAGNN) model for unsupervised graph representation learning using
self-supervised techniques. In CAGNN, we perform clustering on the node
embeddings and update the model parameters by predicting the cluster
assignments. Moreover, we observe that graphs often contain inter-class edges,
which mislead the GNN model to aggregate noisy information from neighborhood
nodes. We further refine the graph topology by strengthening intra-class edges
and reducing node connections between different classes based on cluster
labels, which better preserves cluster structures in the embedding space. We
conduct comprehensive experiments on two benchmark tasks using real-world
datasets. The results demonstrate the superior performance of the proposed
model over existing baseline methods. Notably, our model gains over 7%
improvements in terms of accuracy on node clustering over state-of-the-arts.
Yet Meta Learning Can Adapt Fast, It Can Also Break Easily
Comments: Meta Learning Robustnss
Subjects:
Machine Learning (cs.LG)
; Machine Learning (stat.ML)
Meta learning algorithms have been widely applied in many tasks for efficient
learning, such as few-shot image classification and fast reinforcement
learning. During meta training, the meta learner develops a common learning
strategy, or experience, from a variety of learning tasks. Therefore, during
meta test, the meta learner can use the learned strategy to quickly adapt to
new tasks even with a few training samples. However, there is still a dark side
about meta learning in terms of reliability and robustness. In particular, is
meta learning vulnerable to adversarial attacks? In other words, would a
well-trained meta learner utilize its learned experience to build wrong or
likely useless knowledge, if an adversary unnoticeably manipulates the given
training set? Without the understanding of this problem, it is extremely risky
to apply meta learning in safety-critical applications. Thus, in this paper, we
perform the initial study about adversarial attacks on meta learning under the
few-shot classification problem. In particular, we formally define key elements
of adversarial attacks unique to meta learning and propose the first attacking
algorithm against meta learning under various settings. We evaluate the
effectiveness of the proposed attacking strategy as well as the robustness of
several representative meta learning algorithms. Experimental results
demonstrate that the proposed attacking strategy can easily break the meta
learner and meta learning is vulnerable to adversarial attacks. The
implementation of the proposed framework will be released upon the acceptance
of this paper.
MixBoost: Synthetic Oversampling with Boosted Mixup for Handling Extreme Imbalance
Comments: Work done as part of internship at MDSR
Subjects:
Machine Learning (cs.LG)
; Machine Learning (stat.ML)
Training a classification model on a dataset where the instances of one class
outnumber those of the other class is a challenging problem. Such imbalanced
datasets are standard in real-world situations such as fraud detection, medical
diagnosis, and computational advertising. We propose an iterative data
augmentation method, MixBoost, which intelligently selects (Boost) and then
combines (Mix) instances from the majority and minority classes to generate
synthetic hybrid instances that have characteristics of both classes. We
evaluate MixBoost on 20 benchmark datasets, show that it outperforms existing
approaches, and test its efficacy through significance testing. We also present
ablation studies to analyze the impact of the different components of MixBoost.
Can AutoML outperform humans? An evaluation on popular OpenML datasets using AutoML Benchmark
Marc Hanussek , Matthias Blohm , Maximilien Kintz Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
In the last few years, Automated Machine Learning (AutoML) has gained much
attention. With that said, the question arises whether AutoML can outperform
results achieved by human data scientists. This paper compares four AutoML
frameworks on 12 different popular datasets from OpenML; six of them supervised
classification tasks and the other six supervised regression ones.
Additionally, we consider a real-life dataset from one of our recent projects.
The results show that the automated frameworks perform better or equal than the
machine learning community in 7 out of 12 OpenML tasks.
Process Mining Meets Causal Machine Learning: Discovering Causal Rules from Event Logs
Comments: 8 pages, 4 figures, conference
Subjects:
Machine Learning (cs.LG)
; Machine Learning (stat.ML)
This paper proposes an approach to analyze an event log of a business process
in order to generate case-level recommendations of treatments that maximize the
probability of a given outcome. Users classify the attributes in the event log
into controllable and non-controllable, where the former correspond to
attributes that can be altered during an execution of the process (the possible
treatments). We use an action rule mining technique to identify treatments that
co-occur with the outcome under some conditions. Since action rules are
generated based on correlation rather than causation, we then use a causal
machine learning technique, specifically uplift trees, to discover subgroups of
cases for which a treatment has a high causal effect on the outcome after
adjusting for confounding variables. We test the relevance of this approach
using an event log of a loan application process and compare our findings with
recommendations manually produced by process mining experts.
Sample-Efficient Automated Deep Reinforcement Learning
Jörg K.H. Franke , Gregor Köhler , André Biedenkapp , Frank Hutter Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
Despite significant progress in challenging problems across various domains,
applying state-of-the-art deep reinforcement learning (RL) algorithms remains
challenging due to their sensitivity to the choice of hyperparameters. This
sensitivity can partly be attributed to the non-stationarity of the RL problem,
potentially requiring different hyperparameter settings at different stages of
the learning process. Additionally, in the RL setting, hyperparameter
optimization (HPO) requires a large number of environment interactions,
hindering the transfer of the successes in RL to real-world applications. In
this work, we tackle the issues of sample-efficient and dynamic HPO in RL. We
propose a population-based automated RL (AutoRL) framework to meta-optimize
arbitrary off-policy RL algorithms. In this framework, we optimize the
hyperparameters, including architecture hyperparameters while simultaneously
training the agent. By sharing the collected experience across the population,
we substantially increase the sample efficiency of the meta-optimization. We
demonstrate the capabilities of our sample-efficient AutoRL approach in a case
study with the popular TD3 algorithm in the MuJoCo benchmark suite, where we
reduce the number of environment interactions needed for meta-optimization by
up to an order of magnitude compared to population-based training.
Bounded Risk-Sensitive Markov Game and Its Inverse Reward Learning Problem
Ran Tian , Liting Sun , Masayoshi Tomizuka Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
Classical game-theoretic approaches for multi-agent systems in both the
forward policy learning/design problem and the inverse reward learning problem
often make strong rationality assumptions: agents are perfectly rational
expected utility maximizers. Specifically, the agents are risk-neutral to all
uncertainties, maximize their expected rewards, and have unlimited computation
resources to explore such policies. Such assumptions, however, substantially
mismatch with many observed humans’ behaviors such as satisficing with
sub-optimal policies, risk-seeking and loss-aversion decisions. In this paper,
we investigate the problem of bounded risk-sensitive Markov Game (BRSMG) and
its inverse reward learning problem. Instead of assuming unlimited computation
resources, we consider the influence of bounded intelligence by exploiting
iterative reasoning models in BRSMG. Instead of assuming agents maximize their
expected utilities (a risk-neutral measure), we consider the impact of
risk-sensitive measures such as the cumulative prospect theory. Convergence
analysis of BRSMG for both the forward policy learning and the inverse reward
learning are established. The proposed forward policy learning and inverse
reward learning algorithms in BRSMG are validated through a navigation
scenario. Simulation results show that the behaviors of agents in BRSMG
demonstrate both risk-averse and risk-seeking phenomena, which are consistent
with observations from humans. Moreover, in the inverse reward learning task,
the proposed bounded risk-sensitive inverse learning algorithm outperforms the
baseline risk-neutral inverse learning algorithm.
Explainable Empirical Risk Minimization
A. Jung Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
The widespread use of modern machine learning methods in decision making
crucially depends on their interpretability or explainability. The human users
(decision makers) of machine learning methods are often not only interested in
getting accurate predictions or projections. Rather, as a decision-maker, the
user also needs a convincing answer (or explanation) to the question of why a
particular prediction was delivered. Explainable machine learning might be a
legal requirement when used for decision making with an immediate effect on the
health of human beings. As an example consider the computer vision of a
self-driving car whose predictions are used to decide if to stop the car. We
have recently proposed an information-theoretic approach to construct
personalized explanations for predictions obtained from ML. This method was
model-agnostic and only required some training samples of the model to be
explained along with a user feedback signal. This paper uses an
information-theoretic measure for the quality of an explanation to learn
predictors that are intrinsically explainable to a specific user. Our approach
is not restricted to a particular hypothesis space, such as linear maps or
shallow decision trees, whose predictor maps are considered as explainable by
definition. Rather, we regularize an arbitrary hypothesis space using a
personalized measure for the explainability of a particular predictor.
Optimality-based Analysis of XCSF Compaction in Discrete Reinforcement Learning
Jordan T. Bishop , Marcus Gallagher Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
Learning classifier systems (LCSs) are population-based predictive systems
that were originally envisioned as agents to act in reinforcement learning (RL)
environments. These systems can suffer from population bloat and so are
amenable to compaction techniques that try to strike a balance between
population size and performance. A well-studied LCS architecture is XCSF, which
in the RL setting acts as a Q-function approximator. We apply XCSF to a
deterministic and stochastic variant of the FrozenLake8x8 environment from
OpenAI Gym, with its performance compared in terms of function approximation
error and policy accuracy to the optimal Q-functions and policies produced by
solving the environments via dynamic programming. We then introduce a novel
compaction algorithm (Greedy Niche Mass Compaction – GNMC) and study its
operation on XCSF’s trained populations. Results show that given a suitable
parametrisation, GNMC preserves or even slightly improves function
approximation error while yielding a significant reduction in population size.
Reasonable preservation of policy accuracy also occurs, and we link this metric
to the commonly used steps-to-goal metric in maze-like environments,
illustrating how the metrics are complementary rather than competitive.
Penalty and Augmented Lagrangian Methods for Layer-parallel Training of Residual Networks
Qi Sun , Hexing Dong , Zewei Chen , Weizhen Dian , Jiacheng Sun , Yitong Sun , Zhenguo Li , Bin Dong Subjects : Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Algorithms for training residual networks (ResNets) typically require forward
pass of data, followed by backpropagating of loss gradient to perform parameter
updates, which can take many hours or even days for networks with hundreds of
layers. Inspired by the penalty and augmented Lagrangian methods, a
layer-parallel training algorithm is proposed in this work to overcome the
scalability barrier caused by the serial nature of forward-backward propagation
in deep residual learning. Moreover, by viewing the supervised classification
task as a numerical discretization of the terminal control problem, we bridge
the concept of synthetic gradient for decoupling backpropagation with the
parareal method for solving differential equations, which not only offers a
novel perspective on the design of synthetic loss function but also performs
parameter updates with reduced storage overhead. Experiments on a preliminary
example demonstrate that the proposed algorithm achieves comparable or even
better testing accuracy to the full serial backpropagation approach, while
enabling layer-parallelism can provide speedup over the traditional
layer-serial training methods.
Error estimate for a universal function approximator of ReLU network with a local connection
Jae-Mo Kang , Sunghwan Moon Subjects : Machine Learning (cs.LG) ; Information Theory (cs.IT); Machine Learning (stat.ML)
Neural networks have shown high successful performance in a wide range of
tasks, but further studies are needed to improve its performance. We analyze
the approximation error of the specific neural network architecture with a
local connection and higher application than one with the full connection
because the local-connected network can be used to explain diverse neural
networks such as CNNs. Our error estimate depends on two parameters: one
controlling the depth of the hidden layer, and the other, the width of the
hidden layers.
Enyan Dai , Suhang Wang Subjects : Machine Learning (cs.LG)
Graph neural networks (GNNs) have shown great power in modeling graph
structured data. However, similar to other machine learning models, GNNs may
make predictions biased on protected sensitive attributes, e.g., skin color,
gender, and nationality. Because machine learning algorithms including GNNs are
trained to faithfully reflect the distribution of the training data which often
contains historical bias towards sensitive attributes. In addition, the
discrimination in GNNs can be magnified by graph structures and the
message-passing mechanism. As a result, the applications of GNNs in sensitive
domains such as crime rate prediction would be largely limited. Though
extensive studies of fair classification have been conducted on i.i.d data,
methods to address the problem of discrimination on non-i.i.d data are rather
limited. Furthermore, the practical scenario of sparse annotations in sensitive
attributes is rarely considered in existing works. Therefore, we study the
novel and important problem of learning fair GNNs with limited sensitive
attribute information. FairGNN is proposed to eliminate the bias of GNNs whilst
maintaining high node classification accuracy by leveraging graph structures
and limited sensitive information. Our theoretical analysis shows that FairGNN
can ensure the fairness of GNNs under mild conditions given limited nodes with
known sensitive attributes. Extensive experiments on real-world datasets also
demonstrate the effectiveness of FairGNN in debiasing and keeping high
accuracy.
Data Programming by Demonstration: A Framework for Interactively Learning Labeling Functions
Sara Evensen , Chang Ge , Dongjin Choi , Çağatay Demiralp Subjects : Machine Learning (cs.LG) ; Computation and Language (cs.CL); Databases (cs.DB); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)
Data programming is a programmatic weak supervision approach to efficiently
curate large-scale labeled training data. Writing data programs (labeling
functions) requires, however, both programming literacy and domain expertise.
Many subject matter experts have neither programming proficiency nor time to
effectively write data programs. Furthermore, regardless of one’s expertise in
coding or machine learning, transferring domain expertise into labeling
functions by enumerating rules and thresholds is not only time consuming but
also inherently difficult. Here we propose a new framework, data programming by
demonstration (DPBD), to generate labeling rules using interactive
demonstrations of users. DPBD aims to relieve the burden of writing labeling
functions from users, enabling them to focus on higher-level semantics such as
identifying relevant signals for labeling tasks. We operationalize our
framework with Ruler, an interactive system that synthesizes labeling rules for
document classification by using span-level annotations of users on document
examples. We compare Ruler with conventional data programming through a user
study conducted with 10 data scientists creating labeling functions for
sentiment and spam classification tasks. We find that Ruler is easier to use
and learn and offers higher overall satisfaction, while providing
discriminative model performances comparable to ones achieved by conventional
data programming.
Algebraic Neural Networks: Stability Properties
Alejandro Parada-Mayorga , Alejandro Ribeiro Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
In this work we study the stability of algebraic neural networks (AlgNNs)
with commutative algebras which unify CNNs and GNNs under the umbrella of
algebraic signal processing. An AlgNN is a stacked layered structure where each
layer is conformed by an algebra (mathcal{A}), a vector space (mathcal{M})
and a homomorphism (
ho:mathcal{A}
ightarrow ext{End}(mathcal{M})), where
( ext{End}(mathcal{M})) is the set of endomorphims of (mathcal{M}). Signals
in each layer are modeled as elements of (mathcal{M}) and are processed by
elements of ( ext{End}(mathcal{M})) defined according to the structure of
(mathcal{A}) via (
ho). This framework provides a general scenario that
covers several types of neural network architectures where formal convolution
operators are being used. We obtain stability conditions regarding to
perturbations which are defined as distortions of (
ho), reaching general
results whose particular cases are consistent with recent findings in the
literature for CNNs and GNNs. We consider conditions on the domain of the
homomorphisms in the algebra that lead to stable operators. Interestingly, we
found that these conditions are related to the uniform boundedness of the
Fréchet derivative of a function
(p: ext{End}(mathcal{M})
ightarrow ext{End}(mathcal{M})) that maps the
images of the generators of (mathcal{A}) on ( ext{End}(mathcal{M})) into a
power series representation that defines the filtering of elements in
(mathcal{M}). Additionally, our results show that stability is universal to
convolutional architectures whose algebraic signal model uses the same algebra.
Towards Efficient and Scalable Acceleration of Online Decision Tree Learning on FPGA
Comments: appear as a conference paper in FCCM 2019
Subjects:
Machine Learning (cs.LG)
; Distributed, Parallel, and Cluster Computing (cs.DC)
Decision trees are machine learning models commonly used in various
application scenarios. In the era of big data, traditional decision tree
induction algorithms are not suitable for learning large-scale datasets due to
their stringent data storage requirement. Online decision tree learning
algorithms have been devised to tackle this problem by concurrently training
with incoming samples and providing inference results. However, even the most
up-to-date online tree learning algorithms still suffer from either high memory
usage or high computational intensity with dependency and long latency, making
them challenging to implement in hardware. To overcome these difficulties, we
introduce a new quantile-based algorithm to improve the induction of the
Hoeffding tree, one of the state-of-the-art online learning models. The
proposed algorithm is light-weight in terms of both memory and computational
demand, while still maintaining high generalization ability. A series of
optimization techniques dedicated to the proposed algorithm have been
investigated from the hardware perspective, including coarse-grained and
fine-grained parallelism, dynamic and memory-based resource sharing, pipelining
with data forwarding. We further present a high-performance, hardware-efficient
and scalable online decision tree learning system on a field-programmable gate
array (FPGA) with system-level optimization techniques. Experimental results
show that our proposed algorithm outperforms the state-of-the-art Hoeffding
tree learning method, leading to 0.05% to 12.3% improvement in inference
accuracy. Real implementation of the complete learning system on the FPGA
demonstrates a 384x to 1581x speedup in execution time over the
state-of-the-art design.
It's Hard for Neural Networks To Learn the Game of Life
Comments: 12 pages, 6 figures
Subjects:
Machine Learning (cs.LG)
; Machine Learning (stat.ML)
Efforts to improve the learning abilities of neural networks have focused
mostly on the role of optimization methods rather than on weight
initializations. Recent findings, however, suggest that neural networks rely on
lucky random initial weights of subnetworks called “lottery tickets” that
converge quickly to a solution. To investigate how weight initializations
affect performance, we examine small convolutional networks that are trained to
predict n steps of the two-dimensional cellular automaton Conway’s Game of
Life, the update rules of which can be implemented efficiently in a 2n+1 layer
convolutional network. We find that networks of this architecture trained on
this task rarely converge. Rather, networks require substantially more
parameters to consistently converge. In addition, near-minimal architectures
are sensitive to tiny changes in parameters: changing the sign of a single
weight can cause the network to fail to learn. Finally, we observe a critical
value d_0 such that training minimal networks with examples in which cells are
alive with probability d_0 dramatically increases the chance of convergence to
a solution. We conclude that training convolutional neural networks to learn
the input/output function represented by n steps of Game of Life exhibits many
characteristics predicted by the lottery ticket hypothesis, namely, that the
size of the networks required to learn this function are often significantly
larger than the minimal network required to implement the function.
A Partial Regularization Method for Network Compression
Comments: arXiv admin note: substantial text overlap with arXiv:1912.05078
Subjects:
Machine Learning (cs.LG)
; Machine Learning (stat.ML)
Deep Neural Networks have achieved remarkable success relying on the
developing availability of GPUs and large-scale datasets with increasing
network depth and width. However, due to the expensive computation and
intensive memory, researchers have concentrated on designing compression
methods in order to make them practical for constrained platforms. In this
paper, we propose an approach of partial regularization rather than the
original form of penalizing all parameters, which is said to be full
regularization, to conduct model compression at a higher speed. It is
reasonable and feasible according to the existence of the permutation invariant
property of neural networks. Experimental results show that as we expected, the
computational complexity is reduced by observing less running time in almost
all situations. It should be owing to the fact that partial regularization
method invovles a lower number of elements for calculation. Surprisingly, it
helps to improve some important metrics such as regression fitting results and
classification accuracy in both training and test phases on multiple datasets,
telling us that the pruned models have better performance and generalization
ability. What’s more, we analyze the results and draw a conclusion that an
optimal network structure must exist and depend on the input data.
A Heaviside Function Approximation for Neural Network Binary Classification
Nathan Tsoi , Yofti Milkessa , Marynel Vázquez Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
Neural network binary classifiers are often evaluated on metrics like
accuracy and (F_1)-Score, which are based on confusion matrix values (True
Positives, False Positives, False Negatives, and True Negatives). However,
these classifiers are commonly trained with a different loss, e.g. log loss.
While it is preferable to perform training on the same loss as the evaluation
metric, this is difficult in the case of confusion matrix based metrics because
set membership is a step function without a derivative useful for
backpropagation. To address this challenge, we propose an approximation of the
step function that adheres to the properties necessary for effective training
of binary networks using confusion matrix based metrics. This approach allows
for end-to-end training of binary deep neural classifiers via batch gradient
descent. We demonstrate the flexibility of this approach in several
applications with varying levels of class imbalance. We also demonstrate how
the approximation allows balancing between precision and recall in the
appropriate ratio for the task at hand.
Comments: 18 pages, 9 Figures
Subjects:
Machine Learning (cs.LG)
Intensive care clinicians need reliable clinical practice tools to preempt
unexpected critical events that might harm their patients in intensive care
units (ICU), to pre-plan timely interventions, and to keep the patient’s family
well informed. The conventional statistical models are built by curating only a
limited number of key variables, which means a vast unknown amount of
potentially precious data remains unused. Deep learning models (DLMs) can be
leveraged to learn from large complex datasets and construct predictive
clinical tools. This retrospective study was performed using 42,818 hospital
admissions involving 35,348 patients, which is a subset of the MIMIC-III
dataset. Natural language processing (NLP) techniques were applied to build
DLMs to predict in-hospital mortality (IHM) and length of stay >=7 days (LOS).
Over 75 million events across multiple data sources were processed, resulting
in over 355 million tokens. DLMs for predicting IHM using data from all sources
(AS) and chart data (CS) achieved an AUC-ROC of 0.9178 and 0.9029,
respectively, and PR-AUC of 0.6251 and 0.5701, respectively. DLMs for
predicting LOS using AS and CS achieved an AUC-ROC of 0.8806 and 0.8642,
respectively, and PR-AUC of 0.6821 and 0.6575, respectively. The observed
AUC-ROC difference between models was found to be significant for both IHM and
LOS at p=0.05. The observed PR-AUC difference between the models was found to
be significant for IHM and statistically insignificant for LOS at p=0.05. In
this study, deep learning models were constructed using data combined from a
variety of sources in Electronic Health Records (EHRs) such as chart data,
input and output events, laboratory values, microbiology events, procedures,
notes, and prescriptions. It is possible to predict in-hospital mortality with
much better confidence and higher reliability from models built using all
sources of data.
Change Point Detection by Cross-Entropy Maximization
Comments: Preprint
Subjects:
Machine Learning (cs.LG)
; Signal Processing (eess.SP); Machine Learning (stat.ML)
Many offline unsupervised change point detection algorithms rely on
minimizing a penalized sum of segment-wise costs. We extend this framework by
proposing to minimize a sum of discrepancies between segments. In particular,
we propose to select the change points so as to maximize the cross-entropy
between successive segments, balanced by a penalty for introducing new change
points. We propose a dynamic programming algorithm to solve this problem and
analyze its complexity. Experiments on two challenging datasets demonstrate the
advantages of our method compared to three state-of-the-art approaches.
An Internal Cluster Validity Index Based on Distance-based Separability Measure
Comments: 8 pages, 4 figures. Accepted by ICTAI 2020
Subjects:
Machine Learning (cs.LG)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
To evaluate clustering results is a significant part in cluster analysis.
Usually, there is no true class labels for clustering as a typical unsupervised
learning. Thus, a number of internal evaluations, which use predicted labels
and data, have been created. They also named internal cluster validity indices
(CVIs). Without true labels, to design an effective CVI is not simple because
it is similar to create a clustering method. And, to have more CVIs is crucial
because there is no universal CVI that can be used to measure all datasets, and
no specific method for selecting a proper CVI for clusters without true labels.
Therefore, to apply more CVIs to evaluate clustering results is necessary. In
this paper, we propose a novel CVI – called Distance-based Separability Index
(DSI), based on a data separability measure. We applied the DSI and eight other
internal CVIs including early studies from Dunn (1974) to most recent studies
CVDD (2019) as comparison. We used an external CVI as ground truth for
clustering results of five clustering algorithms on 12 real and 97 synthetic
datasets. Results show DSI is an effective, unique, and competitive CVI to
other compared CVIs. In addition, we summarized the general process to evaluate
CVIs and created a new method – rank difference – to compare the results of
CVIs.
Understanding the wiring evolution in differentiable neural architecture search
Sirui Xie , Shoukang Hu , Xinjiang Wang , Chunxiao Liu , Jianping Shi , Xunying Liu , Dahua Lin Subjects : Machine Learning (cs.LG) ; Machine Learning (stat.ML)
Controversy exists on whether differentiable neural architecture search
methods discover wiring topology effectively. To understand how wiring topology
evolves, we study the underlying mechanism of several existing differentiable
NAS frameworks. Our investigation is motivated by three observed searching
patterns of differentiable NAS: 1) they search by growing instead of pruning;
2) wider networks are more preferred than deeper ones; 3) no edges are selected
in bi-level optimization. To anatomize these phenomena, we propose a unified
view on searching algorithms of existing frameworks, transferring the global
optimization to local cost minimization. Based on this reformulation, we
conduct empirical and theoretical analyses, revealing implicit inductive biases
in the cost’s assignment mechanism and evolution dynamics that cause the
observed phenomena. These biases indicate strong discrimination towards certain
topologies. To this end, we pose questions that future differentiable methods
for neural wiring discovery need to confront, hoping to evoke a discussion and
rethinking on how much bias has been enforced implicitly in existing NAS
methods.
Knowing What to Listen to: Early Attention for Deep Speech Representation Learning
Amirhossein Hajavi , Ali Etemad Subjects : Audio and Speech Processing (eess.AS) ; Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
Deep learning techniques have considerably improved speech processing in
recent years. Speech representations extracted by deep learning models are
being used in a wide range of tasks such as speech recognition, speaker
recognition, and speech emotion recognition. Attention models play an important
role in improving deep learning models. However current attention mechanisms
are unable to attend to fine-grained information items. In this paper we
propose the novel Fine-grained Early Frequency Attention (FEFA) for speech
signals. This model is capable of focusing on information items as small as
frequency bins. We evaluate the proposed model on two popular tasks of speaker
recognition and speech emotion recognition. Two widely used public datasets,
VoxCeleb and IEMOCAP, are used for our experiments. The model is implemented on
top of several prominent deep models as backbone networks to evaluate its
impact on performance compared to the original networks and other related work.
Our experiments show that by adding FEFA to different CNN architectures,
performance is consistently improved by substantial margins, even setting a new
state-of-the-art for the speaker recognition task. We also tested our model
against different levels of added noise showing improvements in robustness and
less sensitivity compared to the backbone networks.
CNN-Based Ultrasound Image Reconstruction for Ultrafast Displacement Tracking
Comments: Main text: 10 pages (3 figures). Animation and slideshow of figure 3 are provided as ancillary files. This work has been submitted to the IEEE Transactions on Medical Imaging for possible publication
Subjects:
Image and Video Processing (eess.IV)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Thanks to its capability of acquiring full-view frames at multiple kilohertz,
ultrafast ultrasound imaging unlocked the analysis of rapidly changing physical
phenomena in the human body, with pioneering applications such as
ultrasensitive flow imaging in the cardiovascular system or shear-wave
elastography. The accuracy achievable with these motion estimation techniques
is strongly contingent upon two contradictory requirements: a high quality of
consecutive frames and a high frame rate. Indeed, the image quality can usually
be improved by increasing the number of steered ultrafast acquisitions, but at
the expense of a reduced frame rate and possible motion artifacts. To achieve
accurate motion estimation at uncompromised frame rates and immune to motion
artifacts, the proposed approach relies on single ultrafast acquisitions to
reconstruct high-quality frames and on only two consecutive frames to obtain
2-D displacement estimates. To this end, we deployed a convolutional neural
network-based image reconstruction method combined with a speckle tracking
algorithm based on cross-correlation. Numerical and in vivo experiments,
conducted in the context of plane-wave imaging, demonstrate that the proposed
approach is capable of estimating displacements in regions where the presence
of side lobe and grating lobe artifacts prevents any displacement estimation
with a state-of-the-art technique that rely on conventional delay-and-sum
beamforming. The proposed approach may therefore unlock the full potential of
ultrafast ultrasound, in applications such as ultrasensitive cardiovascular
motion and flow analysis or shear-wave elastography.
Sparse Meta Networks for Sequential Adaptation and its Application to Adaptive Language Modelling
Comments: 9 pages, 4 figures, 2 tables
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Training a deep neural network requires a large amount of single-task data
and involves a long time-consuming optimization phase. This is not scalable to
complex, realistic environments with new unexpected changes. Humans can perform
fast incremental learning on the fly and memory systems in the brain play a
critical role. We introduce Sparse Meta Networks — a meta-learning approach to
learn online sequential adaptation algorithms for deep neural networks, by
using deep neural networks. We augment a deep neural network with a
layer-specific fast-weight memory. The fast-weights are generated sparsely at
each time step and accumulated incrementally through time providing a useful
inductive bias for online continual adaptation. We demonstrate strong
performance on a variety of sequential adaptation scenarios, from a simple
online reinforcement learning to a large scale adaptive language modelling.
Comments: ARRW@ECCV2020
Subjects:
Machine Learning (stat.ML)
; Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)
Deep neural networks have been successful in diverse discriminative
classification tasks, although, they are poorly calibrated often assigning high
probability to misclassified predictions. Potential consequences could lead to
trustworthiness and accountability of the models when deployed in real
applications, where predictions are evaluated based on their confidence scores.
Existing solutions suggest the benefits attained by combining deep neural
networks and Bayesian inference to quantify uncertainty over the models’
predictions for ambiguous datapoints. In this work we propose to validate and
test the efficacy of likelihood based models in the task of out of distribution
detection (OoD). Across different datasets and metrics we show that Bayesian
deep learning models on certain occasions marginally outperform conventional
neural networks and in the event of minimal overlap between in/out distribution
classes, even the best models exhibit a reduction in AUC scores in detecting
OoD data. Preliminary investigations indicate the potential inherent role of
bias due to choices of initialisation, architecture or activation functions. We
hypothesise that the sensitivity of neural networks to unseen inputs could be a
multi-factor phenomenon arising from the different architectural design choices
often amplified by the curse of dimensionality. Furthermore, we perform a study
to find the effect of the adversarial noise resistance methods on in and
out-of-distribution performance, as well as, also investigate adversarial noise
robustness of Bayesian deep learners.
Action and Perception as Divergence Minimization
Comments: 13 pages, 10 figures
Subjects:
Artificial Intelligence (cs.AI)
; Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)
We introduce a unified objective for action and perception of intelligent
agents. Extending representation learning and control, we minimize the joint
divergence between the world and a target distribution. Intuitively, such
agents use perception to align their beliefs with the world, and use actions to
align the world with their beliefs. Minimizing the joint divergence to an
expressive target maximizes the mutual information between the agent’s
representations and inputs, thus inferring representations that are informative
of past inputs and exploring future inputs that are informative of the
representations. This lets us derive intrinsic objectives, such as
representation learning, information gain, empowerment, and skill discovery
from minimal assumptions. Moreover, interpreting the target distribution as a
latent variable model suggests expressive world models as a path toward highly
adaptive agents that seek large niches in their environments, while rendering
task rewards optional. The presented framework provides a common language for
comparing a wide range of objectives, facilitates understanding of latent
variables for decision making, and offers a recipe for designing novel
objectives. We recommend deriving future agent objectives from the joint
divergence to facilitate comparison, to point out the agent’s target
distribution, and to identify the intrinsic objective terms needed to reach
that distribution.
Private Weighted Random Walk Stochastic Gradient Descent
Ghadir Ayache , Salim El Rouayheb Subjects : Information Theory (cs.IT) ; Machine Learning (cs.LG)
We consider a decentralized learning setting in which data is distributed
over nodes in a graph. The goal is to learn a global model on the distributed
data without involving any central entity that needs to be trusted. While
gossip-based stochastic gradient descent (SGD) can be used to achieve this
learning objective, it incurs high communication and computation costs, since
it has to wait for all the local models at all the nodes to converge. To speed
up the convergence, we propose instead to study random walk based SGD in which
a global model is updated based on a random walk on the graph. We propose two
algorithms based on two types of random walks that achieve, in a decentralized
way, uniform sampling and importance sampling of the data. We provide a
non-asymptotic analysis on the rate of convergence, taking into account the
constants related to the data and the graph. Our numerical results show that
the weighted random walk based algorithm has a better performance for
high-variance data. Moreover, we propose a privacy-preserving random walk
algorithm that achieves local differential privacy based on a Gamma noise
mechanism that we propose. We also give numerical results on the convergence of
this algorithm and show that it outperforms additive Laplace-based privacy
mechanisms.
Computational Analysis of Deformable Manifolds: from Geometric Modelling to Deep Learning
Comments: PhD Thesis, Versions of several chapters have previously appeard or been submitted under different titles
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (cs.LG); Numerical Analysis (math.NA)
Leo Tolstoy opened his monumental novel Anna Karenina with the now famous
words: Happy families are all alike; every unhappy family is unhappy in its own
way A similar notion also applies to mathematical spaces: Every flat space is
alike; every unflat space is unflat in its own way. However, rather than being
a source of unhappiness, we will show that the diversity of non-flat spaces
provides a rich area of study. The genesis of the so-called big data era and
the proliferation of social and scientific databases of increasing size has led
to a need for algorithms that can efficiently process, analyze and, even
generate high dimensional data. However, the curse of dimensionality leads to
the fact that many classical approaches do not scale well with respect to the
size of these problems. One technique to avoid some of these ill-effects is to
exploit the geometric structure of coherent data. In this thesis, we will
explore geometric methods for shape processing and data analysis. More
specifically, we will study techniques for representing manifolds and signals
supported on them through a variety of mathematical tools including, but not
limited to, computational differential geometry, variational PDE modeling, and
deep learning. First, we will explore non-isometric shape matching through
variational modeling. Next, we will use ideas from parallel transport on
manifolds to generalize convolution and convolutional neural networks to
deformable manifolds. Finally, we conclude by proposing a novel auto-regressive
model for capturing the intrinsic geometry and topology of data. Throughout
this work, we will use the idea of computing correspondences as a though-line
to both motivate our work and analyze our results.
Quantum Long Short-Term Memory
Samuel Yen-Chi Chen , Shinjae Yoo , Yao-Lung L. Fang Subjects : Quantum Physics (quant-ph) ; Machine Learning (cs.LG)
Long short-term memory (LSTM) is a kind of recurrent neural networks (RNN)
for sequence and temporal dependency data modeling and its effectiveness has
been extensively established. In this work, we propose a hybrid
quantum-classical model of LSTM, which we dub QLSTM. We demonstrate that the
proposed model successfully learns several kinds of temporal data. In
particular, we show that for certain testing cases, this quantum version of
LSTM converges faster, or equivalently, reaches a better accuracy, than its
classical counterpart. Due to the variational nature of our approach, the
requirements on qubit counts and circuit depth are eased, and our work thus
paves the way toward implementing machine learning algorithms for sequence
modeling on noisy intermediate-scale quantum (NISQ) devices.
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
Jiawei Chen , Xu Tan , Jian Luan , Tao Qin , Tie-Yan Liu Subjects : Audio and Speech Processing (eess.AS) ; Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
High-fidelity singing voices usually require higher sampling rate (e.g.,
48kHz) to convey expression and emotion. However, higher sampling rate causes
the wider frequency band and longer waveform sequences and throws challenges
for singing voice synthesis (SVS) in both frequency and time domains.
Conventional SVS systems that adopt small sampling rate cannot well address the
above challenges. In this paper, we develop HiFiSinger, an SVS system towards
high-fidelity singing voice. HiFiSinger consists of a FastSpeech based acoustic
model and a Parallel WaveGAN based vocoder to ensure fast training and
inference and also high voice quality. To tackle the difficulty of singing
modeling caused by high sampling rate (wider frequency band and longer
waveform), we introduce multi-scale adversarial training in both the acoustic
model and vocoder to improve singing modeling. Specifically, 1) To handle the
larger range of frequencies caused by higher sampling rate, we propose a novel
sub-frequency GAN (SF-GAN) on mel-spectrogram generation, which splits the full
80-dimensional mel-frequency into multiple sub-bands and models each sub-band
with a separate discriminator. 2) To model longer waveform sequences caused by
higher sampling rate, we propose a multi-length GAN (ML-GAN) for waveform
generation to model different lengths of waveform sequences with separate
discriminators. 3) We also introduce several additional designs and findings in
HiFiSinger that are crucial for high-fidelity voices, such as adding F0 (pitch)
and V/UV (voiced/unvoiced flag) as acoustic features, choosing an appropriate
window/hop size for mel-spectrogram, and increasing the receptive field in
vocoder for long vowel modeling. Experiment results show that HiFiSinger
synthesizes high-fidelity singing voices with much higher quality: 0.32/0.44
MOS gain over 48kHz/24kHz baseline and 0.83 MOS gain over previous SVS systems.
Distributed Online Optimization via Gradient Tracking with Adaptive Momentum
Guido Carnevale , Francesco Farina , Ivano Notarnicola , Giuseppe Notarstefano Subjects : Optimization and Control (math.OC) ; Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
This paper deals with a network of computing agents aiming to solve an online
optimization problem in a distributed fashion, i.e., by means of local
computation and communication, without any central coordinator. We propose the
gradient tracking with adaptive momentum estimation (GTAdam) distributed
algorithm, which combines a gradient tracking mechanism with first and second
order momentum estimates of the gradient. The algorithm is analyzed in the
online setting for strongly convex and smooth cost functions. We prove that the
average dynamic regret is bounded and that the convergence rate is linear. The
algorithm is tested on a time-varying classification problem, on a (moving)
target localization problem and in a stochastic optimization setup from image
classification. In these numerical experiments from multi-agent learning,
GTAdam outperforms state-of-the-art distributed optimization methods.
Online Community Detection for Event Streams on Networks
Comments: 38 pages
Subjects:
Social and Information Networks (cs.SI)
; Machine Learning (cs.LG); Machine Learning (stat.ML)
A common goal in network modeling is to uncover the latent community
structure present among nodes. For many real-world networks, observed
connections consist of events arriving as streams, which are then aggregated to
form edges, ignoring the temporal dynamic component. A natural way to take
account of this temporal dynamic component of interactions is to use point
processes as the foundation of the network models for community detection.
Computational complexity hampers the scalability of such approaches to large
sparse networks. To circumvent this challenge, we propose a fast online
variational inference algorithm for learning the community structure underlying
dynamic event arrivals on a network using continuous-time point process latent
network models. We provide regret bounds on the loss function of this
procedure, giving theoretical guarantees on performance. The proposed algorithm
is illustrated, using both simulation studies and real data, to have comparable
performance in terms of community structure in terms of community recovery to
non-online variants. Our proposed framework can also be readily modified to
incorporate other popular network structures.
Bayesian Perceptron: Towards fully Bayesian Neural Networks
Comments: Accepted for publication at the 59th IEEE Conference on Decision and Control (CDC) 2020
Subjects:
Machine Learning (stat.ML)
; Machine Learning (cs.LG)
Artificial neural networks (NNs) have become the de facto standard in machine
learning. They allow learning highly nonlinear transformations in a plethora of
applications. However, NNs usually only provide point estimates without
systematically quantifying corresponding uncertainties. In this paper a novel
approach towards fully Bayesian NNs is proposed, where training and predictions
of a perceptron are performed within the Bayesian inference framework in
closed-form. The weights and the predictions of the perceptron are considered
Gaussian random variables. Analytical expressions for predicting the
perceptron’s output and for learning the weights are provided for commonly used
activation functions like sigmoid or ReLU. This approach requires no
computationally expensive gradient calculations and further allows sequential
learning.
On the study of the Beran estimator for generalized censoring indicators
Mikael Escobar-Bach , Olivier Goudet Subjects : Machine Learning (stat.ML) ; Machine Learning (cs.LG)
Along with the analysis of time-to-event data, it is common to assume that
only partial information is given at hand. In the presence of right-censored
data with covariates, the conditional Kaplan-Meier estimator (also referred as
the Beran estimator) is known to propose a consistent estimate for the
lifetimes conditional survival function. However, a necessary condition is the
clear knowledge of whether each individual is censored or not, although, this
information might be incomplete or even totally absent in practice. We thus
propose a study on the Beran estimator when the censoring indicator is not
clearly specified. From this, we provide a new estimator for the conditional
survival function and establish its asymptotic normality under mild conditions.
We further study the supervised learning problem where the conditional survival
function is to be predicted with no censorship indicators. To this aim, we
investigate various approaches estimating the conditional expectation for the
censoring indicator. Along with the theoretical results, we illustrate how the
estimators work for small samples by means of a simulation study and show their
practical applicability with the analysis of synthetic data and the study of
real data for the prognosis of monoclonal gammopathy.
Simulation of an Elevator Group Control Using Generative Adversarial Networks and Related AI Tools
Tom Peetz , Sebastian Vogt , Martin Zaefferer , Thomas Bartz-Beielstein Subjects : Machine Learning (stat.ML) ; Machine Learning (cs.LG)
Testing new, innovative technologies is a crucial task for safety and
acceptance. But how can new systems be tested if no historical real-world data
exist? Simulation provides an answer to this important question. Classical
simulation tools such as event-based simulation are well accepted. But most of
these established simulation models require the specification of many
parameters. Furthermore, simulation runs, e.g., CFD simulations, are very time
consuming. Generative Adversarial Networks (GANs) are powerful tools for
generating new data for a variety of tasks. Currently, their most frequent
application domain is image generation. This article investigates the
applicability of GANs for imitating simulations. We are comparing the
simulation output of a technical system with the output of a GAN. To exemplify
this approach, a well-known multi-car elevator system simulator was chosen. Our
study demonstrates the feasibility of this approach. It also discusses pitfalls
and technical problems that occurred during the implementation. Although we
were able to show that in principle, GANs can be used as substitutes for
expensive simulation runs, we also show that they cannot be used “out of the
box”. Fine tuning is needed. We present a proof-of-concept, which can serve as
a starting point for further research.
Quasi-symplectic Langevin Variational Autoencoder
Zihao Wang , Hervé Delingette Subjects : Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Variational autoencoder (VAE) as one of the well investigated generative
model is very popular in nowadays neural learning research works. To leverage
VAE in practical tasks which have high dimensions and huge dataset often face
the problem of low variance evidence lower bounds construction. Markov chain
Monte Carlo (MCMC) is an effective approach to tight the evidence lower bound
(ELBO) for approximating the posterior distribution. Hamiltonian Variational
Autoencoder (HVAE) is one of the effective MCMC inspired approaches for
constructing the unbiased low-variance ELBO which is also amenable for
reparameterization trick. The solution significantly improves the performance
of the posterior estimation effectiveness, yet, a main drawback of HVAE is the
leapfrog method need to access the posterior gradient twice which leads to bad
inference efficiency performance and the GPU memory requirement is fair large.
This flaw limited the application of Hamiltonian based inference framework for
large scale networks inference. To tackle this problem, we propose a
Quasi-symplectic Langevin Variational autoencoder (Langevin-VAE), which can be
a significant improvement over resource usage efficiency. We qualitatively and
quantitatively demonstrate the effectiveness of the Langevin-VAE compared to
the state-of-art gradients informed inference framework.
Learning Unknown Physics of non-Newtonian Fluids
Brandon Reyes , Amanda A. Howard , Paris Perdikaris , Alexandre M. Tartakovsky Subjects : Computational Physics (physics.comp-ph) ; Machine Learning (cs.LG); Fluid Dynamics (physics.flu-dyn)
We extend the physics-informed neural network (PINN) method to learn
viscosity models of two non-Newtonian systems (polymer melts and suspensions of
particles) using only velocity measurements. The PINN-inferred viscosity models
agree with the empirical models for shear rates with large absolute values but
deviate for shear rates near zero where the analytical models have an
unphysical singularity. Once a viscosity model is learned, we use the PINN
method to solve the momentum conservation equation for non-Newtonian fluid flow
using only the boundary conditions.
A free web service for fast COVID-19 classification of chest X-Ray images
Comments: 14 pages, 12 figures
Subjects:
Image and Video Processing (eess.IV)
; Machine Learning (cs.LG)
The coronavirus outbreak became a major concern for society worldwide.
Technological innovation and ingenuity are essential to fight COVID-19 pandemic
and bring us one step closer to overcome it. Researchers over the world are
working actively to find available alternatives in different fields, such as
the Healthcare System, pharmaceutic, health prevention, among others. With the
rise of artificial intelligence (AI) in the last 10 years, IA-based
applications have become the prevalent solution in different areas because of
its higher capability, being now adopted to help combat against COVID-19. This
work provides a fast detection system of COVID-19 characteristics in X-Ray
images based on deep learning (DL) techniques. This system is available as a
free web deployed service for fast patient classification, alleviating the high
demand for standards method for COVID-19 diagnosis. It is constituted of two
deep learning models, one to differentiate between X-Ray and non-X-Ray images
based on Mobile-Net architecture, and another one to identify chest X-Ray
images with characteristics of COVID-19 based on the DenseNet architecture. For
real-time inference, it is provided a pair of dedicated GPUs, which reduce the
computational time. The whole system can filter out non-chest X-Ray images, and
detect whether the X-Ray presents characteristics of COVID-19, highlighting the
most sensitive regions.
SRQA: Synthetic Reader for Factoid Question Answering
Comments: arXiv admin note: text overlap with arXiv:1809.00676
Journal-ref: Knowledge-Based Systems, Volume 193, 6 April 2020, 105415
Subjects:
Computation and Language (cs.CL)
; Machine Learning (cs.LG)
The question answering system can answer questions from various fields and
forms with deep neural networks, but it still lacks effective ways when facing
multiple evidences. We introduce a new model called SRQA, which means Synthetic
Reader for Factoid Question Answering. This model enhances the question
answering system in the multi-document scenario from three aspects: model
structure, optimization goal, and training method, corresponding to Multilayer
Attention (MA), Cross Evidence (CE), and Adversarial Training (AT)
respectively. First, we propose a multilayer attention network to obtain a
better representation of the evidences. The multilayer attention mechanism
conducts interaction between the question and the passage within each layer,
making the token representation of evidences in each layer takes the
requirement of the question into account. Second, we design a cross evidence
strategy to choose the answer span within more evidences. We improve the
optimization goal, considering all the answers’ locations in multiple evidences
as training targets, which leads the model to reason among multiple evidences.
Third, adversarial training is employed to high-level variables besides the
word embedding in our model. A new normalization method is also proposed for
adversarial perturbations so that we can jointly add perturbations to several
target variables. As an effective regularization method, adversarial training
enhances the model’s ability to process noisy data. Combining these three
strategies, we enhance the contextual representation and locating ability of
our model, which could synthetically extract the answer span from several
evidences. We perform SRQA on the WebQA dataset, and experiments show that our
model outperforms the state-of-the-art models (the best fuzzy score of our
model is up to 78.56%, with an improvement of about 2%).
Multimodal brain tumor classification
Marvin Lerousseau , Eric Deutsh , Nikos Paragios Subjects : Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cancer is a complex disease that provides various types of information
depending on the scale of observation. While most tumor diagnostics are
performed by observing histopathological slides, radiology images should yield
additional knowledge towards the efficacy of cancer diagnostics. This work
investigates a deep learning method combining whole slide images and magnetic
resonance images to classify tumors. Experiments are prospectively conducted on
the 2020 Computational Precision Medicine challenge, in a 3-classes unbalanced
classification task. We report cross-validation (resp. validation)
balanced-accuracy, kappa and f1 of 0.913, 0.897 and 0.951 (resp. 0.91, 0.90 and
0.94). The complete code of the method is open-source at XXXX. Those include
histopathological data pre-processing, and can therefore be used off-the-shelf
for other histopathological and/or radiological classification.
Large Dimensional Analysis and Improvement of Multi Task Learning
Malik Tiomoko , Romain Couillet , Hafiz Tiomoko Subjects : Machine Learning (stat.ML) ; Machine Learning (cs.LG)
Multi Task Learning (MTL) efficiently leverages useful information contained
in multiple related tasks to help improve the generalization performance of all
tasks. This article conducts a large dimensional analysis of a simple but, as
we shall see, extremely powerful when carefully tuned, Least Square Support
Vector Machine (LSSVM) version of MTL, in the regime where the dimension (p) of
the data and their number (n) grow large at the same rate.
Under mild assumptions on the input data, the theoretical analysis of the
MTL-LSSVM algorithm first reveals the “sufficient statistics” exploited by the
algorithm and their interaction at work. These results demonstrate, as a
striking consequence, that the standard approach to MTL-LSSVM is largely
suboptimal, can lead to severe effects of negative transfer but that these
impairments are easily corrected. These corrections are turned into an improved
MTL-LSSVM algorithm which can only benefit from additional data, and the
theoretical performance of which is also analyzed.
As evidenced and theoretically sustained in numerous recent works, these
large dimensional results are robust to broad ranges of data distributions,
which our present experiments corroborate. Specifically, the article reports a
systematically close behavior between theoretical and empirical performances on
popular datasets, which is strongly suggestive of the applicability of the
proposed carefully tuned MTL-LSSVM method to real data. This fine-tuning is
fully based on the theoretical analysis and does not in particular require any
cross validation procedure. Besides, the reported performances on real datasets
almost systematically outperform much more elaborate and less intuitive
state-of-the-art multi-task and transfer learning methods.
Auto-Classifier: A Robust Defect Detector Based on an AutoML Head
Comments: 12 pages, 2 figures. Published in ICONIP2020, proceedings published in the Springer’s series of Lecture Notes in Computer Science
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
The dominant approach for surface defect detection is the use of hand-crafted
feature-based methods. However, this falls short when conditions vary that
affect extracted images. So, in this paper, we sought to determine how well
several state-of-the-art Convolutional Neural Networks perform in the task of
surface defect detection. Moreover, we propose two methods: CNN-Fusion, that
fuses the prediction of all the networks into a final one, and Auto-Classifier,
which is a novel proposal that improves a Convolutional Neural Network by
modifying its classification component using AutoML. We carried out experiments
to evaluate the proposed methods in the task of surface defect detection using
different datasets from DAGM2007. We show that the use of Convolutional Neural
Networks achieves better results than traditional methods, and also, that
Auto-Classifier out-performs all other methods, by achieving 100% accuracy and
100% AUC results throughout all the datasets.
Automated identification of metamorphic test scenarios for an ocean-modeling application
Comments: Shot paper:2 pages, 2020 IEEE International Conference On Artificial Intelligence Testing (AITest)
Subjects:
Software Engineering (cs.SE)
; Machine Learning (cs.LG)
Metamorphic testing seeks to validate software in the absence of test
oracles. Our application domain is ocean modeling, where test oracles often do
not exist, but where symmetries of the simulated physical systems are known. In
this short paper we present work in progress for automated generation of
metamorphic test scenarios using machine learning. Metamorphic testing may be
expressed as f(g(X))=h(f(X)) with f being the application under test, with
input data X, and with the metamorphic relation (g, h). Automatically generated
metamorphic relations can be used for constructing regression tests, and for
comparing different versions of the same software application. Here, we
restrict to h being the identity map. Then, the task of constructing tests
means finding different g which we tackle using machine learning algorithms.
These algorithms typically minimize a cost function. As one possible g is
already known to be the identity map, for finding a second possible g, we
construct the cost function to minimize for g being a metamorphic relation and
to penalize for g being the identity map. After identifying the first
metamorphic relation, the procedure is repeated with a cost function rewarding
g that are orthogonal to previously found metamorphic relations. For
experimental evaluation, two implementations of an ocean-modeling application
will be subjected to the proposed method with the objective of presenting the
use of metamorphic relations to test the implementations of the applications.
Fairness in the Eyes of the Data: Certifying Machine-Learning Models
Shahar Segal , Yossi Adi , Benny Pinkas , Carsten Baum , Chaya Ganesh , Joseph Keshet Subjects : Artificial Intelligence (cs.AI) ; Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)
We present a framework that allows to certify the fairness degree of a model
based on an interactive and privacy-preserving test. The framework verifies any
trained model, regardless of its training process and architecture. Thus, it
allows us to evaluate any deep learning model on multiple fairness definitions
empirically. We tackle two scenarios, where either the test data is privately
available only to the tester or is publicly known in advance, even to the model
creator. We investigate the soundness of the proposed approach using
theoretical analysis and present statistical guarantees for the interactive
test. Finally, we provide a cryptographic technique to automate fairness
testing and certified inference with only black-box access to the model at hand
while hiding the participants’ sensitive data.
End-to-End Learning of Neuromorphic Wireless Systems for Low-Power Edge Artificial Intelligence
Comments: To be presented at Asilomar 2020
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
This paper introduces a novel “all-spike” low-power solution for remote
wireless inference that is based on neuromorphic sensing, Impulse Radio (IR),
and Spiking Neural Networks (SNNs). In the proposed system, event-driven
neuromorphic sensors produce asynchronous time-encoded data streams that are
encoded by an SNN, whose output spiking signals are pulse modulated via IR and
transmitted over general frequence-selective channels; while the receiver’s
inputs are obtained via hard detection of the received signals and fed to an
SNN for classification. We introduce an end-to-end training procedure that
treats the cascade of encoder, channel, and decoder as a probabilistic
SNN-based autoencoder that implements Joint Source-Channel Coding (JSCC). The
proposed system, termed NeuroJSCC, is compared to conventional synchronous
frame-based and uncoded transmissions in terms of latency and accuracy. The
experiments confirm that the proposed end-to-end neuromorphic edge architecture
provides a promising framework for efficient and low-latency remote sensing,
communication, and inference.
Smoke Testing for Machine Learning: Simple Tests to Discover Severe Defects
Comments: under review
Subjects:
Software Engineering (cs.SE)
; Machine Learning (cs.LG)
Machine learning is nowadays a standard technique for data analysis within
software applications. Software engineers need quality assurance techniques
that are suitable for these new kinds of systems. Within this article, we
discuss the question whether standard software testing techniques that have
been part of textbooks since decades are also useful for the testing of machine
learning software. Concretely, we try to determine generic smoke tests that can
be used to assert that basic functions can be executed without crashing. We
found that we can derive such tests using techniques similar to equivalence
classes and boundary value analysis. Moreover, we found that these concepts can
also be applied to hyperparameters, to further improve the quality of the smoke
tests. Even though our approach is almost trivial, we were able to find bugs in
all three machine learning libraries that we tested and severe bugs in two of
the three libraries. This demonstrates that common software testing techniques
are still valid in the age of machine learning and that they are suitable to
find and prevent severe bugs, even in mature machine learning libraries.
TopoMap: A 0-dimensional Homology Preserving Projection of High-Dimensional Data
Harish Doraiswamy , Julien Tierny , Paulo J. S. Silva , Luis Gustavo Nonato , Claudio Silva Subjects : Graphics (cs.GR) ; Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Multidimensional Projection is a fundamental tool for high-dimensional data
analytics and visualization. With very few exceptions, projection techniques
are designed to map data from a high-dimensional space to a visual space so as
to preserve some dissimilarity (similarity) measure, such as the Euclidean
distance for example. In fact, although adopting distinct mathematical
formulations designed to favor different aspects of the data, most
multidimensional projection methods strive to preserve dissimilarity measures
that encapsulate geometric properties such as distances or the proximity
relation between data objects. However, geometric relations are not the only
interesting property to be preserved in a projection. For instance, the
analysis of particular structures such as clusters and outliers could be more
reliably performed if the mapping process gives some guarantee as to
topological invariants such as connected components and loops. This paper
introduces TopoMap, a novel projection technique which provides topological
guarantees during the mapping process. In particular, the proposed method
performs the mapping from a high-dimensional space to a visual space, while
preserving the 0-dimensional persistence diagram of the Rips filtration of the
high-dimensional data, ensuring that the filtrations generate the same
connected components when applied to the original as well as projected data.
The presented case studies show that the topological guarantee provided by
TopoMap not only brings confidence to the visual analytic process but also can
be used to assist in the assessment of other projection methods.
DRLE: Decentralized Reinforcement Learning at the Edge for Traffic Light Control
Pengyuan Zhou , Xianfu Chen , Zhi Liu , Tristan Braud , Pan Hui , Jussi Kangasharju Subjects : Multiagent Systems (cs.MA) ; Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Systems and Control (eess.SY)
The Internet of Vehicles (IoV) enables real-time data exchange among vehicles
and roadside units and thus provides a promising solution to alleviate traffic
jams in the urban area. Meanwhile, better traffic management via efficient
traffic light control can benefit the IoV as well by enabling a better
communication environment and decreasing the network load. As such, IoV and
efficient traffic light control can formulate a virtuous cycle. Edge computing,
an emerging technology to provide low-latency computation capabilities at the
edge of the network, can further improve the performance of this cycle.
However, while the collected information is valuable, an efficient solution for
better utilization and faster feedback has yet to be developed for
edge-empowered IoV. To this end, we propose a Decentralized Reinforcement
Learning at the Edge for traffic light control in the IoV (DRLE). DRLE exploits
the ubiquity of the IoV to accelerate the collection of traffic data and its
interpretation towards alleviating congestion and providing better traffic
light control. DRLE operates within the coverage of the edge servers and uses
aggregated data from neighboring edge servers to provide city-scale traffic
light control. DRLE decomposes the highly complex problem of large area
control. into a decentralized multi-agent problem. We prove its global optima
with concrete mathematical reasoning. The proposed decentralized reinforcement
learning algorithm running at each edge node adapts the traffic lights in real
time. We conduct extensive evaluations and demonstrate the superiority of this
approach over several state-of-the-art algorithms.
Modeling Global Body Configurations in American Sign Language
Nicholas Wilkins , Beck Cordes Galbraith , Ifeoma Nwogu Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG)
American Sign Language (ASL) is the fourth most commonly used language in the
United States and is the language most commonly used by Deaf people in the
United States and the English-speaking regions of Canada. Unfortunately, until
recently, ASL received little research. This is due, in part, to its delayed
recognition as a language until William C. Stokoe’s publication in 1960.
Limited data has been a long-standing obstacle to ASL research and
computational modeling. The lack of large-scale datasets has prohibited many
modern machine-learning techniques, such as Neural Machine Translation, from
being applied to ASL. In addition, the modality required to capture sign
language (i.e. video) is complex in natural settings (as one must deal with
background noise, motion blur, and the curse of dimensionality). Finally, when
compared with spoken languages, such as English, there has been limited
research conducted into the linguistics of ASL.
We realize a simplified version of Liddell and Johnson’s Movement-Hold (MH)
Model using a Probabilistic Graphical Model (PGM). We trained our model on
ASLing, a dataset collected from three fluent ASL signers. We evaluate our PGM
against other models to determine its ability to model ASL. Finally, we
interpret various aspects of the PGM and draw conclusions about ASL phonetics.
The main contributions of this paper are
Decision Tree Based Hardware Power Monitoring for Run Time Dynamic Power Management in FPGA
Comments: published as a conference paper in FPL 2017
Subjects:
Hardware Architecture (cs.AR)
; Machine Learning (cs.LG)
Fine-grained runtime power management techniques could be promising solutions
for power reduction. Therefore, it is essential to establish accurate power
monitoring schemes to obtain dynamic power variation in a short period (i.e.,
tens or hundreds of clock cycles). In this paper, we leverage a
decision-tree-based power modeling approach to establish fine-grained hardware
power monitoring on FPGA platforms. A generic and complete design flow is
developed to implement the decision tree power model which is capable of
precisely estimating dynamic power in a fine-grained manner. A flexible
architecture of the hardware power monitoring is proposed, which can be
instrumented in any RTL design for runtime power estimation, dispensing with
the need for extra power measurement devices. Experimental results of applying
the proposed model to benchmarks with different resource types reveal an
average error up to 4% for dynamic power estimation. Moreover, the overheads of
area, power and performance incurred by the power monitoring circuitry are
extremely low. Finally, we apply our power monitoring technique to the power
management using phase shedding with an on-chip multi-phase regulator as a
proof of concept and the results demonstrate 14% efficiency enhancement for the
power supply of the FPGA internal logic.
An Ensemble Learning Approach for In-situ Monitoring of FPGA Dynamic Power
Comments: published as a journal (TCAD) paper in 2018
Subjects:
Hardware Architecture (cs.AR)
; Machine Learning (cs.LG)
As field-programmable gate arrays become prevalent in critical application
domains, their power consumption is of high concern. In this paper, we present
and evaluate a power monitoring scheme capable of accurately estimating the
runtime dynamic power of FPGAs in a fine-grained timescale, in order to support
emerging power management techniques. In particular, we describe a novel and
specialized ensemble model which can be decomposed into multiple customized
decision-tree-based base learners. To aid in model synthesis, a generic
computer-aided design flow is proposed to generate samples, select features,
tune hyperparameters and train the ensemble estimator. Besides this, a hardware
realization of the trained ensemble estimator is presented for on-chip
real-time power estimation. In the experiments, we first show that a single
decision tree model can achieve prediction error within 4.51% of a commercial
gate-level power estimation tool, which is 2.41–6.07x lower than provided by
the commonly used linear model. More importantly, we study the extra gains in
inference accuracy using the proposed ensemble model. Experimental results
reveal that the ensemble monitoring method can further improve the accuracy of
power predictions to within a maximum error of 1.90%. Moreover, the lookup
table (LUT) overhead of the ensemble monitoring hardware employing up to 64
base learners is within 1.22% of the target FPGA, indicating its light-weight
and scalable characteristics.
Comments: 28 pages, 9 figures
Subjects:
Computational Engineering, Finance, and Science (cs.CE)
; Machine Learning (cs.LG)
A common workflow for many engineering design problems requires the
evaluation of the design system to be investigated under a range of conditions.
These conditions usually involve a combination of several parameters. To
perform a complete evaluation of a single candidate configuration, it may be
necessary to perform hundreds to thousands of simulations. This can be
computationally very expensive, particularly if several configurations need to
be evaluated, as in the case of the mathematical optimization of a design
problem. Although the simulations are extremely complex, generally, there is a
high degree of redundancy in them, as many of the cases vary only slightly from
one another. This redundancy can be exploited by omitting some simulations that
are uninformative, thereby reducing the number of simulations required to
obtain a reasonable approximation of the complete system. The decision of which
simulations are useful is made through the use of machine learning techniques,
which allow us to estimate the results of “yet-to-be-performed” simulations
from the ones that are already performed. In this study, we present the results
of one such technique, namely active learning, to provide an approximate result
of an entire offshore riser design simulation portfolio from a subset that is
80\% smaller than the original one. These results are expected to facilitate a
significant speed-up in the offshore riser design.
Learning from Protein Structure with Geometric Vector Perceptrons
Bowen Jing , Stephan Eismann , Patricia Suriana , Raphael J.L. Townshend , Ron Dror Subjects : Biomolecules (q-bio.BM) ; Machine Learning (cs.LG); Machine Learning (stat.ML)
Learning on 3D structures of large biomolecules is emerging as a distinct
area in machine learning, but there has yet to emerge a unifying network
architecture that simultaneously leverages the graph-structured and geometric
aspects of the problem domain. To address this gap, we introduce geometric
vector perceptrons, which extend standard dense layers to operate on
collections of Euclidean vectors. Graph neural networks equipped with such
layers are able to perform both geometric and relational reasoning on efficient
and natural representations of macromolecular structure. We demonstrate our
approach on two important problems in learning from protein structure: model
quality assessment and computational protein design. Our approach improves over
existing classes of architectures, including state-of-the-art graph-based and
voxel-based methods.
P6: A Declarative Language for Integrating Machine Learning in Visual Analytics
Comments: Accepted for presentation at IEEE VIS 2020
Subjects:
Software Engineering (cs.SE)
; Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Programming Languages (cs.PL)
We present P6, a declarative language for building high performance visual
analytics systems through its support for specifying and integrating machine
learning and interactive visualization methods. As data analysis methods based
on machine learning and artificial intelligence continue to advance, a visual
analytics solution can leverage these methods for better exploiting large and
complex data. However, integrating machine learning methods with interactive
visual analysis is challenging. Existing declarative programming libraries and
toolkits for visualization lack support for coupling machine learning methods.
By providing a declarative language for visual analytics, P6 can empower more
developers to create visual analytics applications that combine machine
learning and visualization methods for data analysis and problem solving.
Through a variety of example applications, we demonstrate P6’s capabilities and
show the benefits of using declarative specifications to build visual analytics
systems. We also identify and discuss the research opportunities and challenges
for declarative visual analytics.
Real Image Super Resolution Via Heterogeneous Model using GP-NAS
Comments: This is a manuscript related to our algorithm that won the ECCV AIM 2020 Real Image Super-Resolution Challenge
Subjects:
Image and Video Processing (eess.IV)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
With advancement in deep neural network (DNN), recent state-of-the-art (SOTA)
image superresolution (SR) methods have achieved impressive performance using
deep residual network with dense skip connections. While these models perform
well on benchmark dataset where low-resolution (LR) images are constructed from
high-resolution (HR) references with known blur kernel, real image SR is more
challenging when both images in the LR-HR pair are collected from real cameras.
Based on existing dense residual networks, a Gaussian process based neural
architecture search (GP-NAS) scheme is utilized to find candidate network
architectures using a large search space by varying the number of dense
residual blocks, the block size and the number of features. A suite of
heterogeneous models with diverse network structure and hyperparameter are
selected for model-ensemble to achieve outstanding performance in real image
SR. The proposed method won the first place in all three tracks of the AIM 2020
Real Image Super-Resolution Challenge.
Robust Object Classification Approach using Spherical Harmonics
Ayman Mukhaimar , Ruwan Tennakoon , Chow Yin Lai , Reza Hoseinnezhad , Alireza Bab-Hadiashar Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG)
In this paper, we present a robust spherical harmonics approach for the
classification of point cloud-based objects. Spherical harmonics have been used
for classification over the years, with several frameworks existing in the
literature. These approaches use variety of spherical harmonics based
descriptors to classify objects. We first investigated these frameworks
robustness against data augmentation, such as outliers and noise, as it has not
been studied before. Then we propose a spherical convolution neural network
framework for robust object classification. The proposed framework uses the
voxel grid of concentric spheres to learn features over the unit ball. Our
proposed model learn features that are less sensitive to data augmentation due
to the selected sampling strategy and the designed convolution operation. We
tested our proposed model against several types of data augmentation, such as
noise and outliers. Our results show that the proposed model outperforms the
state of art networks in terms of robustness to data augmentation.
Cost-aware Feature Selection for IoT Device Classification
Comments: 32 Pages, 8 figures
Subjects:
Networking and Internet Architecture (cs.NI)
; Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Classification of IoT devices into different types is of paramount
importance, from multiple perspectives, including security and privacy aspects.
Recent works have explored machine learning techniques for fingerprinting (or
classifying) IoT devices, with promising results. However, existing works have
assumed that the features used for building the machine learning models are
readily available or can be easily extracted from the network traffic; in other
words, they do not consider the costs associated with feature extraction. In
this work, we take a more realistic approach, and argue that feature extraction
has a cost, and the costs are different for different features. We also take a
step forward from the current practice of considering the misclassification
loss as a binary value, and make a case for different losses based on the
misclassification performance. Thereby, and more importantly, we introduce the
notion of risk for IoT device classification. We define and formulate the
problem of cost-aware IoT device classification. This being a combinatorial
optimization problem, we develop a novel algorithm to solve it in a fast and
effective way using the Cross-Entropy (CE) based stochastic optimization
technique. Using traffic of real devices, we demonstrate the capability of the
CE based algorithm in selecting features with minimal risk of misclassification
while keeping the cost for feature extraction within a specified limit.
Non-parametric generalized linear model
Matthew Dowling , Yuan Zhao , Il Memming Park Subjects : Machine Learning (stat.ML) ; Machine Learning (cs.LG)
A fundamental problem in statistical neuroscience is to model how neurons
encode information by analyzing electrophysiological recordings. A popular and
widely-used approach is to fit the spike trains with an autoregressive point
process model. These models are characterized by a set of convolutional
temporal filters, whose subsequent analysis can help reveal how neurons encode
stimuli, interact with each other, and process information. In practice a
sufficiently rich but small ensemble of temporal basis functions needs to be
chosen to parameterize the filters. However, obtaining a satisfactory fit often
requires burdensome model selection and fine tuning the form of the basis
functions and their temporal span. In this paper we propose a nonparametric
approach for jointly inferring the filters and hyperparameters using the
Gaussian process framework. Our method is computationally efficient taking
advantage of the sparse variational approximation while being flexible and rich
enough to characterize arbitrary filters in continuous time lag. Moreover, our
method automatically learns the temporal span of the filter. For the particular
application in neuroscience, we designed priors for stimulus and history
filters useful for the spike trains. We compare and validate our method on
simulated and real neural spike train data.
Bid Shading in The Brave New World of First-Price Auctions
Comments: In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM’20), October 19-23, 2020, Virtual Event, Ireland
Subjects:
Computer Science and Game Theory (cs.GT)
; Machine Learning (cs.LG); Machine Learning (stat.ML)
Online auctions play a central role in online advertising, and are one of the
main reasons for the industry’s scalability and growth. With great changes in
how auctions are being organized, such as changing the second- to first-price
auction type, advertisers and demand platforms are compelled to adapt to a new
volatile environment. Bid shading is a known technique for preventing
overpaying in auction systems that can help maintain the strategy equilibrium
in first-price auctions, tackling one of its greatest drawbacks. In this study,
we propose a machine learning approach of modeling optimal bid shading for
non-censored online first-price ad auctions. We clearly motivate the approach
and extensively evaluate it in both offline and online settings on a major
demand side platform. The results demonstrate the superiority and robustness of
the new approach as compared to the existing approaches across a range of
performance metrics.
Learning to summarize from human feedback
Nisan Stiennon , Long Ouyang , Jeff Wu , Daniel M. Ziegler , Ryan Lowe , Chelsea Voss , Alec Radford , Dario Amodei , Paul Christiano Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
As language models become more powerful, training and evaluation are
increasingly bottlenecked by the data and metrics used for a particular task.
For example, summarization models are often trained to predict human reference
summaries and evaluated using ROUGE, but both of these metrics are rough
proxies for what we really care about—summary quality. In this work, we show
that it is possible to significantly improve summary quality by training a
model to optimize for human preferences. We collect a large, high-quality
dataset of human comparisons between summaries, train a model to predict the
human-preferred summary, and use that model as a reward function to fine-tune a
summarization policy using reinforcement learning. We apply our method to a
version of the TL;DR dataset of Reddit posts and find that our models
significantly outperform both human reference summaries and much larger models
fine-tuned with supervised learning alone. Our models also transfer to CNN/DM
news articles, producing summaries nearly as good as the human reference
without any news-specific fine-tuning. We conduct extensive analyses to
understand our human feedback dataset and fine-tuned models. We establish that
our reward model generalizes to new datasets, and that optimizing our reward
model results in better summaries than optimizing ROUGE according to humans. We
hope the evidence from our paper motivates machine learning researchers to pay
closer attention to how their training loss affects the model behavior they
actually want.
Towards Earnings Call and Stock Price Movement
Comments: Accepted by KDD 2020 MLF workshop
Subjects:
Statistical Finance (q-fin.ST)
; Computational Engineering, Finance, and Science (cs.CE); Computation and Language (cs.CL); Machine Learning (cs.LG)
Earnings calls are hosted by management of public companies to discuss the
company’s financial performance with analysts and investors. Information
disclosed during an earnings call is an essential source of data for analysts
and investors to make investment decisions. Thus, we leverage earnings call
transcripts to predict future stock price dynamics. We propose to model the
language in transcripts using a deep learning framework, where an attention
mechanism is applied to encode the text data into vectors for the
discriminative network classifier to predict stock price movements. Our
empirical experiments show that the proposed model is superior to the
traditional machine learning baselines and earnings call information can boost
the stock price prediction performance.
Convolutional Speech Recognition with Pitch and Voice Quality Features
Comments: 5 pages
Subjects:
Audio and Speech Processing (eess.AS)
; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
The effects of adding pitch and voice quality features such as jitter and
shimmer to a state-of-the-art CNN model for Automatic Speech Recognition are
studied in this work. Pitch features have been previously used for improving
classical HMM and DNN baselines, while jitter and shimmer parameters have
proven to be useful for tasks like speaker or emotion recognition. Up to our
knowledge, this is the first work combining such pitch and voice quality
features with modern convolutional architectures, showing improvements up to 2%
absolute WER points, for the publicly available Spanish Common Voice dataset.
Particularly, our work combines these features with mel-frequency spectral
coefficients (MFSCs) to train a convolutional architecture with Gated Linear
Units (Conv GLUs). Such models have shown to yield small word error rates,
while being very suitable for parallel processing for online streaming
recognition use cases. We have added pitch and voice quality functionality to
Facebook’s wav2letter speech recognition framework, and we provide with such
code and recipes to the community, to carry on with further experiments.
Besides, to the best of our knowledge, our Spanish Common Voice recipe is the
first public Spanish recipe for wav2letter.
Micro-entries: Encouraging Deeper Evaluation of Mental Models Over Time for Interactive Data Systems
Comments: 10 pages, submitted to BELIV 2020 Workshop
Subjects:
Human-Computer Interaction (cs.HC)
; Machine Learning (cs.LG)
Many interactive data systems combine visual representations of data with
embedded algorithmic support for automation and data exploration. To
effectively support transparent and explainable data systems, it is important
for researchers and designers to know how users understand the system. We
discuss the evaluation of users’ mental models of system logic. Mental models
are challenging to capture and analyze. While common evaluation methods aim to
approximate the user’s final mental model after a period of system usage, user
understanding continuously evolves as users interact with a system over time.
In this paper, we review many common mental model measurement techniques,
discuss tradeoffs, and recommend methods for deeper, more meaningful evaluation
of mental models when using interactive data analysis and visualization
systems. We present guidelines for evaluating mental models over time that
reveal the evolution of specific model updates and how they may map to the
particular use of interface features and data queries. By asking users to
describe what they know and how they know it, researchers can collect
structured, time-ordered insight into a user’s conceptualization process while
also helping guide users to their own discoveries.
Clustering of Nonnegative Data and an Application to Matrix Completion
C. Strohmeier , D. Needell Subjects : Machine Learning (stat.ML) ; Machine Learning (cs.LG); Signal Processing (eess.SP)
In this paper, we propose a simple algorithm to cluster nonnegative data
lying in disjoint subspaces. We analyze its performance in relation to a
certain measure of correlation between said subspaces. We use our clustering
algorithm to develop a matrix completion algorithm which can outperform
standard matrix completion algorithms on data matrices satisfying certain
natural conditions.
Efficiency in Real-time Webcam Gaze Tracking
Comments: Awarded Best Paper at European Conference on Computer Vision (ECCV) Workshop on Eye Gaze in AR, VR, and in the Wild (OpenEyes) 2020
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Efficiency and ease of use are essential for practical applications of camera
based eye/gaze-tracking. Gaze tracking involves estimating where a person is
looking on a screen based on face images from a computer-facing camera. In this
paper we investigate two complementary forms of efficiency in gaze tracking: 1.
The computational efficiency of the system which is dominated by the inference
speed of a CNN predicting gaze-vectors; 2. The usability efficiency which is
determined by the tediousness of the mandatory calibration of the gaze-vector
to a computer screen. To do so, we evaluate the computational speed/accuracy
trade-off for the CNN and the calibration effort/accuracy trade-off for screen
calibration. For the CNN, we evaluate the full face, two-eyes, and single eye
input. For screen calibration, we measure the number of calibration points
needed and evaluate three types of calibration: 1. pure geometry, 2. pure
machine learning, and 3. hybrid geometric regression. Results suggest that a
single eye input and geometric regression calibration achieve the best
trade-off.
Quantum Discriminator for Binary Classification
Prasanna Date Subjects : Quantum Physics (quant-ph) ; Machine Learning (cs.LG); Machine Learning (stat.ML)
Quantum computers operate in the high-dimensional tensor product spaces and
are known to outperform classical computers on many problems. They are poised
to accelerate machine learning tasks in the future. In this work, we operate in
the quantum machine learning (QML) regime where a QML model is trained using a
quantum-classical hybrid algorithm and inferencing is performed using a quantum
algorithm. We leverage the traditional two-step machine learning workflow,
where features are extracted from the data in the first step and a
discriminator acting on the extracted features is used to classify the data in
the second step. Assuming that the binary features have been extracted from the
data, we propose a quantum discriminator for binary classification. The quantum
discriminator takes as input the binary features of a data point and a
prediction qubit in the zero state, and outputs the correct class of the data
point. The quantum discriminator is defined by a parameterized unitary matrix
(U_Theta) containing (mathcal{O}(N)) parameters, where (N) is the number of
data points in the training data set. Furthermore, we show that the quantum
discriminator can be trained in (mathcal{O}(N log N)) time using
(mathcal{O}(N log N)) classical bits and (mathcal{O}(log N)) qubits. We
also show that inferencing for the quantum discriminator can be done in
(mathcal{O}(N)) time using (mathcal{O}(log N)) qubits. Finally, we use the
quantum discriminator to classify the XOR problem on the IBM Q universal
quantum computer with (100\%) accuracy.
Detecting Parkinson's Disease from Speech-task in an accessible and interpretable manner
Wasifur Rahman , Sangwu Lee , Md. Saiful Islam , Abdullah Al Mamun , Victor Antony , Harshil Ratnu , Mohammad Rafayet Ali , Ehsan Hoque Subjects : Audio and Speech Processing (eess.AS) ; Computers and Society (cs.CY); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
Every nine minutes a person is diagnosed with Parkinson’s Disease (PD) in the
United States. However, studies have shown that between 25 and 80\% of
individuals with Parkinson’s Disease (PD) remain undiagnosed. An online, in the
wild audio recording application has the potential to help screen for the
disease if risk can be accurately assessed. In this paper, we collect data from
726 unique subjects (262 PD and 464 Non-PD) uttering the “quick brown fox jumps
over the lazy dog ….” to conduct automated PD assessment. We extracted both
standard acoustic features and deep learning based embedding features from the
speech data and trained several machine learning algorithms on them. Our models
achieved 0.75 AUC by modeling the standard acoustic features through the
XGBoost model. We also provide explanation behind our model’s decision and show
that it is focusing mostly on the widely used MFCC features and a subset of
dysphonia features previously used for detecting PD from verbal phonation task.
Ultra Lightweight Image Super-Resolution with Multi-Attention Layers
Comments: ECCVW AIM2020
Subjects:
Image and Video Processing (eess.IV)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Lightweight image super-resolution (SR) networks have the utmost significance
for real-world applications. There are several deep learning based SR methods
with remarkable performance, but their memory and computational cost are
hindrances in practical usage. To tackle this problem, we propose a
Multi-Attentive Feature Fusion Super-Resolution Network (MAFFSRN). MAFFSRN
consists of proposed feature fusion groups (FFGs) that serve as a feature
extraction block. Each FFG contains a stack of proposed multi-attention blocks
(MAB) that are combined in a novel feature fusion structure. Further, the MAB
with a cost-efficient attention mechanism (CEA) helps us to refine and extract
the features using multiple attention mechanisms. The comprehensive experiments
show the superiority of our model over the existing state-of-the-art. We
participated in AIM 2020 efficient SR challenge with our MAFFSRN model and won
1st, 3rd, and 4th places in memory usage, floating-point operations (FLOPs) and
number of parameters, respectively.
Information Theory
Private Weighted Random Walk Stochastic Gradient Descent
Ghadir Ayache , Salim El Rouayheb Subjects : Information Theory (cs.IT) ; Machine Learning (cs.LG)
We consider a decentralized learning setting in which data is distributed
over nodes in a graph. The goal is to learn a global model on the distributed
data without involving any central entity that needs to be trusted. While
gossip-based stochastic gradient descent (SGD) can be used to achieve this
learning objective, it incurs high communication and computation costs, since
it has to wait for all the local models at all the nodes to converge. To speed
up the convergence, we propose instead to study random walk based SGD in which
a global model is updated based on a random walk on the graph. We propose two
algorithms based on two types of random walks that achieve, in a decentralized
way, uniform sampling and importance sampling of the data. We provide a
non-asymptotic analysis on the rate of convergence, taking into account the
constants related to the data and the graph. Our numerical results show that
the weighted random walk based algorithm has a better performance for
high-variance data. Moreover, we propose a privacy-preserving random walk
algorithm that achieves local differential privacy based on a Gamma noise
mechanism that we propose. We also give numerical results on the convergence of
this algorithm and show that it outperforms additive Laplace-based privacy
mechanisms.
Optimal Streaming of 360 VR Videos with Perfect, Imperfect and Unknown FoV Viewing Probabilities
Comments: 6 pages, 5 figures, to appear in GLOBECOM 2020
Subjects:
Information Theory (cs.IT)
In this paper, we investigate wireless streaming of multi-quality tiled 360
virtual reality (VR) videos from a multi-antenna server to multiple
single-antenna users in a multi-carrier system. To capture the impact of
field-of-view (FoV) prediction, we consider three cases of FoV viewing
probability distributions, i.e., perfect, imperfect and unknown FoV viewing
probability distributions, and use the average total utility, worst average
total utility and worst total utility as the respective performance metrics. We
adopt rate splitting with successive decoding for efficient transmission of
multiple sets of tiles of different 360 VR videos to their requesting users. In
each case, we optimize the encoding rates of the tiles, minimum encoding rates
of the FoVs, rates of the common and private messages and transmission
beamforming vectors to maximize the total utility. The problems in the three
cases are all challenging nonconvex optimization problems. We successfully
transform the problem in each case into a difference of convex (DC) programming
problem with a differentiable objective function, and obtain a suboptimal
solution using concave-convex procedure (CCCP). Finally, numerical results
demonstrate the proposed solutions achieve notable gains over existing schemes
in all three cases. To the best of our knowledge, this is the first work
revealing the impact of FoV prediction and its accuracy on the performance of
streaming of multi-quality tiled 360 VR videos.
A Design Framework for Epsilon-Private Data Disclosure
Comments: 16 pages, 2 figures
Subjects:
Information Theory (cs.IT)
In this paper, we study a stochastic disclosure control problem using
information-theoretic methods. The useful data to be disclosed depend on
private data that should be protected. Thus, we design a privacy mechanism to
produce new data which maximizes the disclosed information about the useful
data under a strong (chi^2)-privacy criterion. For sufficiently small leakage,
the privacy mechanism design problem can be geometrically studied in the space
of probability distributions by a local approximation of the mutual
information. By using methods from Euclidean information geometry, the original
highly challenging optimization problem can be reduced to a problem of finding
the principal right-singular vector of a matrix, which characterizes the
optimal privacy mechanism. In two extensions we first consider a noisy
disclosure channel and then we look for a mechanism which finds (U) based on
observing (X), maximizing the mutual information between (U) and (Y) while
satisfying the privacy criterion on (U) and (Z) under the Markov chain
((Z,Y)-X-U).
Comments: 14 pages, 5 figures, major revision, IEEE Transations on Multimedia. arXiv admin note: substantial text overlap with arXiv:2001.01906
Subjects:
Information Theory (cs.IT)
In this paper, we would like to investigate optimal wireless streaming of a
multi-quality tiled 360 virtual reality (VR) video from a server to multiple
users. To this end, we propose to maximally exploit potential multicast
opportunities by effectively utilizing characteristics of multi-quality tiled
360 VR videos and computation resources at the users’ side. In particular, we
consider two requirements for quality variation in one field-of-view (FoV),
i.e., the absolute smoothness requirement and the relative smoothness
requirement, and two video playback modes, i.e., the direct-playback mode
(without user transcoding) and transcode-playback mode (with user transcoding).
Besides natural multicast opportunities, we introduce two new types of
multicast opportunities, namely, relative smoothness-enabled multicast
opportunities, which allow flexible tradeoff between viewing quality and
communications resource consumption, and transcoding-enabled multicast
opportunities, which allow flexible tradeoff between computation and
communications resource consumptions. Then, we establish a novel mathematical
model that reflects the impacts of natural, relative smoothness-enabled and
transcoding-enabled multicast opportunities on the average transmission energy
and transcoding energy. Based on this model, we optimize the transmission
resource allocation, playback quality level selection and transmission quality
level selection to minimize the energy consumption in the four cases with
different requirements for quality variation and video playback modes. By
comparing the optimal values in the four cases, we prove that the energy
consumption reduces when more multicast opportunities can be utilized. Finally,
numerical results show substantial gains of the proposed solutions over
existing schemes, and demonstrate the importance of effective exploitation of
the three types of multicast opportunities.
On the Size of the Giant Component in Inhomogeneous Random K-out Graphs
Comments: To appear in 9th IEEE Conference on Decision and Control. arXiv admin note: substantial text overlap with arXiv:1911.05147
Subjects:
Information Theory (cs.IT)
; Probability (math.PR)
Inhomogeneous random K-out graphs were recently introduced to model
heterogeneous sensor networks secured by random pairwise key predistribution
schemes. First, each of the (n) nodes is classified as type-1 (respectively,
type-2) with probability (0<mu<1) (respectively, (1-mu)) independently from
each other. Next, each type-1 (respectively, type-2) node draws 1 arc towards a
node (respectively, (K_n) arcs towards (K_n) distinct nodes) selected uniformly
at random, and then the orientation of the arcs is ignored. It was recently
established that this graph, denoted by (mathbb{H}(n;mu,K_n)), is connected
with high probability (whp) if and only if (K_n=omega(1)). In other words, if
(K_n=O(1)), then (mathbb{H}(n;mu,K_n)) has a positive probability of being
{not} connected as (n) gets large. Here, we study the size of the largest
connected subgraph of (mathbb{H}(n;mu,K_n)) when (K_n = O(1)). We show that
the trivial condition of (K_n geq 2) for all (n) is sufficient to ensure that
inhomogeneous K-out graph has a connected component of size (n-O(1)) whp. Put
differently, even with (K_n =2), all but finitely many nodes will form a
connected sub-network in this model under any (0<mu<1). We present an upper
bound on the probability that more than (M) nodes are outside of the largest
component, and show that this decays as (O(1)exp{-M(1-mu)(K_n-1)} +o(1)).
Numerical results are presented to demonstrate the size of the largest
connected component when the number of nodes is finite.
Service Rate Region: A New Aspect of Coded Distributed System Design
Mehmet Aktas , Gauri Joshi , Swanand Kadhe , Fatemeh Kazemi , Emina Soljanin Subjects : Information Theory (cs.IT) ; Discrete Mathematics (cs.DM); Performance (cs.PF)
Erasure coding has been recently employed as a powerful method to mitigate
delays due to slow or straggling nodes in distributed systems. In this work, we
show that erasure coding of data objects can flexibly handle skews in the
request rates. Coding can help boost the service rate region, that is, increase
the overall volume of data access requests that can be handled by the system.
The goal of this paper is to postulate the service rate region as an important
consideration in the design of erasure coded distributed systems. We highlight
several open problems that can be grouped into two broad threads: 1)
characterizing the service rate region of a given code and finding the optimal
request allocation, and 2) designing the underlying erasure code for a given
service rate region. As contributions along the first thread, we characterize
the rate regions of maximum-distance-separable, locally repairable, and Simplex
codes. In terms of code design, we show the effectiveness of hybrid codes that
combine replication and erasure coding, and also discover fundamental
connections between multi-set batch codes and the problem of maximizing the
service rate region.
Secure Strong Coordination
Journal-ref: IEEE WPS 2020 – International Workshop on Privacy and Security for
Information Systems
Subjects:
Information Theory (cs.IT)
We consider a network of two nodes separated by a noisy channel, in which the
source and its reconstruction have to be strongly coordinated, while
simultaneously satisfying the strong secrecy condition with respect to an
outside observer of the noisy channel. In the case of non-causal encoding and
decoding, we propose a joint source-channel coding scheme for the secure strong
coordination region. Furthermore, we provide a complete characterization of the
secure strong coordination region when the decoder has to reliably reconstruct
the source sequence and the legitimate channel is more capable than the channel
of the eavesdropper.
Remote Joint Strong Coordination and Reliable Communication
Journal-ref: 2020 IEEE International Symposium on Information Theory (ISIT)
Subjects:
Information Theory (cs.IT)
We consider a three-node network, in which two agents wish to communicate
over a noisy channel, while controlling the distribution observed by a third
external agent. We use strong coordination to constrain the distribution, and
we provide a complete characterization of the “remote strong coordination and
reliable communication” region.
Smart Meter Data Privacy
Giulio Giaconi , Deniz Gunduz , H. Vincent Poor Subjects : Information Theory (cs.IT)
Smart grids (SGs) promise to deliver dramatic improvements compared to
traditional power grids thanks primarily to the large amount of data being
exchanged and processed within the grid, which enables the grid to be monitored
more accurately and at a much faster pace. The smart meter (SM) is one of the
key devices that enable the SG concept by monitoring a household’s electricity
consumption and reporting it to the utility provider (UP), i.e., the entity
that sells energy to customers, or to the distribution system operator (DSO),
i.e., the entity that operates and manages the grid, with high accuracy and at
a much faster pace compared to traditional meters. However, the very
availability of rich and high-frequency household electricity consumption data,
which enables a very efficient power grid management, also opens up
unprecedented challenges on data security and privacy. To counter these
threats, it is necessary to develop techniques that keep SM data private, and,
for this reason, SM privacy has become a very active research area. The aim of
this chapter is to provide an overview of the most significant
privacy-preserving techniques for SM data, highlighting their main benefits and
disadvantages.
Algebraic geometry codes and some applications
Comments: Survey chapter to appear in “A Concise Encyclopedia of Coding Theory”, W.C. Huffman, J.-L. Kim, and P. Sole’ Eds., CRC Press
Subjects:
Information Theory (cs.IT)
; Cryptography and Security (cs.CR); Algebraic Geometry (math.AG); Number Theory (math.NT)
This article surveys the development of the theory of algebraic geometry
codes since their discovery in the late 70’s. We summarize the major results on
various problems such as: asymptotic parameters, improved estimates on the
minimum distance, and decoding algorithms. In addition, we present various
modern applications of these codes such as public-key cryptography, algebraic
complexity theory, multiparty computation or distributed storage.
Comments: ARRW@ECCV2020
Subjects:
Machine Learning (stat.ML)
; Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)
Deep neural networks have been successful in diverse discriminative
classification tasks, although, they are poorly calibrated often assigning high
probability to misclassified predictions. Potential consequences could lead to
trustworthiness and accountability of the models when deployed in real
applications, where predictions are evaluated based on their confidence scores.
Existing solutions suggest the benefits attained by combining deep neural
networks and Bayesian inference to quantify uncertainty over the models’
predictions for ambiguous datapoints. In this work we propose to validate and
test the efficacy of likelihood based models in the task of out of distribution
detection (OoD). Across different datasets and metrics we show that Bayesian
deep learning models on certain occasions marginally outperform conventional
neural networks and in the event of minimal overlap between in/out distribution
classes, even the best models exhibit a reduction in AUC scores in detecting
OoD data. Preliminary investigations indicate the potential inherent role of
bias due to choices of initialisation, architecture or activation functions. We
hypothesise that the sensitivity of neural networks to unseen inputs could be a
multi-factor phenomenon arising from the different architectural design choices
often amplified by the curse of dimensionality. Furthermore, we perform a study
to find the effect of the adversarial noise resistance methods on in and
out-of-distribution performance, as well as, also investigate adversarial noise
robustness of Bayesian deep learners.
Action and Perception as Divergence Minimization
Comments: 13 pages, 10 figures
Subjects:
Artificial Intelligence (cs.AI)
; Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)
We introduce a unified objective for action and perception of intelligent
agents. Extending representation learning and control, we minimize the joint
divergence between the world and a target distribution. Intuitively, such
agents use perception to align their beliefs with the world, and use actions to
align the world with their beliefs. Minimizing the joint divergence to an
expressive target maximizes the mutual information between the agent’s
representations and inputs, thus inferring representations that are informative
of past inputs and exploring future inputs that are informative of the
representations. This lets us derive intrinsic objectives, such as
representation learning, information gain, empowerment, and skill discovery
from minimal assumptions. Moreover, interpreting the target distribution as a
latent variable model suggests expressive world models as a path toward highly
adaptive agents that seek large niches in their environments, while rendering
task rewards optional. The presented framework provides a common language for
comparing a wide range of objectives, facilitates understanding of latent
variables for decision making, and offers a recipe for designing novel
objectives. We recommend deriving future agent objectives from the joint
divergence to facilitate comparison, to point out the agent’s target
distribution, and to identify the intrinsic objective terms needed to reach
that distribution.
End-to-End Learning of Neuromorphic Wireless Systems for Low-Power Edge Artificial Intelligence
Comments: To be presented at Asilomar 2020
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
This paper introduces a novel “all-spike” low-power solution for remote
wireless inference that is based on neuromorphic sensing, Impulse Radio (IR),
and Spiking Neural Networks (SNNs). In the proposed system, event-driven
neuromorphic sensors produce asynchronous time-encoded data streams that are
encoded by an SNN, whose output spiking signals are pulse modulated via IR and
transmitted over general frequence-selective channels; while the receiver’s
inputs are obtained via hard detection of the received signals and fed to an
SNN for classification. We introduce an end-to-end training procedure that
treats the cascade of encoder, channel, and decoder as a probabilistic
SNN-based autoencoder that implements Joint Source-Channel Coding (JSCC). The
proposed system, termed NeuroJSCC, is compared to conventional synchronous
frame-based and uncoded transmissions in terms of latency and accuracy. The
experiments confirm that the proposed end-to-end neuromorphic edge architecture
provides a promising framework for efficient and low-latency remote sensing,
communication, and inference.
Error estimate for a universal function approximator of ReLU network with a local connection
Jae-Mo Kang , Sunghwan Moon Subjects : Machine Learning (cs.LG) ; Information Theory (cs.IT); Machine Learning (stat.ML)
Neural networks have shown high successful performance in a wide range of
tasks, but further studies are needed to improve its performance. We analyze
the approximation error of the specific neural network architecture with a
local connection and higher application than one with the full connection
because the local-connected network can be used to explain diverse neural
networks such as CNNs. Our error estimate depends on two parameters: one
controlling the depth of the hidden layer, and the other, the width of the
hidden layers.
Zuckerli: A New Compressed Representation for Graphs
Luca Versari , Iulia M. Comsa , Alessio Conte , Roberto Grossi Subjects : Data Structures and Algorithms (cs.DS) ; Information Theory (cs.IT)
Zuckerli is a scalable compression system meant for large real-world graphs.
Graphs are notoriously challenging structures to store efficiently due to their
linked nature, which makes it hard to separate them into smaller, compact
components. Therefore, effective compression is crucial when dealing with large
graphs, which can have billions of nodes and edges. Furthermore, a good
compression system should give the user fast and reasonably flexible access to
parts of the compressed data without requiring full decompression, which may be
unfeasible on their system. Zuckerli improves multiple aspects of WebGraph, the
current state-of-the-art in compressing real-world graphs, by using advanced
compression techniques and novel heuristic graph algorithms. It can produce
both a compressed representation for storage and one which allows fast direct
access to the adjacency lists of the compressed graph without decompressing the
entire graph. We validate the effectiveness of Zuckerli on real-world graphs
with up to a billion nodes and 90 billion edges, conducting an extensive
experimental evaluation of both compression density and decompression
performance. We show that Zuckerli-compressed graphs are 10% to 29% smaller,
and more than 20% in most cases, with a resource usage for decompression
comparable to that of WebGraph.
Quantum stabilizer codes, lattices, and CFTs
Comments: 99 pages
Subjects:
High Energy Physics – Theory (hep-th)
; Information Theory (cs.IT); Combinatorics (math.CO); Quantum Physics (quant-ph)
There is a rich connection between classical error-correcting codes,
Euclidean lattices, and chiral conformal field theories. Here we show that
quantum error-correcting codes, those of the stabilizer type, are related to
Lorentzian lattices and non-chiral CFTs. More specifically, real self-dual
stabilizer codes can be associated with even self-dual Lorentzian lattices, and
thus define Narain CFTs. We dub the resulting theories code CFTs and study
their properties. T-duality transformations of a code CFT, at the level of the
underlying code, reduce to code equivalences. By means of such equivalences,
any stabilizer code can be reduced to a graph code. We can therefore represent
code CFTs by graphs. We study code CFTs with small central charge (c=nleq 12),
and find many interesting examples. Among them is a non-chiral (E_8) theory,
which is based on the root lattice of (E_8) understood as an even self-dual
Lorentzian lattice. By analyzing all graphs with (nleq 8) nodes we find many
pairs and triples of physically distinct isospectral theories. We also
construct numerous modular invariant functions satisfying all the basic
properties expected of the CFT partition function, yet which are not partition
functions of any known CFTs. We consider the ensemble average over all code
theories, calculate the corresponding partition function, and discuss its
possible holographic interpretation. The paper is written in a self-contained
manner, and includes an extensive pedagogical introduction and many explicit
examples.
欢迎加入我爱机器学习QQ14群:336582044
微信扫一扫,关注我爱机器学习公众号
微博:我爱机器学习
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK