16

精选15篇Prompt Learning前沿工作

 2 years ago
source link: https://mp.weixin.qq.com/s?__biz=MjM5ODkzMzMwMQ%3D%3D&%3Bmid=2650429675&%3Bidx=4&%3Bsn=011ec8e1aa99e12c99797142b7034be9
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
640?wx_fmt=jpeg

1. 面向视觉语言模型的条件提示学习

Title: Conditional Prompt Learning for Vision-Language Models

Published: 2022-03-10

Url: http://arxiv.org/abs/2203.05557v1

Authors: Kaiyang Zhou,Jingkang Yang,Chen Change Loy,Ziwei Liu

随着CLIP等功能强大的预训练视觉语言模型的兴起,研究如何使这些模型适应下游数据集变得至关重要。最近提出的一种称为上下文优化(Context Optimization,CoOp)的方法将快速学习的概念引入视觉领域,以适应预先训练好的视觉语言模型。具体来说,CoOp将提示中的上下文单词转化为一组可学习的向量,并且只需少量标记的图像进行学习,就可以在过度调整手动提示的情况下实现巨大的改进。在我们的研究中,我们发现了合作的一个关键问题:在同一个数据集中,学习的背景不能推广到更广泛的看不见的类别,这表明合作超过了培训过程中观察到的基本类别。为了解决这个问题,我们提出了条件上下文优化(CoCoOp),它通过进一步学习一个轻量级神经网络来扩展CoOp,从而为每个图像生成一个输入条件标记(向量)。与CoOp的静态提示相比,我们的动态提示适应每个实例,因此对类转移不太敏感。大量的实验表明,cocoop比CoOp更适合于不可见的类,甚至表现出超越单个数据集的良好可移植性;并且产生更强的领域泛化性能。代码是可用的athttps://github.com/KaiyangZhou/CoOp.

With the rise of powerful pre-trained vision-language models like CLIP, itbecomes essential to investigate ways to adapt these models to downstreamdatasets. A recently proposed method named Context Optimization (CoOp)introduces the concept of prompt learning -- a recent trend in NLP -- to thevision domain for adapting pre-trained vision-language models. Specifically,CoOp turns context words in a prompt into a set of learnable vectors and, withonly a few labeled images for learning, can achieve huge improvements overintensively-tuned manual prompts. In our study we identify a critical problemof CoOp: the learned context is not generalizable to wider unseen classeswithin the same dataset, suggesting that CoOp overfits base classes observedduring training. To address the problem, we propose Conditional ContextOptimization (CoCoOp), which extends CoOp by further learning a lightweightneural network to generate for each image an input-conditional token (vector).Compared to CoOp's static prompts, our dynamic prompts adapt to each instanceand are thus less sensitive to class shift. Extensive experiments show thatCoCoOp generalizes much better than CoOp to unseen classes, even showingpromising transferability beyond a single dataset; and yields stronger domaingeneralization performance as well. Code is available athttps://github.com/KaiyangZhou/CoOp.

2. 少样本对话状态跟踪的提示学习

Title: Prompt Learning for Few-Shot Dialogue State Tracking

Published: 2022-02-25

Url: http://arxiv.org/abs/2201.05780v2

Authors: Yuting Yang,Wenqiang Lei,Pei Huang,Juan Cao,Jintao Li,Tat-Seng Chua

收集对话状态标签、时隙和值,用于学习对话状态跟踪(DST)模型可能成本高昂,尤其是随着对话系统在新兴领域的广泛应用。在本文中,我们重点研究如何在有限的标记数据下有效地学习DST模型。我们设计了一个用于少样本DST的快速学习框架,该框架由两个主要部分组成:基于值的提示和反向提示机制。该框架旨在利用预先训练好的语言模型(PLM)的语言理解和生成能力。首先,我们设计了基于值的提示函数来探测PLM中与ST相关的知识,它不依赖于已知的批次本体。此外,利用反向提示机制对“提示”知识进行自检,帮助PLM进一步理解DST任务的本质。实验表明,我们的模型能够生成看不见的时隙,并且优于现有的最先进的少样本方法。这表明,DST相关知识可以从PLM中探索出来,并在即时学习的帮助下有效地解决低资源DST问题。

Collecting dialogue state labels, slots and values, for learning dialoguestate tracking (DST) models can be costly, especially with the wide applicationof dialogue systems in new-rising domains. In this paper, we focus on how tolearn a DST model efficiently with limited labeled data. We design a promptlearning framework for few-shot DST, which consists of two main components:value-based prompt and inverse prompt mechanism. This framework aims to utilizethe language understanding and generation ability of pre-trained languagemodels (PLM). First, we design value-based prompt functions to probe theDST-related knowledge from PLM, which do not rely on the known ontology ofslots. Further, an inverse prompt mechanism is utilized to self-check the"prompted" knowledge and help the PLM understand the essence of DST taskfurther. Experiments show that our model can generate unseen slots andoutperforms existing state-of-the-art few-shot methods. It indicates thatDST-related knowledge can be probed from PLM and utilized to addresslow-resource DST efficiently with the help of prompt learning.

3. 短文本分类的提示学习

Title: Prompt-Learning for Short Text Classification

Published: 2022-02-23

Url: http://arxiv.org/abs/2202.11345v1

Authors: Yi Zhu,Xinke Zhou,Jipeng Qiang,Yun Li,Yunhao Yuan,Xindong Wu

在短文本中,极短的文本长度、特征稀疏性和高度模糊性对分类任务构成了巨大挑战。近年来,快速学习作为一种针对特定下游任务调整预先训练好的语言模型的有效方法,吸引了大量的关注和研究。提示学习背后的主要直觉是在输入中插入模板,并将文本分类任务转换为等效的完形填空式任务。然而,大多数快速学习方法都是手工扩展标签词,或者只考虑完形填空预测中知识的类名,这将不可避免地导致分类任务中的遗漏和偏差。在本文中,我们提出了一种简单的短文本分类方法,该方法利用基于知识扩展的快速学习,在扩展标签词空间时可以同时考虑短文本本身和类名。具体来说,与短文本中实体相关的顶级概念是从开放知识图(如Probase)中检索的,我们通过计算所选概念和类标签之间的距离来进一步细化扩展的标签词。实验结果表明,与其他微调、快速学习和知识快速调整方法相比,我们的方法得到了明显的改进,在三个已知数据集上的精度高达6个百分点。

In the short text, the extreme short length, feature sparsity and highambiguity pose huge challenge to classification tasks. Recently, as aneffective method for tuning Pre-trained Language Models for specific downstreamtasks, prompt-learning has attracted vast amount of attention and research. Themain intuition behind the prompt-learning is to insert template into the inputand convert the text classification tasks into equivalent cloze-style tasks.However, most prompt-learning methods expand label words manually or onlyconsider the class name for knowledge incorporating in cloze-style prediction,which will inevitably incurred omissions and bias in classification tasks. Inthis paper, we propose a simple short text classification approach that makesuse of prompt-learning based on knowledgeable expansion, which can considerboth the short text itself and class name during expanding label words space.Specifically, the top $N$ concepts related to the entity in short text areretrieved from the open Knowledge Graph like Probase, and we further refine theexpanded label words by the distance calculation between selected concepts andclass label. Experimental results show that our approach obtains obviousimprovement compared with other fine-tuning, prompt-learning and knowledgeableprompt-tuning methods, outperforming the state-of-the-art by up to 6 Accuracypoints on three well-known datasets.

4. LFPT5:基于T5 Prompt Tuning的终身少样本语言学习统一框架

Title: LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based on Prompt Tuning of T5

Published: 2022-02-17

Url: http://arxiv.org/abs/2110.07298v2

Authors: Chengwei Qin,Shafiq Joty

现有的终身语言学习方法依赖大量的标签数据来学习一项新任务,这在大多数真实场景中很难获得。考虑到人类可以不断地从几个例子中学习新的任务,我们期望这些模型也能够很好地概括新的几次任务,而不会忘记以前的任务。在这项工作中,我们将这一更具挑战性但又实用的问题定义为终身少数语言学习(LFLL),并基于T5的快速调整提出了一个统一的框架。我们的LFPT5框架充分利用了PT强大的学习能力,同时将模型训练为任务求解器和adata生成器。在学习同一任务类型的新领域之前,LFPT5生成先前学习领域的伪(标记)样本,然后在学习新领域时对这些样本进行训练,以减轻对先前知识的遗忘。此外,KL发散损失最小化,以实现前一个模型和当前模型之间的标签一致性。在适应新任务类型的同时,LFPT5还包括并调整新任务的其他提示。通过大量实验,我们证明了LFPT5可以应用于各种不同类型的任务,并且在不同的LFLL设置下显著优于以前的方法。

Existing approaches to lifelong language learning rely on plenty of labeleddata for learning a new task, which is hard to obtain in most real scenarios.Considering that humans can continually learn new tasks from a handful ofexamples, we expect the models also to be able to generalize well on newfew-shot tasks without forgetting the previous ones. In this work, we definethis more challenging yet practical problem as Lifelong Few-shot LanguageLearning (LFLL) and propose a unified framework for it based on prompt tuningof T5. Our framework called LFPT5 takes full advantage of PT's strong few-shotlearning ability, and simultaneously trains the model as a task solver and adata generator. Before learning a new domain of the same task type, LFPT5generates pseudo (labeled) samples of previously learned domains, and latergets trained on those samples to alleviate forgetting of previous knowledge asit learns the new domain. In addition, a KL divergence loss is minimized toachieve label consistency between the previous and the current model. Whileadapting to a new task type, LFPT5 includes and tunes additional promptembeddings for the new task. With extensive experiments, we demonstrate thatLFPT5 can be applied to various different types of tasks and significantlyoutperform previous methods in different LFLL settings.

5. 通过提示学习进行领域适应

Title: Domain Adaptation via Prompt Learning

Published: 2022-02-14

Url: http://arxiv.org/abs/2202.06687v1

Authors: Chunjiang Ge,Rui Huang,Mixue Xie,Zihang Lai,Shiji Song,Shuang Li,Gao Huang

无监督域自适应(Unsupervised domain adaption,UDA)旨在将从带注释的源域学习到的模型自适应到只给出未标记样本的目标域。当前的UDA方法通过对齐源和目标特征空间来学习域不变特征。这种一致性是由统计差异最小化或对抗性训练等约束条件强加的。然而,这些语义约束会导致类的可分辨性丧失。在本文中,我们介绍了一种新的UDA快速学习范式,即通过快速学习的领域适应(DAPL)。与之前的工作相比,我们的方法使用预先训练的视觉语言模型,只优化了很少的参数。其主要思想是将领域信息嵌入到提示中,这是一种从自然语言生成的表示形式,然后用于执行分类。该域信息仅由来自同一域的图像共享,因此根据每个域动态调整分类器。通过采用这种范例,我们表明,我们的模型不仅在几种跨领域基准测试上优于以前的方法,而且训练效率高,易于实现。

Unsupervised domain adaption (UDA) aims to adapt models learned from awell-annotated source domain to a target domain, where only unlabeled samplesare given. Current UDA approaches learn domain-invariant features by aligningsource and target feature spaces. Such alignments are imposed by constraintssuch as statistical discrepancy minimization or adversarial training. However,these constraints could lead to the distortion of semantic feature structuresand loss of class discriminability. In this paper, we introduce a novel promptlearning paradigm for UDA, named Domain Adaptation via Prompt Learning (DAPL).In contrast to prior works, our approach makes use of pre-trainedvision-language models and optimizes only very few parameters. The main idea isto embed domain information into prompts, a form of representations generatedfrom natural language, which is then used to perform classification. Thisdomain information is shared only by images from the same domain, therebydynamically adapting the classifier according to each domain. By adopting thisparadigm, we show that our model not only outperforms previous methods onseveral cross-domain benchmarks but also is very efficient to train and easy toimplement.

6. 学习提示视觉语言模型

Title: Learning to Prompt for Vision-Language Models

Published: 2022-02-06

Url: http://arxiv.org/abs/2109.01134v3

Authors: Kaiyang Zhou,Jingkang Yang,Chen Change Loy,Ziwei Liu

像CLIP这样预先训练好的大型视觉语言模型在学习表征方面显示出了巨大的潜力,这些表征可以在广泛的下游任务中转移。与主要基于离散化标签的传统表征学习不同,视觉语言预训练将图像和文本对齐在一个公共特征空间中,允许通过提示将零镜头转移到任何下游任务,即。,分类权重是从描述感兴趣的类的自然语言合成的。在这项工作中,我们展示了在实践中部署此类模型的一个主要挑战是PrompEngineering,这需要领域专业知识,而且非常耗时——人们需要花费大量时间来调整单词,因为措辞的细微变化可能会对性能产生巨大影响。受自然语言处理(NLP)中快速学习研究的最新进展启发,我们提出了一种简单的方法,专门用于下游图像识别的剪辑式视觉语言模型。具体来说,CoOp使用可学习向量对提示的上下文词进行建模,同时保持整个预先训练的参数不变。为了处理不同的图像识别任务,我们提供了两种CoOp实现:统一上下文和特定于类的上下文。通过对11个数据集的广泛实验,我们证明,CoOp只需要一到两个镜头,就能以相当大的优势击败手工制作的提示,并且在使用更多镜头时能够获得显著的改善,例如,16个镜头的平均增益约为15%(最高可达45%)。尽管是一种基于学习的方法,但与使用手工提示的zero shot model相比,Coopachives具有出色的领域泛化性能。

Large pre-trained vision-language models like CLIP have shown great potentialin learning representations that are transferable across a wide range ofdownstream tasks. Different from the traditional representation learning thatis based mostly on discretized labels, vision-language pre-training alignsimages and texts in a common feature space, which allows zero-shot transfer toany downstream task via \emph{prompting}, i.e., classification weights aresynthesized from natural language describing classes of interest. In this work,we show that a major challenge for deploying such models in practice is promptengineering, which requires domain expertise and is extremely time-consuming --one needs to spend a significant amount of time on words tuning since a slightchange in wording could have a huge impact on performance. Inspired by recentadvances in prompt learning research in natural language processing (NLP), wepropose \emph{Context Optimization (CoOp)}, a simple approach specifically foradapting CLIP-like vision-language models for downstream image recognition.Concretely, CoOp models a prompt's context words with learnable vectors whilethe entire pre-trained parameters are kept fixed. To handle different imagerecognition tasks, we provide two implementations of CoOp: unified context andclass-specific context. Through extensive experiments on 11 datasets, wedemonstrate that CoOp requires as few as one or two shots to beat hand-craftedprompts with a decent margin and is able to gain significant improvements whenusing more shots, e.g., with 16 shots the average gain is around 15\% (with thehighest reaching over 45\%). Despite being a learning-based approach, CoOpachieves superb domain generalization performance compared with the zero-shotmodel using hand-crafted prompts.

7. 联合训练来提高大型语言模型的提示学习能力

Title: Co-training Improves Prompt-based Learning for Large Language Models

Published: 2022-02-02

Url: http://arxiv.org/abs/2202.00828v1

Authors: Hunter Lang,Monica Agrawal,Yoon Kim,David Sontag

我们证明了联合训练(Blum&Mitchell,1998)可以通过使用未标记的数据来提高即时学习的绩效。虽然Prompting已经成为少样本和零镜头学习的一个有前途的范例,但它非常脆弱,需要比标准监督设置大得多的模型。我们发现,协同训练可以改进原始的提示模型,同时学习更小的、针对下游任务的模型。在我们只能部分访问promptmodel的情况下(例如,GPT-3的输出概率(Brown et al.,2020)),我们通过prompt输出学习校准模型。当我们可以完全访问prompt模型的梯度,但完全微调仍然非常昂贵(例如T0(Sanh等人,2021)),我们学习一组软提示连续向量来迭代更新prompt模型。我们发现,以这种方式训练的模型可以显著提高具有挑战性的数据集的性能,在这些数据集上,基于提示的学习和完全监督的模型之间存在很大差距。

We demonstrate that co-training (Blum & Mitchell, 1998) can improve theperformance of prompt-based learning by using unlabeled data. While promptinghas emerged as a promising paradigm for few-shot and zero-shot learning, it isoften brittle and requires much larger models compared to the standardsupervised setup. We find that co-training makes it possible to improve theoriginal prompt model and at the same time learn a smaller, downstreamtask-specific model. In the case where we only have partial access to a promptmodel (e.g., output probabilities from GPT-3 (Brown et al., 2020)) we learn acalibration model over the prompt outputs. When we have full access to theprompt model's gradients but full finetuning remains prohibitively expensive(e.g., T0 (Sanh et al., 2021)), we learn a set of soft prompt continuousvectors to iteratively update the prompt model. We find that models trained inthis manner can significantly improve performance on challenging datasets wherethere is currently a large gap between prompt-based learning andfully-supervised models.

8. PADA:基于示例的提示学习,用于动态适应未知领域

Title: PADA: Example-based Prompt Learning for on-the-fly Adaptation to Unseen  Domains

Published: 2022-01-27

Url: http://arxiv.org/abs/2102.12206v4

Authors: Eyal Ben-David,Nadav Oved,Roi Reichart

自然语言处理算法已经取得了令人难以置信的进步,但当应用到分布外的例子时,它们仍然很困难。我们解决了这个领域适应问题的一个挑战性和未充分探索的版本,算法在几个源领域上进行训练,然后应用于训练时未知的未知领域的例子。特别是,在训练时,算法可以使用没有标记或未标记的样本,或关于目标域的任何其他知识。基于T5语言模型,我们提出了PADA:一种基于示例的自回归快速学习算法,用于flyAny域自适应。给定一个测试示例,PADA首先为其生成一个唯一的提示,然后根据该提示,标记与NLP预测任务相关的示例。PADA被训练生成一个提示,该提示是一个长度不受限制的令牌序列,由描述每个源域的域相关特征(DRF)组成。直观地说,生成的提示是一个唯一的签名,它将测试示例映射到源域所跨越的语义空间。在3项任务(文本分类和序列标记)的实验中,对于总共14个多源适应场景,PADA显著优于强大的基线。

Natural Language Processing algorithms have made incredible progress, butthey still struggle when applied to out-of-distribution examples. We address achallenging and underexplored version of this domain adaptation problem, wherean algorithm is trained on several source domains, and then applied to examplesfrom unseen domains that are unknown at training time. Particularly, noexamples, labeled or unlabeled, or any other knowledge about the target domainare available to the algorithm at training time. We present PADA: Anexample-based autoregressive Prompt learning algorithm for on-the-flyAny-Domain Adaptation, based on the T5 language model. Given a test example,PADA first generates a unique prompt for it and then, conditioned on thisprompt, labels the example with respect to the NLP prediction task. PADA istrained to generate a prompt which is a token sequence of unrestricted length,consisting of Domain Related Features (DRFs) that characterize each of thesource domains. Intuitively, the generated prompt is a unique signature thatmaps the test example to a semantic space spanned by the source domains. Inexperiments with 3 tasks (text classification and sequence tagging), for atotal of 14 multi-source adaptation scenarios, PADA substantially outperformsstrong baselines.

9. 面向少样本学习的本体增强Prompt-tuning

Title: Ontology-enhanced Prompt-tuning for Few-shot Learning

Published: 2022-01-27

Url: http://arxiv.org/abs/2201.11332v1

Authors: Hongbin Ye,Ningyu Zhang,Shumin Deng,Xiang Chen,Hui Chen,Feiyu Xiong,Xi Chen,Huajun Chen

少数镜头学习(FSL)旨在基于有限数量的样本进行预测。结构化数据(如知识图和本体库)已被用于各种任务中的少数镜头设置。然而,现有方法所采用的先验知识存在挑战性知识缺失、知识噪声和知识异质性等问题,阻碍了少数镜头学习的性能。在这项研究中,我们通过预先训练的语言模型和proposeontology enhanced prompt tuning(OntoPrompt),探索FSL的知识注入。具体来说,我们开发了基于外部知识图的生物学转换来解决知识缺失问题,实现了结构知识到文本的转换。我们进一步通过visiblematrix引入跨域敏感知识注入,以选择信息性知识来处理知识噪声问题。为了弥补知识和文本之间的差距,我们提出了一种集体训练算法来联合优化表示。我们使用八个数据集,在三个任务中评估ProposedOnPrompt,包括关系提取、事件提取和知识图完成。实验结果表明,我们的方法可以获得比基线更好的少样本性能。

Few-shot Learning (FSL) is aimed to make predictions based on a limitednumber of samples. Structured data such as knowledge graphs and ontologylibraries has been leveraged to benefit the few-shot setting in various tasks.However, the priors adopted by the existing methods suffer from challengingknowledge missing, knowledge noise, and knowledge heterogeneity, which hinderthe performance for few-shot learning. In this study, we explore knowledgeinjection for FSL with pre-trained language models and proposeontology-enhanced prompt-tuning (OntoPrompt). Specifically, we develop theontology transformation based on the external knowledge graph to address theknowledge missing issue, which fulfills and converts structure knowledge totext. We further introduce span-sensitive knowledge injection via a visiblematrix to select informative knowledge to handle the knowledge noise issue. Tobridge the gap between knowledge and text, we propose a collective trainingalgorithm to optimize representations jointly. We evaluate our proposedOntoPrompt in three tasks, including relation extraction, event extraction, andknowledge graph completion, with eight datasets. Experimental resultsdemonstrate that our approach can obtain better few-shot performance thanbaselines.

10. 学习编写图像情感分类的多种提示

Title: Learning to Compose Diversified Prompts for Image Emotion Classification

Published: 2022-01-26

Url: http://arxiv.org/abs/2201.10963v1

Authors: Sinuo Deng,Lifang Wu,Ge Shi,Lehao Xing,Meng Jian

对比语言图像预训练(CLIP)代表了预训练视觉语言模型的最新发展。尽管CLIP最近在视觉问答等一系列下游视觉语言任务上显示出了其优越的能力,但它在图像情感分类(IEC)方面的研究仍然不足。使CLIP适应IEC任务有三个重大挑战,即预训练和IEC之间的巨大训练目标差距,所有情况下共享次优和不变的提示。在本文中,我们提出了一个通用框架,展示了如何将CLIP有效地应用于电子商务。我们首先介绍一种快速调整方法,该方法模仿CLIP的pretrainingobjective,因此可以利用CLIP中包含的丰富图像和文本语义。然后,我们根据实例的类别和图像内容自动合成特定于实例的提示,使提示多样化,避免出现次优问题。对六个广泛使用的情感数据集的评估表明,在IEC任务中,我们提出的方法在很大程度上优于最先进的方法(即,高达9.29%的准确率gainon-EmotionROI数据集),只需训练少量参数。我们的代码将公开用于研究目的。

Contrastive Language-Image Pre-training (CLIP) represents the latestincarnation of pre-trained vision-language models. Although CLIP has recentlyshown its superior power on a wide range of downstream vision-language taskslike Visual Question Answering, it is still underexplored for Image EmotionClassification (IEC). Adapting CLIP to the IEC task has three significantchallenges, tremendous training objective gap between pretraining and IEC,shared suboptimal and invariant prompts for all instances. In this paper, wepropose a general framework that shows how CLIP can be effectively applied toIEC. We first introduce a prompt tuning method that mimics the pretrainingobjective of CLIP and thus can leverage the rich image and text semanticsentailed in CLIP. Then we automatically compose instance-specific prompts byconditioning them on the categories and image contents of instances,diversifying prompts and avoiding suboptimal problems. Evaluations on sixwidely-used affective datasets demonstrate that our proposed method outperformsthe state-of-the-art methods to a large margin (i.e., up to 9.29% accuracy gainon EmotionROI dataset) on IEC tasks, with only a few parameters trained. Ourcodes will be publicly available for research purposes.

11. 语境调整:学习自然语言生成的语境化提示

Title: Context-Tuning: Learning Contextualized Prompts for Natural Language  Generation

Published: 2022-01-21

Url: http://arxiv.org/abs/2201.08670v1

Authors: Tianyi Tang,Junyi Li,Wayne Xin Zhao

最近,预训练语言模型(PLM)在语言生成方面取得了非凡的成功。为了利用PLMs编码的丰富知识,simpleyet的一个强大机制是使用离散标记或连续嵌入形式的提示。在现有研究中,手动提示耗时且需要领域专家,而连续提示通常独立于输入。为了解决这个问题,我们提出了一种新的连续提示方法,称为上下文调整,用于微调PLM以生成自然语言。首先,根据输入文本导出提示,以便它们可以从PLM中获取有用的知识,以便生成。我们将此类提示称为语境化提示。其次,为了进一步增强生成的文本与输入的相关性,我们利用连续反向提示,通过建模从输出到输入的反向生成过程来细化自然语言生成过程。此外,我们提出了一种轻量级的contexttuning,只微调0.4%的参数,同时保持良好的性能。

Recently, pretrained language models (PLMs) have made exceptional success inlanguage generation. To leverage the rich knowledge encoded by PLMs, a simpleyet powerful mechanism is to use prompts, in the form of either discrete tokensor continuous embeddings. In existing studies, manual prompts aretime-consuming and require domain expertise, while continuous prompts aretypically independent of the inputs. To address this issue, we propose a novelcontinuous prompting approach, called Context-Tuning, to fine-tuning PLMs fornatural language generation. Firstly, the prompts are derived based on theinput text, so that they can elicit useful knowledge from PLMs for generation.We refer to such prompts as contextualized prompts. Secondly, to furtherenhance the relevance of the generated text to the inputs, we utilizecontinuous inverse prompting to refine the process of natural languagegeneration by modeling an inverse generation process from output to input.Moreover, we propose a lightweight contexttuning, fine-tuning only 0.4% ofparameters while retaining well performance.

12. 针对训练的语言模型的黑箱提示学习

Title: Black-box Prompt Learning for Pre-trained Language Models

Published: 2022-01-21

Url: http://arxiv.org/abs/2201.08531v1

Authors: Shizhe Diao,Xuechun Li,Yong Lin,Zhichao Huang,Tong Zhang

近年来,针对大型预训练模型的特定领域微调策略受到了广泛关注。在之前研究的设置中,模型架构和参数是可调的,或者至少是可见的,我们称之为白盒设置。这项工作考虑了一个新的场景,在这个场景中,除了给定输入的输出,我们无法访问预先训练的模型,我们称这个问题为黑盒微调。为了说明我们的方法,我们首先在文本分类中正式引入黑盒设置,其中预训练的模型不仅是冻结的,而且是不可见的。然后我们提出了我们的解决方案黑盒提示,这是提示学习家族中的一种新技术,它可以利用预先训练的模型从预先训练的语料库中学习到的知识。我们的实验表明,该方法在八个数据集上达到了最先进的性能。对不同的人工设计目标、提示长度和直观解释的进一步分析证明了我们方法的鲁棒性和灵活性。

Domain-specific fine-tuning strategies for large pre-trained models receivedvast attention in recent years. In previously studied settings, the modelarchitectures and parameters are tunable or at least visible, which we refer toas white-box settings. This work considers a new scenario, where we do not haveaccess to a pre-trained model, except for its outputs given inputs, and we callthis problem black-box fine-tuning. To illustrate our approach, we firstintroduce the black-box setting formally on text classification, where thepre-trained model is not only frozen but also invisible. We then propose oursolution black-box prompt, a new technique in the prompt-learning family, whichcan leverage the knowledge learned by pre-trained models from the pre-trainingcorpus. Our experiments demonstrate that the proposed method achieved thestate-of-the-art performance on eight datasets. Further analyses on differenthuman-designed objectives, prompt lengths, and intuitive explanationsdemonstrate the robustness and flexibility of our method.

13. 提示学习:学习边缘网络应用程序的动态资源分配策略

Title: PROMPT: Learning Dynamic Resource Allocation Policies for Edge-Network  Applications

Published: 2022-01-19

Url: http://arxiv.org/abs/2201.07916v1

Authors: Drew Penney,Bin Li,Jaroslaw Sydir,Charlie Tai,Eoin Walsh,Thomas Long,Stefan Lee,Lizhong Chen

越来越多的服务提供商正在探索提高服务器利用率、降低功耗和降低总体拥有成本的方法,通过将高优先级延迟关键型工作负载与最轻松的工作负载共同调度。这种做法要求在工作负载之间进行严格的资源分配,以减少资源争用并维持服务质量(QoS)保证。之前的资源分配工作已被证明可以在理想情况下提高服务器利用率,但往往会损害QoS保证,或无法在更动态的操作环境中找到有效的资源分配。此外,之前的工作基本上依赖于QoS测量,实际上,QoS测量可能会出现显著的瞬态波动,因此无法可靠地实现稳定的控制行为。本文提出了一种基于主动QoS预测的动态资源分配框架。这些预测有助于指导基于强化学习的资源控制器实现最佳资源分配,同时避免因工作负载需求波动而导致短暂的QoS违规。评估表明,与之前的工作相比,所提出的方法减少了4.3倍的QoS违规,将QoS违规的严重性降低了3.7倍,提高了尽力而为的工作负载性能,并提高了总体能效。

A growing number of service providers are exploring methods to improve serverutilization, reduce power consumption, and reduce total cost of ownership byco-scheduling high-priority latency-critical workloads with best-effortworkloads. This practice requires strict resource allocation between workloadsto reduce resource contention and maintain Quality of Service (QoS) guarantees.Prior resource allocation works have been shown to improve server utilizationunder ideal circumstances, yet often compromise QoS guarantees or fail to findvalid resource allocations in more dynamic operating environments. Further,prior works are fundamentally reliant upon QoS measurements that can, inpractice, exhibit significant transient fluctuations, thus stable controlbehavior cannot be reliably achieved. In this paper, we propose a novelframework for dynamic resource allocation based on proactive QoS prediction.These predictions help guide a reinforcement-learning-based resource controllertowards optimal resource allocations while avoiding transient QoS violationsdue to fluctuating workload demands. Evaluation shows that the proposed methodincurs 4.3x fewer QoS violations, reduces severity of QoS violations by 3.7x,improves best-effort workload performance, and improves overall powerefficiency compared with prior work.

14. 实例感知快速学习促进语言理解和生成

Title: Instance-aware Prompt Learning for Language Understanding and Generation

Published: 2022-01-18

Url: http://arxiv.org/abs/2201.07126v1

Authors: Feihu Jin,Jinliang Lu,Jiajun Zhang,Chengqing Zong

最近,快速学习已经成为一种新的范例,可以利用预先训练的语言模型(PLM),并在下游任务中取得有希望的结果,而参数的增加可以忽略不计。当前使用离散和连续提示时,假设特定任务的提示是固定的,并且任务中的所有示例共享同一个提示。然而,一项任务可能包含一些简单而另一些困难的完全相反的样本,并且需要不同的提示。在本文中,我们提出了一种实例感知的快速学习方法,可以为每个实例学习不同的提示。具体来说,我们假设每个可学习的提示标记对不同的实例有不同的贡献,我们通过计算实例和每个提示标记之间的相关性来学习贡献。贡献权重提示将是实例感知的。我们将我们的方法应用于单向和双向PLM的语言理解和生成任务。大量实验表明,与强基线相比,我们的方法取得了显著的改进。特别是,我们的方法在SuperGLUE少样本学习基准上达到了最新水平。

Recently, prompt learning has become a new paradigm to utilize pre-trainedlanguage models (PLMs) and achieves promising results in downstream tasks witha negligible increase of parameters. The current usage of discrete andcontinuous prompts assumes that the prompt is fixed for a specific task and allsamples in the task share the same prompt. However, a task may contain quitediverse samples in which some are easy and others are difficult, and diverseprompts are desirable. In this paper, we propose an instance-aware promptlearning method that learns a different prompt for each instance. Specifically,we suppose that each learnable prompt token has a different contribution todifferent instances, and we learn the contribution by calculating the relevancescore between an instance and each prompt token. The contribution weightedprompt would be instance aware. We apply our method to both unidirectional andbidirectional PLMs on both language understanding and generation tasks.Extensive experiments demonstrate that our method obtains considerableimprovements compared to strong baselines. Especially, our method achieves thestate-of-the-art on the SuperGLUE few-shot learning benchmark.

15. 探索基于提示的少样本学习生成可落地对话

Title: Exploring Prompt-based Few-shot Learning for Grounded Dialog Generation

Published: 2022-01-14

Url: http://arxiv.org/abs/2109.06513v2

Authors: Chujie Zheng,Minlie Huang

通过基于各种外部信息,对话模型可以得到极大的加强,但基于对话语料库通常是不可自然访问的。在这项工作中,我们主要关注扎可落地对话生成(GDG)的少数镜头学习。我们首先为GDG任务提出了一种简单的提示方法,通过连续或离散的提示来区分模型输入的不同结构,例如接地源和会话上下文。在三个典型的GDG任务上,我们实证地证明并深入分析了我们方法的有效性。然后,我们进行了大量实验,全面研究我们的激励方法如何与不同的预训练模型协同工作。我们证明了提示语言模型优于会话模型,并进一步分析了影响提示效果的各种因素。总的来说,我们的工作为GDG任务的少样本学习引入了一个基于提示的视角,并为未来的研究提供了有价值的发现和启示。

Dialog models can be greatly strengthened through grounding on various external information, but grounded dialog corpora are usually not naturallyaccessible. In this work, we focus on the few-shot learning for grounded dialoggeneration (GDG). We first propose a simple prompting method for GDG tasks,where different constructs of model input, such as the grounding source and theconversation context, are distinguished through continuous or discrete prompts.On three typical GDG tasks, we empirically demonstrate and analyze in-depth theeffectiveness of our method. We then conduct extensive experiments tothoroughly investigate how our prompting method works with differentpre-trained models. We show that prompted language models perform superiorly toconversational models, and further analyze various factors that influence theeffects of prompting. Overall, our work introduces a prompt-based perspectiveto the few-shot learning for GDG tasks, and provides valuable findings andinsights for future research.

0?wx_fmt=png
AINLP
一个有趣有AI的自然语言处理公众号:关注AI、NLP、机器学习、推荐系统、计算广告等相关技术。公众号可直接对话双语聊天机器人,尝试自动对联、作诗机、藏头诗生成器,调戏夸夸机器人、彩虹屁生成器,使用中英翻译,查询相似词,测试NLP相关工具包。
343篇原创内容
Official Account
进技术交流群请添加AINLP小助手微信(id: ainlper)
请备注具体方向+所用到的相关技术点
640?wx_fmt=jpeg

关于AINLP

AINLP 是一个有趣有AI的自然语言处理社区,专注于 AI、NLP、机器学习、深度学习、推荐算法等相关技术的分享,主题包括文本摘要、智能问答、聊天机器人、机器翻译、自动生成、知识图谱、预训练模型、推荐系统、计算广告、招聘信息、求职经验分享等,欢迎关注!加技术交流群请添加AINLPer(id:ainlper),备注工作/研究方向+加群目的。

640?wx_fmt=jpeg

阅读至此了,分享、点赞、在看三选一吧🙏


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK