NMF 可以应用下面的方法用于对多变量数据进行统计分析。给定一组多变量的
n 维数据向量,其向量位于一个 \(n\times
x\) 矩阵 V 的列中(m
表示数据集中的示例数)。然后将此矩阵近似分解为 \(n\times r\) 的 W 矩阵与 \(r\times m\)的 H 矩阵。通常 r 要小于 n 或
m,以使 W 和 H 小于原始矩阵 V。最终得到的是原始数据矩阵的压缩形态。
公式(1)中约等于的意义在于它可以将公式逐列用 \(v\approx Wh\) 来表示,其中 v 和 h 是矩阵 V
和矩阵 H 的对应的列。也就是说,每个数据向量 v 近似地由矩阵 W
的各列线性组合而成,同时用 h 的分量进行加权。因此可以被认为 W 包含了对 V
中的数据的线性近似优化的基向量。由于要使用少量的基向量来表示大量的数据向量,因此只有在基向量发现数据中的潜在结构时才能实现较好的近似。
在此,我们讨论了基于迭代更新 W 和 H 的两种 NMF
算法。由于这两种算法易于实现,同时能保证其收敛性,因此它们在现实情况中非常实用。其他算法可能在总计算时间方面更有效率,但是更难实现,并且很难推广到不同的代价函数(cost
function)。因子与我们类似的算法,已经被用于对发射断层扫描和天文图像进行反卷积(deconvolution)。
在我们算法的每次迭代中,会用当前值乘某些取决于公式(1)中的“近似程度”的因数,来找到
W 或 H
的新值。我们可以证明“近似程度”会随着不断应用这些乘法更新规则而单调减小。这正意味着更新规则的重复迭代可以保证矩阵分解算法收敛到局部最优。
代价函数
为了找到$ VWH
$的近似解,我们首先需要定义一个代价函数,用以量化近似的程度。可以使用两个非负矩阵
A 和 B 的距离来构造此代价函数。一种使用的距离度量方法为:计算 A 和 B
之间的欧几里得距离(Euclidean distance)的平方值。
与欧几里得距离相同,它的下界也为 0,且在 A=B
时距离消失。但它不能被称为“距离”,因为这个式子在 A 与 B
中并不对称,因此我们将其称为 A 对于 B
的“散度”(divergence)。它可以归纳为 KL 散度或者相对熵,当 \(\sum_{ij}A_{ij}=\sum_{ij}B_{ij}=1\) 时,A
与 B 可以看做是标准化的概率分布。
现在,我们可以按照以下两种公式来将 NMF 化为最优化问题:
最优化问题1:在约束条件 \(W, H \geq 0\) 下,以 W 和 H
作为参数,最小化 \(||V - WH||^2\)。
最优化问题2:在约束条件 \(W, H \geq 0\) 下,以 W 和 H
作为参数,最小化 \(D(V||WH)\)。
虽然方程 \(||V - WH||^2\) 和 \(D(V||WH)\) 在只考虑 W 或 H
之一时为凸,但在同时考虑 WH
两个变量时不为凸。因此,寻找一种可以找到全局最小值的算法去解决以上两个最优化问题是不切实际的。但是,还有许多数值优化方法可以用于寻找局部最小值。
深度优先搜索算法是优先扩展尚未扩展的且具有最大深度的节点;广度优先搜索法是在扩展完第
K 层的节点后才扩展 K+1 层的节点。在此应用深度优先搜索算法。
假设初始状态是图中所有顶点未曾被访问,从图中某个顶点 i
出发,访问此顶点,然后依次从 i
的未被访问的邻接点出发深度优先遍历图,直至图中所有和 i
有路径的顶点都被访问为止。若此时图中尚有顶点未被访问,则另选图中未曾访问的顶点为起始点,重复上述过程,直至图中所有顶点都被访问为止。在节点遍历过程中,应该注意节点的回溯搜索以及避免节点被重复搜索。
在搜索过程中,如何避免被搜过的节点不被重复搜索以及节点的回溯是主要难点。根据关联矩阵以及深度优先搜索算法,从节点
i 搜索出节点 j,如果节点 j 已经被搜索过,那么修改关联矩阵中对应节点 i
和节点 j 的元素为 0,并返回节点 i 重新搜索与之相联的另一节点。当节点 j
是该条树枝的最后一个节点时,修改关联矩阵中相应的元素,并且返回节点 i
重新搜索与之相连的另一树枝。依此类推,直至遍历所有节点,也就是关联矩阵的所有元素都变为
0
时,停止搜索。在遍历各节点的同时,根据节点的先后顺序以及树枝集合,合理地安排各节点坐标。其搜索过程逻辑如下图所示:
<?php /** * file_cache_example.php * * Demonstrates usage of phpFastCache with "file system" adapter */ // Include composer autoloader require__DIR__ . '/vendor/autoload.php'; usephpFastCache\CacheManager; // Init default configuration for "files" adapter CacheManager::setDefaultConfig([ "path" => __DIR__ . "/cache" ]); // Get instance of files cache $objFilesCache = CacheManager::getInstance('files'); $key = "welcome_message"; // Try to fetch cached item with "welcome_message" key $CachedString = $objFilesCache->getItem($key); if (is_null($CachedString->get())) { // The cached entry doesn't exist $numberOfSeconds = 60; $CachedString->set("This website uses PhpFastCache!")->expiresAfter($numberOfSeconds); $objFilesCache->save($CachedString); echo"Not in cache yet, we set it in cache and try to get it from cache!</br>"; echo"The value of welcome_message:" . $CachedString->get(); } else { // The cached entry exists echo"Already in cache!</br>"; echo"The value of welcome_message:" . $CachedString->get(); }
if (is_null($CachedString->get())) { // The cached entry doesn't exist $numberOfSeconds = 60; $CachedString->set("This website uses PhpFastCache!")->expiresAfter($numberOfSeconds); $objFilesCache->save($CachedString); echo"Not in cache yet, we set it in cache and try to get it from cache!</br>"; echo"The value of welcome_message:" . $CachedString->get(); } else { // The cached entry exists echo"Already in cache!</br>"; echo"The value of welcome_message:" . $CachedString->get(); }
非常容易上手对吧!你可以试着自己去运行一下这个程序来查看结果。
当你第一次运行这个程序时,应该会看到以下输出:
1 2
Not in cache yet, we setitin cache andtrytogetitfrom cache! The value of welcome_message: This website uses PhpFastCache!
之后再运行的时候,输出会是这样:
1 2
Already in cache! The value of welcome_message: This website uses PhpFastCache!
<?php /** * redis_cache_example.php * * Demonstrates usage of phpFastCache with "redis" adapter * * Make sure php-redis extension is installed along with Redis server. */ // Include composer autoloader require__DIR__ . '/vendor/autoload.php'; usephpFastCache\CacheManager; // Init default configuration for "redis" adapter CacheManager::setDefaultConfig([ "host" => '127.0.0.1', "port" => 6379 ]); // Get instance of files cache $objRedisCache = CacheManager::getInstance('redis'); $key = "welcome_message"; // Try to fetch cached item with "welcome_message" key $CachedString = $objRedisCache->getItem($key); if (is_null($CachedString->get())) { // The cached entry doesn't exist $numberOfSeconds = 60; $CachedString->set("This website uses PhpFastCache!")->expiresAfter($numberOfSeconds); $objRedisCache->save($CachedString); echo"Not in cache yet, we set it in cache and try to get it from cache!</br>"; echo"The value of welcome_message:" . $CachedString->get(); } else { // The cached entry exists echo"Already in cache!</br>"; echo"The value of welcome_message:" . $CachedString->get(); }
从概念上看,较低层中的滤波器捕捉的是原始的句子信息(h-gram,类似于图像处理中的边缘信息),而叫高层中的滤波器捕捉的是更复杂的语言特征,比如语义和句法结构等(类似于图像处理中的图像元素)。这种自底向上的结构通过分层堆叠文本片段(h-gram)作为构建表示向量
h 的积木。这种方法在思路上类似于通过 concrete syntax trees [26]
对语言的语法结构进行建模。不过,我们没有事先去指定某种语法结构(比如英语),而是通过多层
CNN 网络提取此结构。
2.2 反卷积解码器
我们按照一定步长应用卷积的变体 - 反卷积操作(比如 convolutional
transpose),用于解码表示向量
h,将其还原至原来的文本域。随着反卷积的不断进行,向量的空间高度也不断增加,如图1所示,和之前描述的卷积操作刚好相反。空间维数首先展开至与卷积层的
L-1 层相同,接着逐渐展开为 \(T^{(l+1)} =
(T^{(l)}-1)*r^{(l)} + h\) for l=1,...直到第 L
个反卷积层(此层与卷积编码器的输入层相对应)。第 L
反卷积层的输出目标是重建 Word embedding 矩阵 \(\hat{X}\)。与 \(W_e\) 一样,\(\hat{X}\) 的每一列都经由 l2-norm 处理。
用 \(\hat{w}^t\) 来表示重建后句子
\(\hat{s}\) 中的第 t 个单词,\(\hat{w}^t\) 为 v 的概率可表示为:
我们提出了一种只使用卷积与反卷积运算的用于文本模型的通用结构。这种结构没有进行序列条件生成,因此避免了
teacher forcing training 与 exposure bias
问题。这种方法可以将段落完全压缩至潜在表示向量中,此向量也能解压缩重建原始输入序列。总的来说,这种方法实现了高质量的长段落重建,并优于现有的拼写矫正算法、半监督序列分类算法、文本摘要算法,且减少了计算成本。
引用
[1] Andrew M Dai and Quoc V Le. Semi-supervised sequence learning. In
NIPS, 2015.
[2] Quoc Le and Tomas Mikolov. Distributed representations of
sentences and documents. In ICML, 2014.
[3] Rie Johnson and Tong Zhang. Supervised and Semi-Supervised Text
Categorization using LSTM for Region Embeddings. arXiv, February
2016.
[4] Takeru Miyato, Andrew M Dai, and Ian Goodfellow. Adversarial
Training Methods for Semi-Supervised Text Classification. In ICLR, May
2017.
[5] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural
Machine Translation by Jointly Learning to Align and Translate. In ICLR,
2015.
[6] Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry
Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning
phrase representations using rnn encoder-decoder for statistical machine
translation. In EMNLP, 2014.
[7] Fandong Meng, Zhengdong Lu, Mingxuan Wang, Hang Li, Wenbin Jiang,
and Qun Liu. Encoding source language with convolutional neural network
for machine translation. In ACL, 2015.
[8] Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Pei-Hao Su, David
Vandyke, and Steve Young. Semantically conditioned lstm-based natural
language generation for spoken dialogue systems. arXiv, 2015.
[9] Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao,
and Dan Jurafsky. Deep reinforcement learning for dialogue generation.
arXiv, 2016.
[10] Jiwei Li, Will Monroe, Tianlin Shi, Alan Ritter, and Dan
Jurafsky. Adversarial learning for neural dialogue generation.
arXiv:1701.06547, 2017.
[11] Ramesh Nallapati, Bowen Zhou, Cicero Nogueira dos santos, Caglar
Gulcehre, and Bing Xiang. Abstractive Text Summarization Using
Sequence-to-Sequence RNNs and Beyond. In CoNLL, 2016.
[12] Shashi Narayan, Nikos Papasarantopoulos, Mirella Lapata, and
Shay B Cohen. Neural Extractive Summarization with Side Information.
arXiv, April 2017.
[13] Alexander M Rush, Sumit Chopra, and Jason Weston. A Neural
Attention Model for Abstractive Sentence Summarization. In EMNLP,
2015.
[14] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to
sequence learning with neural networks. In NIPS, 2014.
[15] Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernock`y, and
Sanjeev Khudanpur. Recurrent neural network based language model. In
INTERSPEECH, 2010.
[16] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory.
In Neural computation, 1997.
[17] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua
Bengio. Empirical evaluation of gated recurrent neural networks on
sequence modeling. arXiv, 2014.
[18] Ronald J Williams and David Zipser. A learning algorithm for
continually running fully recurrent neural networks. Neural computation,
1(2):270–280, 1989.
[19] Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer.
Scheduled sampling for sequence prediction with recurrent neural
networks. In NIPS, 2015.
[20] Ferenc Huszár. How (not) to train your generative model:
Scheduled sampling, likelihood, adversary? arXiv, 2015.
[21] Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. A
convolutional neural network for modelling sentences. In ACL, 2014.
[22] Yoon Kim. Convolutional neural networks for sentence
classification. In EMNLP, 2014.
[23] Ishaan Gulrajani, Kundan Kumar, Faruk Ahmed, Adrien Ali Taiga,
Francesco Visin, David Vazquez, and Aaron Courville. Pixelvae: A latent
variable model for natural images. arXiv, 2016.
[24] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised
representation learning with deep convolutional generative adversarial
networks. arXiv, 2015.
[25] Vinod Nair and Geoffrey E Hinton. Rectified linear units improve
restricted boltzmann machines. In ICML, pages 807–814, 2010.
[26] Ian Chiswell and Wilfrid Hodges. Mathematical logic, volume 3.
OUP Oxford, 2007.
[27] Emil Julius Gumbel and Julius Lieblein. Statistical theory of
extreme values and some practical applications: a series of lectures.
1954.
[28] Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen,
Koray Kavukcuoglu, and Pavel Kuksa. Natural language processing (almost)
from scratch. In JMLR, 2011.
[29] Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan
Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. cudnn: Efficient
primitives for deep learning. arXiv, 2014.
[30] Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and
Eduard Hovy. Hierarchical attention networks for document classification.
In NAACL, 2016.
[31] Adji B Dieng, Chong Wang, Jianfeng Gao, and John Paisley.
TopicRNN: A Recurrent Neural Network with Long-Range Semantic
Dependency. In ICLR, 2016.
[32] Diederik P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and
Max Welling. Semi-supervised learning with deep generative models. In
NIPS, 2014.
[33] Yunchen Pu, Zhe Gan, Ricardo Henao, Xin Yuan, Chunyuan Li,
Andrew Stevens, and Lawrence Carin.
Variational autoencoder for deep learning of images, labels and
captions. In NIPS, 2016.
[34] Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, and Jürgen
Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning
long-term dependencies, 2001.
[35] Richard Socher, Jeffrey Pennington, Eric H Huang, Andrew Y Ng,
and Christopher D Manning. Semisupervised recursive autoencoders for
predicting sentiment distributions. In EMNLP. Association for
Computational Linguistics, 2011.
[36] Samuel R Bowman, Luke Vilnis, Oriol Vinyals, Andrew M Dai, Rafal
Jozefowicz, and Samy Bengio. Generating sentences from a continuous
space. arXiv, 2015.
[37] Zichao Yang, Zhiting Hu, Ruslan Salakhutdinov, and Taylor
Berg-Kirkpatrick. Improved Variational Autoencoders for Text Modeling
using Dilated Convolutions. arXiv, February 2017.
[38] Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen.
Convolutional neural network architectures for matching natural language
sentences. In NIPS, 2014.
[39] Rie Johnson and Tong Zhang. Effective use of word order for text
categorization with convolutional neural networks. In NAACL HLT,
2015.
[40] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and
Martin Riedmiller. Striving for simplicity: The all convolutional net.
arXiv, 2014.
[41] Karen Simonyan and Andrew Zisserman. Very deep convolutional
networks for large-scale image recognition. In ICLR, 2015.
[42] Stanislau Semeniuta, Aliaksei Severyn, and Erhardt Barth. A
Hybrid Convolutional Variational Autoencoder for Text Generation. arXiv,
February 2017.
[43] Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den
Oord, Alex Graves, and Koray Kavukcuoglu. Neural machine translation in
linear time. arXiv, 2016.
[44] Yann N Dauphin, Angela Fan, Michael Auli, and David Grangier.
Language Modeling with Gated Convolutional Networks. arXiv, December
2016.
[45] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin.
Convolutional Sequence to Sequence Learning. arXiv, May 2017.
[46] Aaron van den Oord, Nal Kalchbrenner, Lasse Espeholt, Oriol
Vinyals, Alex Graves, et al. Conditional image generation with pixelcnn
decoders. In NIPS, pages 4790–4798, 2016.
[47] Jiwei Li, Minh-Thang Luong, and Dan Jurafsky. A hierarchical
neural autoencoder for paragraphs and documents. In ACL, 2015.
[48] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and
Jeff Dean. Distributed representations of words and phrases and their
compositionality. In NIPS, 2013.
[49] Kam-Fai Wong, Mingli Wu, and Wenjie Li. Extractive summarization
using supervised and semi-supervised learning. In ICCL. Association for
Computational Linguistics, 2008.
[50] Chin-Yew Lin. Rouge: A package for automatic evaluation of
summaries. In ACL workshop, 2004.
[51] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu.
Bleu: a method for automatic evaluation of machine translation. In ACL.
Association for Computational Linguistics, 2002.
[52] Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level
convolutional networks for text classification. In NIPS, pages 649–657,
2015.
[53] Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal,
Ryan Lowe, Joelle Pineau, Aaron Courville, and Yoshua Bengio. An
actor-critic algorithm for sequence prediction. arXiv, 2016.
[54] JP Woodard and JT Nelson. An information theoretic measure of
speech recognition performance. In Workshop on standardisation for
speech I/O, 1982.
[55] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic
optimization. In ICLR, 2015.
[56] Xavier Glorot and Yoshua Bengio. Understanding the difficulty of
training deep feedforward neural networks. In AISTATS, 2010.