Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



2 Commits

Repository files navigation

#Deep Learning for NLP

A list of resources dedicated to deep learning for natural language processing tasks

##Word Vector

  1. Bengio Y, Schwenk H, Senécal J S, et al. A Neural Probabilistic Language Models[J]. Journal of Machine Learning Research, 2003, 3(6):1137-1155.
    —— Introduction to a neural langauge model that learns a distributed representation for each word, along with the probability function for word sequence.

  2. Morin F, Bengio Y. Hierarchical probabilistic neural network language model[J]. Aistats, 2005.
    —— A hierarchical neural network called hierasrchical softmax that provides exponential speed-up when used to compute conditional probabilities.

  3. Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[J]. Computer Science, 2013.
    —— CBOW and Skip-gram are two new log-linear model architectures for learning distributed representations of words. They can be used for learning high-quality word vectors from huge data sets with billions of words, and with millions of words in the vocabulary.
    of words and a

  4. Gutmann M U, Hyv&#, Rinen A. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics[J]. Journal of Machine Learning Research, 2012, 13(1):307-361.
    —— Noise-Contrastive Estimation(NCE) is an is an objective function for estimation of both normalized and unnormalized models, a simplified verson of NEC called Negative Sampling Estimation(NSC) is applied on Word2Vec to speed up training.

  5. Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and their Compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26:3111-3119.
    —— This paper discribes architecture of Google's word2vec, it is an extension of Skip-gram models with subsampling of frequent words and NSC as an alternation to the hierarchical softmax.

  6. Goldberg Y, Levy O. word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method[J]. Eprint Arxiv, 2014.
    —— Detailed description of Negative Sampling Estimation

  7. Pennington J, Socher R, Manning C. Glove: Global Vectors for Word Representation[C]// Conference on Empirical Methods in Natural Language Processing. 2014.
    —— Glove is a global logbilinear regression model that combines the advantages of the two major model families:global matrix factorization and local context window methods

  8. Bojanowski P, Grave E, Joulin A, et al. Enriching Word Vectors with Subword Information[J]. 2016.
    —— Langusge model for Facebook's FastText, it is an extension of Skip-gram model and propose a different scoring function that take into account of internal structure of words.

  9. word2vec中的数学原理详解
    —— A Chinese blog for word2vec


  1. Gensim is a free Python library designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible.It can also be used to train word2vec models
  2. TensorFlow is an open source software library for machine intelligence,The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API.


No description, website, or topics provided.






No releases published


No packages published