# NLP and Word Embeddings

## Word Embeddings

[0, 0, 0, ..., 1, ..., 0, 0]


1表示其在dictionary中的index，我们用$O_{index}$表示。上面例子中，man在字典中的index为5791，则对应的表示为${O_{5791}}$。不难发现，对任意两个不同的word，他们所对应向量之间的inner product为0。这说明word之间完全正交，即使他们有相关性，系统也无法generalize，例如

I want a glass of orange juice
I want a glass of apple ____


x = ["Sally", "Johnson", "is", "an", "orange", "farmer"]
y = [1, 1, 0, 0, 0, 0]


## Word2Vec

Word2Vec是一种相对比较高效的learn word emedding的一种算法。它的大概意思，选取一个context word，比如”orange” 和一个 target word比如 “juice”，我们通过构建neural network找到将context word映射成target word的embedding matrix。通常来说，这个target word是context word附近的一个word，可以是context word向前或者向后skip若干个random word之后得到的word。

# for each input word, predict its context words surrounding it
class SkipGram(nn.Module):
def __init__(self, n_vocab, n_embed):
super().__init__()

# complete this SkipGram model
self.embedding = nn.Embedding(n_vocab, n_embed)
self.fc = nn.Linear(n_embed, n_vocab)
self.softmax = nn.LogSoftmax(dim=1)

def forward(self, x):
x = self.embedding(x)
x = self.fc(x)
x = self.softmax(x)
return x


nn.Embedding

## Negtive Sampling

x1: (orange, juice), y1:1
x2: (orange, king), y2:0


context_embed = nn.Embedding(n_vocab, n_embed)
target_embed = nn.Embedding(n_vocab, n_embed)

P(y=1 | c,t) = sigmoid(target_embed.t() * target_embed)


context |  word | target?
--------------------------
orange  | juice | 1
range   | king  | 0
orange  | book  | 0
orange  | the   | 0
orange  | of    | 0


TBD