The course notes about Stanford CS224n Natural Language Processing with Deep Learning Winter 2019 (using PyTorch)
The course notes about Stanford CS224n Winter 2019 (using PyTorch)
Some general notes I'll write in my Deep Learning Practice repository
Course Related Links
Lecture
Assignment
Project
Paper reading
Derivation
glove.6B.*d.txt
files into embedding/GloVe
directoryOutline
Outline
CS 168 The Modern Algorithmic Toolbox - for SVD
Outline
Outline
Outline
mentioned CS103, CS228
- N-gram Language Model
- Fixed-window Neural Language Model
- vanilla RNN
Vanishing gradient =>
- LSTM and GRU
- Training method: Teacher Forcing
- During training, we feed the gold (aka reference) target sentence into the decoder, regardless of what the decoder predicts.
- During testing (decoding): Beam Search vs. Greedy Decoding
- Decoding Algorithm: an algorithm you use to generate text from your language model
- Greedy Decoding => lack of backtracking
- on each step take the most probable word (i.e. argmax)
- use that as the next word, and feed it as input on the next step
- keep going until you produce
<END>
or reach some max length- Beam Search: aims to find high-probability sequence by tracking multiple possible sequences at once
- on each step of decoder, keep track of the k (beam size) most probable partial sequences (hypotheses)
- after you reach some stopping criterion (get n complete hypotheses (each stop when reach max depth, produce
<END>
)), choose the sequence with the highest probability (with score normalization)
ELMo, BERT
guest lecture
Self-attention, Transformer
Vanishing Gradient, LSTM, GRU (again)
some more Attention, mentioned CS 276: Information Retrieval and Web Search
Quick notes about QA:
mentioned CS231n: Convolutional Neural Networks for Visual Recognition
Lot of common technique (nowadays)
fastText
Outline
Softmax temperature: another way to control diversity
Outline
Outline
Outline
Outline
Related
Others' Answer
python3 parser_transitions.py part_c
check the corretness of transition mechanicspython3 parser_transitions.py part_d
check the correctness of minibatch parsepython3 run.py
debug=True
to test the process (debug_out.log
)debug=False
to train on the entire dataset (train_out.log
)
Outline
Others' Answer
nltk.translate.bleu_score
python3 sanity_check.py 1d
check the correctness of encode procedure (including utils.pad_sents)python3 sanity_check.py 1e
check the correctness of decode procedure (including step function)sh run.sh vocab
to get the necessary vocabularysh run.sh train_local
; test sh run.sh test_local
sh run.sh train
; test sh run.sh test
model.bin
and optimizers' state model.bin.optim
)epoch 13, iter 86000, cum. loss 28.94, cum. ppl 5.13 cum. examples 64000
=> Corpus BLEU: 22.36579929869114
vim -dO outputs/test_outputs.txt en_es_data/test.en
vim -o outputs/test_outputs.txt en_es_data/test.en en_es_data/test.es
Other's Answer
build a character level ConvNet
sh run.sh vocab
vocab_tiny_q1.json
: generated vocabulary, source 132 words, target 132 words
vocab_tiny_q2.json
: generated vocabulary, source 26 words, target 32 words
vocab.json
: generated vocabulary, source 50004 words, target 50002 words
python3 sanity_check.py [part]
sh run.sh train_local_q1
- this will run 100 epoches
epoch 100, iter 500, cum. loss 0.31, cum. ppl 1.02 cum. examples 200
validation: iter 500, dev. ppl 1.003381
sh run.sh test_local_q1
- the model should overfit => Corpus BLEU: 99.29792465574434 (> 99)
outputs/test_outputs_local_q1.txt
sh run.sh train_local_q2
epoch 200, iter 1000, cum. loss 0.26, cum. ppl 1.01 cum. examples 200
validation: iter 1000, dev. ppl 1.003469
sh run.sh test_local_q2
- the model should overfit => Corpus BLEU: 99.29792465574434
outputs/test_outputs_local_q2.txt
sh run.sh train
and test the performance with sh run.sh test
epoch 29, iter 196330, avg. loss 90.37, avg. ppl 147.15 cum. examples 10537, speed 3512.25 words/sec, time elapsed 29845.45 sec
reached maximum number of epochs!
=> Corpus BLEU: 24.20035238301319
TODO:
<unk>
words)SQuAD is NOT an Natural Language Generation task. (since the answer is extracted from text.)
Default final project
Recommend in Lecture 11
PyTorch notes
A * B
, torch.mul(A, B)
, A.mul(B)
A @ B
, torch.matmul(A, B)
, torch.mm
, torch.bmm
, .....view()
=> error (only on CPU, because tensor.cuda()
automatically makes the tensor contiguous).contiguous().view()
=> okay.reshape()
=> okay