Bert Embedding - Intro PDF

Title	Bert Embedding - Intro
Author	Isanka Rajapaksha
Course	Communication Systems and Networks
Institution	University of Moratuwa
Pages	5
File Size	302.9 KB
File Type	PDF
Total Downloads	90
Total Views	146

Preview

CLICK TO PREVIEW PDF

Summary

Introduction to Bert embedding...

Description

Bert Embedding Difference btw BERT and Word2Vec BERT offers an advantage over models like Word2Vec, because while each word has a fixed representation under Word2Vec regardless of the context within which the word appears, BERT produces word representations that are dynamically informed by the words around them. For example, given two sentences: “The man was accused of robbing a bank.” “The man went fishing by the bank of the river.” Word2Vec would produce the same word embedding for the word “bank” in both sentences, while under BERT the word embedding for “bank” would be different for each sentence. Aside from capturing obvious differences like polysemy, the context-informed word embeddings capture other forms of information that result in more accurate feature representations, which in turn results in better model performance. The current list of classes provided for fine-tuning by hugging face: 1. 2. 3. 4. 5. 6. 7.

BertModel BertForPreTraining BertForMaskedLM BertForNextSentencePrediction BertForSequenceClassification BertForTokenClassification BertForQuestionAnswering

Input Formatting 1. A special token, [SEP], to mark the end of a sentence, or the separation between two sentences 2. A special token, [CLS], at the beginning of our text. This token is used for classification tasks, but BERT expects it no matter what your application is. 3. Tokens that conform with the fixed vocabulary used in BERT 4. The Token IDs for the tokens, from BERT’s tokenizer 5. Mask IDs to indicate which elements in the sequence are tokens and which are padding elements 6. Segment IDs used to distinguish different sentences 7. Positional Embeddings used to show token position within the sequence

Output

1. last_hidden_state 2. pooler_output 3. hidden_states 4. attentions last_hidden_state The sequence of hidden-states at the output of the last layer of the model. Shape - (batch_size, sequence_length, hidden_size)

hidden_states Hidden-states of the model (initial embedding + output of each layer(12 layers)) Shape- (batch_size, sequence_length, hidden_size)

Experiments

bert-base-uncased

pretrained_BERT_further_tra ined_with_criminal_corpus

last_hidden_state

test_acc: 0.6878, test_f1: 0.6121

test_acc: 0.6779, test_f1: 0.6154

Initial input embedding

test_acc: 0.6603 test_f1: 0.5796

Sum last four hidden

test_acc: 0.6891, test_f1: 0.6185

GCN

0.6297

0.5678

SDGCN

0.67814

0.61214

Dear sir, The progress of the last week is as follows: We have tested other remaining word embedding strategies for both bert-base-uncased model and pretrained_BERT_further_trained_with_criminal_corpus model. The evaluation

results are shown in the below table; bert-base-uncased

pretrained_BERT_further_traine d_with_criminal_corpus

last_hidden_state

test_acc: 0.6981, test_f1: 0.6340

test_acc: 0.6781, test_f1: 0.6263

Initial input embedding

test_acc: 0.6603 test_f1: 0.5796

test_acc: 0.6670 test_f1: 0.5705

Summation of last four hidden

test_acc: 0.6958, test_f1: 0.6451

test_acc: 0.6864, test_f1: 0.6395

Contact last four hidden

test_acc: 0.6799, test_f1: 0.6377

test_acc: 0.6930, test_f1: 0.6365

bert-base-uncased

pretrained_BERT_further_tra ined_with_criminal_corpus

last_hidden_state

test_acc: 0.6561, test_f1: 0.5830

test_acc: 0.6258, test_f1: 0.5763

Summation of last four hidden

test_acc: 0.6530, test_f1: 0.6170

test_acc: 0.6364, test_f1: 0.5861

Contact last four hidden

test_acc: 0.6409, test_f1: 0.5877

test_acc: 0.6500, test_f1: 0.5874

AEN BERT - Real values...