Title | Bert Embedding - Intro |
---|---|
Author | Isanka Rajapaksha |
Course | Communication Systems and Networks |
Institution | University of Moratuwa |
Pages | 5 |
File Size | 302.9 KB |
File Type | |
Total Downloads | 90 |
Total Views | 146 |
Introduction to Bert embedding...
Bert Embedding Difference btw BERT and Word2Vec BERT offers an advantage over models like Word2Vec, because while each word has a fixed representation under Word2Vec regardless of the context within which the word appears, BERT produces word representations that are dynamically informed by the words around them. For example, given two sentences: “The man was accused of robbing a bank.” “The man went fishing by the bank of the river.” Word2Vec would produce the same word embedding for the word “bank” in both sentences, while under BERT the word embedding for “bank” would be different for each sentence. Aside from capturing obvious differences like polysemy, the context-informed word embeddings capture other forms of information that result in more accurate feature representations, which in turn results in better model performance. The current list of classes provided for fine-tuning by hugging face: 1. 2. 3. 4. 5. 6. 7.
BertModel BertForPreTraining BertForMaskedLM BertForNextSentencePrediction BertForSequenceClassification BertForTokenClassification BertForQuestionAnswering
Input Formatting 1. A special token, [SEP], to mark the end of a sentence, or the separation between two sentences 2. A special token, [CLS], at the beginning of our text. This token is used for classification tasks, but BERT expects it no matter what your application is. 3. Tokens that conform with the fixed vocabulary used in BERT 4. The Token IDs for the tokens, from BERT’s tokenizer 5. Mask IDs to indicate which elements in the sequence are tokens and which are padding elements 6. Segment IDs used to distinguish different sentences 7. Positional Embeddings used to show token position within the sequence
Output
1. last_hidden_state 2. pooler_output 3. hidden_states 4. attentions last_hidden_state The sequence of hidden-states at the output of the last layer of the model. Shape - (batch_size, sequence_length, hidden_size)
hidden_states Hidden-states of the model (initial embedding + output of each layer(12 layers)) Shape- (batch_size, sequence_length, hidden_size)
Experiments
bert-base-uncased
pretrained_BERT_further_tra ined_with_criminal_corpus
last_hidden_state
test_acc: 0.6878, test_f1: 0.6121
test_acc: 0.6779, test_f1: 0.6154
Initial input embedding
test_acc: 0.6603 test_f1: 0.5796
Sum last four hidden
test_acc: 0.6891, test_f1: 0.6185
GCN
0.6297
0.5678
SDGCN
0.67814
0.61214
Dear sir, The progress of the last week is as follows: We have tested other remaining word embedding strategies for both bert-base-uncased model and pretrained_BERT_further_trained_with_criminal_corpus model. The evaluation
results are shown in the below table; bert-base-uncased
pretrained_BERT_further_traine d_with_criminal_corpus
last_hidden_state
test_acc: 0.6981, test_f1: 0.6340
test_acc: 0.6781, test_f1: 0.6263
Initial input embedding
test_acc: 0.6603 test_f1: 0.5796
test_acc: 0.6670 test_f1: 0.5705
Summation of last four hidden
test_acc: 0.6958, test_f1: 0.6451
test_acc: 0.6864, test_f1: 0.6395
Contact last four hidden
test_acc: 0.6799, test_f1: 0.6377
test_acc: 0.6930, test_f1: 0.6365
bert-base-uncased
pretrained_BERT_further_tra ined_with_criminal_corpus
last_hidden_state
test_acc: 0.6561, test_f1: 0.5830
test_acc: 0.6258, test_f1: 0.5763
Summation of last four hidden
test_acc: 0.6530, test_f1: 0.6170
test_acc: 0.6364, test_f1: 0.5861
Contact last four hidden
test_acc: 0.6409, test_f1: 0.5877
test_acc: 0.6500, test_f1: 0.5874
AEN BERT - Real values...