Summary Observation

HyperParameters considered

Models

GLUE benchmark Description

BenchMark Comparisons
Benchmark Dataset size Metric Comments
CoLA 10.657k Matt Corr Grammar, 23 linguistic pub, single sentence
SST-2 70.045k Accuracy Sentiments
MRPC 6.212k F1/Acc
STS-B 8.631k Pearson-Spearman Corr Sentence pair similarity
QQP 8.631k F1/Acc Sentence pair
MNLI 431.997k Accuracy Multi-NLI - matched and mismatched
QNLI 116.672k Accuracy Question-NLI
RTE 5.770k Accuracy
WNLI 0.858k Accuracy Winograd-NLI
SNLI 569.036k

DeBerta - Decoding enhanced BERT with disentangled attention

Funnel Transformer

* Filtering out Sequential Redundancy
Commentary on GLUE Results
Model Comparisons and Parameters
Model # Parameters Comments
Bert-base 110M
Bert-large 345M
DeBerta-1.5 1.5B Surpass human performance on SuperGLUE
DeBerta-base 134M
DeBerta-large 390M
BenchMark Model/Parameters Eval Acc
WNLI bert-base-uncased, 20 epochs, LR=2e-5, Batch=32 0.338
WNLI bert-base-uncased, 5 epochs, LR=2e-4, Batch=32 0.437
WNLI bert-base-uncased, 5 epochs, LR=2e-6, Batch=32 0.563
WNLI bert-base-uncased, 5 epochs, LR=2e-7, Batch=32 0.563
WNLI textattack/bert-based-uncased-WNLI, 5 epochs, LR=2e-5, Batch=32 0.5
WNLI textattack/bert-based-uncased-WNLI, 5 epochs, LR=5e-5, Batch=64 0.5
WNLI textattack/bert-based-uncased-WNLI,5 epochs,LR=5e-5,Batch=32,MaxSeq=256 0.5
Attempts at the large models : GLUE (Shibainu) 5 epochs Batch Size=8
BenchMark Model Eval Acc/F1
MRPC deberta-large 0.911/0.936
MRPC funnel-transformer-large (batch=8) 0.909/0.935
MRPC ernie-2.0-large 0.895/0.925
MRPC funnel-transformer-large (batch=16) 0.889/0.919
MRPC funnel-transformer-xlarge (batch=8) 0.887/0.919
MRPC bert-large-uncased 0.882/0.917
MRPC deberta-base(3 epochs) 0.877/0.913
MRPC albert-large-v1 0.870/0.907
MRPC albert-large-v2 0.850/0.894
MRPC electra-discriminator-large 0.684/0.812
MRPC roberta-large 0.683/0.812
BenchMark Model Eval Acc
WNLI deberta-large (batch=8, 5 epochs) 0.563
WNLI funnel-transformer-xlarge 0.563
WNLI deberta-large (batch=8, 15 epochs) 0.493
WNLI Ernie2.0-large 0.423
BenchMark Model Eval Pearson Eval Spearman
STSB Ernie2.0-large 0.924 0.921
STSB Funnel-xlarge 0.921 0.920
STSB DeBerta-large 0.904 0.910
STSB Bert-uncased-large 0.907 0.904
BenchMark Model Eval Acc Train Time
SST2 Bert-large-uncased 0.934 20,086
SST2 Ernie2.0-large 0.928 18,922
SST2 DeBerta-large 0.512
SST2 Funnel-xlarge 0.509
SST2 Funnel-large (batch=16) 0.509
BenchMark Model Eval Acc
RTE Funnel-xlarge 0.902
RTE Ernie2.0-large 0.830
RTE bert-uncased-large (batchsize=8) 0.729
RTE DeBerta-large 0.472
BenchMark Model Eval Matt Corr
CoLA Funnel-xlarge 0.663
CoLA Ernie2.0-large 0.660
CoLA Bert-Large 0.619
CoLA DeBerta-large 0 ??
Attempts at the small models : GLUE (Shibainu) Batch Size=32
BenchMark Model
MRPC funnel-transformer-intermediate (batch=32)
MRPC funnel-transformer-intermediate (batch=48)
MRPC deberta-base(3 epochs)