Transformer related optimization, including BERT, GPT
Fix some bugs of v5.2
Fix the bug of model parallelism setting of T5 on v5.1