PhoBERT: Pre-trained language models for Vietnamese



PhoBERT: Pre-trained language models for Vietnamese (

Pre-trained models are available at:

Pre-trained PhoBERT models are the state-of-the-art language models for Vietnamese (Pho, i.e. “Phở”, is a popular food in Vietnam):

  • Two versions of PhoBERT “base” and “large” are the first public large-scale monolingual language models pre-trained for Vietnamese. PhoBERT pre-training approach is based on RoBERTa which optimizes the BERT pre-training method for more robust performance.

  • PhoBERT outperforms previous monolingual and multilingual approaches, obtaining new state-of-the-art performances on three downstream Vietnamese NLP tasks of Part-of-speech tagging, Named-entity recognition and Natural language inference.

We release our PhoBERT models in popular open-source libraries, hoping that PhoBERT can serve as a strong baseline for future Vietnamese NLP research and applications.


Cám ơn anh vì dự án này :smile: hi vọng sẽ được thấy bản hướng dẫn load model này qua huggingface sớm ạ !


các anh cho em hỏi, khi em dùng từ “vậy” thì lỗi, nhưng dùng từ khác, ví dụ như từ “thế” sẽ ko có lỗi

lỗi này khắc phục như thế nào ạ em cảm ơn.


If you have any issue, please post to PhoBERT github!


Problem solved!!!


I guess the word is out of dictionary.