By: Lucas Cruz and Caio Santos
In: Legal and AI
2023, March 17

Introducing LIBERT - Legal Inference with Bidirectional Encoder Representations from Transformers

In recent years, artificial intelligence advancements have significantly impacted various fields, including law. The increasing availability of legal data has opened new avenues for automating and streamlining legal processes, with a particular interest in applying natural language processing (NLP) techniques to analyze legal documents [1, 2, 3].

Deep learning breakthroughs have led to exceptional results in NLP tasks, with bidirectional transformers like BERT [5] and later work [6, 7, 8, 9] trained on large datasets to capture intricate relationships between words and their context. This project aims to harness these models to analyze initial petitions and predict lawsuit outcomes.

The LIBERT project is an innovative initiative using state-of-the-art large language models (LLMs) to accurately predict civil lawsuit outcomes through initial petition analysis. By equipping a legal team with insights and decision-making support, the project seeks to optimize lawsuit negotiation, improve efficiency, and reduce financial risks.

This cutting-edge project has three primary objectives: (1) develop a deep learning algorithm capable of accurately predicting civil lawsuit outcomes, (2) produce high-quality academic research detailing the methodology and findings, and (3) enhance team expertise in legal concepts, data extraction from legal documents, and applying LLMs to specific tasks.

LIBERT has the potential to revolutionize the legal landscape by leveraging LLMs for civil lawsuit outcome prediction, saving legal teams time and resources while improving prediction accuracy. The project's success will advance AI and deep learning applications in the legal sector and lay a foundation for future research and innovation.


The significant legal risk exposure faced by a company serving thousands of customers presents a substantial challenge, often requiring extensive professional hours and resources. Companies must represent themselves and manage legal processes, which can be both time-consuming and costly. Furthermore, the potential for compensation claims can greatly impact a company's financial standing.

To tackle this challenge, the company must establish a clear and effective system for managing legal processes and responding to claims, utilizing the scalable power of AI [3, 4]. This involves automating and enhancing tasks to improve management, generate insights, and increase agility within the legal team responsible for handling civil lawsuits. By accomplishing this objective, the company can minimize time spent on potential losses, negotiate claims proactively, optimize human resources, and focus more on critical processes within the workflow.

The specific problem the company in our case study is facing is that due to financial constraints, their legal team is limited in their ability to negotiate lawsuits, which can have negative long-term consequences for the company. While it may be necessary to conserve resources in the short term, not settling certain lawsuits could result in higher damages in the long run. Therefore, the company needs to optimize its lawsuit settlement process and identify which cases to negotiate based on its limited budget.

What we’ve accomplished so far

The project's timeline is divided into three main phases: (1) developing a proof of concept, (2) refining and improving data, and (3) completing model training and validation. We’re currently on the second phase of the project, improving data quality to further push model capacity.


  • Sharma, S., Gamoura, S., Prasad, D., & Aneja, A. (2021). Emerging Legal Informatics Towards Legal Innovation: Current Status and Future Challenges and Opportunities. Legal Information Management, 21(3-4), 218-235. doi:10.1017/S1472669621000384 Tag: sharma2021

  • Hongdao, Q., Bibi, S., Khan, A., Ardito, L., & Khaskheli, M. (2019). Legal Technologies in Action: The Future of the Legal Market in Light of Disruptive Innovations. Sustainability, 11(4), 1015. MDPI AG. Retrieved from Tag: hongdao2019

  • John Armour, Mari Sako, AI-enabled business models in legal services: from traditional law firms to next-generation law companies?, Journal of Professions and Organization, Volume 7, Issue 1, March 2020, Pages 27–46, Tag: armour2020

  • Lawtech Adoption Research Report (2019) The Law Society. Available at: (Accessed: March 20, 2023). Tag: law2019

  • Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Tag: devlin2018

  • Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. Tag: sanh2019

  • Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv, abs/1907.11692. Tag: liu2019

  • Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2019). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Annual Meeting of the Association for Computational Linguistics. Tag: lewis2019

  • Joshi, M., Chen, D., Liu, Y., Weld, D.S., Zettlemoyer, L., & Levy, O. (2019). SpanBERT: Improving Pre-training by Representing and Predicting Spans. Transactions of the Association for Computational Linguistics, 8, 64-77. Tag: joshi2019

  • Butcher, H. and Georgiou, C. (2021) Leading in-house legal teams develop roadmap for the future, Ashurst. Available at: (Accessed: March 20, 2023). - Tag: butcher2021

Lucas Cruz

Research and development engineer
Electrical and Computer Engineering - UFRJ - Federal University of Rio de Janeiro

Caio Santos

Research and development engineer
Nuclear Engineering - UFRJ - Federal University of Rio de Janeiro