Sequence extraction in legal Contracts
Anyone who has ever had to deal with loads of unstructured data, be it corporate/govt. documents, invoices, forms etc to extract relevant information knows how tedious, mundane & frustrating it can be and have wished if only there was a quicker way to do all of this.
Contracts are one such form of unstructured data and extracting key relevant information from it can be difficult to an untrained eye. It may even take some good 20 minutes for a lawyer to go through a contract thoroughly and god forbids if one has to process hundred's of such documents. We tried to solve this problem by building custom sequence extraction models for legal contracts that analyse a contract and extract more than 22 key pieces of information, under a minute.
Through this talk, I hope to shed light on how a custom sequence detection model can be built to extract relevant information from a legal document and will go into depth of the following points:
- Unique nature of legal contracts and legal lingo. What various steps were taken to tackle the problem and thus steps taken in preparation of the training data (kinds of feature engineering done).
- Description and comparison with various methods used (sliding-window + SVM, sliding-window + LR etc.) and how and why deep learning (bilstm+crf) gave the best results.
- How the model can be generalised for other use cases.
Attend this talk to know how we built a custom sequence extractor which extracts entities of vastly varying length with more than 90% accuracy.
Basic understanding of Machine Learning.
Here at SpotDraft i have been working with various NLP methods using tools like StanfordCoreNLP and Spacy to provide meaningful insights from contracts. I have been involved in building an entity extraction model (Bi-LSTM + CRF), multi-class text classier (CNN), object detection and segmentation.
I have had the privilege to be involved in the entire life cycle of model creation, from annotating data using Prodigy, to building the model and improving it, to serving the model predictions by deploying the docker container to Kubernetes clusters.
I recently gave a talk at Pydata Delhi on the topic: From "Hello World" to production, running Machine Learning models into production using Tensorflow serving and Kubernetes.