Semantic Segmentation of a clinical chart with Machine Learning using Python
Tahir Ahmad (~tahir) |
In this talk I will be discussing about how a clinical chart can be segmented into semantically related components using Python and Spacy. A clinical chart of a patient is documented by a medical provider in SOAP(Subjective, Objective, Assessment and Plan) format. Each segment in SOAP contains semantically related information. Subjective part of SOAP note contains history of a patient, Objective contains vital signs like BMI and BP, Assessment contains diagnosis given by the medical doctor and Plan contains medications prescribed etc.
Segmentation of an unstructured chart in SOAP classes has many use case, One of the major use cases is to help medical practitioners review only relevant section of a clinical chart. This can save time while receiving large clinical charts of a patient.
Dataset contains unidentified 1000 clinical charts which are annotated by subject matter experts. Each chart is split into sentences and each sentence is annotated as per the SOAP class it belongs. A sentence which doesn’t belong to any SOAP class is annotated as NONSoap class.
Proposed model: This work was done using two approaches:
Machine Learning using Random Forest Algorithm, and Deep Learning using Spacy Text Categorizer.
I will be discussing how Spacy Ensemble model based on CNN and Bag of words outperformed Random Forest Algorithm in this task.
Machine Learning, Python Programming
I am working as a Data Scientist at Episource LLC. My interests revolve around solving problems in the field of Data Science and Computer Science. Currently at Episource I work extensively in Inofrmation and Entity Extraction using Machine Learning and Deep Learning Techniques using Python. I have also worked as a Research Assistant in the Center for data science at IIIT- Bangalore. My work in the lab was on Data Mining, we developed an open-source Ruby gem called as Akshaya which has an implementation of various semantic mining algorithms. I worked to improve the performance of latent semantic text mining algorithms based on a large term-co-occurrence graph and to improve the performance of an open source graph database Agama. I have also worked on medical information retrieval problems in the lab at IIIT-Bangalore and have developed a framework UCliDSS for medical decision support. At IIITB I have also worked on Streaming Graph partitioning problem using Game Theory.