Exploring CoNLL-U Annotation Schema for Linguistic Structures with Python

Siddharth Gupta (~sidgupta234) | 26 Jul, 2023

0

Votes

Description:

We will explore the CoNLL-U annotation schema for describing linguistic structures and its applications in natural language processing. We will take a practical approach and use the CoNLL-U Python library to analyze a real-world CoNLL-U annotated corpus.

We will start by providing an overview of the CoNLL-U schema and its structure, followed by a deep dive into the functionalities provided by the CoNLL-U library. We will explore the library's capabilities to parse the data, access its various features, and perform linguistic analysis using it. We will showcase examples of using the library to extract meaningful information from annotated text data and how it can be leveraged for a variety of NLP tasks.

By the end of this talk, you will have gained a comprehensive understanding of the CoNLL-U schema and the CoNLL-U Python library. You will be able to confidently apply these tools to conduct linguistic analysis and process annotated text data for a range of NLP tasks.

Talk breakdown:

Introduction (0:00 - 2:00) Explanation of the talk's topic and goals Brief overview of the CoNLL-U annotation schema and its applications in NLP

Overview of CoNLL-U Schema (2:00 - 7:00) Detailed explanation of the CoNLL-U schema's structure and components Examples of linguistic structures and features that can be described using the schema

CoNLL-U Python Library (7:00 - 18:00) Introduction to the CoNLL-U Python library and its functionalities Examples of how the library can be used to parse data and access its features Explanation of how to perform linguistic analysis using the library Showcase of examples of using the library to extract meaningful information from annotated text data

Applications of CoNLL-U in NLP (18:00 - 21:00) Examples of how the CoNLL-U schema and Python library can be leveraged for a variety of NLP tasks such as part-of-speech tagging, named entity recognition, and dependency parsing

Conclusion (21:00 - 25:00) Recap of the talk's main points and takeaways

Last 5 minutes allocated for Q&A. (25:00-30:00) Invitation for questions from the audience

Prerequisites:

Familiarity with the Python programming language and its syntax is recommended.

Understanding of NLP concepts such as part-of-speech tagging, named entity recognition, and dependency parsing, along with knowing a bit about linguistic concepts such as word tokens, part-of-speech tags, syntactic dependencies, and morphological features would be good to have.

Speaker Info:

Data Analyst at Godrej Capital, Siddharth is interested in Programming, Deep Learning, and Academia! They write Twitter threads across the three topics. When not consumed with work, they post YouTube videos, make Discord bots, play around GitHub or try threading some words on Medium Blog.

Speaker Links:

Github Linkedin

Section:	Data Science, AI & ML
Type:	Talks
Target Audience:	Intermediate
Last Updated:	27 Jul, 2023

Comments