Build a Severless Python NLP App

Senthil Kumar (~senthilkumarm1901)


0

Votes

Description:

I. What is the goal of this workshop?

In this workshop, we will guide participants hands-on through the process of building a Serverless Natural Language Processing (NLP) pipeline using Python. The pipeline will be capable of redacting sensitive information such as email addresses, phone numbers, and names (PII) from email bodies. Additionally, it will extract important details like sender, recipient, and subject from the emails and upload relevant metrics to a table. We will use examples from Enron Email data to test the above pipeline.

The NLP pipeline is designed to give the same learning that we had, while building a real-world application of Serverless Services at Toyota Connected India.

Broadly, there are two major types of NLP Pipelines: - 1. NLP Pipeline in ML World: Text pre-processing >> Feature Embeddings >> Prediction Modelling on Numericalized Embeddings - 2. NLP Pipeline in Data Engineering World: Transforming raw text data into useful outputs in a sequence of steps.

If an NLP Pipeline could be defined in above 2 major ways, the second definition of Data Engineering based pipeline, is what we will accomplish in this workshop.

In this workshop, we will build the above mentioned PII redaction NLP pipeline from Scratch in AWS Serverless via AWS CLI tool.

The workshop participants are expected to take home the following: - The AWS CLI framework helps us understand the each cloud service we use for building the NLP pipeline

We will be using AWS services such as S3, API Gateway, Lambda and StepFunctions for this workshop.

*The other common replicable modes of creation of Cloud Services that we are NOT covering in this workshop - AWS Chalice, AWS SAM, AWS CDK, AWS CloudFormation and AWS Terraform

II. Outline of the Workshop

Total Time: 3 hours

  • 1. Setting up AWS Free Tier Account and Credentials, installing prerequisites, warming up to the repo structure (45 min)
  • 2. Building the Pipeline in the AWS CLI Way (A + B + C + D + E = 1 hour 45 min)
    • A. Creating an S3 bucket and adding a REST API endpoint | cURLing a data through the REST API (15 min)
    • B. Creating a Simple StepFunctions State Machine using Amazon States Language and invoking that using a Lambda (30 min)
    • C. Creating a "Simple Lambda" (refer architecture diagram) that replaces Phone and Email (20 min)
    • D. Creating a "Container Lambda" that identifies & replaces names using a Spacy pre-trained model (20 min)
    • (if time permits) E. Creating a "Layer Lambda" that uploads the metrics generated to a table (20 min)
  • 3. Conclusion and Discussions (30 min)

III. Pipeline to build in the AWS CLI

Project Workflow:

In case the above image is not rendered clearly, please visit https://github.com/senthilkumarm1901/aws_serverless_python_app/blob/main/docs/images/demo_pipeline_2.png

Complete Architecture we will build:

In case the above image is not rendered clearly, please visit https://github.com/senthilkumarm1901/serverless_python_nlp_app/blob/main/docs/images/pii_redaction_pipeline_architecture.png

Prerequisites:

  • A beginner knowledge in NLP
    • Before starting of the workshop, the codes will be available for git clone/download. Please have the following ready:
    • AWS Free Tier Account | source
    • Install AWS CLI v2 | How to setup AWS credentials | Follow this source
    • Install Taskfile | We are using this to run the AWS CLI commands as tasks
    • Install Docker (I use the open source Rancher Desktop; Installation Link)
    • Install jp | A shell script module like jp (for processing jmespath expressions) might come in handy while reviewing the outputs of AWS services. More details. This is not mandatory.

Speaker Info:

Speaker 1:

Senthil Kumar is currently working as Senior ML Engineer with Toyota Connected India (TCIN). He has spent 10+ years as a Data Scientist specialized in building Natural Language Processing applications using ML and DL.

At Toyota Connected India, he co-develops Speech and NLP applications in AWS cloud with an awesome team of software engineers. Prior to that, he worked with Ford where he contributed to building ML Applications with NLP experts from the US Ford AI team at Michigan. At LatentView, his first Analytics firm, he worked as a Text Data Analyst who poured into huge volumes of Social Media data to unearth insights.

Over the years, he has played the roles of a NLP trainer, a Python tutor, and a technical mentor. In the years to come, he strives to keep improving his skills in cloud engineering and MLOps while being up-to-date in applied ML.

He has written 20+ blogs since 2020 here. Many of them are long-form blog posts.

Speaker 2:

Ayush is a dedicated professional capable of architecting and optimising cutting-edge solutions in Machine Learning, Artificial Intelligence, Cloud Computing, and Python development. His proficiency in automating processes, implementing best practices, and ensuring Prod readiness makes him a valuable asset in driving organisational growth and success. Ayush holds masters degree from Indian Institute of Technology, Gandhinagar.

Speaker Links:

Section: Cloud Computing
Type: Workshops
Target Audience: Beginner
Last Updated: