End-to-end project on predicting collective sentiment for programming language using StackOverflow answers

sayakpaul


Description:

In the world of a plethora of programming languages, and a diverse population of developers working on them, an interesting question is posed - “How happy are the developers of any given language?”.

It is often that sentiment for a language creeps into the StackOverflow answer provided by any user. With an ability to perform sentiment analysis on the user's answers, we can take a step forward to aggregate the average sentiment on the factor of language. This conveniently answers our question of interest.

The presenters create an end-to-end project which begins with pulling data from the StackOverflow API, making the collective sentiment prediction model and eventually deploying it as an API on the GCP Compute Engine.

Outline/Structure of the workshop:

  • Scraping data from StackOverflow using the StackOverflow API
  • Investigating the data and preprocessing it as necessary
  • Serializing the data into a usable format (such as .csv) for further usage
  • Discussion on the basics of NLP and some of the classical NLP techniques like count vectorization, TF-IDF etc
  • Preparing the collective sentiment prediction model
  • Evaluating the model
  • Deploying the model as a REST API on GCP Compute Engine

Learning Outcome This tutorial presents the typical workflow of a full-stack data science project. Following is the learning outcome in brief:

  • Collecting data for a given problem statement when the data is not directly available
  • Investigating the data from a Data Scientist's perspective
  • Building simple NLP models
  • Deploying a model as an API on the web

Prerequisites:

  • StackOverflow Ninja
  • Python3
  • Familiarity with NumPy, Pandas, NLTK, Spacy
  • Understanding of basic web concepts

Speaker Info:

Speaker 1: Anubhav Singh

"A Web Developer since before Bootstrap was born, I began my journey in the field of computer science in my 8th grade with my first two projects being - a search engine and a social network right from scratch using LAMP stack. Currently an active Machine Learning explorer, I've used Python for all reasons and seasons over the last 4 years including Data Science to ACM ICPC Regional Finals for 2 consecutive years, where I fell prey to a precision loss in Python - and hence the will to master it.

I'm an active speaker at Google Developers Group Kolkata talking often about machine learning and cloud computing, including the GDG DevFest Kolkata 2018. I'm also a speaker for the Elastic Search Kolkata Community and Neo4j Community.

I am among the youngest instructors at DataCamp, which is a global platform for learning DataScience. I'm also currently authoring 2 books on Deep Learning using Python expected to be published in late 2019 by Packt Publications."

Speaker 2: Sayak Paul

"Hi there. I am Sayak (সায়ক). In my current role at DataCamp, I develop projects for DataCamp Project. My first DataCamp project Predicting Credit Card Approvals got launched recently. I create exercises for DataCamp Practice. I also write technical tutorials for DataCamp Community on a daily basis. Prior to DataCamp, I have worked at TCS Research and Innovation (TRDDC) as a developer where the domain of work was Cyber Security (specifically Data Privacy). There, I was a part of TCS's critically acclaimed GDPR solution called Crystal Ball. Prior to that, I have worked as a Web Services Developer at TCS (Kolkata area). Recently, I became an Intel Software Innovator. I am also working with Dr. Anupam Ghosh and my beloved college juniors for Applied Machine Learning research/tinkering. Currently, we are working on the application of machine learning in Phonocardiogram classification.

My subject of interest broadly lies in areas like Machine Learning Interpretability, Full-Stack Data Science. I aspire for a career in Data Science where I should be able to interpret models and communicate the results effectively. Recently,"

Speaker Links:

Anubhav Singh:

Sayak Paul:

Id: 1126
Section: Data Science, Machine Learning and AI
Type: Workshop
Target Audience: Intermediate
Last Updated: