End-to-end project on predicting collective sentiment for programming language using StackOverflow answers

sayakpaul | 11 May, 2019

Description:

In the world of a plethora of programming languages, and a diverse population of developers working on them, an interesting question is posed - “How happy are the developers of any given language?”.

It is often that sentiment for a language creeps into the StackOverflow answer provided by any user. With an ability to perform sentiment analysis on the user's answers, we can take a step forward to aggregate the average sentiment on the factor of language. This conveniently answers our question of interest.

The presenters create an end-to-end project which begins with pulling data from the StackOverflow API, making the collective sentiment prediction model and eventually deploying it as an API on the GCP Compute Engine.

Outline/Structure of the workshop:

Scraping data from StackOverflow using the StackOverflow API
Investigating the data and preprocessing it as necessary
Serializing the data into a usable format (such as .csv) for further usage
Discussion on the basics of NLP and some of the classical NLP techniques like count vectorization, TF-IDF etc
Preparing the collective sentiment prediction model
Evaluating the model
Deploying the model as a REST API on GCP Compute Engine

Learning Outcome This tutorial presents the typical workflow of a full-stack data science project. Following is the learning outcome in brief:

Collecting data for a given problem statement when the data is not directly available
Investigating the data from a Data Scientist's perspective
Building simple NLP models
Deploying a model as an API on the web

Prerequisites:

StackOverflow Ninja
Python3
Familiarity with NumPy, Pandas, NLTK, Spacy
Understanding of basic web concepts

Content URLs:

Speaker Info:

Speaker 1: Anubhav Singh

"A Web Developer since before Bootstrap was born, I began my journey in the field of computer science in my 8th grade with my first two projects being - a search engine and a social network right from scratch using LAMP stack. Currently an active Machine Learning explorer, I've used Python for all reasons and seasons over the last 4 years including Data Science to ACM ICPC Regional Finals for 2 consecutive years, where I fell prey to a precision loss in Python - and hence the will to master it.

I'm an active speaker at Google Developers Group Kolkata talking often about machine learning and cloud computing, including the GDG DevFest Kolkata 2018. I'm also a speaker for the Elastic Search Kolkata Community and Neo4j Community.

I am among the youngest instructors at DataCamp, which is a global platform for learning DataScience. I'm also currently authoring 2 books on Deep Learning using Python expected to be published in late 2019 by Packt Publications."

Speaker 2: Sayak Paul

"Hi there. I am Sayak (সায়ক). In my current role at DataCamp, I develop projects for DataCamp Project. My first DataCamp project Predicting Credit Card Approvals got launched recently. I create exercises for DataCamp Practice. I also write technical tutorials for DataCamp Community on a daily basis. Prior to DataCamp, I have worked at TCS Research and Innovation (TRDDC) as a developer where the domain of work was Cyber Security (specifically Data Privacy). There, I was a part of TCS's critically acclaimed GDPR solution called Crystal Ball. Prior to that, I have worked as a Web Services Developer at TCS (Kolkata area). Recently, I became an Intel Software Innovator. I am also working with Dr. Anupam Ghosh and my beloved college juniors for Applied Machine Learning research/tinkering. Currently, we are working on the application of machine learning in Phonocardiogram classification.

My subject of interest broadly lies in areas like Machine Learning Interpretability, Full-Stack Data Science. I aspire for a career in Data Science where I should be able to interpret models and communicate the results effectively. Recently,"

Speaker Links:

Anubhav Singh:

Personal Website - To know as much about me as is on the Internet.
LinkedIn - If you wish to connect with me professionally
DataCamp Instructor Profile - My love for spreading knowledge, here!
GitHub - To explore my projects!
Blog Article on Reinforcement Learning - My Article on DataCamp
Previous Presentations - Previous presentations

Sayak Paul:

Section:	Data Science, Machine Learning and AI
Type:	Workshop
Target Audience:	Intermediate
Last Updated:	11 May, 2019

Comments