Optimizations in Web Development: Journey from a college project to a professional product
Megha Sharma (~megha480) |
There lies a huge gap between a website developed as a hobby/college project and that developed for professional purposes. The journey to cross this is marked through database optimizations, consistent look and feel, efficient cache layer and many other things! Before delving into the open source world, my code screamed that it's owned by a college kid. But things changed once I interned with Wikimedia (under the Outreachy program). I want to share this very experience with my audience that how some gotchas and design decisions can bring about this transition. In this talk, I'll touch upon some of these areas that mainly deal with backend and database. My talk will summarize my learning from using Django in an application built for Wikipedia and is capable of handling huge amount of Wikipedia's data.
To give a bit of background - I built this application for Wikipedia under Outreachy Round 15 (https://www.outreachy.org/). The app summarizes the contributions of the Wikipedia editors and presents it in a CV-like format. The biggest development challenge was dealing with millions of edits and doing all the related computations within seconds. Without any kind of optimizations, the webpage took 3 hours to load. Through my talk, I want to bring out the journey from 3 hours to 3 seconds on the table!
Broad outline of my talk is as follows:
- Why Django: It's very important to understand why and when to use Django. Majorly I'll be touching upon the scalability aspect and how it's a full package when it comes to web development.
- Reducing the response time: When one is dealing with a database as huge as that of Wikipedia's, response time becomes of paramount importance. Optimizations like implementing a cache layer, using cron jobs, sessions etc will be discussed. Also, design choices will be compared - like cache layer using database vs sessions in python.
- Database Optimizations: In this I'll be covering how database choice and query optimizations can affect the performance when dealing with large datasets.
Hope you will find this talk interesting. :)
Basic knowledge of Python, Django and querying RDBMS is required.
Slide links: https://docs.google.com/presentation/d/1mWpEUEN4K9TTAyGFTyvyHvv08W2N58FtsT2yThkBKCU/edit?usp=sharing
Github Repository: https://github.com/MeghaSharma21/WikiCV
Project details: https://phabricator.wikimedia.org/T178688
Link to the tool: https://tools.wmflabs.org/outreachy-wikicv/wiki-cv/
I'm a final year student pursuing B.Tech from Punjab Engineering College. College made me fall in love with coding and after that there has been no looking back. I've been an Outreachy (https://www.outreachy.org/) intern and currently a part of Google Summer of Code. When it comes to the open source world, I'm a regular contributor in Wikimedia. Other than coding, I love reading, writing and trying out new things.
- Blog: https://medium.com/@meghasharma4910
- Github: https://github.com/MeghaSharma21
- Outreachy project: https://github.com/MeghaSharma21/WikiCV
- Google Summer of Code project: https://github.com/MeghaSharma21/WorklistTool-GSoC-2018