Introducing DataSwissKnife - A tool to automate data science operations
Ramshankar Yadhunath (~ramshankar) |
There is no doubt that data science is one field that has recently seen a surge in the number of people who want to work in it. From school students to experienced professionals, everybody is trying to figure a way to enter this mercurial field born out of the confluence of multiple disciplines. However, it is no secret that like all fields, getting into data science and being able to leverage data science abilities for your business is not an overnight job. It takes time to learn and apply these concepts. This time space might be further extended if individuals lack the technical expertise to readily analyse data. The problem with this extended time space is that it delays the ability of a person to see how impactful data science can be for his or her work.
WHY IS DSK IMPORTANT?
Are you a data science beginner who wishes you had access to a tool that would help you watch data science in action without writing code? Or are you someone who understands the importance of data in your business, but lack the technical grounding to code? Or are you an expert-level researcher or data scientist who just wishes to do some preliminary analysis or build baseline models without having to spend time writing code?
If you fall into any of these categories, the DataSwissKnife project(abbreviated as DSK) will be of help to you. DSK is software that has been built with the purpose of aiding anybody who is familiar with necessary domain expertise to do preliminary data science.
DSK lets users load a raw block of tabular data onto it and by asking relevant questions about the kind of work the user wants to do with the data; DSK performs the operations of data cleaning, pre-processing, auto-generating visualizations and even some preliminary baseline modelling. DSK only makes use of these question-response interactions with the user and thus helps users perform preliminary data science without having to write any code to do so.
A much more in-depth explanation of DSK can be found at our project repository
The final project report submitted at university can be found here
- Will help if the user knows to setup a virtual machine to run DSK from inside of it
Ramshankar Yadhunath is a data-loving, soon-to-graduate CSE student at Amrita University, India. He is currently interning as the Data Analyst at People for Animals, Bangalore. He loves to work on projects with a cause and when he is not burdened by academic coursework, he is trying to experiment with data science projects and blogging.