SQL on Text
Anant Gupta (~anant79) |
SQL is a powerful tool. It is the simplest way to analyse a dataset. In recent times however unstructured data has started to get a lot of mileage. A lot of effort is spent in converting this to structured data.
- 80% of the data is unstructured
- As more people go online, it will lead to generation of more unstructured data. Currently the count sit at 3 billion people, so there is a lot of capacity for data overload in the coming days
- SQL is the world's easiest and most used programming language. The reason it is most used is because of its simplicity and power
What I want to propose is a tool that will help analysts directly use SQL on text data. This will be more than just applying NLTK functions on the SQL text. It will involve the following components
- Data Structures ( similar to RDBMS etc)
- Ability to join etc
- The entire world of text data will be open for people with basic SQL skills to analyse. This will not just help in more productivity but help in seamless integration of business and technology
- Cross functional text data can be analysed easily
- Injection of populated knowledge graphs etc will ensure that new information gets added easily
- SQL will help reporting/logic storage very easy
I am a data scientist at Morgan Stanley. I have been working in the analytics domain for the past 7 years I love applied machine learning and have been working in this capacity for the past 3 years.