Enhancing Data Integrity in Engineering: Python's Role in Automated Data Quality Checks





Abstract In the rapidly evolving landscape of data engineering, maintaining high-quality data is paramount for organizations to make informed decisions and stay competitive. This talk aims to explore the challenges faced by data engineers and end-users due to poor data quality and how Python can be leveraged to automate data quality checks across various environments, including on-premise, Azure Databricks, and AWS. By integrating Python-based solutions for data quality monitoring, organizations can significantly improve their data health, leading to more reliable analytics and business insights.

Introduction - Brief overview of the importance of data quality in data engineering. - Introduction to Python's versatility and its application in data quality monitoring.

Challenges in Data Processing - Discuss common challenges faced by data engineers, including data inconsistency, volume management, and integration issues across different platforms. - Highlight the impact of these challenges on data quality and subsequent decision-making processes.

Impact of Poor Data Quality on End Users - Explore how analysts, data scientists, and business users are affected by poor data quality, leading to inaccurate analytics and potentially costly business decisions. - Examples of real-world consequences of poor data quality in decision-making.

Python's Role in Automated Data Quality Checks - Introduction to Python libraries and frameworks that facilitate data quality checks (e.g., Pandas for data manipulation, Great Expectations for data testing). - How Python can be used to automate data quality checks in different environments: on-premise, Azure Databricks, and AWS.

Case Studies - Present case studies where Python was successfully implemented for data quality monitoring, highlighting the before and after scenarios. - Discuss the specific Python tools and methodologies used in these case studies.

Benefits of Automated Data Quality Checks - Detailed discussion on how automated data quality checks can save time and resources for data engineering teams. - The role of good data quality in enhancing the overall data health of an organization, leading to more accurate analytics and better business decisions.

Speaker Info:

Speaker: Ishan Shrivastava Location: Dubai, United Arab Emirates Contact: er.shrivastav@gmail.com


Ishan Shrivastava is a results-driven Data Analytics Leader with a robust background in Business Intelligence, Artificial Intelligence, and Stakeholder Engagement. With a dynamic professional journey spanning various industries and roles, Ishan has demonstrated expertise in crafting and implementing advanced analytics solutions to drive organizational performance and achieve strategic goals. He is a certified Generative AI expert, proficient in leveraging cutting-edge technologies to deliver impactful business outcomes.

Currently serving as the Data Analytics Manager at Damac Group in Dubai, Ishan leads the Performance Management and Analytics division, where he has been instrumental in refining key performance indicators (KPIs) across departments to align with organizational objectives. His work involves leveraging Power BI for dynamic reporting, implementing AI solutions using Azure Open AI Services, and driving continuous enhancement through rigorous data analysis.

Ishan's prior roles include Senior Analytics Consultant at Etihad Airways and various analytics leadership positions, where he has consistently utilized Python, SQL, Power BI, Azure, and AWS to transform raw data into valuable business insights. His proficiency in solution lifecycle management, Agile/Scrum methodologies, and his strong problem-solving skills make him an effective communicator and collaborator across cross-functional teams.

With a Master of Science in Information Technology Management from the University of Sunderland and a Bachelor of Engineering in Electronics and Communications Engineering, Ishan's educational background complements his extensive professional experience. His technical skills encompass Python, SQL, MS Power BI, Azure, Databricks, AWS, and more, making him well-versed in a wide range of analytics and data engineering tools.

Ishan's proposal for PyCon India focuses on leveraging Python for Data Quality Monitoring in the Data Engineering process, addressing challenges faced by data engineers and end-users due to poor data quality, and showcasing how automated data quality checks can significantly improve organizational data health.

Speaking Experience Ishan brings a wealth of experience in analytics and data engineering, having led numerous projects that demonstrate the power of Python in enhancing data quality and integrity. His ability to translate complex technical concepts into actionable insights makes him an engaging and informative speaker, capable of connecting with audiences across various levels of technical expertise.

Section: Python in Platform Engineering and Developer Operations
Type: Talk
Target Audience: Advanced
Last Updated: