Accelerating India's Open Science Journey with Python

Jyoti Bhogal (~jyoti8)


19

Votes

Description:

Description

Open Science holds the promise of transforming the scientific process by making it more transparent, inclusive, and democratic. It is increasingly recognised as a crucial accelerator for achieving the United Nations Sustainable Development Goals (SDGs) and for bridging gaps in science, technology, and innovation, thereby upholding the human right to science. The UNESCO Recommendation on Open Science provides an international framework for open science policy and practice, recognising disciplinary and regional differences in open science perspectives. It emphasises academic freedom, gender-transformative approaches, and addresses the unique challenges faced by scientists and other open science actors, particularly in developing countries. This framework aims to reduce the digital, technological, and knowledge divides both between and within countries. India acknowledges the need to implement an open science policy to align with the global shift towards openness. For instance, the recent meeting of the Confederation of Open Access Repositories (COAR), titled "The Promise and Practice of Open Science" – Asia OA Meeting was held in New Delhi, India. This meeting focused on advancing open science in India in a sustainable and equitable way. It provided an opportunity to hear from key open science stakeholders in India about their activities and the challenges they face. This event followed the 'G20 Chief Science Advisers’ Roundtable Meeting, which emphasised the importance of new methods to provide immediate and free access to publicly funded research. This talk will explore how Python, with its extensive libraries and community support, can play a pivotal role in accelerating India's journey towards Open Science, enhancing transparency, inclusivity, and collaboration in the scientific community.

Key takeaways:

  • What is Open Science and why is it needed?
  • How can Python help in achieving Open Science?

Outline of the talk:

1. What is Open Science and why is it needed? (12 minutes)

  • The world is changing rapidly, with new problems emerging every day that require groundbreaking scientific discoveries to solve. To stay ahead, the pace of science must accelerate, becoming more accurate and faster to enable the transformative breakthroughs necessary for our future.

  • Closed science, characterised by hoarding information and resources and maintaining silos of knowledge, holds science back by limiting participation. We need more voices working together, sharing knowledge and resources. Only through collaboration and open sharing will we find new and better solutions.

  • Open Science broadens participation, increases accessibility to knowledge, and embraces new technologies that can respond to these changes on a large scale. It is increasingly recognized as a critical accelerator for achieving the United Nations Sustainable Development Goals. Open Science bridges the gaps in science, technology, and innovation, and fulfils the human right to science.

  • Some key aspects of open science are: open tools and software, open data, open code, open result.

1.1 Open tools and software:

  • Open tools and software refer to freely available and openly licensed tools that researchers can use, modify, and share. These tools ensure that scientific research is accessible and reproducible by providing the means to conduct experiments, analyse data, and share findings without proprietary restrictions.

  • Examples:

  • Jupyter Notebooks: An open-source web application that allows researchers to create and share documents containing live code, equations, visualisations, and narrative text.

1.2 Open data:

  • Open data refers to data that is freely available to anyone to use, reuse, and redistribute, subject only to the requirement to attribute and share alike. Open data promotes transparency, allowing other researchers to validate findings, conduct meta-analyses, and build upon existing work. It accelerates scientific discovery and innovation by making data universally accessible.

  • Examples:

  • GenBank: A database of publicly available nucleotide sequences and their protein translations.
  • DataONE: A network that provides access to a vast amount of environmental and ecological data.

1.3 Open code:

  • Open code involves sharing the source code of software and algorithms developed during research. It is often made available under open-source licences that permit anyone to view, modify, and distribute the code. Open code ensures reproducibility and transparency in research. By sharing the algorithms and methods used, other researchers can verify results, identify errors, and adapt the code for their own research purposes.
  • Examples:
  • GitHub: A platform for hosting and sharing code repositories, allowing collaborative development and version control.
  • GitLab: Another platform for managing and sharing code, similar to GitHub, with integrated CI/CD (Continuous Integration/Continuous Deployment) tools.

1.4 Open results:

  • Open results refer to the practice of making the outcomes of research, such as publications, datasets, and findings, freely accessible to the public. This includes sharing both positive and negative results to provide a complete picture of the research. Open results enhance the dissemination and impact of research by making findings accessible to a wider audience. They foster transparency and trust in science by allowing others to scrutinise and build upon the results.

  • Examples:

  • PLOS ONE: An open-access journal that publishes research across all areas of science and medicine.
  • arXiv: A repository of electronic preprints (known as e-prints) approved for publication after moderation, where researchers can freely share their results before formal peer review.

2. How can Python help in achieving open science? (12 minutes)

2.1 Python's features make it an ideal language for promoting open science in India:

  • Ease of use: Python is simple to learn and use, making it accessible for researchers from various backgrounds. Its readability reduces errors and simplifies debugging.
  • Versatility: As an object-oriented, interpreted, and cross-platform language, Python is highly versatile. It supports various programming standards and has a broad standard library.
  • Open source: Python is free and open-source, encouraging widespread adoption and modification to suit specific research needs.
  • Community support: The Python Software Foundation provides extensive community support, fostering collaboration and resource sharing among researchers.
  • Reproducibility: Python’s ability to ensure reproducibility of code is crucial for open science. Researchers can easily share and replicate experiments and analyses.
  • Data accessibility: Python facilitates making datasets available to all, enhancing transparency and collaboration in research.
  • Adoption of existing tools: Researchers can leverage existing open-source tools without needing to contribute to open source, easing the transition to open science.

2.2 Some Python packages:

  • 1. NumPy: Essential for numerical computations and handling large datasets.
  • 2. Pandas: Excellent for data manipulation and analysis.
  • 3. Matplotlib: Used for creating static, interactive, and animated visualisations.
  • 4. SciPy: Provides modules for optimization, integration, and statistics.
  • 5. Scikit-learn: A library for machine learning, providing tools for data mining and data analysis.
  • 6. TensorFlow and PyTorch: Popular libraries for machine learning and deep learning applications.
  • 7. Keras: Simplifies building and training neural network models.

2.3 Supporting some key aspects of open science:

  • 1. Open Tools/Software: Python libraries like Jupyter Notebooks provide open tools for interactive computing and sharing research workflows.
  • 2. Open Code: Version control systems like Git, combined with platforms like GitHub, facilitate sharing and collaboration on open code.
  • 3. Open Data: Pandas and DVC (Data Version Control) help manage and share open datasets efficiently.
  • 4. Open Results: Libraries like Matplotlib and Seaborn enable researchers to create open and reproducible visualisations of their results, making findings accessible and understandable.

By leveraging these features and packages, Python can significantly contribute to advancing open science in India, promoting transparency, collaboration, and accessibility in scientific research.

3. Q&A (6 minutes)

Prerequisites:

No pre-requisites required.

Video URL:

https://drive.google.com/file/d/1BPpwsiEwWisP5YTSPuTl-oUJYYEAFyom/view?usp=share_link

Speaker Info:

Jyoti is a trained Statistician, has worked as Software Quality Engineer at a Contract Research Organisation, and as a Data Modeller. She helps solve Statistical and Data Analytics problems using tools like Python language, R language, JavaScript, and MS Excel. She has 4+ years of experience working on Clinical Sciences; and has expertise in principles of good software development life cycle, and Clinical Trials R&D cycle. She co-founded the RSE Asia Association to create awareness and ultimately establish the profession of a Research Software Engineer across the Asian Region while taking inspiration from the Global RSE Movement. She is an open source enthusiast, and is keen on giving the young minds the exposure to the unconventional yet essential roles in tech by speaking at events like the bi-annual APAN meetings. She is a PyData Impact Scholar (2021), has attended the PyCon India 2023 conference at Hyderabad and is always immensely inspired by the entire Python Community!

Speaker Links:

  1. LinkedIn: https://www.linkedin.com/in/jyoti-bhogal-a20705163/
  2. Github: https://github.com/jyoti-bhogal
  3. Website: https://jyoti-bhogal.github.io/about-me/
  4. Spoke at PythonPune February 2024 Meetup: https://docs.google.com/presentation/d/15Dvn0gt19aNC3CCQI3O5cBeIz-yWlzy_11LMM32vov4/edit#slide=id.g2a313145b94_0_0
  5. Gmail: bhogaljyoti1@gmail.com

Section: Python in Education and Research
Type: Talk
Target Audience: Beginner
Last Updated: