Towards Domain Specific Languages for Computational Chemistry
Rohit Goswami (~HaoZeke) |
Python is increasingly being considered to the "tool of choice" for computational materials science and computational chemistry. This is driven large numbers of scientific software adapting
python bindings (e.g. NWChem , Psi4 , LAMMPS , ESPResSo  etc.). A number of workflow management systems across a spectra of flexibility have stepped up to deal with the challenge of handling scientific data provenance tracking for reproducibility. These range from the familiar (Snakemake  as a Make extension) to the intuitive (Pyiron  for materials modeling) to incredibly flexible options like AiiDA .
Concomitant to the explosive growth of such tools, there has been attempts to tap into the underlying commonality of these programs, both at the level of parsing output files (e.g.
cclib ) but also in terms of a unified lexicon like the Quantum Chemistry Schema . In this talk, I shall focus on introducing the PEG parsing logic and dataset generation capacity of the
wailord  package which I developed for generating datasets using ORCA.
wailord was designed around using
parsimonious as a rational backbone for enabling validation of computational chemistry outputs. Additionally, the workflows generated leverage
siuba. In particular, I will highlight the challenges and progress towards an ultimate goal of using
python to create a Domain Specific Language (DSL) of sorts for computational chemistry.
- Introduction: 1 Min
- Python for Computational Chemistry -- Calculations: 1 Min
- Computational Chemistry, HPC and workflows : 3 Min
- Output parsing and validation: 3 Min
- Common keywords and structures, a
cclibperspective: 3 Min
- From outputs to dataframes and plots with
plotnine: 5 Min
- Input generation and PEG parsers for data handling: 3 Min
- Validation goals and augments: 2 Min
- Towards a DSL for Computational Chemistry: 3 Min
- Conclusions and Future Directions: 2 Min
QnA. : 5 Min
The following are optional prerequisites which might make for more germane audience interaction:
- An understanding of computational chemistry
- Data provenance and HPC workload management
- Dataset generation and validation / exploration
- An understanding of DSLs and parsing libraries
wailord is on PyPI and has a site as well:
I'm Rohit Goswami, a doctoral researcher at the Science Institute of the University of Iceland supported by a fellowship from Rannis (Iceland Research Fund) for "Magnetic interactions of itinerant electrons modeled using Bayesian machine learning". I have over ten years of FOSS development experience which span projects in a veritable host of languages. I am an OSI (Open Source Initiative) advocate member, and am also a member of and contributor to other scientific and FOSS programming communities (e.g. the Carpentries). I have an eclectic set of interests, mostly centered around HPC algorithmic efficiency, compilers, and reproducible science. In the past, I have been associated with IIT Kanpur, specifically the Chemical Engineering department, the HPC division, and the department of Chemistry. I am also the co-developer and author of the reproducible FOSS project d-SEAMS. I have a history of open source pedagogy as well, having been a CS106A Code in Place section leader in 2020, a teacher mentor and section leader for the 2021 session, and have also co-taught a course on computational chemistry at the middle-school level in 2020. Language interoperability is a passion of mine and I have been an invited speaker for an IOP workshop on interpolating C++ and Python. I am also currently working on the LFortran compiler, an interactive LLVM based Fortran compiler under the aegis of the Fortran-lang organization and the Google summer of code. I was also honored to speak at PyCon IN 2020 about scalable reproducible workflows using Nix, Renku and Papermill.