Towards Domain Specific Languages for Computational Chemistry

Rohit Goswami (~HaoZeke)



Python is increasingly being considered to the "tool of choice" for computational materials science and computational chemistry. This is driven large numbers of scientific software adapting python bindings (e.g. NWChem [1], Psi4 [2], LAMMPS [3], ESPResSo [4] etc.). A number of workflow management systems across a spectra of flexibility have stepped up to deal with the challenge of handling scientific data provenance tracking for reproducibility. These range from the familiar (Snakemake [5] as a Make extension) to the intuitive (Pyiron [6] for materials modeling) to incredibly flexible options like AiiDA [7].

Concomitant to the explosive growth of such tools, there has been attempts to tap into the underlying commonality of these programs, both at the level of parsing output files (e.g. cclib [8]) but also in terms of a unified lexicon like the Quantum Chemistry Schema [9]. In this talk, I shall focus on introducing the PEG parsing logic and dataset generation capacity of the wailord [10] package which I developed for generating datasets using ORCA. wailord was designed around using parsimonious as a rational backbone for enabling validation of computational chemistry outputs. Additionally, the workflows generated leverage plotnine and siuba. In particular, I will highlight the challenges and progress towards an ultimate goal of using python to create a Domain Specific Language (DSL) of sorts for computational chemistry.


  1. Introduction: 1 Min
  2. Python for Computational Chemistry -- Calculations: 1 Min
  3. Computational Chemistry, HPC and workflows : 3 Min
  4. Output parsing and validation: 3 Min
  5. Common keywords and structures, a cclib perspective: 3 Min
  6. From outputs to dataframes and plots with siuba+plotnine: 5 Min
  7. Input generation and PEG parsers for data handling: 3 Min
  8. Validation goals and augments: 2 Min
  9. Towards a DSL for Computational Chemistry: 3 Min
  10. Conclusions and Future Directions: 2 Min

QnA. : 5 Min














The following are optional prerequisites which might make for more germane audience interaction:

  • An understanding of computational chemistry
  • Data provenance and HPC workload management
  • Dataset generation and validation / exploration
  • An understanding of DSLs and parsing libraries

Video URL:

Content URLs:

wailord is on PyPI and has a site as well:

Speaker Info:

I'm Rohit Goswami, a doctoral researcher at the Science Institute of the University of Iceland supported by a fellowship from Rannis (Iceland Research Fund) for "Magnetic interactions of itinerant electrons modeled using Bayesian machine learning". I have over ten years of FOSS development experience which span projects in a veritable host of languages. I am an OSI (Open Source Initiative) advocate member, and am also a member of and contributor to other scientific and FOSS programming communities (e.g. the Carpentries). I have an eclectic set of interests, mostly centered around HPC algorithmic efficiency, compilers, and reproducible science. In the past, I have been associated with IIT Kanpur, specifically the Chemical Engineering department, the HPC division, and the department of Chemistry. I am also the co-developer and author of the reproducible FOSS project d-SEAMS. I have a history of open source pedagogy as well, having been a CS106A Code in Place section leader in 2020, a teacher mentor and section leader for the 2021 session, and have also co-taught a course on computational chemistry at the middle-school level in 2020. Language interoperability is a passion of mine and I have been an invited speaker for an IOP workshop on interpolating C++ and Python. I am also currently working on the LFortran compiler, an interactive LLVM based Fortran compiler under the aegis of the Fortran-lang organization and the Google summer of code. I was also honored to speak at PyCon IN 2020 about scalable reproducible workflows using Nix, Renku and Papermill.

Speaker Links:

Other links, including ORCiD, Publons, etc. can be found from the landing page for my Homepage and Blog.

Section: Scientific Computing
Type: Talks
Target Audience: Advanced
Last Updated: