Writing Performant Python

Scientists, engineers, and professionals increasingly work with large, complex datasets and computational workflows. Effective analysis requires both understanding the data and writing code that uses time, memory, and hardware efficiently. This course introduces a practical workflow for writing performant Python: understand the problem, benchmark the code, profile bottlenecks, optimize the right parts, parallelize where appropriate, and use accelerators when they genuinely help.

Students will learn how Python executes code, why some programs become slow, and how to make evidence-based performance improvements. The course uses realistic examples to introduce benchmarking, profiling, algorithmic improvement, vectorized array programming, memory-aware computation, parallelization with Dask, and performance boosting with tools such as Cython, Numba, Pythran, Transonic, and Numexpr.

By the end of the course, participants will be able to diagnose performance problems, choose appropriate tools, and improve Python programs without sacrificing correctness, readability, or reproducibility.

Prerequisites

  • Basic experience with Python

  • Basic experience in working in a Linux-like terminal

  • Some prior experience in working with large or small datasets

Learning outcomes

This material is for all researchers and engineers who work with large or small datasets and who want to learn powerful tools and best practices for writing more performant, parallelised, robust and reproducible data analysis pipelines.

By the end of this module, learners should:

  • Explain common sources of Python performance bottlenecks.

  • Design and run reproducible benchmarks.

  • Use profiling tools to identify expensive functions, lines, and memory behavior.

  • Apply algorithmic, vectorization, memory-layout, and I/O improvements.

  • Use parallel and accelerated Python tools when they fit the workload.

  • Communicate performance tradeoffs clearly and keep optimized code reproducible.

Credit

Don’t forget to check out additional course materials from XXX. Please contact us if you want to reuse these course materials in your teaching. You can also join the XXX channel to share your experience and get more help from the community.

This course incorporates and adapts open training material from ENCCS/python-perf, ENCCS/hpda-python, ENCCS/word-count-hpda, coderefinery/word-count, CodeRefinery reproducible research, and HPC Carpentry Python material.

It also includes images and descriptions from lectures.scientific-python.org, Project Jupyter images under the BSD 3-Clause license, images from The Noun Project under CC-BY 3.0, and nbabel example code from https://github.com/paugier/nbabel under GPLv2.

License

Note

To module authors: For code you may use any OSI-approved license as mentioned in https://spdx.org/licenses/, such as Apache License 2.0, GNU GPLv3, MIT. Please make sure to update the deed above and LICENSE.code file accordingly.