Skip to the content.

The Open Source Computational Biology Master’s

created and maintained by Dillon Gavlock

Foundations • Exploration • Basic Area Skills • Advanced Area Skills • Datasets • Resources


Welcome to Open Source Comp Bio Masters! This repo is meant to be a guide to all those who wish to gain skills in Computational Biology so they can follow their passions in biology, math, and computer science. The idea behind the Open Source Comp Bio Masters is not original, it came after exploring GitHub’s repositories for Data Science libraries and stumbling upon the Open Source Data Science Master’s repository maintained by Clare Corthell. If you’re just interested in Data Science alone I couldn’t recommend the Open Source Data Science Master’s more. It’s excellently curated with wonderful categories from the fundamentals of Data Science to Advanced concepts. This repository will in fact share some of the foundations of the OSDSM cirriculum as many of the mathematical and computational foundations are the same.


One could look at Computational Biology as an applied data science where the domain knowledge is biological phenomena. As we all know biology is an extremely dense base of knoweldge with new information being gathered all the time adding to the collective knowledge through published research. We could of course have many, many repositories for all the various subfields of biology. In this cirriculum we look to put biological phenomena in the context of math and computation. We will have a basic introduction to biology in our foundational courses and then when branching into specific skill areas will introduce the necessary concepts that go along with that skill set. My hope here is to provide pathways that led people to find their interests in the various computational biology subspaces and to learn only the skills that meet that interest. All paths will be designed to faciliate specific job titles in computational biology.

Courses & Course Structure

At first courses will consist mostly of links to resources where one can learn the topics along the path for either no or low cost. Over time, we will work to provide free materials in each repository, Jupyter Notebooks, that cover a select topic and will contain supplimental homeworks that consists of problem sets related to the lectures, and projects. All projects will focus on showcasing what you’ll learned over the course (there could be multiple per course) in a unique way (No one will have the same work as you). The program is divided into four phases according to the diagram below.


With the phases in mind we can see the various available paths one might take to follow their interests. At each level you gain more specialized skills in one area of computational biology.


Here we will leverage open data and other open resources. We also look to integrate best practices everywhere possible. This will be most evident in our lessions and in the code itself, where we plan to use the most updated practices for building and maintaining quality code bases. At first all examples will be in Python, the reasoning for this is simple: python is widely used, it supports great open sources packages for everything we need to learn the basic skills, and it’s the language the sole developer of Open Source Computational Biology Master’s knows. We will of course be using Git and GitHub for version control of our work. This will serve as a good way to get your GitHub some activity as well as teach you the basics about version control systems.

Communication & Building Community

This portion of learning cannot be overstated. Finding a group of people who shares your passion for Computational Biology will be hard if you are not also in a university program, so we here at Open Source Comp Bio Master’s want to provide a way for people to find their community. This will be facilitated by a discord server with a general channel, a channel for each course, and possibily more for job/collaboration posting as we go along.

Community rules:

  1. As a Computational Biologist you should have a Life-Long learning attitude for yourself and others. With that in mind, there will be no place for the putting down of other learner in any way on this Discord server. That means refraining from statements like, “You don’t know that?”, “That’s simple it’s…”, I know that seems nuanced, but any new learner can tell you even qualifiers like this make you feel small for not knowing. Boasting yourself also fits under that category, please be considerate of others.
  2. Constructive criticism is great, always give it when you can, and looks something like this, “Hey, that’s great you’re thinking about that problem in that way, you could also try this way!”, “Though you didn’t get the right answer here, you made an error I found, that error is this, and you can find more on that topic here! Keep trying!”.
  3. The Server is for discussing the cirriculum and Computational Biology related materials. Please refrain from posting material that won’t do the following:
    • Help others through the cirriculum
    • Grow the cirriculum in a meaningful way
    • Point others to resources that can suppliment the cirriculum
    • Help other grow as Computational Biologist
  4. All of the Discord Community Guidelines

Check out the Server Here

Note on Cirriculum and Development: Right now there’s only a single admin working on this project, he is still learning himself and doesn’t claim to know everything about computational biology. This leads to the understanding that there will be mistakes early on and maybe those mistakes will lead better computational biologist joining in and building this cirriculum to a solid, accurate, place for anyone to learn computational biology. Also, the admin acknowledges that his training at the University of Pittsburgh has played a major role in how the courses are organized and what materials to use. With that said, he would like to thank the University of Pittsburgh Compuational & Systems Biology department for their role in training him as well as for the inspiration for creating this website. The major areas stem from the PhD. program breakdown and much of the material recommendations are from those used in the courses there, though we try to stay with materials that are freely available and open-source.

Big Thanks List

Dr. Ivet Bahar
Dr. David Koes
Dr. Jim Faeder
Dr. Robin Lee
Dr. Takis Benos
Dr. Carlos Camacho
Dr. Chakra Chennubhotla
Dr. Maria Chikina
Dr. Lans Taylor
Dr. Mark Miedel
Dr. Larry Vernetti
Dr. Albert Gough
Dr. Mark Schurdak
Dr. Tim Lezon