intended audience

  • Rachel has a master’s degree in cell biology and now works in a research hospital doing cell assays.
  • She learned a bit of R in an undergrad biostatistics course and has been through the Carpentries lesson on the Unix shell.
  • Rachel is thinking about becoming a data scientist and would like to understand how data is stored and managed.
  • Her work schedule is unpredictable and highly variable, so she needs to be able to learn a bit at a time.

prerequisites

  • basic Unix command line: cd, ls, * wildcard
  • basic tabular data analysis: filtering rows, aggregating within groups

learning outcomes

  • Explain the difference between a database and a database manager.
  • Write SQL to select, filter, sort, group, and aggregate data.
  • Define tables and insert, update, and delete records.
  • Describe different types of join and write queries that use them to combine data.
  • Use windowing functions to operate on adjacent rows.
  • Explain what transactions are and write queries that roll back when constraints are violated.
  • Explain what triggers are and write SQL to create them.
  • Manipulate JSON data using SQL.
  • Interact with a database using Python directly, from a Jupyter notebook, and via an ORM.