This introduction to babypandas is used in DSC 10 @ UCSD.
babypandas is an opinionated proper subset of the popular pandas package
designed with the novice data scientist in mind.
DSC 10 is an introductory data science course which develops the core ideas of statistics via programming and simulation instead of the manipulation of mathematical formulae. At the same time, the course does not assume that the reader has any experience in programming; instead, we learn "just enough" programming to do data science.
- Building
- Making Changes
- Publishing
- Project Structure
- Extensions
- Reader-Friendly Jupyter Notebooks
- License
This project's dependencies are managed with
poetry. First, make sure that you have poetry
installed. Before building the project for the first time, run poetry install
in the repository root to install the dependencies. Then, to build the project,
run poetry run make. The contents will be placed in the src/_build/html
directory, and opening src/_build/html/index.html in a browser will display
the front page of the notes.
⚠️ Nix is used to ensure reproducibility of the build, but it is more difficult to set up. Using Poetry is recommended for most use cases.
For a more reproducible build, the Python and Poetry dependencies are also specified using Nix. The below assumes that you have installed a recent version of Nix that has the "flake" feature enabled. For instructions, see the Nix Wiki.
To build the notes, run nix develop in the current directory to enter the
development environment; this will install both python and poetry and will
download the needed python dependencies using poetry. Next, run make to build
the project. The results will again appear in src/_build/html.
- Before working on the notes for the first time, run
make initin the project's root. This will install git pre-commit hooks which clean the notebook pages of output. - Create a new branch to hold your changes with
git checkout -b <branch_name>. - Make your changes by editing the appropriate file (usually in the
src/directory). - Build and preview the updated notes using the instructions above.
- Push your branch to the
dsc-courses/dsc10-notesrepository and submit a pull request.
Notes are automatically built and published when changes are pushed to the
main branch on GitHub. The build and publication is managed by this GitHub
workflow.
The workflow builds the main branch with Nix and copies the HTML output to the
gh-pages branch published to GitHub Pages.
It also builds "clean" versions of the Jupyter Notebooks that are used as pages
by removing directive cells. These notebooks are what students will see when
they click the "Launch in JupyterHub" link at the top of a page. These cleaned
notebooks are kept in the notebooks branch of the repository. This branch
should not be manually edited, as all changes will be overwritten by the
workflow. For more information, see Reader-Friendly
Notebooks below.
These notes are written using JupyterBook along with several custom extensions and scripts. Pages are either Jupyter notebooks or MyST markdown files.
The src/ directory contains the pages. src/_config.yml contains
important configuration variables, such as the URL of the JupyterHub that will
be used to launch notebooks interactively.
extensions/ contains the extensions which define custom directives. See
Extensions below.
scripts/ contains various scripts used in the development and building of the
notes, such as the script which generates the "cleaned" version of pages that
appear in the notebooks/book_pages directory. These scripts should generally
not be invoked manually.
Several extensions of MyST markdown are used in these notes. These extensions
are defined by the files in extensions/. Their usage is described below.
The "hiddenanswer" directive
This directive provides a way of quickly "quizzing" readers for understanding. It is invoked as follows:
```{hiddenanswer}
---
question: This is the question.
answer: This is the answer.
```
This will create a "tabbed" container. The first tab will show the question. Clicking on the second tab shows the answer.
Long answers or code can be included using the standard YAML syntax for multi-line strings. For example:
````{hiddenanswer}
---
question: |
This is the question,
which will be
1. Parsed *as* [MyST](#)
2. With paragraph breaks preserved
Like this.
answer: |
```
def func(arg):
return 42
```
````
(note that an additional backtick has been used in the directive code fence to allow us to nest a code block in the answer)
The jupytertip directive creates an admonition box intended to display a tip
related to Jupyter notebooks. The jupytertiplist directive creates list of
all of the tips.
Example:
```{jupytertip}
Select `Kernel -> Restart and Run All` to restart the kernel and run all of
the notebook's cells from top to bottom.
```
Sometimes it is useful to place a link to open a notebook in JupyterHub within the text. This can be done with the "jupyterhublink" directive:
```{jupyterhublink} path/to/notebook/relative/to/repo/root.ipynb
```
The URL used for the links is configured in book/_config.yml.
Many pages are written as Jupyter Notebooks which are then converted to HTML by
the build process. Readers can click the rocket icon on the HTML page to launch
the notebook version in a JupyterHub session. However, the Jupyter notebooks
used as source documents may include directives and other content that might
confuse readers. Therefore, as part of the build process, the source notebooks
are transformed by a GitHub workflow into "reader-friendly" notebooks and stored
in the notebooks branch (these are the notebooks that are launched when the
user clicks the rocket link to interact with a page). The notebooks are made
"reader-friendly" by several mechanisms described below.
Cells containing an admonition, such as
```{warning}
This is a warning.
```
are automatically-identified and converted to Markdown:
**Warning**
This is a warning.
Note: This conversion occurs only if the directive is the only content of
the cell. See ./scripts/make_reader_friendly_notebooks.py for more
information.
Hiddenanswer Cells
Cells containing a hidden answer directive, such as
```{Hiddenanswer}
---
question: This is the question
answer: This is the answer
```
are automatically-identified and converted to Markdown:
**Question**: This is the question
**Answer**: This is the answer
This conversion occurs only if the directive is the only content of the cell.
See ./scripts/make_reader_friendly_notebooks.py for more information.
Cells can be hidden in the reader-friendly version by adding a
hide-from-reader tag to the cell.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) License.
