Version control
with git and GitHub
Karl Broman
Biostatistics & Medical Informatics, UW–Madison
[Link]
[Link]/kbroman
@kwbroman
Course web: [Link]/AdvData
Slides prepared with Sam Younkin
[Link]
2
Methods for tracking versions
▶ Don’t keep track
▶ Save numbered zip files
▶ Formal version control
3
Suppose it stops working…
▶ Don’t keep track
– good luck!
▶ Save numbered zip files
– Unzip versions and diff
▶ Formal version control
– Easy to study changes back in time
– Easy to jump back and test
3
Why use formal version control?
▶ History of changes
▶ Able to go back
▶ No worries about breaking things that work
▶ Merging changes from multiple people
4
Example repository
5
Example repository
5
Example history
6
Example commit
7
What is git?
▶ Formal version control system
▶ Developed by Linus Torvalds (developer of Linux)
– used to manage the source code for Linux
▶ Tracks any content (but mostly plain text files)
– source code
– data analysis projects
– manuscripts
– websites
– presentations
8
Why use git?
▶ It’s fast
▶ You don’t need access to a server
▶ Amazingly good at merging simultaneous changes
▶ Everyone’s using it
9
What is GitHub?
▶ A home for git repositories
▶ Interface for exploring git repositories
▶ Real open source
– immediate, easy access to the code
▶ Like facebook for programmers
▶ Free 2-year “Pro” account for students
– [Link]
▶ ([Link] is an alternative)
– free private repositories
10
Why use GitHub?
▶ It takes care of the server aspects of git
▶ Graphical user interface for git
– Exploring code and its history
– Tracking issues
▶ Facilitates:
– Learning from others
– Seeing what people are up to
– Contributing to others’ code
▶ Lowers the barrier to collaboration
– “There’s a typo in your documentation.” vs.
“Here’s a correction for your documentation.”
11
Basic use
▶ Change some files
▶ See what you’ve changed
git status
git diff
git log
▶ Indicate what changes to save
git add
▶ Commit to those changes
git commit
12
Basic use
▶ Change some files
▶ See what you’ve changed
git status
git diff
git log
▶ Indicate what changes to save
git add
▶ Commit to those changes
git commit
▶ Push the changes to GitHub
git push
12
Basic use
▶ Change some files
▶ See what you’ve changed
git status
git diff
git log
▶ Indicate what changes to save
git add
▶ Commit to those changes
git commit
▶ Push the changes to GitHub
git push
▶ Pull changes from your collaborator
git pull
12
Basic use
▶ Change some files
▶ See what you’ve changed
git status
git diff
git log
▶ Indicate what changes to save
git add
▶ Commit to those changes
git commit
▶ Push the changes to GitHub
git push
▶ Pull changes from your collaborator
git fetch
git merge 12
Initialize repository
▶ Create (and cd to) a working directory
– For example, ~/Docs/Talks/Graphs
▶ Initialize it to be a git repository
– git init
– Creates subdirectory ~/Docs/Talks/Graphs/.git
$ mkdir ~/ Docs/Talks/Graphs
$ cd ~/ Docs/Talks/Graphs
$ git init
Initialized empty Git repository in ~/ Docs/Talks/Graphs /.git/
13
Produce content
▶ Create a [Link] file
## Talk on “How to display data badly”
These are slides for a talk that I give as often as possible ,
because it's fun.
This was inspired by Howard Wainer 's article , whose title I
stole: H Wainer (1984) How to display data badly.
American Statistician 38:137 -147
A recent PDF is
[here ](
http :// [Link]/~ kbroman/talks/[Link]).
14
Incorporate into repository
▶ Stage the changes using git add
$ git add [Link]
15
Incorporate into repository
▶ Now commit using git commit
$ git commit -m "Initial commit of [Link] file"
[master (root -commit) 32 c9d01] Initial commit of [Link] file
1 file changed , 14 insertions (+)
create mode 100644 [Link]
▶ The -m argument allows one to enter a message
▶ Without -m, git will spawn a text editor
▶ Use a meaningful message
▶ Message can have multiple lines, but make 1st line an overview
16
A few points on commits
▶ Use frequent, small commits
▶ Don’t get out of sync with your collaborators
▶ Commit the sources, not the derived files
(R code not images)
▶ Use a .gitignore file to indicate files to be ignored
*~
[Link]
Figs /*. pdf
.RData
.RHistory
*. Rout
*.aux
*.log
*.out
17
Using git on an existing project
▶ git init
▶ Set up .gitignore file
▶ git status (did you miss any?)
▶ git add . (or name files individually)
▶ git status (did you miss any?)
▶ git commit
18
Removing/moving files
For files that are being tracked by git:
Use git rm instead of just rm
Use git mv instead of just mv
$ git rm myfile
$ git mv myfile newname
$ git mv myfile SubDir/
$ git commit
19
First use of git
$ git config --global [Link] "Jane Doe"
$ git config --global [Link] "janedoe@[Link]"
$ git config --global [Link] true
$ git config --global [Link] emacs
$ git config --global [Link] ~/. gitignore_global
20
Set up GitHub repository
▶ Get a GitHub account
▶ Click the “Create a new repo” button
▶ Give it a name and description
▶ Click the “Create repository” button
▶ Back at the command line:
git remote add origin [Link]
git push -u origin master
21
Set up GitHub repository
21
Set up GitHub repository
21
Configuration file
Part of a .git/config file:
[remote "origin"]
url = https :// [Link]/kbroman/[Link]
fetch = +refs/heads /*: refs/remotes/origin /*
[branch "master"]
remote = origin
merge = refs/heads/master
[remote "brian"]
url = git :// [Link]/byandell/[Link]
fetch = +refs/heads /*: refs/remotes/brian /*
22
Destroy it and start over
▶ Why?
– You commit something you shouldn’t have (large and/or private)
– You are embarrassed by your repository’s history
– You can’t figure out the mess you’ve made
▶ Pick the repository you like and destroy the other one
– For example, get your local directory in the state you like and destroy everything
else
▶ Local repository
– If you delete the .git subdirectory, it’ll no longer be a git repository
▶ GitHub repository
– Go to the settings for the repository and head down to the Danger Zone
23
Branching and merging
▶ Use branches to test out new features without breaking the working
code.
git branch devel
git branch
git checkout devel
▶ When you’re happy with the work, merge it back into your master
branch.
git checkout master
git merge devel
24
Issues and pull requests
▶ Problem with or suggestion for someone’s code?
– Point it out as an Issue
▶ Even better: Provide a fix
– Fork
– Clone
– Modify
– Commit
– Push
– Submit a Pull Request
25
Suggest a change to a repo
▶ Go to the repository:
[Link]
▶ Fork the repository
Click the “Fork” button
▶ Clone your version of it
git clone [Link]
▶ Change things locally, git add, git commit
▶ Push your changes to your GitHub repository
git push
▶ Go to your GitHub repository
▶ Click “Pull Requests” and “New pull request”
26
Pulling a friend’s changes
▶ Add a connection
git remote add friend git://[Link]/friend/repo
▶ If you trust them, just pull the changes
git pull friend master
▶ Alternatively, fetch the changes, test them, and then merge them.
git fetch friend master
git branch -a
git checkout remotes/friend/master
git checkout -b friend
git checkout master
git merge friend
▶ Push them back to your GitHub repo
git push
27
Merge conflicts
Sometimes after git pull friend master
Auto -merging [Link]
CONFLICT (content ): Merge conflict in [Link]
Automatic merge failed; fix conflicts and then commit the result.
Inside the file you’ll see:
<<<<<<< HEAD
A line in my file.
=======
A line in my friend 's file
>>>>>>> 031389 f2cd2acde08e32f0beb084b2f7c3257fff
Edit, add, commit, push, submit pull request.
28
git/GitHub with RStudio
29
Open source means everyone can see my stupid mistakes.
Version control means everyone can see every stupid mistake I’ve ever
made.
[Link]/stupidcode
30