COMP 1238
Comparing and versioning text files
COMP 1238
Introduction to Data
Management
Starting at 1:05
Monday, Oct 7
◎ AtKlass: XXXX
COMP 1238 – Intro to Data
Management
Starting Zoom recording
3
In previous episodes …
◎ Representation of numbers and characters on stone,
clay, paper and in computer memory
◎ Printing and Typewriters and all the terminology we
inherited from there
◎ Bits, bytes and ASCII encoding
◎ Tools: Keyboard, Text editor, CLI
4
Comparing and
versioning text
files
◎ Diffing tools and methods
◎ Versioning - basic git workflow
◎ Working with GitHub
Objective: Lay the foundation for using comparison and
versioning tools
5
Comparing files and
keeping track of versions is hard
[Link]
mysite_new.zip
mysite_final.zip
mysite_final_v2.zip
mysite_final_v2 (1).zip
mysite_final_v2 (2).zip
6
We need to do better than this
[Link]
mysite_new.zip
mysite_final.zip
mysite_final_v2.zip
mysite_final_v2 (1).zip
mysite_final_v2 (2).zip
7
Managing versions in other
industries
◎ Where else do we need to strictly manage versions?
◎ Mechanical engineering – blueprints
◎ Legal documents
9
Comparing text files is easier than
technical drawings, a comparison
might look like this
◎ ttt
10
Comparing two files
[Link] [Link]
1 # Rice with Onions and Miso 1 # Rice with Onions and Miso
2 ## Ingredients: 2 ## Ingredients:
3 * 1 cup of rice 3 for 2 servings
4 * 1 medium onion, chopped 4 * 1 cup of rice
5 * 2 tbsp of miso paste 5 * 1 medium onion, chopped
6 * 2 cups of water 6 * 2 tbsp of miso paste
7 * 1 tbsp of olive oil 7 * 2 cups of water
8 * Salt and pepper to taste 8 * 1 tbsp of vegetable oil
9 * Optional greens for garnish 9 * Salt and pepper to taste
11
Comparison in VSCode
diff [Link] [Link]
[Link] [Link]
1 # Rice with Onions and
Miso
diff / patch 1 # Rice with Onions and
Miso
2 2
## Ingredients: ## Ingredients:
3 2a3 3
* 1 cup of rice for 2 servings
4 > for 2 servings 4
* 1 medium onion, chopped * 1 cup of rice
5 7c8 5
* 2 tbsp of miso paste * 1 medium onion, chopped
6 < * 1 tbsp of olive oil 6
* 2 cups of water * 2 tbsp of miso paste
7 --- 7
* 1 tbsp of olive oil * 2 cups of water
8 > * 1 tbsp of vegetable oil 8
* Salt and pepper to taste * 1 tbsp of vegetable oil
9 9d9 9
* Optional greens for * Salt and pepper to taste
< * Optional greens for
garnish
garnish
13
Comparing two smaller files
product: Gelato product: Gelato
type: Banana type: Vanilla
amount: 15 amount: 10
unit: kg unit: kg
14
diff [Link] [Link]
product: Gelato product: Gelato
type: Banana type: Vanilla
amount: 15 amount: 10
unit: kg unit: kg
2,3c2,3
< type: Banana
< amount: 15
---
> type: Vanilla
> amount: 10
15
Sending patches over the email
Was a common practice in the old days, especially for big projects
like the Linux kernel
patch < [Link]
Version Control Tools
18
But then git took over
19
◎ Git was initially developed by Linus Torvalds in
2005 for the Linux Kernel after an unfortunate
conflict related to the previous tool they used
◎ It was initially perceived as overly complex tool
optimized for large open-source projects
◎ But then it gradually took over
20
◎ Open-source software tool ◎ Commercial service in the
installed locally on your cloud. Launched in 2008 as
computer, first released in a startup but currently
2005 owned by Microsoft
◎ Deals with the files on your ◎ Provides a place to store
disk your work in the cloud and
◎ Your local git installation share it with others
communicates with GitHub ◎ Lots of extra functionality on
(or another git server) top - GitHub Pages, GitHub
Projects …
21
◎ Git can be overwhelming
◎ Most git documentation gets
into details very quickly
◎ We will look at the basic
workflow and some terminology
to get you started
22
Using git on your
computer
many options
◎ Command line
◎ Integrated in VSCode & WebStorm
◎ GitHub Desktop client
◎ GitUI –Text mode UI for git
◎…
23
Local and remote repositories
◎ We created a repository on GitHub
◎ But when you “clone” it to your computer, you get a complete
copy including ALL the history with all previous versions of the
files
Typical coding cycle – workflow
◎ The work on a mature software project looks like
1. Get the latest version of the source files
2. Modify them to change something, or fix some issue
3. Save the latest version of the files where our coworkers
can access them
4. Repeat
25
Same cycle with git
When first joining a project, we “clone” the repo and get a local copy of
the repo
1. “Pull” the latest version from the remote repo
2. Modify files using any editor
3. “Commit” your changes to your local repo
“commit” as a verb = create a version
4. “Push” your commit to the remote repo
“commit” as a noun is a synonym of “version”
5. Repeat
26
Your files in a directory called “my-project
my-project docs [Link]
user_docs.docx
src
[Link]
[Link]
[Link]
[Link]
27
Your files in your local git repository
.git
my-project docs [Link]
When you have a git repository, you have
user_docs.docx
an additional directory called .git, which
src points at a mini-filesystem.
[Link]
[Link] This file system keeps all your data, plus
the bells and whistles that git needs to do
[Link]
its job.
[Link]
All this sits on your local machine.
28
Your files in your local git repository
.git
my-project docs [Link]
user_docs.docx
src
This mini-filesystem is highly optimized and
[Link]
pretty complicated. Don’t try to read it directly.
[Link]
The job of the git client is to manage this for
[Link] you.
[Link] You can alternate between multiple git clients
on the same repo. For example, the VSCode
integrated client and the CLI one.
29
Your workflow
◎ You edit your local files directly.
○ You can edit, add and delete files using whatever
tools you like
○ This doesn’t change the repo, so now your repo is
behind
30
A Commit
When you do a “commit”, you
.git
record your changes into the local
my-project docs [Link] repo.
user_docs.docx
src
[Link] The mini-fs of the repo is “append-
[Link] only”. Nothing is ever over-written
[Link] commit there, so everything you ever
[Link] commit can be recovered.
31
Synchronizing with the server -
push
server, somewhere on the internet,
your local machine
eg. [Link]
manu
.git
push
my-
projec docs [Link]
user_
t x
docs.
src docx
main.r
kt
modul
[Link]
modul
[Link]
modul
[Link]
At the end of each work session, you
need to save your changes on the server.
This is called a push.
Now all your data is backed up.
• You can retrieve it, on your machine or
some other machine.
• Your co-workers can retrieve it
32
Synchronizing with the server -
pull
a server, somewhere on the internet,
your local machine
eg. [Link]
my-
projec
t
docs
manu
[Link]
user_
x
.git pull
docs.
src docx
main.r
kt
modul
[Link]
modul
[Link]
modul
pull
[Link]
To retrieve your data from the server, you do a “pull”. A
“pull” takes the data from the server and puts it both in
your local mini-fs in .git directory and in your ordinary
files.
If your local file has changed, git will merge the
changes if possible. If it can’t figure out how to the
merge, you will get an error message. We'll learn how
to deal with these later. 33
The whole picture
your local machine a server, somewhere on the
internet, eg. [Link]
push
my-
projec docs
manu
[Link]
user_
.git
t x
docs.
docx
pull
src main.r
kt
modul
[Link]
modul
[Link]
modul
pull
[Link]
commit
34
The whole picture
a server, somewhere on the
your local machine internet, eg. [Link]
my- manu
[Link] .git sync
projec docs user_
t x
docs.
src docx
main.r
kt
modul
[Link]
modul
[Link]
modul
sync
[Link]
In some clients there is a “Sync”
commit
button combining “push” and
“pull” into a single operation
35
Your workflow
sync
edit Best practice: commit your work whenever
you’ve gotten one part of your problem
commit working, or before trying something that
might fail.
edit
If your new stuff is messed up, you can
commit
always “revert” to your last good commit.
edit
commit
sync
36
Your workflow with a partner
You Your Partner (or you You
on another computer)
sync sync sync
edit edit edit
commit commit commit
edit server edit edit
server
commit commit commit
edit edit edit
commit commit commit
sync sync sync
Your partner You get your
gets your work partner’s work
from the server from the server
37
Working in parallel – merges
◎ Sync = push & pull You Co-worker
server
◎ Whoever pushed first – ok sync sync
◎ Second edit edit
○ push declined commit commit
○ asked to pull first edit edit
○ merge the versions – usually goes commit commit
ok automatically edit edit
○ push commit commit
sync sync
server
38
Merge conflicts
◎ How git merges automatically
○ Different files changed – no problem
○ Different parts of the same file changed – no problem
◎ Change around the same place of a file = conflict
○ Git notifies you of the conflict and asks to resolve it
manually (we will talk about this later)
39
Tech tools don’t replace the
need for human-to-human
communication
◎ Talk to your coworkers, discuss any plans for large scale
changes
◎ Even if it’s a small change, but one that affects many files, or
parts that someone else was working on lately – talk to them
Talk to people, or merge conflicts will
escalate into human conflicts
40
Working with git history
◎ View history
◎ Blame
◎ Commit messages
◎ Live demo …
41
Branching
Branches – why use them
◎ go back in time and create an alternative history line
◎ Release a version
◎ The team continues working on future features
◎ But released version needs occasional urgent bug fixes
Branches – why use them
Development
branch
Production branch
Questions?
45
Links and references
◎ Git Tutorial for Beginners - Git & GitHub Fundamentals In Depth
by @TechWithTim
◎ Other long tutorials
○ Git and GitHub Tutorial for Beginners by Kevin Stratvert
○ Git and GitHub for Beginners - Crash Course
by Gwen Faraday of @faradayacademy
○ GitHub Basics Made Easy: A Fast Beginner's Tutorial!
◎ Git (not really) explained in 100 seconds by Fireship
◎ Download GitHub Desktop
◎ Brief history of Git
◎ GitHub Learning Lab
◎ The Git Book (free online, advanced)
◎ Wikipedia article about patch 46
DRAFTS
47
Git basic workflow
◎ Create or download a repository
○ git pull (or git init)
◎ Track changes
○ git add, git commit
◎ View history
○ git log
◎ Upload changes
○ git push
48
Staging area
◎ How to commit some but not all of
the files you changed?
◎ You add them to the commit you
are preparing. This is called staging
or staging area – not a real place
Staging area
◎ How to commit some but not
all of the files you changed?
◎ You add them to the commit
you are preparing. This is
called staging or staging area –
not a real place
51
◎ TODO: copy over this preso
◎ [Link]
ub-desktop/github#/3/1
52
53
Local and remote repositories
◎ We created a repository on GitHub
◎ But when you “clone” it to your computer, you get a complete
copy including ALL the history with all previous versions of the
files
54
How it’s done in the
parliament?
Example from Bill C-11 (Online Streaming Act)
[Link]
55
Comparing two simple
files
56
Comparing two simple
files
1 # Rice with Onions and Miso <h1>Rice with Onions and Miso</h1>
2 ## Ingredients: <h2>Ingredients for 2 servings:</h2>
3 * 1 cup of rice <ul>
<li>1 cup of rice</li>
4 * 1 medium onion, chopped
<li>1 medium onion, chopped</li>
5 * 2 tbsp of miso paste
<li>2 tablespoons of miso
6 * 2 cups of water paste</li>
7 * 1 tbsp of vegetable oil <li>2 cups of water</li>
8 * Salt and pepper to taste <li>1 tablespoon of vegetable
oil</li>
9 * Optional greens for garnish
<li>Salt and pepper to taste</li>
</ul>
57
58
59