Git introduction

What is a version control system?

A version control system is a tool that keeps track of file changes, effectively creating different versions of our files, and keeps useful metadata about them. The complete history of commits (which are like snapshots) for a particular project and their metadata make up a repository. Repositories can be kept in sync across different computers, facilitating collaboration among different people.

Hystory

Automated version control systems are nothing new. Tools like RCS, CVS, or Subversion have been around since the early 1980s and are used by many large companies. However, many of these are now considered legacy systems (i.e., outdated) due to various limitations in their capabilities.
More modern systems, such as Git and Mercurial, are distributed, meaning that they do not need a centralized server to host the repository. These modern systems also include powerful merging tools that make it possible for multiple authors to work on the same files concurrently.

Git has become the de-facto standard when it comes to version control tools. It was created by Linus Torvalds during the development of the Linux operating system. It is not to be confused with GitHub, which is a commercial website hosting git repositories.

Fundamentals

The fundamental feature git offers is the ability to track the state of all the files in a folder, allowing you to freely move between all the save snapshots.

Staging

From git’s perspective, files can be in one of three stages: working directory, staging and commit. The working directory represent the current state of your file, the way you see them if you open them with any software. Files can then be moved in the staging area. This indicates they are ready to be committed. And finally, creating a commit will take a snapshot of the staging area and store all the changes in a new commit that can be restored any time down the line.

Loading diagram...

Tip

You can think of the staging area as a “snapshot in progress”. When you are satisfied with your finalise the snapshot, creating a permanent immutable commit.

Note

If you modify again a file you had previously added to the staging area, the new changes won’t be present: the “snapshot in progress” was taken before the changes. If you want to track them you need to add the file again, updating the “snapshot in progress” before the commit.

Git branches

Branch early, branch often

Basic commands

Global configuration

It is a good idea of starting creating a global configuration for git. The minimum required information is the user’s name and email. They can be configured through the command line with the following commands:

git config --global <config-key> <config-value>
# Configure username
git config --global user.name "My name"
# Configure email
git config --global user.email "myemail@myemail.com"
# Configure the line endings
git config --global core.autocrlf input
# Configure the editor
git config --global core.editor "nano -w"
# List all the configurations
git config --list

Available editors

Editor	Configuration command
Atom	`git config --global core.editor "atom --wait"`
nano	`git config --global core.editor "nano -w"`
BBEdit (Mac, with command line tools)	`git config --global core.editor "bbedit -w"`
Sublime Text (Mac)	`git config --global core.editor "/Applications/Sublime\ Text.app/Contents/SharedSupport/bin/subl -n -w"`
Sublime Text	`git config --global core.editor "'c:/program files/sublime text 3/sublime_text.exe' -w"`
Notepad++	`git config --global core.editor "'c:/program files/Notepad++/notepad++.exe' -multiInst -notabbar -nosession -noPlugin"`
Kate (Linux)	`git config --global core.editor "kate"`
Gedit (Linux)	`git config --global core.editor "gedit --wait --new-window"`
Scratch (Linux)	`git config --global core.editor "scratch-text-editor"`
Emacs	`git config --global core.editor "emacs"`
Vim	`git config --global core.editor "vim"`
VS Code	`git config --global core.editor "code --wait"`

Initialise the repository

To be able to use git in a project, we first need to create a new folder and initialise the repository with the git init command.

# Create a new directory called "recipes" (linux)
mkdir recipes

# Move to that directory
cd recipes

# Initialise a git repository
git init
#> Initialized empty Git repository in /home/user/recipes/.git/

An hidden .git folder will be created. Inside it, git will store all the data it needs to track the state of the project (i.e. file changes).

# Trying to list the files in the folder won't show anything
ls
#>

# By adding the -a flag, the .git folder will appear
ls -a
#> . .. .git

Warning

Deinitialising a git repository is as simple as deleting the .git folder. The state of all other files will remain unchanged, but you will lose all your commits.

Git will also create a default branch, called main (or, historically, master). You can check the status of the repository with git status.

# Visualise the status of the repository
git status
#> On branch main
#>
#> No commits yet
#>
#> nothing to commit (create/copy files and use "git add" to track

Loading diagram...

Tracking changes

Git will track any new file added to the folder.

# Create a new text file containing "New file"
echo "New file" > new_file.txt

# Check the status of the repository
git status
#> On branch main
#>
#> No commits yet
#>
#> Untracked files:
#>   (use "git add <file>..." to include in what will be committed)
#> 	new_file.txt
#>
#> nothing added to commit but untracked files present (use "git add" to track)

If we want to take a snapshot of this change, we first add it to the staging area and then create a new commit with a message describing our change.

# Add the file to the staging area
git add new_file.txt

# Check the status of the repository
git status
#> On branch main
#>
#> No commits yet
#>
#> Changes to be committed:
#>   (use "git rm --cached <file>..." to unstage)
#> 	new file:   new_file.txt
#>

# Create a commit (snapshot)
git commit -m "Added new file new_file.txt"
#> [main (root-commit) ce52f8c] Added new file new_file.txt
#>  1 file changed, 1 insertion(+)
#>  create mode 100644 new_file.txt

# Check the status of the repository
git status
#> On branch main
#> nothing to commit, working tree clean

Loading diagram...

Tip

To remove all files from the staging area (unstage) use the git reset command. It is possible to provide the name of the specific files you want to unstage: git reset new_file.txt

If the file is modified further (let’s say we add some lines to it), the git status will reflect that.

# Add a line to new_file.txt
echo "Another line" >> new_file.txt

# Check the status of the repository
git status
#> On branch main
#> Changes not staged for commit:
#>   (use "git add <file>..." to update what will be committed)
#>   (use "git restore <file>..." to discard changes in working directory)
#> 	modified:   new_file.txt
#>
#> no changes added to commit (use "git add" and/or "git commit -a")

# Trying to commit with an empty staging area will raise an error
git commit -m "I forgot to 'git add new_file.txt'"
#> On branch main
#> Changes not staged for commit:
#>   (use "git add <file>..." to update what will be committed)
#>   (use "git restore <file>..." to discard changes in working directory)
#> 	modified:   new_file.txt
#>
#> no changes added to commit (use "git add" and/or "git commit -a")

# Instead, first stage all the changes you want to commit
git add new_file.txt

# And then commit them
git commit -m "Added a line to new_file.txt"
#> [main 3a25799] Added a line to new_file.txt
#>  1 file changed, 1 insertion(+)

Loading diagram...

Tip

Git does not keep track of empty folders. If you want to override this behaviour, just add an empty file to the folder. The convention is to name the empty file .gitkeep.

# Create empty .gitkeep file
echo "" > .gitkeep

Not tracking changes

There are some files you may not want to track with a version control system. These typically include binary blobs, cache files, large videos or images and logs. To let git know it should ignore these files, create a .gitignore file and specify everything that should be ignored.

# .gitignore file

# Ignore the "executable.exe" file
executable.exe
# Ignore files with the extensions ".mp3", ".mp4" and ".avi"
*.mp3
*.mp4
*.avi
# Ignore the folder "cache"
cache/
# Ignore all ".png" files under the "docs" folder
docs/**/*.png
# No matter if it matches any previous rule, always track ".gitkeep"
!.gitkeep

Repository history

The history of the repository, with all its commits, is available through the git log command.

# Show the history of the repository
git log
#> commit 3a2579938cc2d2934e24d971f9738894a60c18ad (HEAD -> main)
#> Author: Ernesto Casablanca <myemail@myemail.com>
#> Date:   Tue Oct 22 11:29:05 2024 +0100
#>
#>     Added a line to new_file.txt
#>
#> commit ce52f8cef8a35df3bc1e848a579668de263d5a6a
#> Author: Ernesto Casablanca <myemail@myemail.com>
#> Date:   Tue Oct 22 11:19:02 2024 +0100
#>
#>     Added new file called new_file.txt

# Show the history in a more compact way
git log --oneline
#> 3a25799 (HEAD -> main) Added a line to new_file.txt
#> ce52f8c Added new file called new_file.txt

# Show the history keeping track of branches
git log --oneline --graph
#> * 3a25799 (HEAD -> main) Added a line to new_file.txt
#> * ce52f8c Added new file called new_file.txt

# Show exaustive but compact information about the history of the repository
git log --all --decorate --oneline --graph --date=relative --pretty=tformat:'%C(auto)%h%Creset -%C(auto)%d%Creset %s %Cgreen(%an %ad)%Creset'
#> * 3a25799 - (HEAD -> main) Added a line to new_file.txt (Ernesto Casablanca 28 minutes ago)
#> * ce52f8c - Added new file called new_file.txt (Ernesto Casablanca 38 minutes ago)

Compare differences

Git can produce a compact and informative comparison between the current state of the repository with the git diff command.

# Add another line to new_file.txt
echo "And another one" >> new_file.txt

# Show the differences
git diff
#> diff --git a/new_file.txt b/new_file.txt
#> index 20a8ff1..7785ace 100644
#> --- a/new_file.txt
#> +++ b/new_file.txt
#> @@ -1,2 +1,3 @@
#>  New file
#>  Another line
#> +And another one

# We can also be selective with the files we want to see the diff of
git diff new_file.txt
#> diff --git a/new_file.txt b/new_file.txt
#> index 20a8ff1..7785ace 100644
#> --- a/new_file.txt
#> +++ b/new_file.txt
#> @@ -1,2 +1,3 @@
#>  New file
#>  Another line
#> +And another one

# Staged files won't appear in the diff anymore
git add new_file.txt
git diff
#>

# To see the differences between the staging area and the latest commit,
# add the --staged flag
git diff --staged
#> diff --git a/new_file.txt b/new_file.txt
#> index 20a8ff1..7785ace 100644
#> --- a/new_file.txt
#> +++ b/new_file.txt
#> @@ -1,2 +1,3 @@
#>  New file
#>  Another line
#> +And another one

# Committing the changes will update the reference for future diffs
git commit -m "Added yet another line to new_file.txt"
#> [main 1b7d1e4] Added yet another line to new_file.txt
#> 1 file changed, 1 insertion(+)

Loading diagram...

Working with a remote

One of git’s most appreciated features is its ability of synchronising the local repository with a remote git server. You could host your own git server, or use a public one, such as GitHub or GitLab.

Adding a remote

To let the local repository know of the remote one, we use the git remote add command. The url we use should match the one provided by the remote git server.

# Add a new remote called "origin" at the provided url
git remote add origin git@github.com:TendTo/my-repo.git

# Check the remote was correctly added
git remote -v
#> origin git@github.com:TendTo/my-repo.git (fetch)
#> origin git@github.com:TendTo/my-repo.git (push)

Remote authentication

Most services will require some form of authentication before allowing you to push any changes to the remote git repository. Using GitHub as a reference, the two available protocols are https and ssh. ssh is usually preferable because, while it requires some additional configuration, it is a security protocol widely used by many applications. GitHub does not permit HTTPS access to repositories using your account login and password in any event, instead requiring the creation of a personal access token (PAT) which also needs additional configuration.

Generating ssh keys

To generate a pair of ssh keys, use the ssh-keygen command. You will be asked for a path where to save the keys. By default they will be saved under the .ssh folder in your home directory. If you choose the default path, git will be able to find the key without any further indication. You will also be given the option to provide a password you will need to input every time you need to use the key. Press enter without writing anything to avoid setting the password.

# Generate a new key pair.
# ed25519 is the encryption algorithm.
# You can put a comment, usually your email, to identify the key
ssh-keygen -t ed25519 -C "myemail@myemail.com"
#> Generating public/private ed25519 key pair.
#> Enter file in which to save the key (/home/user/.ssh/id_ed25519): ./my_key
#> Enter passphrase (empty for no passphrase):
#> Enter same passphrase again:
#> Your identification has been saved in ~/.ssh/id_ed25519
#> Your public key has been saved in ~/.ssh/id_ed25519
#> The key fingerprint is:
#> SHA256:1DYSxuTTogM8nrNkWeXXbpd9MAe6fiUDd6vlB0y0gHg myemail@myemail.com
#> The key's randomart image is:
#> +--[ED25519 256]--+
#> |       o=. .. o  |
#> |   .   =o+E. + o |
#> |    + . B.* + * o|
#> |   . * o * o * B.|
#> |    B o S   + Bo+|
#> |   o o .   o .+=.|
#> |    .       .....|
#> |             .  .|
#> |                 |
#> +----[SHA256]-----+

Publishing the ssh keys

Now we need publish our public key on the remote repository. Read the public key (the file with the .pub extension) and paste the whole output on your remote of choice.

# Read the public key
cat ~/.ssh/id_ed25519.pub
#> ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDmRA3d51X0uu9wXek559gfn6UFNF69yZjChyBIU2qKI myemail@myemail.com

For instance, on GitHub, click on your profile icon in the top right corner to get the drop-down menu. Click Settings, then on the settings page, click SSH and GPG keys, on the left side Account settings menu. Click the New SSH key button on the right side. Now, you can add a title to help you remember the key, paste your SSH key into the field, and click the Add SSH key to complete the setup.

Pushing changes

Git workflows

Branches

It is generally a good practice to keep the main work safe from experimental changes we are working on. To do this we can use branches to work on separate tasks in parallel without changing our current branch, main.

We can list all branches and create new ones with the git branch command.

# List all branches
git branch
># * main

# Create a new branch called experiment
git branch experiment

# List all branches
# The '*' indicates the active branch
git branch
#>  experiment
#> * main

Loading diagram...

Creating a branch just adds a new pointer to the last commit. In order to ensure that future commits are appended to the new branch, we need to checkout to it.

# Switch to the experiment branch
git checkout experiment
#> Switched to branch 'experiment'

# We can double check that is the case with git log
# HEAD indicates the active branch
git log --oneline
#> 1b7d1e4 (HEAD -> experiment, main) Added yet another line to new_file.txt
#> 3a25799 Added a line to new_file.txt
#> ce52f8c Added new file called new_file.txt

We can create and switch to a new branch with the checkout command or with the newer switch command.

# Create and checkout to the new branch 'new'
git checkout -b new
#> Switched to a new branch 'new'

# or, equivantely

# Create and checkout to the new branch 'new'
git switch -c new
#> Switched to a new branch 'new'

Loading diagram...

We can delete a branch we don’t need anymore by using the -d flag in the branch command. We cannot delete a currently checked out branch.

# Delete the branch "experiment"
git branch -d experiment
#> Deleted branch experiment (was 1b7d1e4).

Loading diagram...

Warning

All commits related to that branch will be lost, unless there is another branch keeping track of them.

We can makes some changes and commit them on the new branch.

# Make some changes
echo "Branch new" > branch_file.txt
# Add and commit them
git add branch_file.txt
git commit -m "Add branch_file.txt"
#> [new 091a043] Add branch_file.txt
#>  1 file changed, 1 insertion(+)
#>  create mode 100644 branch_file.txt

Loading diagram...

When we are satisfied with the changes, we can merge them back in the main branch.

# Checkout back to the main branch
git checkout main # or git switch main
#> Switched to branch 'main'

# Merge the changes from the "new" branch
git merge new
#> Updating 1b7d1e4..091a043
#> Fast-forward
#>  branch_file.txt | 1 +
#>  1 file changed, 1 insertion(+)
#>  create mode 100644 branch_file.txt

Loading diagram...

Merge conflicts

Git employs a lot of heuristics to automatically handle merges between different branches in a non disruptive way. That being said, there are some cases where there is no obvious solution. Hence git will trust you to know which changes should be given higher priority and override the others.

# Switch to the new branch "left" and make some changes
git checkout -b left
echo "left" >> branch_file.txt
git add branch_file.txt
git commit -m "Left changes"

# Go back to the main branch
git checkout main

# Repeat the same procedure for the "right" branch
git checkout -b right
echo "right" >> branch_file.txt
git add branch_file.txt
git commit -m "Right changes"

# Go back to the main branch
git checkout main

# Show the state of the repository
git log --oneline --graph --all
#> * c5c0519 (right) Right changes
#> | * 9ba1fdc (left) Left changes
#> |/
#> * 091a043 (HEAD -> main, new) Add branch_file.txt
#> * 1b7d1e4 Added yet another line to new_file.txt
#> * 3a25799 Added a line to new_file.txt
#> * ce52f8c Added new file called new_file.txt

Loading diagram...

Trying to merge both branches into the main will raise a merge conflict.

# Merge the left branch into main; no problems
git checkout  main
git merge left
#> Updating 091a043..9ba1fdc
#> Fast-forward
#>  branch_file.txt | 1 +
#>  1 file changed, 1 insertion(+)


# Merge the right branch; merge conflict
git merge right
#> Auto-merging branch_file.txt
#> CONFLICT (content): Merge conflict in branch_file.txt
#> Automatic merge failed; fix conflicts and then commit the result.

Loading diagram...

In other words, git does not know how to handle the divergence of the file automatically, hence it needs your guidance. If you open the branch_file.txt, you will notice git has modified it.

# Read "branch_file.txt"
cat branch_file.txt
#> Branch new
#> <<<<<<< HEAD
#> left
#> =======
#> right
#> >>>>>>> right

The section between <<<<<<< HEAD and ======= is the content of the file in the current commit, while the other section indicates what the branch we are merging would have it changed to. It is up to us to edit the file and solve the conflict, according to our needs. When we are satisfied, we should delete the lines added by git and commit the result.

# Commit the solved merge
git add branch_file.txt
git commit -m "Merge branch 'right'"
#> [main d9a26e8] Merge branch 'right'

Loading diagram...