User Tools

Site Tools


software:git

Using git

Initial setup

Configure your name and email using the following commands to ensure that your commits are signed properly.

git config --global user.name "John Doe"
git config --global user.email johndoe@example.com

To use gedit as the editor, run:

git config --global core.editor "gedit --wait --new-window"

Cloning a repository

git clone <url> [directory optional]

When cloning from github, the URL can be either HTTPS or SSH (if SSH, pubkey authentication on github should be set up):

https://github.com/user/repo.git
git@github.com:user/repo.git

Introduction to commits

A commit in git is an elementary unit of work in a project. It represents a snapshot of all the tracked project files. Among the information encoded in the commit are: the set of changes from the previous commit (called the parent commit), time, date, the author details, and a cryptographic (SHA) hash of all the above information.

Commit workflow

1. Make some changes to the working tree (the directory containing the checked out files), then review them using the git GUI:

git gui

If missing, install with sudo apt install git-gui. Another recommended tool for viewing commits is gitg.

2. Before commiting, changes have to be added to the staging area (also known as cache or index). To perform this from the git GUI, select files from the upper left pane (which shows unstaged changes) and press Ctrl-T. You can also add the changes on per-line or per-hunk basis by right clicking in the right (preview) pane of git-gui and selecting the appropriate option.

Alternatively, to stage changes from the CLI, use git add <file with changes>. The list of files with unstaged changes can be viewed with git status. To see a diff listing of unstaged changes, run git diff (to see the staged changes, add the parameter --cached). If you want to stage all changes in tracked files (untracked files have to be explicitly git add-ed), call git add -u.

3. After you have staged all the changes that you want to go into the commit (keep in mind that every commit should be a sensible and a self-complete set of changes), invoke git commit. The editor you have configured will open up and ask you to enter the commit message. If you leave it empty, the commit will be aborted (changes will remain in the staging area).

Clearing changes

The staging area can be cleared with git reset – this will unstage the changes, but they will remain in the working tree. Changes in the working tree can be discarded by calling git reset --hard or saved for later with git stash (use git stash pop to recover them).

Amending a commit

A simple way of changing the last commit is amending it. If you have just made a commit and still haven't pushed it to a public/shared repository (more on this later in the subsection about rewriting history), additional changes can be added into that commit. These changes should first be made in the working tree and then staged as described above. Then, the following command should be called:

git commit --amend

…which will prompt git to open the message of the last commit for editing. After editing is finished, the commit will be regenerated in order to incorporate the newly staged changes along with the ones which were previously included in the commit.

Commit message style

When writing commit messages, it is preferred to follow some guidelines regarding language and styling, which are in use in several large git projects – such as git itself, or the Linux kernel. For example:

  • the message is split into two parts: a title (ideally around 50 characters) and body (possibly several paragraphs)
  • the first word in the title should be a verb in the present simple/imperative tense, such as add, fix, implement, etc.
  • the body should be manually wrapped to 72 characters per line

Working with branches

In git, as in with most SVN software, the concept of branches is used to represent different versions of code, usually taking different development directions. By convention, there is usually a master branch which represents the main version, ideally in the state ready for release.

Branches in git can be visualised as pointers to commits. The commits can be visualized as a graph, where every commit (except the initial one) depends on a parent commit:

Switching between branches can be done with the command

git checkout <branch name>

This means that a jump is performed to the tip of the requested branch – where “jumping” can be more precisely described as moving a special pointer, associated with the tip of the currently checked out branch, called HEAD.

New branches (pointing to the same place as the current HEAD) can be created by calling

git branch <new branch name>. 

A shorthand for creating a new branch and immediately checking it out is

git checkout -b <new branch name>

Existing branches can be, after they have been merged into another branch, deleted with the command git branch -d merged-branch. If the branch in question has not been merged, capitalize the option (-D). Git requires this in order to warn you that, after this operation, commits from the unmerged branch will be left lingering until the git garbage collection deletes them.

Reset - manually moving the tip

Naturally, when a new commit is made, the tip of the currently checked out branch (i.e. the HEAD pointer) is advanced one step in order to point to the new commit. If one needs to move the tip somewhere else manually, this can be done with git reset <new tip reference>. The reference to a commit which will become the new tip of the branch can be specified in multiple ways:

  • the beginning of the commit's SHA hash (a 40-character hexadecimal unique identifier) - at least 4 characters from the beginning
  • name of a branch
  • the special HEAD pointer, usually in combination with a parent modifier - see the next paragraph

All commit references can be modified by adding the ^ or ~ modifier characters to the end of the reference in order to refer to a parent commit. A single ^ character means “the parent commit”, two (^^) mean “the parent of the parent”, etc. The ~ means the same as ^, but can be followed by a number, which indicates how many parent commits earlier in the tree the reference points to. A common use of this is git reset HEAD^ to discard the last commit. (If the commit remains a dead end and doesn't end up as a part of some branch, it will linger around in the repository for a few weeks until git runs garbage collection. The same applies when amending and rebasing commits - old versions of the commits will still be available for some time, by their hash identifiers.)

Soft vs. hard reset

Note that git reset only moves the tip of the branch, i.e. the HEAD pointer, and does not change files in the working tree. This means that after the reset, the working tree will still be in the same state as before and git status will now show the difference between the working tree and the new HEAD as unstaged changes. This is also known as a soft reset. On the other hand, a hard reset will also checkout the working tree, so it matches the state of the new HEAD commit.

As mentioned before, git reset --hard (without a commit reference) can be used to discard the unstaged changes in the working tree.

Rewriting history

An important thing to consider when rewriting history is that a git repository, by definition, is a shared resource, designed to be used in a collaborative setting. This means that one should not needlessly inconvenience others by rewriting common history without prior agreement, because this will cause all kinds of conflicts while using git.

With that in mind, rewriting history can be a very powerful tool. Commit history can be cleaned up and polished by reorderding commits, squashing them together, editing their content or rewording their commit messages. Keep in mind that this means that new commits are generated from the old ones. Even if we want to edit only one commit, all its children commits will have to be regenerated. This is because the content of a commit depends on its parent - if the parent is changed, the descendant commit needs to be regenerated too, because it depends on the hash identifier of the parent commit.

Interactive rebase

The interactive rebase is a powerful command which can perform all the above actions. It is invoked in the following way:

git rebase -i <commit reference>

The commit reference is usually formed as a parent commit of the HEAD commit. E.g. if one wants to edit last 10 commits, the commit reference passed to the interactive rebase command will be HEAD~10. Git will then open a “command list” text file with the following contents:

pick <id of the commit HEAD~9>
pick <id of the commit HEAD~8
...
pick <id of the commit HEAD~>
pick <id of the commit HEAD>

After the file is closed, git will check out the commit HEAD~10. This means that HEAD becomes HEAD~10. During the rebase operation, the previous HEAD is saved in another special pointer called OLD_HEAD. Git will now follow the command list and apply (“cherry-pick”) all the individual commits in the range OLD_HEAD~9..OLD_HEAD. If the command list has not been modified in any way, git-rebase will effectively do nothing: it will figure out that no commits actually need to be regenerated, and HEAD will again point to the same commit as before. However, besides pick, we have the following commands at our disposal (either full form or the one-letter abbreviation can be used):

  • p, pick = use commit
  • r, reword = use commit, but edit the commit message
  • e, edit = use commit, but stop for amending
  • s, squash = use commit, but meld into previous commit
  • f, fixup = like “squash”, but discard this commit's log message
  • x, exec = run command (the rest of the line) using shell
  • d, drop = remove commit

Git will always print this list as a reminder in the comments at the end of the rebase commands file. The commits can also be reordered from here by simply changing the order in the rebase commands file.

Editing a commit

Editing a commit is done by changing the pick command in its line to an edit command. When git-rebase reaches that line, it will apply that commit (i.e. HEAD will then be pointing to that commit, as if it has just been commited). Next, the rebase is paused and the user is returned to the terminal. The commit can then be edited by amending it as described above (make the changes, stage them, amend the commit). You can also insert another commit at this point by simply commiting it as described in the subsection about making commits. Afterwards, to continue the rebase, run git rebase --continue.

Helpful note: the commit amending step can actually be skipped. It is enough to stage the changes; running git rebase --continue will then trigger an amend.

Resolving conflicts

Resolving conflicts is usually tricky and has no universal receipt. In general, rebasing will stop when git cannot apply a commit's patch. The files with conflicts can be viewed using git status. After resolving them manually, the conflict resolution needs to be marked by git add-ing the files with conflicts. After that, rebasing is continued with git rebase --continue.

Git GUI (git gui) can also be helpful when resolving conflicts, with the ability to select a particular version of the file by right-clicking in the preview pane. Keep in mind that Git GUI calls the so far rebased commits “local”, while the incoming commit being currently picked is the “remote” version.

When a rebase goes wrong

When you feel that the current rebase is going in a wrong way, you can always abort it by running git rebase --abort instead of continuing, and the old tip of the branch will be restored (it was saved in OLD_HEAD when rebase was started).

If you wish to undo a bad rebase which has already been completed, open git reflog to see a history of the HEAD pointer. Find the commit it was pointing to when rebase was started (the old tip of the branch) and perform a git reset as described in the earlier subsection.

A shortcut for the history of the HEAD pointer is HEAD@{n}, where n is a number of steps back in history. Every time an action is made – such as making a commit, rebasing, amending, checking out different branches – an entry in the history of the HEAD pointer is made.

Although rebasing can be a destructive operation in the long term, you don't need to fear that you will irreparably immediately break something when running a rebase. You are always one git reset away from undoing everything.

Rebasing a branch

Besides the interactive rebase, which is used for tidying up the commit history, another important use case of git rebase is keeping up a private branch with upstream changes.

Imagine that you are developing a new feature in your private experiment branch. You made a commit C4 at the time when C2 had been the most recent commit in the master branch – the so-called base of the experiment branch. However, new activity happens in the meantime on the master branch and someone produces the C3 commit. Your branch is now behind the master branch by one commit, and simultaneously ahead by one commit. Integrating your feature as it is by merging it in the master branch would yield a non-linear history of the master branch. You want to update the base of the experiment branch to be the new tip of the master branch, C3 – the operation is called rebasing.

The operation is performed by executing the following command (experiment has to be already checked out):

git rebase master

This will generate a new commit, C4', which will have C3 as the parent commit, and this will be the new tip of the experiment branch. The experiment branch can now neatly be merged into the master branch - more on this in the subsection on merging.

Cherry-picking

If, for some reason, you want to apply the changes from one or more other commits, no matter how they are related to the current HEAD, this can be done with the cherry-pick command:

git cherry-pick <commit reference>

If applying more commits, the commit reference can be given a series of n commits, specified in the following format: commit0..commitn.

Keep in mind that commit0 is not included among the commits to be applied. A common trick is to specify the sequence in the following manner: commit1^..commitn. This works because the ^ modifier changes the first commit reference to be the parent commit of commit1.

Should conflicts arise, they are resolved the same way when rebasing: manually resolve, mark resolution with git add and run git cherry-pick --continue.

Merging

When work on a feature branch is finished, it is time to merge that work to a public main branch. The commits in the feature branch should by then preferably be tidied up, polished and rebased on top of the master branch. This enables a so called “fast-forward” merge, which results in a linear history of the main branch – merging becomes effectively moving the tip of the master branch to the tip of the feature branch, as illustrated below.

Suppose that new commits have been made in the master branch since the feature branch was created. The following sequence of commands describes the above workflow of integrating the feature branch which results in a linear history:

git checkout feature
git rebase master     # include new commits from master

git checkout master
git merge feature     # merge feature into master

If the development of the feature branch happens during a longer time period, it might be a good idea to perform an occasional rebase to integrate the changes that are happening on the master branch. However, if this is not strictly your private branch, this needs to be done in agreement with all the team members who are working on the branch, since you will need to do a force push to the remote (more on this later), and everyone will have to do a hard reset/rebase of their local copies of the branch.

No-fast forward merge

When integrating larger features which have been developed in many commits, one might want to preserve this separation of commits which shows that they all originated in a common, separate branch:

This can be achieved by adding the --no-ff parameter when calling git merge. This forces a creation of a so-called merge commit, a special type of commit which has more than one parent. In this case, the parents of the merge commit will be the tip and the base of the feature branch. This is a clean merge from which we can still observe the development of the feature branch with a clear beginning and an end, e.g. in a graphical utility such as gitg or gitk.

Please note that pull requests in Github by default use no-fast forward merging. In order to avoid clutter, you might not want this when the pull request is a minor fix which shouldn't have a separate merge commit. Thus, if a fast-forward merge is desired, the “rebase” option should be clicked when closing a pull request.

Three-way merging

In a more traditional workflow where feature branches are not rebased, a so called three-way merge happens if new commits on the master branch were made in the meantime. The most recent of these commits is going to be a parent of the merge commit. This is in contrast to the clean situation when doing a no-fast forward merge of a rebased feature branch, where one parent is the base of the branch.

If the development of the feature branch is prolonged, new commits from the master branch can be merged into the feature branch, instead of rebasing the feature branch as previously described. The benefit of the three-way merging workflow is that commits on the feature branch are never regenerated, which makes collaboration somewhat easier. However, a significant drawback is that the commit history is not linear, which can make finding regressions and general understanding of the commit history more difficult. This is why we have opted to build this tutorial around the rebasing workflow, with three-way merging only briefly mentioned here as an alternative.

Working with remotes

Git is a distributed SVN. This means that the repository on your hard drive is self-contained – you don't need internet access to make commits and work with git in general. However, when you want to share your work, the commits and branches are uploaded to (and downloaded from) a server with its own copy of the git repository, called a remote in git lingo. By default, the location from where a repository is cloned is added as a remote named origin.

Pushing your commits

After making some commits on the master branch, they can be uploaded to the origin remote using the following command:

git push origin master

However, origin master can be omitted, because by default, git will push the currently checked out branch (for example master) to the corresponding tracked branch on the remote, which is automatically set up when doing a checkout for the first time. So, most of the time, you can use just:

git push

Pushing new branches

However, when pushing for the first time a brand new branch (here new-branch), created locally, to a remote (here origin), it should be done using the following command:

git push -u origin new-branch

This will set up the local branch to track the appropriate remote branch. This has a few benefits, such as git status showing how many commits ahead/behind the remote branch is. Also, further calls to git push can omit the name of the remote and the branch as mentioned above.

A remote branch stale-branch can be deleted from the remote origin using the following command:

git push --delete origin stale-branch

Adding a new remote

To add a new remote, for example upstream, call

git remote add upstream remote_url 

The remote URL can be a HTTPS or SSH URL, or a path to another repo in the local filesystem.

Getting new commits from a remote

The following command contacts a remote, in this example upstream, and grabs new commits, if any:

git fetch upstream

As with push, the remote name can be omitted and git will fetch from the appropriate remote that the currently checked out branch is tracking.

To integrate the changes from the remote master branch, assuming that master is already checked out, run:

git rebase upstream/master

Instead of rebase, you could also run git reset upstream/master to redirect the tip of your local branch to the newest commit of the remote branch. However, if you have made commits that you haven't pushed, and in the meantime there were commits on the remote branch, you would lose these commits by just doing a reset. Your local branch needs to be rebased on the top of the remote branch – and only then you can push these commits to your own remote. The command rebase will handle both cases correctly, even if only it had to do a reset (when your local branch isn't ahead of the remote).

The previous two commands, fetch and rebase, can be executed in one step with the following command:

git pull upstream/master --rebase

(If using the three-way merge workflow, ​git rebase​ is replaced with ​git merge, and git ​pull​ is run without the parameter ​--rebase​).

Force-pushing

If you mistakenly push a commit, and e.g. after amending/rebasing it locally, you want to push the new version of the branch, ordinary git push will fail because your local and the remote branch have diverged. Since you want to change the history of the remote branch, you need to use the --force option with the push command.

Be warned that this:

  • can and will cause data loss if you mistakenly force push an old version of your local branch
  • can and will inconvenience others so they have to rebase their work and fix conflicts

To conclude: be careful when doing a force push. Try not to make it a habit. If unsure, consult someone else, because this action can lead to data loss.

A safer version of force-push

Instead of using -f/--force, you may use --force-with-lease. This makes force pushing after e.g. rebasing much safer - if someone else has pushed in the meantime (since you last fetched), git will refuse to force push. This is a sign that you need to perform a fetch and redo the rebase with their commits included (e.g. perform a hard reset of your local branch to the remote branch and then rebase again).

Dealing with a rebased remote branch

git rebase -i can be helpful here.

Preventing pushing of private branches to public remotes

Keep private branches and private remotes both prefixed with private. Then, add the following script to .git/hooks/ and call it pre-push:

#!/bin/bash
remote="$1"
url="$2"
 
z40=0000000000000000000000000000000000000000
 
while read local_ref local_sha remote_ref remote_sha
do
  if [ "$local_sha" = $z40 ]
  then
    # Handle delete
    :
  else
    if [[ ("$local_ref" =~ heads/private || "$remote_ref" =~ heads/private) \
      && ! "$remote" =~ ^private ]]; then
      echo "You idiot!"
      exit 1
    fi
  fi
done
 
exit 0
software/git.txt · Last modified: 2017/08/25 21:25 by juraj