Git is another service that is responsible for tracking and managing your source code, it is a Version Control System (VCS), or a Source Code Management software (SCM). both are used interchangeably.
there is a company called BitMover Inc. that created a good version control system called BitKeeper VCS. the community behind the Linux kernel started using BitKeeper in 2002.
this software is proprietary (closed source) and had a community version available for free. Linux made a decision to use it although it was advised not to because it is closed source. years later, BitKeeper VCS canceled their free version and became fully commercial. as a result, the Linux community and Linus Torvalds decided to develop a new VCS called Git.
Since 2016, BitKeeper is released as open source VCS.
Git is designed to track pure textual files and not files that need to be interpreted by other programs like pdf, word or PSD file types, and mostly those files have their own tracking tools. however, the subject of this article will focus on how to use Git professionally with your programming projects.
if you don’t have Git installed on your system, head over to the installation link and set it up, I advise you to try the examples throughout the article by yourself.
note: if you’re using windows, make sure git executable is added to your system’s environment variables.
you can configure git by setting your name, email address and other optional things like your favorite text editor (full list of settings from here). for now, we only care about setting the username and email.
there are three levels of git configurations:
.git/config in the project directory and only available for the current project.
git config user.name "John Doe"
~/.gitconfig and are available for all projects for the logged in user.
git config --global user.name "John Doe"
/etc/gitconfig and are available system wide for all users and projects.
git config --system user.name "John Doe"
git allows you to set the username and email in the environment variables GIT_AUTHOR_NAME and GIT_AUTHOR_EMAIL. the precedence of reading the configs by git is as follows:
environment variables > project > global (user) > system
Note: if you’re managing many projects it may be easier to set the configs globally at first and override them at the project level depending on the situation
Note: if you are a user of Vagrant or Docker, it’s more convenient to set Git configurations at the project scope to avoid setting them both on the host and guest machines which may add extra work and confusion.
now, to set the username and email at the project level, open the git shell and run these commands:
git config user.name "John Doe" git config user.email [email protected]
to check your configurations, run
git config –list. To set the text editor, Nano for example, run
git config core.editor “nano”. To use colors with the console (which is enabled by default), run
git config color.ui true. Just run these to complete the rest of the article.
Git has a bunch of commands, to learn more about all of the commands run
git help (remembered as GET HELP), this commands will list all commands and some basic information about each one.
To get a detailed information on a specific command, run
git help [command]. running
git help commit for example will open your web browser to view a local webpage showing detailed documentation on the commit command.
A repository is just a folder containing your project files and is tracked for changes by git VCS.
To create a new repository, you simply navigate to the project folder and run
git init, this command will initialize a repository and will create a folder within the repository called
.git. this folder contains tracking info and project level configurations related to the repo.
git tracks all the changes in the home base directory where it is initialized in the .git folder and to remove git and make the repo a normal folder you simply need to delete this .git folder
the .git folder is managed by git, and you rarely need to change files within .git directory, except the
.git/config file. this is where the project based configurations are.
let’s make a change in our empty repository, create a text file foo.txt within the repository folder, open the file and write something like “this is a demo text file” or whatever.
The repository has some changes now, and it is marked as “not clean”. We get a clean repository if we hadn’t made any changes since the last commit or initialization.
To check the status of a repository, from anywhere inside the root repository folder, run
git status. This will show you either a message stating that the repository is clean, or a display of the changes that where made since the last commit. Take a look at the following:
git status(says clean).
echo “this is a demo text file” >> foo.txt(created a new file foo.txt within the repository).
git status(says a new file is created with some content, not clean).
When we make sure that adding the foo.txt file is what’s required, maybe contributes to a next release and fixes a bug. The next step is to commit the changes we just made to somehow make the current state of the project a base state for next work. Basically a commit is somehow like saving the changes.
Before committing we need to mark the file foo.txt as tracked by git and add it to something called a staging index. This is done at once with the git add [filename] command.
Git add ./foo.txt (foo.txt is on the staging index now).
This will add the file to the staging index. The staging index is like a place where we put the changes we want to be in the next commit, this lets us select part of the changes made to be committed.
git status (shows that the foo.txt is added to the staging index. And no other changes made).
git commit –m “added file foo.txt”
A commit needs a message to describe what it does, added with the
git status (now the repository is clean again).
To see all the commits for a repository, run
git log. A list of the commits will be shown.
In git, the changes you make to the project files can be on one of three places, the repository tree, the staging index or in the actual repository tree. Initially when you’re making changes to a file, you are working within the working tree structure. Then you add part or all of the changes to the staging index when preparing for a commit, then you make a commit and that gets passed over to the repository.
If you’re familiar with SVN. it has the same tree structure but with only 2 sets of trees, the working tree and the repository tree. In SVN you usually make some changes on the tree structure on the working tree and then to apply the changes to the actual repository. All changed/deleted/added files will be committed together to the repository tree.
Git added the new tree layer, the staging tree or the staging index. Now, what’s the benefit of using another tree. I’ll demonstrate by the following example:
Let’s say you are working on a fix for some project, you altered about 5 different files at the end, and managed to figure out fixes for 2 different unrelated bugs by changing those files. Now, if you’re working with SVN, you commit all those changes together with one commit to the repository. And an important rule as a better practice is to make sure a commit to the repository only fixes or adds one feature only at a time. With git and the additional staging tree, you can now add 2 of the 5 files that are related to fixing bug 1 to be staged and make the first commit (with a good message), and the other 3 files fixing bug 2 to be staged for a second commit as commit 2 (with a different message).
For summation here’s how the workflow:
Git add foo.txt: foo.txt is changed in the working directory, this command adds this change to the staging index, preparing for a commit.
git commit –m “fixed bug x”: this commits all the changes in the staging index with a commit message to the actual repository.
git checkout foo.txt: you do this to revert the changes on foo.txt on the working tree, by copying the file foo.txt from the latest commit on the repository tree.
On the following explanations, we will be referring to the repository tree from the three tree architecture.
A git repository is structured basically from a tree of commit nodes. Each commit consists of a computed 40 character long SHA-1 hash code, the hash is used for data integrity and is generated from all the data changes within that commit, the hash is sometimes referred to as the commit ID.
Each commit has a pointer to a previous commit, or the parent commit. This makes a sequence of commits in the tree structure. The pointer is just the hash of the parent commit.
A commit is also bound with the author (the user name as from the git configurations), and a commit message describing what this commit does. (adding feature, fixing a bug, …etc.).
And finally, each commit points to a snapshot of the files that are changed (the files that are committed). The snapshot is just a copy of the files with the changes made.
What’s on the file system (what other programs like the browser sees) is the working directory and not the repository
One thing to always remember is to make each commit only for one specific goal for example:
From within the modified file you’re working with, only stage the changes that are related to one goal or task to be ready for the next commit, and leave the rest of the modified files unstaged. do a commit with some message A, then stage the other changes and do a commit with another message B describing the reason for change.
the repository tree is made of branches of commit nodes, at first you get only one branch called the Master Branch and it is the selected branch. The tip of the selected branch is called the HEAD of the branch. The HEAD is just an alias for the pointer to the last added commit node. The value of this pointer is stored at
you can make new branches on the repository tree when making new features, fixing bugs or trying something new. Working with branches makes you able to separate your work in different contexts. Let’s say I am working on a website. Started with a master branch, I then start a development branch leaving the master branch for production only. I then create a third branch to work on a different theme for the website. This makes the project more manageable and easier to work with, making your mind cleared and focused to work on one thing at a time.
branches can be started from the master branch or any other branch (we’ll get to this later)
Eventually branches with complete and tested work need to be Merged to the originating branch so that the work done is applied to the branch that has the run time project state, and thus deploying to production.
To show the difference between the working directory, staging index and the repository, run git status. This will also show the current branch you’re working on (the context). Here’s the possible outputs of a git status that you need to understand:
$ git status On branch master nothing to commit, working tree clean
Adding a new file foo.txt to the working tree.
So in this case, you are mostly expected to add the file to the staging index and then commit.
$ git status On branch master Initial commit Untracked files: (use "git add <file>..." to include in what will be committed) foo.txt nothing added to commit but untracked files present (use "git add" to track)
Then you modify an existing file (a tracked file)
$ git status On branch master Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory) modified: foo.txt no changes added to commit (use "git add" and/or "git commit -a")
After adding the file with
git add *
$ git status On branch master Changes to be committed: (use "git reset HEAD <file>..." to unstage) modified: foo.txt
You can preview the changes made to a file in the working directory (not staged) by using the command
git diff. this command compares the working directory state to the repository where the HEAD points. note that git diff only shows the changes in files in the working directory and not the staged directory.
git diff --staged compares files in the staged tree directory against the repository tree directory.
if you need to remove a file that is already committed to the repository tree, there are basically two ways:
git commit –m “I removed file foo.txt”
git rm foo.txt, this automatically adds the change to the staging index, next you just commit what you’ve done with
git commit –m “I removed foo.txt using git!”
the same goes for moving and renaming files. If you use the git way, the changes are automatically staged. but if you do these with the operating system, you have to add the changes to be staged.
there is a shortcut
git commit -a -m "a message". this adds all the files to the staged directory and commits to the repository at once. but be aware that:
it only works well for modifications.
To view all of the commits on the repository tree, use the
git log command. This opens the less program and view all commits as a list. To quit just type the letter
A sample output:
$ git log commit d07d3ebeab4e68b7cd9f014f1c9221d6ade59ca3 Author: osama abuomar <[email protected]> Date: Sat Oct 21 21:28:25 2017 +0300 added foo.txt again! commit cca36acfa2287388e2c8dd2f1c8e9eb1b6dd092c Author: osama abuomar <[email protected]> Date: Sat Oct 21 21:27:59 2017 +0300 removed foo.txt commit 2fead6cdfb3be04152e88480127708f31d9df148 Author: osama abuomar <[email protected]> Date: Sat Oct 21 21:27:21 2017 +0300 Initial commit
We get more options by providing different arguments like so:
git log --grep="Init"
git log --since=2012-06-12
git log --until=2012-06-12
git log -n 3(shows last three commits)
whether they are changes in the working directory, the staging index or even committed to the repository, changes can be reverted, let’s take a look at the following scenarios:
case scenario: you deleted some text in a file (page.html) not on purpose, and you haven’t staged or committed yet. The solution is to get the saved version from the repository with the command
git checkout -- page.html.
git checkout does is that it replaces what’s in the working directory with what’s in the repository, if you give it a file name or folder name it replaces that, and if you give it a branch name it replaces by what’s in that branch. However, to avoid having conflicts in the names of the branches and the file folder names (to distinguish if the name given is a file/folder or a branch name) we put -- if it’s a file or a folder. although it is not required but is a better practice when for example you have a branch called foo and also a file called foo.
case scenario: is that we changed a file and added it to the staging index for the next commit but we changed our mind and we need to remove it from the staging index (we still want the change on the file in the working directory but we don’t want it to be staged), this is used most often it you’re trying to put together a commit.
For example, if we modified the resources.html file and added it to the staging index. We can remove it from the staging index using
git reset HEAD resources.html (more on reset later).
commits in a repo are in a series with each commit pointing to its parent commit. and the Hash is generated from the data in that commit (data from the changes, message, user …etc.) and because of that we cannot make changes to commits as that will change the Hash itself and break the series of commits. Git doesn’t want us to do that, however we can make changes to the most recent commit as it is not pointed by any other commit (it is not a parent).
Changes to the last commit are made using the amend command.
case scenario: we made changes to the resources.html file and staged and committed. and now we want to make an additional change to the same last commit (same goal of that commit) or even we just want to change the message of the last commit (not creating a whole new commit just for a small one letter change) so we ament to the last commit.
git commit --amend -m "new message"(this doesn’t create a new commit)
case scenario: We want the reosources.html file from a past commit (not the most recent commit).
git logto see all commits, choose the required commit to retrieve from.
git checkout [Hash or part of it]-- resources.html (beware, this puts it in the staging index and not in the working directory)
git diff –staged, to see the difference, sometimes we like to do that just in case.
git reset HEAD resources.htmlput it in the working directory
git checkout -- resources.htmlto cancel all of this and get the file from the last commit to the working directory.
Note: we will do this with the revert command (see next).
making a complete revert from a commit by making a mirror commit with
git revert [Hash]. This command makes a new commit out of reverting the one specified, you can add an option (
-n) if you want to make the commit yourself and put that in the staging index to do the commit manually.
Now we are assuming you reverted while having a clean workspace, however if you revert and also made other changes yourself meaning your working tree status is not clean, there may be conflicts and if there are, you will have to merge (more on merging later).
This is done with the
git reset [Target commit Hash] command.
Warning: before making resets, copy the result of git log (all commits hash codes) to some text file outside of the project folder (the git root directory). This is necessary as we might need to refer to them later and because we may lose the hash of some commits if for example we reset commits 3 steps back, we will still have the latest 3 commit Hashes.
There are 3 options for git reset:
git reset –soft [Hash]
this is the safest type of reset to do, the working tree and the staging tree changes are kept the same (the actual files on the project folder will still be the same)
but the repository is changed so when we do
git diff or
git diff --staged we will be comparing to another commit from the past.
this type of reset moves the HEAD pointer to a past commit, the new recent commits won’t get deleted and are still there and we can reset (navigate) to any of them again.
git reset –mixed [Hash]or
git reset [Hash]
changes what is in the repo and the staging index to what was in a past commit
the working directory is kept the same and not changed, so it is safe, we don’t lose our recent project files. it’s not a big deal that it removed recent staged changes, we can still stage what is in the working directory.
git reset –hard [Hash]
everything in the working directory and the staging index is thrown out and matched with the repository commit we are resetting to.
note: the working directory is always clean after a hard reset
to remove all files that are untracked by git. Files that are created in the git repository working tree but never staged or committed, run
git clean –n to show what files will be affected (deleted), and
git clean –f to actually delete all untracked files.
By default, any new files you add to the repo are untracked by git until you
git add them for the first time. a lot of the times we need to have files in a repository that we don’t want to be tracked.
The .gitignore file contains file names and directory wildcards (or regex strings) that we don’t want to track by git (ignored). We usually ignore files that are not part of the main project, downloaded directories from development tools (only keeping a meta file as reference), like node packages installed with npm and also project build artifacts. Only the needed files are tracked by git.
Here’s more examples of what to ignore:
the .gitignore file is created at the git root directory, the same level as the .git folder. It is easier to create the .gitignore with the command line.
* ? [aeiou] [0-9].
!is used as a negate expression (as of don’t ignore this part). For example, this ignores all html files except the index.html
#. Black lines are all ignored.
You can find .gitignore files that are already written for some common projects from here. They are called git ignore templates.
ignore files in all repositories in all projects at the user level. This is similar to configuring git at different levels. We do this with a global file that we can store anywhere we want, also you can call the file anything you want. For example:
git config --global core.excludesfile ~/.gitignore_global
for instance, you can use this to ignore the recycled folder in windows systems.
previously, we ignore files that we created and that are not tracked only
we cannot ignore files that have already been tracked (committed) unless we untrack them first, to ignore files they must be untracked.
So to untrack the file, we can use
git rm file.txt but this will remove it from the repo also
what if we want the file to be kept and we just want to untrack it,
git rm --cached file.txt will only remove it from the staging index thus making them untracked. this will still leave the file in the working directory and the repository.
git does not track empty directories, and removes empty directories after a commit. however, if you want to track the empty directory for some reason, just create an empty file inside of that directory and call it .gitkeep. this will keep the empty folder.