Oct. 27, 2017

Git for Developers: Part 1

Git is another service that is responsible for tracking and managing your source code, it is a Version Control System (VCS), or a Source Code Management software (SCM). both are used interchangeably.

Some History

there is a company called BitMover Inc. that created a good version control system called BitKeeper VCS. the community behind the Linux kernel started using BitKeeper in 2002.

this software is proprietary (closed source) and had a community version available for free. Linux made a decision to use it although it was advised not to because it is closed source. years later, BitKeeper VCS canceled their free version and became fully commercial. as a result, the Linux community and Linus Torvalds decided to develop a new VCS called Git.

Since 2016, BitKeeper is released as open source VCS.

Git is designed to track pure textual files and not files that need to be interpreted by other programs like pdf, word or PSD file types, and mostly those files have their own tracking tools. however, the subject of this article will focus on how to use Git professionally with your programming projects.

if you don’t have Git installed on your system, head over to the installation link and set it up, I advise you to try the examples throughout the article by yourself.

note: if you’re using windows, make sure git executable is added to your system’s environment variables.

configuring git

you can configure git by setting your name, email address and other optional things like your favorite text editor (full list of settings from here). for now, we only care about setting the username and email.

there are three levels of git configurations:

project

stored in .git/config in the project directory and only available for the current project.

git config user.name "John Doe"

global

stored in ~/.gitconfig and are available for all projects for the logged in user.

git config --global user.name "John Doe"

system

stored in /etc/gitconfig and are available system wide for all users and projects.

git config --system user.name "John Doe"

git allows you to set the username and email in the environment variables GIT_AUTHOR_NAME and GIT_AUTHOR_EMAIL. the precedence of reading the configs by git is as follows:

environment variables > project > global (user) > system

Note: if you’re managing many projects it may be easier to set the configs globally at first and override them at the project level depending on the situation

Note: if you are a user of Vagrant or Docker, it’s more convenient to set Git configurations at the project scope to avoid setting them both on the host and guest machines which may add extra work and confusion.

now, to set the username and email at the project level, open the git shell and run these commands:

git config user.name "John Doe" git config user.email [email protected]

to check your configurations, run git config –list. To set the text editor, Nano for example, run git config core.editor “nano”. To use colors with the console (which is enabled by default), run git config color.ui true. Just run these to complete the rest of the article.

Getting help

Git has a bunch of commands, to learn more about all of the commands run git help (remembered as GET HELP), this commands will list all commands and some basic information about each one.

To get a detailed information on a specific command, run git help [command]. running git help commit for example will open your web browser to view a local webpage showing detailed documentation on the commit command.

Creating a git repository

A repository is just a folder containing your project files and is tracked for changes by git VCS.

To create a new repository, you simply navigate to the project folder and run git init, this command will initialize a repository and will create a folder within the repository called .git. this folder contains tracking info and project level configurations related to the repo.

git tracks all the changes in the home base directory where it is initialized in the .git folder and to remove git and make the repo a normal folder you simply need to delete this .git folder

the .git folder is managed by git, and you rarely need to change files within .git directory, except the .git/config file. this is where the project based configurations are.

Making an initial commit

let’s make a change in our empty repository, create a text file foo.txt within the repository folder, open the file and write something like “this is a demo text file” or whatever.

The repository has some changes now, and it is marked as “not clean”. We get a clean repository if we hadn’t made any changes since the last commit or initialization.

To check the status of a repository, from anywhere inside the root repository folder, run git status. This will show you either a message stating that the repository is clean, or a display of the changes that where made since the last commit. Take a look at the following:

  1. cd ~/project
  2. git init
  3. git status (says clean).
  4. echo “this is a demo text file” >> foo.txt (created a new file foo.txt within the repository).
  5. git status (says a new file is created with some content, not clean).

When we make sure that adding the foo.txt file is what’s required, maybe contributes to a next release and fixes a bug. The next step is to commit the changes we just made to somehow make the current state of the project a base state for next work. Basically a commit is somehow like saving the changes.

Before committing we need to mark the file foo.txt as tracked by git and add it to something called a staging index. This is done at once with the git add [filename] command.

Git add ./foo.txt (foo.txt is on the staging index now).

This will add the file to the staging index. The staging index is like a place where we put the changes we want to be in the next commit, this lets us select part of the changes made to be committed.

git status (shows that the foo.txt is added to the staging index. And no other changes made).

git commit –m “added file foo.txt”

A commit needs a message to describe what it does, added with the –m argument

git status (now the repository is clean again).

To see all the commits for a repository, run git log. A list of the commits will be shown.

Concepts you need to understand

The three tree architecture

In git, the changes you make to the project files can be on one of three places, the repository tree, the staging index or in the actual repository tree. Initially when you’re making changes to a file, you are working within the working tree structure. Then you add part or all of the changes to the staging index when preparing for a commit, then you make a commit and that gets passed over to the repository.

If you’re familiar with SVN. it has the same tree structure but with only 2 sets of trees, the working tree and the repository tree. In SVN you usually make some changes on the tree structure on the working tree and then to apply the changes to the actual repository. All changed/deleted/added files will be committed together to the repository tree.

Git added the new tree layer, the staging tree or the staging index. Now, what’s the benefit of using another tree. I’ll demonstrate by the following example:

Let’s say you are working on a fix for some project, you altered about 5 different files at the end, and managed to figure out fixes for 2 different unrelated bugs by changing those files. Now, if you’re working with SVN, you commit all those changes together with one commit to the repository. And an important rule as a better practice is to make sure a commit to the repository only fixes or adds one feature only at a time. With git and the additional staging tree, you can now add 2 of the 5 files that are related to fixing bug 1 to be staged and make the first commit (with a good message), and the other 3 files fixing bug 2 to be staged for a second commit as commit 2 (with a different message).

For summation here’s how the workflow:

Git add foo.txt: foo.txt is changed in the working directory, this command adds this change to the staging index, preparing for a commit.

git commit –m “fixed bug x”: this commits all the changes in the staging index with a commit message to the actual repository.

git checkout foo.txt: you do this to revert the changes on foo.txt on the working tree, by copying the file foo.txt from the latest commit on the repository tree.

The structure of a commit

On the following explanations, we will be referring to the repository tree from the three tree architecture.

A git repository is structured basically from a tree of commit nodes. Each commit consists of a computed 40 character long SHA-1 hash code, the hash is used for data integrity and is generated from all the data changes within that commit, the hash is sometimes referred to as the commit ID.

Each commit has a pointer to a previous commit, or the parent commit. This makes a sequence of commits in the tree structure. The pointer is just the hash of the parent commit.

A commit is also bound with the author (the user name as from the git configurations), and a commit message describing what this commit does. (adding feature, fixing a bug, …etc.).

And finally, each commit points to a snapshot of the files that are changed (the files that are committed). The snapshot is just a copy of the files with the changes made.

What’s on the file system (what other programs like the browser sees) is the working directory and not the repository

One thing to always remember is to make each commit only for one specific goal for example:

From within the modified file you’re working with, only stage the changes that are related to one goal or task to be ready for the next commit, and leave the rest of the modified files unstaged. do a commit with some message A, then stage the other changes and do a commit with another message B describing the reason for change.

The branch

the repository tree is made of branches of commit nodes, at first you get only one branch called the Master Branch and it is the selected branch. The tip of the selected branch is called the HEAD of the branch. The HEAD is just an alias for the pointer to the last added commit node. The value of this pointer is stored at .git/HEAD file.

you can make new branches on the repository tree when making new features, fixing bugs or trying something new. Working with branches makes you able to separate your work in different contexts. Let’s say I am working on a website. Started with a master branch, I then start a development branch leaving the master branch for production only. I then create a third branch to work on a different theme for the website. This makes the project more manageable and easier to work with, making your mind cleared and focused to work on one thing at a time.

branches can be started from the master branch or any other branch (we’ll get to this later)

Merging Branches

Eventually branches with complete and tested work need to be Merged to the originating branch so that the work done is applied to the branch that has the run time project state, and thus deploying to production.

Making changes to files

To show the difference between the working directory, staging index and the repository, run git status. This will also show the current branch you’re working on (the context). Here’s the possible outputs of a git status that you need to understand:

  1. The working directory is clean, and there are no untracked files, this means that all three trees are exactly the same, and you haven’t made any changes since the last commit.

$ git status On branch master nothing to commit, working tree clean

Adding a new file foo.txt to the working tree.

  1. The working directory is clean but there are untracked files to be added, you need to know that by default, if you add a file to the working tree (a new file), this file is not tracked by the git system until you git add it for the first time.

So in this case, you are mostly expected to add the file to the staging index and then commit.

$ git status On branch master Initial commit Untracked files: (use "git add <file>..." to include in what will be committed) foo.txt nothing added to commit but untracked files present (use "git add" to track)

Then you modify an existing file (a tracked file)

  1. There are changed files but not staged

$ git status On branch master Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory) modified: foo.txt no changes added to commit (use "git add" and/or "git commit -a")

After adding the file with git add *

  1. There are changed files and ready to be committed.

$ git status On branch master Changes to be committed: (use "git reset HEAD <file>..." to unstage) modified: foo.txt

viewing the changes

You can preview the changes made to a file in the working directory (not staged) by using the command git diff. this command compares the working directory state to the repository where the HEAD points. note that git diff only shows the changes in files in the working directory and not the staged directory.

git diff --staged compares files in the staged tree directory against the repository tree directory.

deleting files that are already committed

if you need to remove a file that is already committed to the repository tree, there are basically two ways:

  1. Remove the file from the working tree rm foo.txt making a change, then add to the staging index with git add foo.txt, finally commit with some message git commit –m “I removed file foo.txt”
  2. This way is simpler, you remove the file using git itself with git rm foo.txt, this automatically adds the change to the staging index, next you just commit what you’ve done with git commit –m “I removed foo.txt using git!”

the same goes for moving and renaming files. If you use the git way, the changes are automatically staged. but if you do these with the operating system, you have to add the changes to be staged.

A handy shortcut

there is a shortcut git commit -a -m "a message". this adds all the files to the staged directory and commits to the repository at once. but be aware that:

  1. it adds all the modified files to the staged area
  2. it does not include files that are not tracked and files that are deleted

it only works well for modifications.

Viewing commits

To view all of the commits on the repository tree, use the git log command. This opens the less program and view all commits as a list. To quit just type the letter “q”.

A sample output:

$ git log commit d07d3ebeab4e68b7cd9f014f1c9221d6ade59ca3 Author: osama abuomar <[email protected]> Date: Sat Oct 21 21:28:25 2017 +0300 added foo.txt again! commit cca36acfa2287388e2c8dd2f1c8e9eb1b6dd092c Author: osama abuomar <[email protected]> Date: Sat Oct 21 21:27:59 2017 +0300 removed foo.txt commit 2fead6cdfb3be04152e88480127708f31d9df148 Author: osama abuomar <[email protected]> Date: Sat Oct 21 21:27:21 2017 +0300 Initial commit

We get more options by providing different arguments like so:

  • git log --grep="Init"
  • git log --since=2012-06-12
  • git log --until=2012-06-12
  • git log -n 3 (shows last three commits)

Undoing changes

whether they are changes in the working directory, the staging index or even committed to the repository, changes can be reverted, let’s take a look at the following scenarios:

undoing changes in the working directory

case scenario: you deleted some text in a file (page.html) not on purpose, and you haven’t staged or committed yet. The solution is to get the saved version from the repository with the command git checkout -- page.html.

Note: what git checkout does is that it replaces what’s in the working directory with what’s in the repository, if you give it a file name or folder name it replaces that, and if you give it a branch name it replaces by what’s in that branch. However, to avoid having conflicts in the names of the branches and the file folder names (to distinguish if the name given is a file/folder or a branch name) we put -- if it’s a file or a folder. although it is not required but is a better practice when for example you have a branch called foo and also a file called foo.

Unstaging a file

case scenario: is that we changed a file and added it to the staging index for the next commit but we changed our mind and we need to remove it from the staging index (we still want the change on the file in the working directory but we don’t want it to be staged), this is used most often it you’re trying to put together a commit.

For example, if we modified the resources.html file and added it to the staging index. We can remove it from the staging index using git reset HEAD resources.html (more on reset later).

Changing the last commit

commits in a repo are in a series with each commit pointing to its parent commit. and the Hash is generated from the data in that commit (data from the changes, message, user …etc.) and because of that we cannot make changes to commits as that will change the Hash itself and break the series of commits. Git doesn’t want us to do that, however we can make changes to the most recent commit as it is not pointed by any other commit (it is not a parent).

Changes to the last commit are made using the amend command.

case scenario: we made changes to the resources.html file and staged and committed. and now we want to make an additional change to the same last commit (same goal of that commit) or even we just want to change the message of the last commit (not creating a whole new commit just for a small one letter change) so we ament to the last commit.

  1. changed file resources.html in working directory
  2. stage, commit (new commit created)
  3. another additional change
  4. stage, commit with amend, using git commit --amend -m "new message" (this doesn’t create a new commit)

retrieving old versions

case scenario: We want the reosources.html file from a past commit (not the most recent commit).

  1. git log to see all commits, choose the required commit to retrieve from.
  2. copy the Hash/ID of the commit or part of it (first 5 or 6 characters is enough)
  3. git checkout [Hash or part of it] -- resources.html (beware, this puts it in the staging index and not in the working directory)
  4. now maybe do git diff –staged, to see the difference, sometimes we like to do that just in case.
  5. git reset HEAD resources.html put it in the working directory
  6. git checkout -- resources.html to cancel all of this and get the file from the last commit to the working directory.

Note: we will do this with the revert command (see next).

Reverting a commit

making a complete revert from a commit by making a mirror commit with git revert [Hash]. This command makes a new commit out of reverting the one specified, you can add an option (-n) if you want to make the commit yourself and put that in the staging index to do the commit manually.

Now we are assuming you reverted while having a clean workspace, however if you revert and also made other changes yourself meaning your working tree status is not clean, there may be conflicts and if there are, you will have to merge (more on merging later).

Using reset to undo many commits (dangerous)

This is done with the git reset [Target commit Hash] command.

Warning: before making resets, copy the result of git log (all commits hash codes) to some text file outside of the project folder (the git root directory). This is necessary as we might need to refer to them later and because we may lose the hash of some commits if for example we reset commits 3 steps back, we will still have the latest 3 commit Hashes.

There are 3 options for git reset:

  1. Soft reset: git reset –soft [Hash]

this is the safest type of reset to do, the working tree and the staging tree changes are kept the same (the actual files on the project folder will still be the same)

but the repository is changed so when we do git diff or git diff --staged we will be comparing to another commit from the past.

this type of reset moves the HEAD pointer to a past commit, the new recent commits won’t get deleted and are still there and we can reset (navigate) to any of them again.

  1. Mixed reset (default): git reset –mixed [Hash] or git reset [Hash]

changes what is in the repo and the staging index to what was in a past commit

the working directory is kept the same and not changed, so it is safe, we don’t lose our recent project files. it’s not a big deal that it removed recent staged changes, we can still stage what is in the working directory.

  1. Hard reset: git reset –hard [Hash]

everything in the working directory and the staging index is thrown out and matched with the repository commit we are resetting to.

note: the working directory is always clean after a hard reset

removing untracked files

to remove all files that are untracked by git. Files that are created in the git repository working tree but never staged or committed, run git clean –n to show what files will be affected (deleted), and git clean –f to actually delete all untracked files.

Ignoring files

By default, any new files you add to the repo are untracked by git until you git add them for the first time. a lot of the times we need to have files in a repository that we don’t want to be tracked.

What to ignore

The .gitignore file contains file names and directory wildcards (or regex strings) that we don’t want to track by git (ignored). We usually ignore files that are not part of the main project, downloaded directories from development tools (only keeping a meta file as reference), like node packages installed with npm and also project build artifacts. Only the needed files are tracked by git.

Here’s more examples of what to ignore:

  1. compiles source code
  2. packages and compressed files (.zip .gz …etc.)
  3. logs and databases
  4. operating system generated files
  5. user uploaded assets (images, pdfs, videos)

writing a .gitignore file

the .gitignore file is created at the git root directory, the same level as the .git folder. It is easier to create the .gitignore with the command line.

  1. We use simple regular expressions to write .gitignore files. Like: * ? [aeiou] [0-9].
  2. ! is used as a negate expression (as of don’t ignore this part). For example, this ignores all html files except the index.html

*.html !index.html

  1. To ignore all files in a directory, use a trailing slash at the end.

Source/media/

  1. Comment lines in a .gitignore files start with #. Black lines are all ignored.

You can find .gitignore files that are already written for some common projects from here. They are called git ignore templates.

Globally ignoring files

ignore files in all repositories in all projects at the user level. This is similar to configuring git at different levels. We do this with a global file that we can store anywhere we want, also you can call the file anything you want. For example:

git config --global core.excludesfile ~/.gitignore_global

for instance, you can use this to ignore the recycled folder in windows systems.

Ignoring tracked files

previously, we ignore files that we created and that are not tracked only

we cannot ignore files that have already been tracked (committed) unless we untrack them first, to ignore files they must be untracked.

So to untrack the file, we can use git rm file.txt but this will remove it from the repo also

what if we want the file to be kept and we just want to untrack it, git rm --cached file.txt will only remove it from the staging index thus making them untracked. this will still leave the file in the working directory and the repository.

Tracking empty directories

git does not track empty directories, and removes empty directories after a commit. however, if you want to track the empty directory for some reason, just create an empty file inside of that directory and call it .gitkeep. this will keep the empty folder.