Git 101 - how can we automate in Ops ?

this course is taken from “School of SRE” repository

Ansible 101 tutorial coverst the following topics:

Prerequisites

What to expect from this course

What is not covered under this course

Git Basics

Creating a Git Repo

Tracking a File

More About a Commit

Adding More Changes

Are commits really linked?

The Version Control part of Git

Reference

References and The Magic

Little Adventure

Working With Branches

Merges

Short quiz to repeat the topic

Prerequisites

Have Git installed https://git-scm.com/downloads
Know the basisc of git usage.
For some basics opening the page (on another tab) The Official Git Docs

What to expect from this course

As an engineer in the field of computer science, having knowledge of version control tools becomes almost a requirement. While there are a lot of version control tools that exist today like SVN, Mercurial, etc, Git perhaps is the most used one and this course we will be working with Git. While this course does not start with Git 101 and expects basic knowledge of git as a prerequisite, it will reintroduce the git concepts known by you with details covering what is happening under the hood as you execute various git commands. So that next time you run a git command, you will be able to press enter more confidently!

What is not covered under this course

Advanced usage and specifics of internal implementation details of Git.

Git Basics

Though you might be aware already, let’s revisit why we need a version control system. As the project grows and multiple developers start working on it, an efficient method for collaboration is warranted. Git helps the team collaborate easily and also maintains the history of the changes happening with the codebase.

Creating a Git Repo

Any folder can be converted into a git repository. After executing the following command, we will see a .git folder within the folder, which makes our folder a git repository. All the magic that git does, .git folder is the enabler for the same.

# creating an empty folder and changing current dir to it
$ cd /tmp
$ mkdir akf
$ cd akf/

# initialize a git repo
$ git init
Initialized empty Git repository in /private/tmp/akf/.git/

As the output says, an empty git repo has been initialized in our folder. Let’s take a look at what is there.

$ ls .git/
HEAD        config      description hooks       info        objects     refs

There are a bunch of folders and files in the .git folder. As I said, all these enables git to do its magic. We will look into some of these folders and files. But for now, what we have is an empty git repository.

Tracking a File

Now as you might already know, let us create a new file in our repo (we will refer to the folder as repo now.) And see git status

$ echo "I am file 1" > file1.txt
$ git status
On branch master

No commits yet

Untracked files:
 (use "git add <file>..." to include in what will be committed)

       file1.txt

nothing added to commit but untracked files present (use "git add" to track)

The current git status says No commits yet and there is one untracked file. Since we just created the file, git is not tracking that file. We explicitly need to ask git to track files and folders. (also checkout gitignore, in short .gitignore file excludes files) And how we do that is via git add command as suggested in the above output. Then we go ahead and create a commit.

$ git add file1.txt
$ git status
On branch master

No commits yet

Changes to be committed:
 (use "git rm --cached <file>..." to unstage)

       new file:   file1.txt

$ git commit -m "adding file 1"
[master (root-commit) df2fb7a] adding file 1
1 file changed, 1 insertion(+)
create mode 100644 file1.txt

Here you will be asked for setting the user.email & user.name

  git config --global user.email "you@example.com"
  git config --global user.name "Your Name"

after setting these values the commit will work.

Notice how after adding the file, git status says Changes to be committed:. What it means is whatever is listed there, will be included in the next commit. Then we go ahead and create a commit, with an attached messaged via -m.

More About a Commit

Commit is a snapshot of the repo. Whenever a commit is made, a snapshot of the current state of repo (the folder) is taken and saved. Each commit has a unique ID. (df2fb7a for the commit we made in the previous step). As we keep adding/changing more and more contents and keep making commits, all those snapshots are stored by git. Again, all this magic happens inside the .git folder. This is where all this snapshot or versions are stored in an efficient manner.

Adding More Changes

Let us create one more file and commit the change. It would look the same as the previous commit we made.

$ echo "I am file 2" > file2.txt
$ git add file2.txt
$ git commit -m "adding file 2"
[master 7f3b00e] adding file 2
1 file changed, 1 insertion(+)
create mode 100644 file2.txt

A new commit with ID 7f3b00e has been created. You can issue git status at any time to see the state of the repository.

   **IMPORTANT: Note that commit IDs are long string (SHA) but we can refer to a commit by its initial few (8 or more) characters too. We will interchangeably using shorter and longer commit IDs.**

Now that we have two commits, let’s visualize them:

$ git log --oneline --graph
* 7f3b00e (HEAD -> master) adding file 2
* df2fb7a adding file 1

git log, as the name suggests, prints the log of all the git commits. Here you see two additional arguments, --oneline prints the shorter version of the log, ie: the commit message only and not the person who made the commit and when. --graph prints it in graph format.

Now at this moment the commits might look like just one in each line but all commits are stored as a tree like data structure internally by git. That means there can be two or more children commits of a given commit. And not just a single line of commits. We will look more into this part when we get to the Branches section. For now this is our commit history:

   df2fb7a ===> 7f3b00e

Are commits really linked?

As I just said, the two commits we just made are linked via tree like data structure and we saw how they are linked. But let’s actually verify it. Everything in git is an object. Newly created files are stored as an object. Changes to file are stored as an objects and even commits are objects. To view contents of an object we can use the following command with the object’s ID. We will take a look at the contents of the second commit

$ git cat-file -p 7f3b00e
tree ebf3af44d253e5328340026e45a9fa9ae3ea1982
parent df2fb7a61f5d40c1191e0fdeb0fc5d6e7969685a
author Author <author@mail.com> 1603273316 -0700
committer Author <author@mail.com> 1603273316 -0700

adding file 2

Take a note of parent attribute in the above output. It points to the commit id of the first commit we made. So this proves that they are linked! Additionally you can see the second commit’s message in this object. As I said all this magic is enabled by .git folder and the object to which we are looking at also is in that folder.

$ ls .git/objects/7f/3b00eaa957815884198e2fdfec29361108d6a9
.git/objects/7f/3b00eaa957815884198e2fdfec29361108d6a9

It is stored in .git/objects/ folder. All the files and changes to them as well are stored in this folder.

The Version Control part of Git

We already can see two commits (versions) in our git log. One thing a version control tool gives you is ability to browse back and forth in history. For example: some of your users are running an old version of code and they are reporting an issue. In order to debug the issue, you need access to the old code. The one in your current repo is the latest code. In this example, you are working on the second commit (7f3b00e) and someone reported an issue with the code snapshot at commit (df2fb7a). This is how you would get access to the code at any older commit

# Current contents, two files present
$ ls
file1.txt file2.txt

# checking out to (an older) commit
$ git checkout df2fb7a
Note: checking out 'df2fb7a'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

 git checkout -b <new-branch-name>

HEAD is now at df2fb7a adding file 1

# checking contents, can verify it has old contents
$ ls
file1.txt

So this is how we would get access to old versions/snapshots. All we need is a reference to that snapshot. Upon executing git checkout ..., what git does for you is use the .git folder, see what was the state of things (files and folders) at that version/reference and replace the contents of current directory with those contents. The then-existing content will no longer be present in the local dir (repo) but we can and will still get access to them because they are tracked via git commit and .git folder has them stored/tracked.

Reference

I mention in the previous section that we need a reference to the version. By default, git repo is made of tree of commits. And each commit has a unique IDs. But the unique ID is not the only thing we can reference commits via. There are multiple ways to reference commits. For example: HEAD is a reference to current commit. Whatever commit your repo is checked out at, HEAD will point to that. HEAD~1 is reference to previous commit. So while checking out previous version in section above, we could have done git checkout HEAD~1.

Similarly, master is also a reference (to a branch). Since git uses tree like structure to store commits, there of course will be branches. And the default branch is called master. Master (or any branch reference) will point to the latest commit in the branch. Even though we have checked out to the previous commit in out repo, master still points to the latest commit. And we can get back to the latest version by checkout at master reference

$ git checkout master
Previous HEAD position was df2fb7a adding file 1
Switched to branch 'master'

# now we will see latest code, with two files
$ ls
file1.txt file2.txt

Note, instead of master in above command, we could have used commit’s ID as well.

References and The Magic

Let’s look at the state of things. Two commits, master and HEAD references are pointing to the latest commit

$ git log --oneline --graph
* 7f3b00e (HEAD -> master) adding file 2
* df2fb7a adding file 1

The magic? Let’s examine these files:

$ cat .git/refs/heads/master
7f3b00eaa957815884198e2fdfec29361108d6a9

Viola! Where master is pointing to is stored in a file. Whenever git needs to know where master reference is pointing to, or if git needs to update where master points, it just needs to update the file above. So when you create a new commit, a new commit is created on top of the current commit and the master file is updated with the new commit’s ID.

Similary, for HEAD reference:

$ cat .git/HEAD
ref: refs/heads/master

We can see HEAD is pointing to a reference called refs/heads/master. So HEAD will point where ever the master points.

Little Adventure

We discussed how git will update the files as we execute commands. But let’s try to do it ourselves, by hand, and see what happens.

$ git log --oneline --graph
* 7f3b00e (HEAD -> master) adding file 2
* df2fb7a adding file 1

Now let’s change master to point to the previous/first commit.

$ echo df2fb7a61f5d40c1191e0fdeb0fc5d6e7969685a > .git/refs/heads/master
$ git log --oneline --graph
* df2fb7a (HEAD -> master) adding file 1

# RESETTING TO ORIGINAL
$ echo 7f3b00eaa957815884198e2fdfec29361108d6a9 > .git/refs/heads/master
$ git log --oneline --graph
* 7f3b00e (HEAD -> master) adding file 2
* df2fb7a adding file 1

We just edited the master reference file and now we can see only the first commit in git log. Undoing the change to the file brings the state back to original. Not so much of magic, is it?

Working With Branches

Coming back to our local repo which has two commits. So far, what we have is a single line of history. Commits are chained in a single line. But sometimes you may have a need to work on two different features in parallel in the same repo. Now one option here could be making a new folder/repo with the same code and use that for another feature development. But there’s a better way. Use branches. Since git follows tree like structure for commits, we can use branches to work on different sets of features. From a commit, two or more branches can be created and branches can also be merged.

Using branches, there can exist multiple lines of histories and we can checkout to any of them and work on it. Checking out, as we discussed earlier, would simply mean replacing contents of the directory (repo) with the snapshot at the checked out version.

Let’s create a branch and see how it looks like:

$ git branch b1
$ git log --oneline --graph
* 7f3b00e (HEAD -> master, b1) adding file 2
* df2fb7a adding file 1

We create a branch called b1. Git log tells us that b1 also points to the last commit (7f3b00e) but the HEAD is still pointing to master. If you remember, HEAD points to the commit/reference wherever you are checkout to. So if we checkout to b1, HEAD should point to that. Let’s confirm:

$ git checkout b1
Switched to branch 'b1'
$ git log --oneline --graph
* 7f3b00e (HEAD -> b1, master) adding file 2
* df2fb7a adding file 1

b1 still points to the same commit but HEAD now points to b1. Since we create a branch at commit 7f3b00e, there will be two lines of histories starting this commit. Depending on which branch you are checked out on, the line of history will progress.

At this moment, we are checked out on branch b1, so making a new commit will advance branch reference b1 to that commit and current b1 commit will become its parent. Let’s do that.

# Creating a file and making a commit
$ echo "I am a file in b1 branch" > b1.txt
$ git add b1.txt
$ git commit -m "adding b1 file"
[b1 872a38f] adding b1 file
1 file changed, 1 insertion(+)
create mode 100644 b1.txt

# The new line of history
$ git log --oneline --graph
* 872a38f (HEAD -> b1) adding b1 file
* 7f3b00e (master) adding file 2
* df2fb7a adding file 1
$

Do note that master is still pointing to the old commit it was pointing to. We can now checkout to master branch and make commits there. This will result in another line of history starting from commit 7f3b00e.

# checkout to master branch
$ git checkout master
Switched to branch 'master'

# Creating a new commit on master branch
$ echo "new file in master branch" > master.txt
$ git add master.txt
$ git commit -m "adding master.txt file"
[master 60dc441] adding master.txt file
1 file changed, 1 insertion(+)
create mode 100644 master.txt

# The history line
$ git log --oneline --graph
* 60dc441 (HEAD -> master) adding master.txt file
* 7f3b00e adding file 2
* df2fb7a adding file 1

Notice how branch b1 is not visible here since we are on the master. Let’s try to visualize both to get the whole picture:

$ git log --oneline --graph --all
* 60dc441 (HEAD -> master) adding master.txt file
| * 872a38f (b1) adding b1 file
|/
* 7f3b00e adding file 2
* df2fb7a adding file 1

Above tree structure should make things clear. Notice a clear branch/fork on commit 7f3b00e. This is how we create branches. Now they both are two separate lines of history on which feature development can be done independently.

To reiterate, internally, git is just a tree of commits. Branch names (human readable) are pointers to those commits in the tree. We use various git commands to work with the tree structure and references. Git accordingly modifies contents of our repo.

Merges

Now say the feature you were working on branch b1 is complete and you need to merge it on master branch, where all the final version of code goes. So first you will checkout to branch master and then you pull the latest code from upstream (eg: GitHub). Then you need to merge your code from b1 into master. There could be two ways this can be done.

Here is the current history:

$ git log --oneline --graph --all
* 60dc441 (HEAD -> master) adding master.txt file
| * 872a38f (b1) adding b1 file
|/
* 7f3b00e adding file 2
* df2fb7a adding file 1

Option 1: Directly merge the branch. Merging the branch b1 into master will result in a new merge commit. This will merge changes from two different lines of history and create a new commit of the result.

$ git merge b1
Merge made by the 'recursive' strategy.
b1.txt | 1 +
1 file changed, 1 insertion(+)
create mode 100644 b1.txt
$ git log --oneline --graph --all
*   8fc28f9 (HEAD -> master) Merge branch 'b1'
|\
| * 872a38f (b1) adding b1 file
* | 60dc441 adding master.txt file
|/
* 7f3b00e adding file 2
* df2fb7a adding file 1

You can see a new merge commit created (8fc28f9). You will be prompted for the commit message. If there are a lot of branches in the repo, this result will end-up with a lot of merge commits. Which looks ugly compared to a single line of history of development. So let’s look at an alternative approach

First let’s reset our last merge and go to the previous state.

$ git reset --hard 60dc441
HEAD is now at 60dc441 adding master.txt file
$ git log --oneline --graph --all
* 60dc441 (HEAD -> master) adding master.txt file
| * 872a38f (b1) adding b1 file
|/
* 7f3b00e adding file 2
* df2fb7a adding file 1

Option 2: Rebase. Now, instead of merging two branches which has a similar base (commit: 7f3b00e), let us rebase branch b1 on to current master. What this means is take branch b1 (from commit 7f3b00e to commit 872a38f) and rebase (put them on top of) master (60dc441).

# Switch to b1
$ git checkout b1
Switched to branch 'b1'

# Rebase (b1 which is current branch) on master
$ git rebase master
First, rewinding head to replay your work on top of it...
Applying: adding b1 file

# The result
$ git log --oneline --graph --all
* 5372c8f (HEAD -> b1) adding b1 file
* 60dc441 (master) adding master.txt file
* 7f3b00e adding file 2
* df2fb7a adding file 1

You can see b1 which had 1 commit. That commit’s parent was 7f3b00e. But since we rebase it on master (60dc441). That becomes the parent now. As a side effect, you also see it has become a single line of history. Now if we were to merge b1 into master, it would simply mean change master to point to 5372c8f which is b1. Let’s try it:

# checkout to master since we want to merge code into master
$ git checkout master
Switched to branch 'master'

# the current history, where b1 is based on master
$ git log --oneline --graph --all
* 5372c8f (b1) adding b1 file
* 60dc441 (HEAD -> master) adding master.txt file
* 7f3b00e adding file 2
* df2fb7a adding file 1


# Performing the merge, notice the "fast-forward" message
$ git merge b1
Updating 60dc441..5372c8f
Fast-forward
b1.txt | 1 +
1 file changed, 1 insertion(+)
create mode 100644 b1.txt

# The Result
$ git log --oneline --graph --all
* 5372c8f (HEAD -> master, b1) adding b1 file
* 60dc441 adding master.txt file
* 7f3b00e adding file 2
* df2fb7a adding file 1

Now you see both b1 and master are pointing to the same commit. Your code has been merged to the master branch and it can be pushed. Also we have clean line of history! :D

Tutorials main page

Short quiz to repeat the topic

After selecting the correct options, press the Gonder button. You can see the results true and in false colours.

(It is not mentioned on the course) Who is the creator of git?

( ) Elon Musk
( ) Steve Jobs
(x) Linus Trovalds
( ) Bill Gates

What does “$ git add .” will add to the changes?

( ) add the hidden files
(x) adds only the changed files
( ) adds the current directory
( ) adds / directory and all files

which command is used to create a new git repo with the name “my-git-repo”?

( ) mkdir my-git-repo
( ) git create my-git-repo
( ) git my-git-repo
(x) git init

you need to exclude some files like “.venv” or “.env-database” to be excluded from your changes. what is the command ?

( ) git dont-add “.venv,.env-database”
( ) git add –exclude .venv –exclude .env-database
(x) create a file .gitignore and add the exluding files inside
( ) git commit -x “.venv,.env-database”