Getting Started with GitPython

A gentle introduction with a brief code example to jump into GitPython.

Table of contents

Gitpython is one of the most popular python libraries that gives the ability to interact with Git. You can use GitPython pure Python function or GitPython git command implementation.

Installation

To install GitPython, go to terminal and run:

$ pip install gitpython

Creating New Project

The first thing that we need to do in every git operation is to create a repository. So let's start.

Create awesome-project directory then pass its path to Repo.init() function.

>>> # Initialize a git repository
>>> repo = git.Repo.init("~/awesome-project")
>>> repo
<git.Repo "/home/khawarizmi/awesome-project/.git">

You can skip creating a directory manually by setting mkdir argument to True

repo = git.Repo.init("~/great-project/", mkdir=True)

Importing Existing Project

If you had an existing git repository, you can create a repo instance using Repo.__init__() method

>>> # Create a Repo instance
>>> repo = git.Repo("~/old-project/")

If you are unable to specify the root directory of your project, you can pass any sub-directory. Gitpython will find the root directory for you.

>>> repo = git.Repo("~/awesome-project/docs", search_parent_directories=True)
>>> repo
<git.Repo "/home/khawarizmi/awesome-project/.git">

Making changes

In the previous chapter, we learn that GitPython offers pure python function or git command implementation. The latter is faster but more resource-intensive. In this step, I will use both of them to give you a closer look at how to do things in both ways.

Using Git Command Implementation

We don't have anything yet.

>>> repo.git.status()
'On branch master\n\nNo commits yet\n\nnothing to commit (create/copy files and use "git add" to track)'

So let's start adding a file named hello.py. Now If we check our repo status, GitPython tells us that we have a new file.

>>> status = repo.git.status()
>>> print(status)
On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	hello.py

nothing added to commit but untracked files are present (use "git add" to track)

Now let's add them to our repository index using repo.git.add():

>>> repo.git.add("hello.py")

If we check our current repository status, GitPython tells us that we have added hello.py to repository index:

>>> status = repo.git.status()
>>> print(status)
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	new file:   hello.py

We sure nothing more to add to hello.py, now it's time to commit our changes:

>>> repo.git.commit(m="first commit")
'[master (root-commit) 0e0f6c2] first commit\n 1 file changed, 1 insertion(+)\n create mode 100644 hello.py'

Using Pure Python Function

As far as I know, GitPython pure function didn't have a similar operation as git.status. But we can leverage the use of index.diff():

>>> # list of untracked files
>>> repo.untracked_files
[]

>>> # diff between the index and the working tree
>>> repo.index.diff(None)
[]

>>> # diff between the index and the commit’s tree
>>> repo.index.diff(repo.head.commit)

The last command will raise an error since we don't have any commit yet.

After adding hello.py we can check for any untracked files:

>>> repo.untracked_files
['hello.py']

Now let's add them to our repository index:

>>> repo.index.add('hello.py')

>>> # as always, you can inspect the return value
>>> add = repo.index.add('hello.py')
>>> add
[(100644, 8cde7829c178ede96040e03f17c416d15bdacd01, 0, hello.py)]

Let's check what we have added:

>>> len(repo.untracked_files)
0

>>> # get staged files
>>> staged = repo.index.diff("HEAD")
>>> len(staged)
1

We have added one file, and no untracked files left. Then the next step is creating a commit:

>>> repo.index.commit("first commit")
<git.Commit "b645f6e5584dce8dadeb268f731d7eb99ab01422">

Exploring The History

To see the history (log) of your project, you can use git.logs()

>>> log = repo.git.log()
>>> print(log)
commit b645f6e5584dce8dadeb268f731d7eb99ab01422
author: azzamsa <azzamsa@example.com>
date:   Sun Feb 16 09:32:19 2020 +0700

    first commit

Using equivalent pure function would be:

>>> log = master.log()
>>> log[0]
0000000000000000000000000000000000000000 b645f6e5584dce8dadeb268f731d7eb99ab01422 azzamsa <azzamsa@example.com> 1581820339 +0700	commit (initial): first commit

Info: Anytime you hesitate about what interesting value an object had, use a `dir()` function.

Let's check what interesting value we have in log object.

>>> dir(log[0])
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '_re_hexsha_only', 'actor', 'count', 'format', 'from_line', 'index', 'message', 'new', 'newhexsha', 'oldhexsha', 'time']

Then use them in our code:

>>> log[0].message
'commit (initial): first commit'
>>> log[0].newhexsha
'b645f6e5584dce8dadeb268f731d7eb99ab01422'

+++

Besides using log objects to see your history, you can use commit objects.

commits = list(repo.iter_commits("master", max_count=5))

>>> commits[0].author
<git.Actor "azzamsa <azzamsa@example.com>">

>>> commits[0].committed_datetime
datetime.datetime(2020, 2, 16, 9, 32, 19, tzinfo=<git.objects.util.tzoffset object at 0x7f8463f32cf8>)

>>> commits[0].hexsha
'b645f6e5584dce8dadeb268f731d7eb99ab01422'

>>> commits[0].message
'first commit'

Managing branches

To list your branches you can use:

>>> repo.branches
[<git.Head "refs/heads/master">, <git.Head "refs/heads/second-branch">,
<git.Head "refs/heads/third">]

>>> # or
>>> repo.heads
[<git.Head "refs/heads/master">, <git.Head "refs/heads/second-branch">, <git.Head "refs/heads/third">]

To see your active branch:

>>> repo.active_branch
<git.Head "refs/heads/master">

Then you can check out your branch using:

>>> repo.heads.third.checkout()
<git.Head "refs/heads/third">

>>> # or using git command implementation
>>> repo.git.checkout("third")
''

The caveat is you can't use a pure function to checkout the branch containing a dash. You can't do repo.heads.second-branch.checkout(). You can leverage git command in this situation repo.git.checkout("second-branch").

Advanced Usage

Unwrapped git functionality

If you find GitPython missing git functionality, you can always go back to GitPython git command implementation. The first step is you need to know what the command and parameters look like in git, then the second step is passing those parameters to the GitPython git command. Some of the examples:

Git log --oneline

$ git log --oneline b645f6e..86f3c62
86f3c62 (HEAD -> master) third commit
6240bd6 (third, second-branch) second commit
>>> logs = repo.git.log("--oneline", "b645f6e..86f3c62")
>>> logs
'86f3c62 third commit\n6240bd6 second commit'

>>> logs.splitlines()
['86f3c62 third commit', '6240bd6 second commit']

Git show current content

$ git show 86f3c62:hello.py
print("hello world")
print("")
print("")
>>> content = repo.git.show("86f3c62:hello.py")
>>> print(content)
print("hello world")
print("")
print("")

Getting diffs:

$ git show 6240bd6 hello.py
commit 6240bd6a9111df3aa624f781ac8bad2cea551f8e (third, second-branch)
author: azzamsa <azzamsa@example.com>
date:   Sun Feb 16 10:13:14 2020 +0700

    second commit

diff --git a/hello.py b/hello.py
index 8cde782..057280e 100644
+++ a/hello.py
+++ b/hello.py
@@ -1 +1,2 @@
 print("hello world")
+print("")
>>> diff = repo.git.show("6240bd6", "hello.py")
>>> print(diff)
commit 6240bd6a9111df3aa624f781ac8bad2cea551f8e
author: azzamsa <azzamsa@example.com>
date:   Sun Feb 16 10:13:14 2020 +0700

    second commit

diff --git a/hello.py b/hello.py
index 8cde782..057280e 100644
+++ a/hello.py
+++ b/hello.py
@@ -1 +1,2 @@
 print("hello world")
+print("")

Git show name only

>>> repo.git.show("--pretty=", "--name-only", "86f3c62")
'hello.py'

Config writer

You can use config_writer() to change repository configuration.

One of the examples is changing the committer username and email:

repo.config_writer().set_value("user", "name", "khwārizmī").release()
repo.config_writer().set_value("user", "email", "khwarizmi@example.com").release()

Project Examples

Here are some useful functions that I extract from my previous project, lupv:

# GPL-3.0
def read_file(self, filename, sha):
     """Get content of current file state."""
     current_file = self._student_repo.git.show("{}:{}".format(sha, filename))
     return current_file

def read_diff(self, filename, sha):
     """Get content of diff file."""
     diff = self._student_repo.git.show(sha, filename)
     return diff

 def is_exists(self, filename, sha):
     """Check if filename in current record exist."""
     files = self._student_repo.git.show("--pretty=", "--name-only", sha)
     if filename in files:
         return True

You can see another useful gist in my StackOverflow answers:

Additional Resources

We don't cover everything here. You can dive deeper by reading GitPython Documentation. My favorite documentation is the test file, it covers many basic things to get you started.


If you liked this article, please support my work. It will definitely be rewarding and motivating. Thanks for the support!

Notes

  • Checking out a branch containing dash will raise an error in GitPython. This situation ackwonledged by the maintainer
  • This tutorial steps are adapted from git tutorial