Gitpython is one of the most popular python libraries the gives the ability to interact with Git. You can use GitPython pure Python function or GitPython git command implementation.

Installation #

To install GitPython, go to terminal and run:

$ pip install gitpython

Creating New Project #

The first thing that we need to do in every git operation is creating a repository. So let’s start.

Create awesome-project directory then pass its path to Repo.init() function.

>>> # Initialize a git repository
>>> repo = git.Repo.init("~/awesome-project")
>>> repo
<git.Repo "/home/khawarizmi/awesome-project/.git">

You can skip creating directory manually by setting mkdir argument to True

repo = git.Repo.init("~/great-project/", mkdir=True)

Importing Existing Project #

If you had an existing git repository, you can create a repo instance using Repo.__init__() method

>>> # Create a Repo instance
>>> repo = git.Repo("~/old-project/")

If you are unable to specify the root directory of your project, you can pass any sub-directory. Gitpython will find the root directory for you.

>>> repo = git.Repo("~/awesome-project/docs", search_parent_directories=True)
>>> repo
<git.Repo "/home/khawarizmi/awesome-project/.git">

Making changes #

In the previous chapter, we learn that GitPython offers pure python function or git command implementation. The latter is faster but more resource-intensive. In this step, I will use both of them to give you a closer look at how to do things in both ways.

Using Git Command Implementation #

We don’t have anything yet.

>>> repo.git.status()
'On branch master\n\nNo commits yet\n\nnothing to commit (create/copy files and use "git add" to track)'

So let’s start adding a file named hello.py. Now If we check our repo status, GitPython tells us that we have a new file.

>>> status = repo.git.status()
>>> print(status)
On branch master

No commits yet

Untracked files:
(use "git add <file>..." to include in what will be committed)

hello.py

nothing added to commit but untracked files present (use "git add" to track)

Now let’s add them to our repository index using repo.git.add():

>>> repo.git.add("hello.py")

If we check our current repository status, GitPython tells that we have added hello.py to repository index:

>>> status = repo.git.status()
>>> print(status)
On branch master

No commits yet

Changes to be committed:
(use "git rm --cached <file>..." to unstage)

new file: hello.py

We sure nothing more to add to hello.py, now it’s time to commit our changes:

>>> repo.git.commit(m="first commit")
'[master (root-commit) 0e0f6c2] first commit\n 1 file changed, 1 insertion(+)\n create mode 100644 hello.py'

Using Pure Python Function #

As far as I know, GitPython pure function didn’t have a similar operation as git.status. But we can leverage the use of index.diff():

>>> # list of untracked files
>>> repo.untracked_files
[]

>>> # diff between the index and the working tree
>>> repo.index.diff(None)
[]

>>> # diff between the index and the commit’s tree
>>> repo.index.diff(repo.head.commit)

The last command will raise an error since we don’t have any commit yet.

After adding hello.py we can check for any untracked files:

>>> repo.untracked_files
['hello.py']

Now let’s add them to our repository index:

>>> repo.index.add('hello.py')

>>> # as always, you can inspect the return value
>>> add = repo.index.add('hello.py')
>>> add
[(100644, 8cde7829c178ede96040e03f17c416d15bdacd01, 0, hello.py)]

Let’s check what we have added:

>>> len(repo.untracked_files)
0

>>> # get staged files
>>> staged = repo.index.diff("HEAD")
>>> len(staged)
1

we have added one file, and no untracked files left. Then the next step is creating a commit:

>>> repo.index.commit("first commit")
<git.Commit "b645f6e5584dce8dadeb268f731d7eb99ab01422">

Exploring The History #

To see history (log) of your project, you can use git.logs()

>>> log = repo.git.log()
>>> print(log)
commit b645f6e5584dce8dadeb268f731d7eb99ab01422
author: azzamsa <azzamsa@example.com>
date: Sun Feb 16 09:32:19 2020 +0700

first commit

Using equivalent pure function would be:

>>> log = master.log()
>>> log[0]
0000000000000000000000000000000000000000 b645f6e5584dce8dadeb268f731d7eb99ab01422 azzamsa <azzamsa@example.com> 1581820339 +0700 commit (initial): first commit

Info: Anytime you hesitate what interesting value an object had, use a dir() function.

Let’s check what interserting value we have in log object.

>>> dir(log[0])
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '_re_hexsha_only', 'actor', 'count', 'format', 'from_line', 'index', 'message', 'new', 'newhexsha', 'oldhexsha', 'time']

Then use them in our code:

>>> log[0].message
'commit (initial): first commit'
>>> log[0].newhexsha
'b645f6e5584dce8dadeb268f731d7eb99ab01422'

Besides using log objects to see your history, you can use commit objects.

commits = list(repo.iter_commits("master", max_count=5))

>>> commits[0].author
<git.Actor "azzamsa <azzamsa@example.com>">

>>> commits[0].committed_datetime
datetime.datetime(2020, 2, 16, 9, 32, 19, tzinfo=<git.objects.util.tzoffset object at 0x7f8463f32cf8>)

>>> commits[0].hexsha
'b645f6e5584dce8dadeb268f731d7eb99ab01422'

>>> commits[0].message
'first commit'

Managing branches #

To list your branches you can use:

>>> repo.branches
[<git.Head "refs/heads/master">, <git.Head "refs/heads/second-branch">,
<git.Head "refs/heads/third">]

>>> # or
>>> repo.heads
[<git.Head "refs/heads/master">, <git.Head "refs/heads/second-branch">, <git.Head "refs/heads/third">]

To see your active branch:

>>> repo.active_branch
<git.Head "refs/heads/master">

Then you can checkout your branch using:

>>> repo.heads.third.checkout()
<git.Head "refs/heads/third">

>>> # or using git command implementation
>>> repo.git.checkout("third")
''

The caveat is you can’t use a pure function to checkout branch containing a dash. You can’t do repo.heads.second-branch.checkout(). You can leverage git command in this situation repo.git.checkout("second-branch").

Advanced Usage #

Unwrapped git functionality #

If you find GitPython missing git functionality, you can always go back to GitPython git command implementation. The first step is you need to know how the command and parameters look like in git, then the second step is passing those parameters to GitPython git command. Some of the examples:

Git log --oneline

$ git log --oneline b645f6e..86f3c62
86f3c62 (HEAD -> master) third commit
6240bd6 (third, second-branch) second commit
>>> logs = repo.git.log("--oneline", "b645f6e..86f3c62")
>>> logs
'86f3c62 third commit\n6240bd6 second commit'

>>> logs.splitlines()
['86f3c62 third commit', '6240bd6 second commit']

Git show current content

$ git show 86f3c62:hello.py
print("hello world")
print("")
print("")
>>> content = repo.git.show("86f3c62:hello.py")
>>> print(content)
print("hello world")
print("")
print("")

Getting diffs:

$ git show 6240bd6 hello.py
commit 6240bd6a9111df3aa624f781ac8bad2cea551f8e (third, second-branch)
author: azzamsa <azzamsa@example.com>
date: Sun Feb 16 10:13:14 2020 +0700

second commit

diff --git a/hello.py b/hello.py
index 8cde782..057280e 100644
--- a/hello.py
+++ b/hello.py
@@ -1 +1,2 @@
print("hello world")
+print("")
>>> diff = repo.git.show("6240bd6", "hello.py")
>>> print(diff)
commit 6240bd6a9111df3aa624f781ac8bad2cea551f8e
author: azzamsa <azzamsa@example.com>
date: Sun Feb 16 10:13:14 2020 +0700

second commit

diff --git a/hello.py b/hello.py
index 8cde782..057280e 100644
--- a/hello.py
+++ b/hello.py
@@ -1 +1,2 @@
print("hello world")
+print("")

Git show name only

>>> repo.git.show("--pretty=", "--name-only", "86f3c62")
'hello.py'

Config writer #

You can use config_writer() to change repository configuration.

One of the examples is changing committer username and email:

repo.config_writer().set_value("user", "name", "khwārizmī").release()
repo.config_writer().set_value("user", "email", "khwarizmi@example.com").release()

Project Examples #

Here some useful function that I extract from my previous projet, lupv:

# GPL-3.0
def read_file(self, filename, sha):
"""Get content of current file state."""
current_file = self._student_repo.git.show("{}:{}".format(sha, filename))
return current_file

def read_diff(self, filename, sha):
"""Get content of diff file."""
diff = self._student_repo.git.show(sha, filename)
return diff

def is_exists(self, filename, sha):
"""Check if filename in current record exist."""
files = self._student_repo.git.show("--pretty=", "--name-only", sha)
if filename in files:
return True

You can see another useful gist in my StackOverflow answers:

Additional Resources #

We don’t cover everything here. You can dive deeper by reading GitPython Documentation. My favorite documentation is the test file, it covers many basic things to get you started.

Notes #

  • Checking out a branch containing dash will raise an error in GitPython. This situation ackwonledged by the maintainher
  • This tutorial steps adapted from git tutorial