JGit Internal: Reference and RevObject

Today, I want to discuss Git internal mechanism with you: How Git resolves references? How JGit, a pure Java implementation of Git, resolves references in Java? Then, how to use them via class RevO...

Today, I want to discuss Git internal mechanism with you: How Git resolves references? How JGit, a pure Java implementation of Git, resolves references in Java? Then, how to use them via class RevObject and its subtypes.

Git References

In Git, files containing SHA-1 values in .git/refs directory are called “references”, or “refs”. References are used as pointers, so that you don’t need to remember the raw SHA-1 value. For example, checkout master instead of checkout a commit SHA-1. There’re different types of reference stored in the Git repository. You can explore them using a find command:

$ find .git/refs -type d
.git/refs
.git/refs/heads
.git/refs/tags
.git/refs/remotes
.git/refs/remotes/origin

As you can see, there’re head, tag, and remote references. I’ll explain them one by one. But before that, let’s see the equivalent in JGit.

JGit References

In JGit, the references are stored in RefDatabase:

public abstract class RefDatabase {
  /**
   * Order of prefixes to search when using non-absolute references.
   * <p>
   * The implementation's {@link #getRef(String)} method must take this search
   * space into consideration when locating a reference by name. The first
   * entry in the path is always {@code ""}, ensuring that absolute references
   * are resolved without further mangling.
   */
  protected static final String[] SEARCH_PATH = { "", //$NON-NLS-1$
      Constants.R_REFS, //
      Constants.R_TAGS, //
      Constants.R_HEADS, //
      Constants.R_REMOTES //
  };

  ...
}

Resolving HEAD

The HEAD file is a symbolic reference to the branch you’re currently on. By symbolic reference, it means that id doesn’t generally contain a SHA-1 value, but rather a pointer to another reference. The reference of HEAD can be retrieved from .git/HEAD in Git. Here, it points to jgit, the current branch where I’m writing this blog post. If you check the file stored in path refs/heads/jgit, you can see the object pointed by this reference is a commit.

$ git show-ref HEAD --head --heads
eb6c6a5d1f06da34e88eb28c7d51560d721bf0dc HEAD

$ cat .git/HEAD
ref: refs/heads/jgit

$ cat .git/refs/heads/jgit
eb6c6a5d1f06da34e88eb28c7d51560d721bf0dc

$ git cat-file -t eb6c6a5d1f06da34e88eb28c7d51560d721bf0dc
commit

The same logic can be done in JGit: after peeling the intermediate reference, the HEAD resolution points to a commit, which is the latest commit of the current branch:

ObjectId commitId = repo.resolve(Constants.HEAD);
try (RevWalk walk = new RevWalk(repo)) {
  RevObject object = walk.parseAny(commitId);
  // Do sth ...
}

Resolving Branch

Resolving branch is almost the same as resolving HEAD, but you need to distinguish local branches and remote branches. Local branches are stored in directory .git/refs/heads and remote branches are stored in .git/refs/remote. Actually, using the word “remote branch” are incorrect, the right word is “remote reference”. All the remote references (tags, heads) are stored in .git/refs/remote.

Resolve the master branch from reference database using Git:

$ git show-ref master --heads
eb6c6a5d1f06da34e88eb28c7d51560d721bf0dc refs/heads/master

$ cat .git/refs/heads/master
eb6c6a5d1f06da34e88eb28c7d51560d721bf0dc

$ git cat-file -t eb6c6a5d1f06da34e88eb28c7d51560d721bf0dc
commit

In JGit:

ObjectId commitId = repo.resolve(Constants.MASTER);
try (RevWalk walk = new RevWalk(repo)) {
  RevCommit commit = walk.parseCommit(commitId);
  // Do sth ...
}

Resolving Tag

Lightweight Tag

Lightweight tag is basically the commit checksum stored in a file—no other information is kept. To create a light weighting tag, don’t supply any of the -a, -s, or -m options, just provide a tag name. Then you can verify the creation by using git-show-ref command, and verify the information stored by using the git-cat-file command:

$ git tag 1.0-lw

$ git show-ref 1.0-lw --tags
eb6c6a5d1f06da34e88eb28c7d51560d721bf0dc refs/tags/1.0-lw

$ git cat-file -t eb6c6a5d1f06da34e88eb28c7d51560d721bf0dc
commit

In JGit, resolving a light weight can be done as following. Note that I’m using the method RevWalk#parseCommit(ObjectId), because lightweight tag points to commit.

// Create an lightweight tag (annotated=false)
git.tag().setAnnotated(false).setName("1.0-lw").call();

ObjectId tagId = repo.resolve("1.0-lw");
try (RevWalk walk = new RevWalk(repo)) {
  RevCommit commit = walk.parseCommit(tagId);
  // Do sth ...
}

Annotated Tag

Different from lightweight tags, annotated tags are stored as full objects in the Git database. They’re checksum, contain the information of tagger (name, email, date), have a tagging message, and can be signed and verified with GNU Privacy Guard (GPG).

$ git tag -a 1.0 -m "Release 1.0"

$ git show-ref 1.0 --tags
0e63979ea6b60f077f7e8f7234d9a7448b1243aa refs/tags/1.0

$ git cat-file -t 0e63979ea6b60f077f7e8f7234d9a7448b1243aa
tag

Unlike lightweight, the resolution of an annotated tag points to a tag object. Therefore, you should parse it as RevTag:

// Create an annotated tag (annotated=true)
Ref tagRef = git.tag().setAnnotated(true).setName("1.0").call();

ObjectId tagId = repo.resolve("1.0");
try (RevWalk walk = new RevWalk(repo)) {
  RevTag tag = walk.parseTag(tagId);
  // Do sth ...
}

If you’re not interested by tags, you can also peel them, and find the commit directly using the RevWalk#parseCommit(ObjectId) method:

ObjectId tagId = repo.resolve("1.0");
try (RevWalk walk = new RevWalk(repo)) {
  RevCommit commit = walk.parseCommit(tagId);
  // Do sth ...
}

Resolving Tree

Tree objects look like UNIX directory entries. A single tree object contains one or more tree entries, each of which contains a SHA-1 pointer to a blob or subtree. For example, the tree of a commit looks like:

$ git cat-file -p eb6c6a5d1f06da34e88eb28c7d51560d721bf0dc^{tree}
100644 blob c009c0f4ec2a635473c5be36b3d0f55740b69d9d	.gitignore
100755 blob 4bf3f7d8fca35a944e38e273e6ff3da2e292d81b	404.html
100644 blob f7041372d8f983b48eb96586909282bbc5f142c2	README.md
...

In JGit, you can use method RevWalk#parseTree(ObjectId) to parse a tree:

String revStr = initialCommit.name() + "^{tree}";
ObjectId treeId = repo.resolve(revStr);
try (RevWalk walk = new RevWalk(repo)) {
  RevTree tree = walk.parseTree(treeId);
  // Do sth ...
}

"Pro Git (2nd Edition)" contains everything you need to know about Git, written by Scott Chacon and Ben Straub. The print version is available on Amazon: https://amzn.to/31CKi27


References