In this post we’ll cover the Git cat-file
command. Previous posts in this
series are
The Git cat-file
command let’s you
dump the contents of a Git object. You cannot view Git object files directly
since they are compressed as well as having headers, as we saw in the second
post on hash-object
. The cat-file
command is essentially the inverse of the
hash-object
command.
With cat-file
you have four primary options with respect to viewing the file:
- pretty-print the content --- the
-p
command-line option - print the type of the object --- the
-t
command-line option - print the size of the object --- the
-s
command-line option - check for the existence of the object --- the
-e
command-line option
Each option is exclusive of the others --- you can only specify one. I
will show how this is achieved using clap
(well, one way). There are other
things you can do, such as batch (i.e. bulk) operations, but I will not cover
that in this post. You can learn more
here.
Handling the command-line options
Preparatory Work
As mentioned, the -e
option just tests for the existence of the object and
the process return code should be 0 if found, 1 if there’s a problem ---
standard Unix. At the same time I wanted to keep the details local to the
command, and have the main()
method be ignorant. There was also some code
duplication creeping in with respect to the error handling, so what I have done
is included the thiserror
crate, and defined my own error enum:
So far the commands I have implemented have only had two errors --- either an
underlying I/O error (e.g. couldn’t open a file) or something wrong with the
provided object id from the user. The thiserror
crate makes it easy to define
custom error types. I
wrote a blog
on it, so I won’t go into details here. Check it out if you like.
For convenience I also created some type aliases for custom Result
types,
rather than using the std::io::Result
everywhere as I had been doing.
The main()
method changed to have the return type be std::process::ExitCode
and I return the exit code explicitly based on the result.
I then changed all the return signatures from the various command handlers to
return GitCommandResult
e.g.
Info
At some point I will probably refactor these command entry points such that there is a trait for a command, to make it explicit that each handler should take
args
and return aGitCommandResult
but it’s not worth it at this point.
I also went through and pretty much everywhere replaced io::Result
with
GitResult
.
Clap Changes
With that work out of the way, I updated the CatFileArgs
struct so that the
-e/-p/-s/-t
options were mutually exclusive. To do this in clap
you can use
groups. One way to do that is to add a group
option to the #[arg]
definition as so:
Here I added group = "operation"
to group these mutually exclusive options
together. You can read more about the clap
group support
here.
As you may have noticed if you read the earlier blogs, I also added doc comments
for each struct field. This is one way to tell clap
what to use to print the
help text. Here I just copied the text from the Git cat-file
help text.
Pretty-Print Option
All of the options require the same up front work … finding the object based
on the object id and reading it. At that point, the options for existence, type
and size are trivial. For existence, we don’t even need to read the file --- we
just need to see if we can find it. So I short circuit that in the
cat_file_command()
If it’s not the -e
option, then I read the file. At that point, the only
interesting thing is printing the content. For the size and type options, I just
print those values.
So let’s focus on the pretty-printing. Here, the blob
and commit
types are
the same and trivial --- the content is just printed out. Where it gets more
involved is handling the tree
object. This is because each entry in the tree
object is variable length and so the overall object size is variable length.
Each entry of the tree object contains the file permissions (standard Unix mode
octal format) and a filename. The filename makes the row and resulting full Git
object variable length. The file name is terminated by a null-byte and then there’s
the 20-byte object hash:
The pretty-printed output of a tree object looks like
Note that the output contains the object type (second column), but that’s not part of the content of the tree object entries themselves. So we have to look up each given object to find the object type, to be able to properly output each row.
Here’s the full implementation:
The code iterates through the content
buffer, maintaining an index consumed
as data gets parsed. First the file mode and name is found by first finding
the null byte to find the end of the file name and then splitting that on the
b' '
separator. Next (line 14) the object id gets parsed. Given the object id,
the code looks up the object to get the object type. The extracted tree info is
printed to stdout
. Repeat while there is still content.
This is a pretty straightforward, brute-force approach. Could it be optimized? Perhaps. In particular, extracting the object type currently requires reading the entire referenced object (line 17) so it can parse the header; the rest of the object content is discarded.
That said, this implementation of the cat-file
command works for blob
,
commit
and tree
objects and the -e
, -s
, -t
and -p
options.
Depending on how you might try this out given the current implementation, you
might find the code unable to find an object you specify or one that’s
referenced in a tree
object …
And in fact if I look in .git/objects there is no such file
This particular object was “packed” up into a pack file.
Also, if you try other Git ways to specify objects like for example with HEAD
or such human-readable names, my code doesn’t work
Why? Well, it’s not smart enough yet :). The first issue with using the object
hash has to do with how Git optimizes space with “pack” files.
The issue with something like HEAD
has to do with the notion of references
which I mentioned in the first post.
If you recall, something like HEAD
is actually a file in
.git/refs/heads/<branch-name>
which then contains an object hash:
You can refer to the gitrevisions
page to see details on this.
In the next post I will talk about pack files and enhance the cat-file
command
to be able to find packed objects. I will probably also add the support for
the ref names like HEAD
.
As always if you have any suggestions how to improve my Rust coding skills, please comment.
If you enjoyed this content, and you’d like to support me, consider buying me a coffee