Why you should (probably) not use submodules

Edit: made title less misleading clockbaity

Spoiler alert. I do not hate submodules. I do how ever have an instant oh oh response when people mention they want to solve a problem with Git submodules. I think this comes from having most experiences where Git submodules were introduced for the wrong reasons.

What is submodules?

I am not going to go through a large dissertation on submodules – there is tons of good information online on how you can use them. Atlassian has excellent tutorials – or you could buy my book Practical Git – that also covers submodules in brief.

In short submodules allows you to have a folder in your repository be populated from another Git repository. You can either track a specific commit or a branch in the given submodule. This is the basic intuition.

Dependency management is hard

I believe handling our dependencies in a way that allows us to decouple the way we work, is one of the most difficult architectural challenges in software. It seems we underestimate how difficult and valuable maintaining explicit and stable interfaces are. This leads to tons of frustrations and horrible DevEx. Neither monolithic or polylithic repository organizations solve this problem. My opinion is that most software organizations do not have the discipline to successfully maintain their implicit dependencies in a monolithic repo, and thus for most working in a polylithic world will be a forcing function bringing dependency pains to the surface, before it is a very expensive mess to correct.

The delivery mechanism implies collaboration mode

If you hand me source code, I expect to tinker with it, and have a high bandwidth, close collaboration with you. If, on the other hand, I am consuming your service as an API, my expectations to our collaboration might be more along the line of a help-desk and a Google search. There are more levels in between this. The unit that we are sharing could typically be (in increasing order of abstraction)

  • Source Code
  • Library
  • Executable
  • Container
  • SaaS

All different levels are valid, I simply propose that the unit of delivery should match the collaboration mode. This is heavily inspired by Team Topologies.

Version your stuff

I don’t know if it is because I have been working so much in the heavy industries, but I fancy myself a Bill-Of-Material (BOM) way of thinking. Both as a way to enable variance, but also as a way to force ourselves to think about explicit and stable APIs. Please refer to semver.org for my preferred versioning scheme. BOMs also allows us to decouple much of our workflow from our source code, which I think is a power move that most underestimate. Versioning our sub-components also hints at a boundary that should mean decoupling.

Cognitive Load is key

One could also say that developer productivity is the most important, but I like to be a bit more specific than that. The problems that we are trying to solve through (software) engineering are hard enough in themselves, that we can’t afford to splurge on accidental complexity, when we should be spending all our effort on essential complexity. For a both excellent and brief walkthrough of accidental vs essential complexity watch this presentation by J.B. Rainsberger.

Where am I working?

When I am working with submodules, all source is equal. It is not obvious to me whether I am working in context of my project or one of its dependencies. I know I can figure out if I am, it is not completely inscrutable. It just isn’t obvious. When things are not obvious, it means I have to spend mental capacity on it, rather than just grokking it. This means lower mental capacity to spend on solving the actual problem.

Gitting Nuclear

Unlike some centralized version control systems such as ClearCase, Git does not have a concept of atomic commits across repositories. This can make the workflow of updating dependencies and then grabbing those dependencies in to your repository clunking and non-intuitive. Again, this creates friction which I would rather avoid to spend energy on, preserving those neurons for the interesting problem.

Again, this is possible to work around with Git submodules – and get working in a way – but this requires discipline, Git proficiency and doesn’t create the tension that we should handling our dependencies as such.

Using submodules at the right time

I believe submodules are a superior solution to copy pasting code between repositories, so if that is your alternative, please use submodules.

I know some programming ecosystems, such as C++ does not have a default way of handling libraries and dependencies, and in that case it might be good to look into submodules.

But, I really want to stress that this should be the last out. I always recommend assuming that the best way to solve these kinds of dependencies is to build libraries or similar, and then resolve them through what ever build engine you are using. Even if you are building to multiple targets, it is unlikely you will end up building the artifacts more often, than if you share them as code.

Submodules as a decomposing tool

If you have a monolithic repository, and perhaps not the most modern buildsystem submodules can be a great way to experiment with decomposing your monolithic repository. These kinds of situations are often difficult to experiment with, because everything is dependent on things being on the right place on our disks, and submodules can solve that for us. Of course, we shouldn’t just stop there.

Our plan for decomposing through submodules should be like so:

  • Move folder containing component into its own repository
  • Add submodule pointing to the component repository
  • Now we have decoupled the workflows and integrations into the two repositories so it is a start.
  • Figure out how to build the component as a library and publish that simultaneously with source code
  • Start consuming the artifact
  • Deprecate, consume the submodule and transition
  • Remember to profit!

In conclusion

Please, if there is any dependency management in your chosen platform use it. As described above there are a bunch of caveats and concerns regarding using submodules.

I also believe that in order to be working efficiently with Git submodules, you have to be quite Git proficient, at least more proficient than is commonly a good investment for an average dev team. Either invest enough in tooling around your submodules to abstract them away, or consider if different dependency management solutions could be used better.

If you need to work with submodules, respect the tricky workflows around updating and be diligent, then you’ll be able to do well.

If I forgot anything reach out to me, if there is something you disagree with, please let me know, I might be wrong 😀 And finally, it is a spectrum, your mileage will vary.

5 thoughts on “Why you should (probably) not use submodules

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

<span>%d</span> bloggers like this: