Modern software does not exist in a vacuum, developers always rely on other developers or engineers to provide aspects of their systems. Even if you write hello world in your programming language of your choice you likely still depend on the standard library from your language. Even if you were to write hello world in the raw binary format of your assembly language, you still depend on the operating system. Even if you wrote your own uni-kernel to produce hello world on the console, you likely rely on vendor provided firmware to initialize the program or memory. And if you are one of the blessed few who can completely fabricate your own hardware and don’t need to rely on vendor provided firmware, you likely don’t have time or expertise to write efficient higher level systems and need others to provide software to run on your hardware. Therefore the question is not will you use dependencies, but how will you use them responsibly.

Types of Dependencies and When to Introduce Them

There are three major kinds of dependencies: build dependencies, link dependencies, and run dependencies. These dependency types effect what portion of the code execution process a dependency is needed for. For example, build dependencies if they are not also link or run dependencies need not be installed on the final system where the code will be executed.

When considering what dependencies to adopt you should consider both its costs and its benefits. Costs of a dependency can include the difficulty for a user to configure or install it, the portability of the dependency to various different architectures and operating systems, the time or size of the installation, the complexity of the dependency or the complexity that the dependency adds to your application, and the rate of change of the dependency. Each of these costs have to be accounted for when adopting a dependency. Dependencies often provide benefits in one of two ways: they either provide some feature or they reduce the maintenance or development burden of some other feature that you may have.

For example, introducing MPI as a dependency for an HPC code can either be a lightweight or a heavy dependency depending on the application. Many HPC centers provide an installation of MPI meaning configuring or installing the dependency has little to no cost. MPI is portable to a variety of even heterogeneous distributed systems. And implementations of MPI often have a sufficiently stable interface that they can be adopted at low cost. However, if executing on some average users laptop, MPI can be a much heavier dependency where installation and configuration of MPI is less available. It is therefore essential to consider the context in which a dependency will be used to truly measure its costs.

When considering the cost of a dependency and how it integrates with a larger application here are some factors to consider. First is the interface boundary how the Dependency interacts with the rest of your system. Dependencies are more easily interchangeable if the size of the interface boundary is small and if the number of points of integration is also small. For example if a dependency provides only some simple function such as left patting a string, its interface boundary may be very small, however that operation may be used pervasively a crossed a user facing application. This specific example is notorious after a developer in the JavaScript ecosystem deleted a package that provided this functionality and intern broke a wide variety of JavaScript packages that either directly utilized left pad or utilized left pad via a transitive dependency. This deletion was ultimately an expensive debacle resulting in likely thousands of hours of wasted developer time.

The next aspect to consider is whether a dependency should be optional or required. If the functionality provided by a dependency is likely to be unused buy some substantial portion of the user base of the code, it may be valuable to only conditionally require a particular dependency that provides the specific functionality used by that some population. However this choice comes at a cost. On the one hand the conditional dependencies Create a combinatorial explosion of different configurations of your software. So while one conditional dependency may be easy to work around, if you had eight conditional dependencies then you have two to the eighth different possible ways to build your software. This can induce a larger cognitive load on users seeking to adopt your project. A good example of this is the LibPressio compression library. LibPressio allows you to provide optional dependencies on a variety of compressors that can be adopted by users. By making each compressor an optional dependency, users only are required to install the compressors that they actually intend to use. However, users may often unintentionally install a copy of LibPressio lacking the specific compressor that they intend to use.

Additionally, when adopting a dependency you may wish to consider how to test the integration of this dependency. Like most pieces of software, software evolves overtime, and this includes your dependencies. It is important to catch changes in the invariants that you expect of your dependencies where it will have a meaningful impact on Gore software. However, this comes at a cost: these tests often must be written and maintained as the project evolves over time.

Lastly, one should consider how the dependency will integrate with your build system. Almost all software projects over a given size utilize some build system that either generates a specific code, or resolves the dependencies that the code adopts. Each build system handles this task in a slightly different way. When should consider how adopting a dependency will impact the complexity required in your build system to handle either variations in your dependency or the simple number of dependencies.

How to Integrate a Dependency

There are several methods of integrating a dependency into your application: embedding or vendor in, expecting ambient installation, package management, or requiring external installation.

Rendering or embedding a dependency simply copies the entire source of a dependency into your repository. The most famous example of this is Google who famously uses a single “mono repository quote for all of its software. This approach has many benefits if you can simply remain inside of its ecosystem. By adopting a monolithic repository, there is no problem with ensuring that your dependencies change unexpectedly, and that the installation of these dependencies can be automatically handled by your build system. This allows you to easily perform atomic updates across the entire transitive side of dependencies at once without multiple or two stage commits. However, I have found that very few can truly operate in a monolithic repository. It requires an organization of sufficient scale to manage the security and other maintenance that may need to be performed on a dependency that is developed external to your organization. Additionally, in some languages having multiple versions of a dependency installed by different sub components of your application can be problematic resulting in at best a spectacular segmentation fault, and at worst silent undefined behavior.

Another option for dependency management is expecting ambient installation. This method assumes that the dependency is already installed. In my experience this works for some specific stable and widely used dependencies. For example, depending on the C standard library In this way is likely a safe choice. Anyone who is using your library almost certainly has a copy of a C standard library of a suitable version to provide the specific functions that you were likely to use. Choosing this method simply punch the entire dependency management process to chance and that requires practically no effort when it works for dependencies such as the C standard library.

Another popular option for dependency management is to use a package manager. Package managers are tools that sit along side Your build system to provide the dependency is that the build system will consume. However it seems to me that package management systems are some of the most frequently implemented pieces of software. Your operating system, your programming language, and your users may all use different packaging systems. Maintaining separate packages for each of these systems can be a substantial maintenance burden. However, when the systems work, it can be as simple as managing a short text file describing your list of dependencies to ensure a reproducible build experience a crossed systems.

Lastly, you can require external installation. External installation is unlike expecting ambient installation and that the developer does not expect or cannot reasonably expect that the dependency that they wish to use is available on the user system. This is often an easy choice for developers. They may implement this by describing in a README file to go in separately follow the instructions to install a particular dependency before continuing installing this program. However this is often a terrible experience for users.

How To Be A Good Dependency

I will close with the discussion of what it means to be a good dependency. A good dependency provides API stability, discovery ability, portability, consistency, a name space, avoiding unnecessary side effects, and an unnecessarily broad API.

API stability means different things to different people. They are concept of denotational, semantic, and binary stability. Denotational stability is the simplest: from one version of the software to a later version, the same functions still exist with at least the same signatures. However, this does not ensure that the program will continue to function. The authors of the dependency may have left the signatures the same, but changed what the functions did in a way that causes the program to malfunction or crash. To remedy this, semantic stability is required. Semantic stability requires that the meaning of a program from one version to another later version has not changed. However, even this is not enough for some users. Some users further require binary stability. In this form of stability, even the machines representation of a particular function and its calling conventions cannot change from one version to a later version. This particularly strict version of stability is useful for critical security dependencies such as the SSL provider. Requiring binary stability ensures that system administrators can swap to implementations of the SSL provider without re-configuring or re-compiling the applications that depend upon it.Choosing what level of stability or lack there of that you intend to provide and convey to users is a critical aspect of deciding how to maintain your library and be a good dependency. As Linus Torvalds has said “don’t break user space. "

Discover-ability is also a key aspect of being a good dependency. Discover-ability means many things including being accessible from your language or systems primary package management system, but also providing sufficient documentation and high-level overview use of your software for a developer to quickly understand how the dependency is to be used. Improving discover-ability can be a challenging undertaking requiring careful attention to odious tasks such as writing well written documentation.

Portability is likewise important. Here are some common areas of portability that I think developers often mess: first they use features with restrictive dependencies for example a particularly bleeding edge version of a compiler or an operating system; they utilize hard coded pads extensively in either the build system or in the software itself; or in the worst case they have completely written their own nonportable build system that doesn’t operate and is not flexible. In some cases your tooling can warn you when you attempt to use functions that are not available on all platforms. Hard coded pads can often be solved with a central file generated by the build system that provides default paths where needed. Lastly use a well established modern build system to build your software. Those who come to use your software will thank you.

Consistency requires careful attention to detail to maintain. There are two types of consistency internal and external consistency. Internal consistency requires that similar functions and similar concepts behave in similar ways within your code. External consistency is more difficult to implement and requires using names that are consistent across your domain or field to provide similar denotational in semantic meanings of your functions. A good example of programs that are not externally consistent are APT and DNF I could managers. In apt, the function update does not actually update any software. In DNF the update function actually update software. I would argue that abs missed the boat here. This requires users to learn new paradigms when switching between various Linux distributions.

Providing a name space that is consistently employed further improve the usability of your software. In languages like sea, all functions are by default included in the global name space. This means that different libraries or dependencies can provide definitions of the same function which unfortunately can behave in different ways. The easiest way to solve this is to prefix all publicly exposed functions with some project specific name code to ensure that these functions from different libraries do not collide. In C++, and other languages that came after, this process is much easier and is often handled using a system often called modules. Modules provide both a mechanism to import definitions as well as a way to name space them after they have been imported. Mini systems additionally provide a way to rename a module if its name could conflict with another function that they use or wishes to provide.

Dependency authors should avoid unnecessary side effects. While some kind of side effects such as printing a relatively obvious while mostly benign, other forms of side effects can be potentially hazardous. For example, your method of error handling can be introducing the side effects into your users code.If your code calls assert on some thing that could otherwise be a runtime error, you deprive your user of agency to determine how they would like to handle the error .Introducing a unintended side effect of program termination. Likewise, functions that refer to global variables can also have unintended side effects. These functions can provide inconsistent behavior if multiple dependency is that a library attempts to use attempt to change these global variables in different ways simultaneously.

Lastly, dependencies should avoid exposing unnecessarily broad API’s. The broader an API is, the harder it is to maintain and to implement consistently. In the vein of Occam’s razor, an API should be sufficiently complex to master its domain and nothing else. A good example of this and how this can go awry is when a library exposes a function for some primitive operation like Max which can unintentionally be imported into a different context and result in inconsistent semantics.

I hope the short document has helped you think both about how to consider when to adopt a dependency, how to adopt a consist dependency, and how to write code that itself will be a good dependency. Happy programming.

Changelog

  • 2022-07-20 first version