13
votes

I have a project which depends on several 3rd party libraries, the project itself is packaged as a jar and distributed to other developers as a library. Those developers add the dependencies to their classpath and use my library in their code.

Recently I had an issue with one of the 3rd party dependencies, the apache commons codec libary, The problem is this:

byte[] arr = "hi".getBytes();
// Codec Version 1.4
Base64.encodeBase64String(arr) == "aGk=\r\n" // this is true

// Codec Version 1.6
Base64.encodeBase64String(arr) == "aGk=" // this is true

As you can see the output of the method has changed with the minor version bump.

My question is, I don't want to force the user of my library to a specific minor version of a 3rd party library. Assuming I know about the change to the dependent library, is there anyway in which I can recognize which library version is being included in the classpath and behave accordingly? or alternatively, what is considered to be the best practice for these kind of scenarios?

P.S - I know that for the above example I can just use new String(Base64.encodeBase64(data, false)) which is backwards compatible, this is a more general question.

7

7 Answers

13
votes

You ask what is the "best practice" for this problem. I'm going to assume that by "this problem" you mean the problem of 3rd party library upgrades, and specifically, these two questions:

  1. When should you upgrade?

  2. What should you do to protect yourself against bad upgrades (like the commons-codec bug mentioned in your example)?

To answer the first question, "when should you upgrade?," many strategies exist in industry. In the majority of the commercial Java world I believe the current dominant practice is "you should upgrade when you are ready to." In other words, as the developer, you first need to realize that a new version of a library is available (for each of your libraries!), you then need to integrate it into your project, and you are the one who makes the final go/no-go decision based on your own test bed --- junit, regression, manual testing, etc... whatever it is you do to ensure quality. Maven facilitates this approach (I call it version "pinning") by making multiple versions of most popular libraries available for automatic download into your build system, and by tacitly fostering this "pinning" tradition.

But other practices do exist, for example, within the Debian Linux distribution it is theoretically possible to delegate a lot of this work to the Debian package maintainers. You would simply dial in your comfort level according to the 4 levels Debian makes available, choosing newness over risk, or vice versa. The 4 levels Debian makes available are: OLDSTABLE, STABLE, TESTING, UNSTABLE. Unstable is remarkably stable, despite its name, and OLDSTABLE offers libraries that may as much as 3 years out of date compared to the latest-and-greatest versions available on their original "upstream" project websites.

As for the 2nd question, how to protect yourself, I think the current 'best practice' in industry is twofold: choose your libraries based on reputation (Apache's is generally pretty good), and wait a little while before upgrading, e.g., don't always rush to be on the latest-and-greatest. Maybe choose a public release of the library that has already been available 3 to 6 months, in the hope that any critical bugs have been flushed out and patched since the initial release.

You could go farther, by writing JUnit tests that specifically protect the behaviours you rely on in your dependencies. That way, when you bring down the newer version of a library, your JUnit would fail right away, warning you of the problem. But I don't see a lot of people doing that, in my experience. And it's often difficult to be aware of the precise behaviour you are relying on.

And, by the way, I'm Julius, the guy responsible for this bug! Please accept my apologies for this problem. Here's why I think it happened. I will speak only for myself. To find out what others on the apache commons-codec team think, you'll have to ask them yourself (e.g., ggregory, sebb).

  1. When I was working on Base64 in versions 1.4 and 1.5, I was very much focused on the main problem of Base64, that is, encoding binary data into the lower-127 ASCIi, and the decoding it back to binary.

  2. So in my mind (and here's where I went wrong) the difference between "aGk=\r\n" and "aGk=" is immaterial. They both decode to the same binary result!

  3. But thinking about it in a broader sense after reading your stackoverflow posting here, I realize there is probably a very popular usecase that I never considered. That is, password checking against a table of encrypted passwords in a database. In that usecase you probably do the following:

    // a.  store user's password in the database
    //     using encryption and salt, and finally,
    //     commons-codec-1.4.jar (with "\r\n").
    //

    // b.  every time the user logs in, encrypt their
    //     password using appropriate encryption alg., plus salt,
    //     finally base64 encode using latest version of commons-codec.jar,
    //     and then check against encrypted password in the database
    //     to see if it matches.

So of course this usecase fails if commons-codec.jar changes its encoding behaviour, even in immaterial ways according to the base64 spec. I'm very sorry!

I think even with all of the "best-practices" I spelled out at the beginning of this post, there's still a high probability of getting screwed on this one. Debian Testing already contains commons-codec-1.5, the version with the bug, and to fix this bug essentially means screwing people who used version 1.5 instead of version 1.4 where you did. But I will try to put some documentation on the apache website to warn people. Thanks for mentioning it here on stack-overflow (am I right about the usecase?).

ps. I thought Paul Grime's solution was pretty neat, but I suspect it relies on projects pushing version info in the the Jar's META-INF/MANIFEST.MF file. I think all Apache Java libraries do this, but other projects might not. The approach is a nice way to pin yourself to versions at build-time though: instead of realizing that you depend on the "\r\n", and writing the JUnit that protects against that, you can instead write a much easier JUnit: assertTrue(desiredLibVersion.equals(actualLibVersion)).

(This assumes run-time libs don't change compared to build-time libs!)

6
votes
package stackoverflow;

import org.apache.commons.codec.binary.Base64;

public class CodecTest {
    public static void main(String[] args) {
        byte[] arr = "hi".getBytes();
        String s = Base64.encodeBase64String(arr);
        System.out.println("'" + s + "'");
        Package package_ = Package.getPackage("org.apache.commons.codec.binary");
        System.out.println(package_);
        System.out.println("specificationVersion: " + package_.getSpecificationVersion());
        System.out.println("implementationVersion: " + package_.getImplementationVersion());
    }
}

Produces (for v1.6):

'aGk='
package org.apache.commons.codec.binary, Commons Codec, version 1.6
specificationVersion: 1.6
implementationVersion: 1.6

Produces (for v1.4):

'aGk=
'
package org.apache.commons.codec.binary, Commons Codec, version 1.4
specificationVersion: 1.4
implementationVersion: 1.4

So you could use the package object to test.

But I would say that it's a bit naughty for the API to have changed the way it did.

EDIT Here is the reason for the change - https://issues.apache.org/jira/browse/CODEC-99.

1
votes

You could calculate a md5 sum of the actual class file and compare it to the expected. Could work like this:

String classname = "java.util.Random"; //fill in the your class
MessageDigest digest = MessageDigest.getInstance("MD5");
Class test = Class.forName(classname);
InputStream in = test.getResourceAsStream("/" + classname.replace(".", "/") + ".class");
byte[] buffer = new byte[8192];
int read = 0;

while ((read = in.read(buffer)) > 0) {
    digest.update(buffer, 0, read);
}
byte[] md5sum = digest.digest();
BigInteger bigInt = new BigInteger(1, md5sum);
String output = bigInt.toString(16);
System.out.println(output);

in.close();

Or maybe you could iterate over the filenames in the classpath. Of course this only works, if the devs use the original filenames.

String classpath = System.getProperty("java.class.path");
for(String path:classpath.split(";")){
    File o = new File(path);
    if(o.isDirectory()){
        ....        
    }    
}
1
votes

Asaf, I solve this problem by using Maven . Maven has nice versioning support for all artifacts you use in your project. On top of that, I use the excellent Maven Shade Plugin which gives you ability to package all 3rd party libraries (maven artifacts) in a single JAR file, ready for deployment. All other solutions are just inferior - I am talking from my personal experience - I've been there, done that... Even wrote my own plugin-manager, etc. Use Maven, that is my friendly advice.

0
votes

replacing the newline with empty string could be a solution?

Base64.encodeBase64String(arr).replace("\r\n","");
0
votes

I would create 2+ different versions of a library to complement appropriate third party library version and provide manual which one to use. Probably write correct pom for it.

0
votes

To resolve your problem I think the best way is to use a OSGi container, so you can choose your version of the 3rd party dependency and other libraries can safely use the other version without any conflict.

If you cannot rely on a OSGi container then you can use the implementation version in the MANIFEST.MF

Maven is a great tool, but cannot alone resolve your problem.