Software Thrives Unless You Kill it First: Premature Optimization and a Tale of Java GC

Written by wasteofserver | Published 2024/04/06
Tech Story Tags: java | garbage-collector | optimization | premature-optimization | java-gc | context-switching | hackernoon-top-story | java-garbage-collection

TLDRDon't go over the board with optimizations, let the language work for you. Story tell. Java upgrades give a performance boost for free. Benchmark, always.via the TL;DR App

Will a LinkedList be faster? Should I swap the `for each` with an `iterator`? Should this `ArrayList` be an `Array`? This article came to be in response to an optimization so malevolent it has permanently etched itself into my memory.

Before going head-on into Java and the ways to tackle interference, either from the garbage collector or from context switching, let's first glance over the fundamentals of writing code for your future self.

Premature optimization is the root of all evil.

You've heard it before; premature optimization is the root of all evil. Well, sometimes. When writing software, I'm a firm believer in being:

  1. as descriptive as possible; you should try to narrate intentions as if you were writing a story.

  2. as optimal as possible; which means that you should know the fundamentals of the language and apply them accordingly.

As Descriptive as Possible

Your code should speak intention, and a lot of it pertains to the way you name methods and variables.

int[10] array1;        // bad
int[10] numItems;      // better
int[10] backPackItems; // great

Just by the variable name, you can already infer functionality.

While numItems is abstract, backPackItems tells you a lot about expected behavior.

Or say you have this method:

List<Countries> visitedCountries() {
    if(noCountryVisitedYet)
        return new ArrayList<>(0);
    }
    // (...)
    return listOfVisitedCountries;
}

As far as code goes, this looks more or less ok.

Can we do better? We definitely can!

List<Countries> visitedCountries() {
    if(noCountryVisitedYet)
        return Collections.emptyList();
    }
    // (...)
    return listOfVisitedCountries;
}

Reading Collections.emptyList() is much more descriptive than new ArrayList<>(0);

Imagine you're reading the above code for the first time and stumble on the guard clause that checks if the user has actually visited countries. Also, imagine this is buried in a lengthy class, reading Collections.emptyList() is definitely more descriptive than new ArrayList<>(0), you're also making sure it's immutable making sure client code can't modify it.

As Optimal as Possible

Know your language, and use it accordingly. If you need a double, there's no need to wrap it in a Double object. The same goes for using a List if all you actually need is an Array.

Know that you should concatenate Strings using StringBuilder or StringBuffer if you're sharing state between threads:

// don't do this
String votesByCounty = "";
for (County county : counties) {
    votesByCounty += county.toString();
}

// do this instead
StringBuilder votesByCounty = new StringBuilder();
for (County county : counties) {
    votesByCounty.append(county.toString());
}

Know how to index your database. Anticipate bottlenecks and cache accordingly. All the above are optimizations. They are the kind of optimizations that you should be aware of and implement as first citizens.

How Do You Kill It First?

I'll never forget about a hack I read a couple of years ago. Truth be said, the author backtracked quickly, but it goes to show how a lot of evil can spur from good intentions.

// do not do this, ever!
int i = 0;
while (i<10000000) {
    // business logic
    
    if (i % 3000 == 0) { //prevent long gc
        try {
            Thread.sleep(0);
        } catch (Ignored e) { }
    }
}

A garbage collector hack from hell!

You can read more on why and how the above code works in the original article and, while the exploit is definitely interesting, this is one of those things you should never ever do.

  • Works by side effects, Thread.sleep(0) has no purpose in this block
  • Works by exploiting a deficiency of code downstream
  • For anyone inheriting this code, it's obscure and magical

Only start forging something a bit more involved if, after writing with all the default optimizations the language provides, you've hit a bottleneck. But steer away from concoctions as the above.

How to Tackle That Garbage Collector?

If after all's done, the Garbage Collector is still the piece that's offering resistance, these are some of the things you may try:

  • If your service is so latency-sensitive that you can't allow for GC, run with "Epsilon GC" and avoid GC altogether.
    -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC


    This will obviously grow your memory until you get an OOM exception, so either it's a short-lived scenario or your program is optimized not to create objects

  • If your service is somewhat latency sensitive, but the allowed tolerance permits some leeway, run GC1 and feed it something like -XX:MaxGCPauseTimeMillis=100(default is 250ms)

  • If the issue spurs from external libraries, say one of them calls System.gc() or Runtime.getRuntime().gc() which are stop-the-world garbage collectors, you can override offending behavior by running with -XX:+DisableExplicitGC

VERSION START

VERSION END

DEFAULT GC

Java 1

Java 4

Serial Garbage Collector

Java 5

Java 8

Parallel Garbage Collector

Java 9

ongoing

G1 Garbage Collector

Note 1: since Java 15, ZGC is production-ready, but you still have to explicitly activate it with -XX:+UseZGC.

Note 2: The VM considers machines as server-class if the VM detects more than two processors and a heap size larger or equal to 1792 MB. If not server-class, it will default to the Serial GC.

In essence, opt for GC tuning when it's clear that the application's performance constraints are directly tied to garbage collection behavior and you have the necessary expertise to make informed adjustments. Otherwise, trust the JVM's default settings and focus on optimizing application-level code.

u/shiphe - you'll want to read the full comment

Other Relevant Libraries You May Want to Explore:

Java Microbenchmark Harness (JMH)

If you're optimizing out of feeling without any real benchmarking, you're doing yourself a disservice. JMH is the de facto Java library to test your algorithms' performance. Use it.

Java-Thread-Affinity

Pinning a process to a specific core may improve cache hits. It will depend on the underlying hardware and how your routine is dealing with data. Nonetheless, this library makes it so easy to implement that, if a CPU-intensive method is dragging you, you'll want to test it.

LMAX Disruptor

This is one of those libraries that, even if you don't need it, you'll want to study. The idea is to allow for ultra-low latency concurrency. But the way it's implemented, from mechanical sympathy to the ring buffer, brings a lot of new concepts. I still remember when I first discovered it, seven years ago, pulling an all-nighter to digest it.

Netflix jvmquake

The premise of jvmquake is that when things go sideways with the JVM, you want it to die and not hang. A couple of years ago, I was running simulations on an HTCondor cluster that was on tight memory constraints, and sometimes, jobs would get stuck due to "out of memory" errors.

This library forces the JVM to die, allowing you to deal with the actual error. In this specific case, HTCondor would auto-re-schedule the job.

Final Thoughts

The code that made me write this post? I've written way worse. I still do. The best we can hope for is to continuously mess up less.

I'm expecting to be disgruntled looking at my own code a few years down the road.

And that's a good sign.


Edits & Thank You:


Also published on wasteofserver.com


Written by wasteofserver | I can still remember when you could activate CPU turbo by pressing a button on the case.
Published by HackerNoon on 2024/04/06