Billion-Dollar Mistake in Go?

The following sample code is from Go's standard library documentation:

data := make([]byte, 100)
count, err := file.Read(data)
if err != nil {
	log.Fatal(err)
}
fmt.Printf("read %d bytes: %q\n", count, data[:count])

It seems to be ok. It must be correct because it's from the official documentation of the standard library, right?

Let's spend a few seconds to figure out what's wrong with it before reading the documentation of the

io.Reader

which declares the

Read

function.

The

if

statement in the sample should have been written like this (at least):

if err != nil && err != io.EOF {

Did I trick you (and my self)? Why didn’t we check the

File.Read

function’s documentation? Isn’t it the correct one? Well, it shouldn’t be the only one.

What good comes with interfaces if we really cannot hide the implementation details with them? The interface should set its semantics, not the implementer as

File.Read

did. What happens to the code above when interface implementer is somethings else than

File

, but it still is

io.Reader

? It exits too early when it returns data and

io.EOF

together, which is allowed for all

io.Reader

implementers.

Interface vs Implementer

In Go, you don’t need to mark an implementer of the interface explicitly. It’s a powerful feature. But does it mean that we should always use interface semantics according to the static type? For example, should the following

Copy

function use

io.Reader

semantics?

func Copy(dst Writer, src Reader) (written int64, err error) {
	src.Read() // now read semantics come from io.Reader?
	...
}

But should this version use only

os.File

semantics? (Note, these are just dummy examples.)

func Copy(dst os.File, src os.File) (written int64, err error) {
	src.Read() // and now read semantics come from os.File's Read function?
	...
}

The practice has thought it’s always better to use interface semantics instead of the binding yourself to the implementation—the famous loose coupling.

Problems with io.Reader

The interface has the following problems :

You cannot safely use any implementation of the
```
Read
```
function without studying documentation of the
```
io.Reader
```
.
You cannot implement the
```
Read
```
function without closely studying documentation of the
```
io.Reader
```
.
The interface is not intuitive, complete, and idiomatic because of missing the error distinction.

The previous problems multiply because of

io.Reader

as an interface. That brings cross package dependency between every implementer of the

io.Reader

and every caller of the

Read

function.

There are many other examples in the standard library itself where callers of the

io.Reader

interface misuse it.

According to this issue, the standard library and especially its tests are tight to the

if err != nil

idiom which prevents optimizations in

Read

implementations.

For instance, you cannot return

io.EOF

immediately when it’s detected (i.e. together with the remaining data) without breaking some of the callers. The reason is apparent. The reader interface documentation allows two different types of implementations.

When Read encounters an error or end-of-file condition after successfully reading n > 0 bytes, it returns the number of bytes read. It may return the (non-nil) error from the same call or return the error (and n == 0) from a subsequent call.

Interfaces should be intuitive and formally defined with the programming language itself that you cannot implement or misuse them. You should not need to read the documentation to be able to do necessary error propagation.

It’s problematic that multiple (two in this case) different explicit behaviour of the interface function is allowed. The whole idea of the interfaces is that they hide the implementation details and enable loose coupling.

The most obvious problem is that the

io.Reader

interface isn’t intuitive nor idiomatic with the Go’s typical error handling. It also breaks the reasoning of the separated control paths: normal and error. The interface uses the error transport mechanism for something which isn’t an actual error.

EOF is the error returned by Read when no more input is available. Functions should return EOF only to signal a graceful end of input. If the EOF occurs unexpectedly in a structured data stream, the appropriate error is either ErrUnexpectedEOF or some other error giving more detail.

Errors as Discriminated Unions

The

io.Reader

interface and

io.EOF

show what is missing from Go’s current error handling and it is the error distinction. For example, Swift and Rust don’t allow partial failure. The function call either succeeds or it fails. That’s one of the problems with the Go’s error return values. The compiler cannot offer any support for that. It’s the same well-know problem with C’s non-standard error returns when you have an overlapping error return channel.

Herb Shutter conveniently put in his C++ proposal, Zero-overhead deterministic exceptions: Throwing values:

“Normal” vs. “error” [control flow] is a fundamental semantic distinction, and probably the most important distinction in any programming language even though this is commonly underappreciated.

Solution

Go’s current

io.Reader

interface is problematic because of the violation of the semantic distinction.

Adding The Semantic Distinction

First, we stop using error return for something which isn’t an error by declaring a new interface function.

Read(b []byte) (n int, left bool, err error)

Allowing Only Obvious Behaviour

Second, to avoid confusion and prevent clear errors we have guided to use the following helper wrapper to handle both of the allowed EOF behaviours. The wrapper offers only one explicit conduct to process the end of the data. Because the documentation says that returning zero bytes without any error (including EOF) must be allowed (“discouraged from returning a zero byte count with a nil error“) we cannot use zero bytes read as a mark of the EOF. Of course, the wrapper also maintains the error distinction.

type Reader struct {
	r   io.Reader
	eof bool
}

func (mr *MyReader) Read(b []byte) (n int, left bool, err error) {
	if mr.eof {
		return 0, !mr.eof, nil
	}
	n, err = mr.r.Read(b)
	mr.eof = err == io.EOF
	left = !mr.eof
	if mr.eof {
		err = nil
		left = true
	}
	return
}

We made an error distinction rule where error and success results are exclusive. We have used the distinction for the

left

return value as well. We will set it false when we have already read all the data which make the usage of the function easier as can be seen in the following for loop. You need to handle incoming data only when

left

is set, i.e. data is available.

for n, left, err := src.Read(dst); err == nil && left; n, left, err = src.Read(dst) {
	fmt.Printf("read: %d, data left: %v, err: %v\n", n, left, err)
}

As the sample code shows, it allows a happy path and error control flows to be separated, which makes program reasoning much easier. The solution we showed here isn’t perfect because Go’s multiple return values aren’t distinctive.

In our case, they all should be. However, we have learned that every newcomer (also them who are new with Go) can use our new

Read

function without documentation or sample code. That is an excellent example of how important the semantic distinction for happy and error paths is.

Conclusion

Can we say that

io.EOF

is a mistake? I’d say so. There is a perfect reason why errors should be distinct from expected returns. We should always build algorithms that praise happy path and prevent errors.

Go’s error handling practice is still missing language features to help the semantic distinction. Luckily, most of us already treat errors in the distinct control flow.