Billion-Dollar Mistake in Go?

Written by lainio | Published 2020/09/30
Tech Story Tags: golang | go | error-handling | interface | programming | standard-library | package | coding

TLDR The interface should set its semantics, not the implementer as a static type. The interface is not intuitive, complete, and idiomatic because of missing the error distinction. It exits too early when it returns data and together, which is allowed for allimplementers. There are many other examples in the standard library itself where callers of the ioReader interface misuse it. Go's standard library and especially its tests are tight to the requirements of the Go standard library. The language itself that you implement or misuse them should not need to be able to do it.via the TL;DR App

The following sample code is from Go's standard library documentation:
data := make([]byte, 100)
count, err := file.Read(data)
if err != nil {
	log.Fatal(err)
}
fmt.Printf("read %d bytes: %q\n", count, data[:count])
It seems to be ok. It must be correct because it's from the official documentation of the standard library, right?
Let's spend a few seconds to figure out what's wrong with it before reading the documentation of the
io.Reader
which declares the
Read
function.
The
if
statement in the sample should have been written like this (at least):
if err != nil && err != io.EOF {
Did I trick you (and my self)? Why didn’t we check the 
File.Read
function’s documentation? Isn’t it the correct one? Well, it shouldn’t be the only one.
What good comes with interfaces if we really cannot hide the implementation details with them? The interface should set its semantics, not the implementer as 
File.Read
 did. What happens to the code above when interface implementer is somethings else than 
File
, but it still is 
io.Reader
? It exits too early when it returns data and 
io.EOF
 together, which is allowed for all 
io.Reader
 implementers.

Interface vs Implementer

In Go, you don’t need to mark an implementer of the interface explicitly. It’s a powerful feature. But does it mean that we should always use interface semantics according to the static type? For example, should the following 
Copy
 function use 
io.Reader
 semantics?
func Copy(dst Writer, src Reader) (written int64, err error) {
	src.Read() // now read semantics come from io.Reader?
	...
}
But should this version use only 
os.File
 semantics? (Note, these are just dummy examples.)
func Copy(dst os.File, src os.File) (written int64, err error) {
	src.Read() // and now read semantics come from os.File's Read function?
	...
}
The practice has thought it’s always better to use interface semantics instead of the binding yourself to the implementation—the famous loose coupling.

Problems with io.Reader

The interface has the following problems :
  • You cannot safely use any implementation of the 
    Read
     function without studying documentation of the 
    io.Reader
    .
  • You cannot implement the 
    Read
     function without closely studying documentation of the 
    io.Reader
    .
  • The interface is not intuitive, complete, and idiomatic because of missing the error distinction.
The previous problems multiply because of 
io.Reader
 as an interface. That brings cross package dependency between every implementer of the 
io.Reader
 and every caller of the 
Read
 function.
There are many other examples in the standard library itself where callers of the 
io.Reader
 interface misuse it.
According to this issue, the standard library and especially its tests are tight to the 
if err != nil
 idiom which prevents optimizations in 
Read
 implementations.
For instance, you cannot return 
io.EOF
 immediately when it’s detected (i.e. together with the remaining data) without breaking some of the callers. The reason is apparent. The reader interface documentation allows two different types of implementations.
When Read encounters an error or end-of-file condition after successfully reading n > 0 bytes, it returns the number of bytes read. It may return the (non-nil) error from the same call or return the error (and n == 0) from a subsequent call.
Interfaces should be intuitive and formally defined with the programming language itself that you cannot implement or misuse them. You should not need to read the documentation to be able to do necessary error propagation.
It’s problematic that multiple (two in this case) different explicit behaviour of the interface function is allowed. The whole idea of the interfaces is that they hide the implementation details and enable loose coupling.
The most obvious problem is that the 
io.Reader
 interface isn’t intuitive nor idiomatic with the Go’s typical error handling. It also breaks the reasoning of the separated control paths: normal and error. The interface uses the error transport mechanism for something which isn’t an actual error.
EOF is the error returned by Read when no more input is available. Functions should return EOF only to signal a graceful end of input. If the EOF occurs unexpectedly in a structured data stream, the appropriate error is either ErrUnexpectedEOF or some other error giving more detail.

Errors as Discriminated Unions

The 
io.Reader
 interface and 
io.EOF
 show what is missing from Go’s current error handling and it is the error distinction. For example, Swift and Rust don’t allow partial failure. The function call either succeeds or it fails. That’s one of the problems with the Go’s error return values. The compiler cannot offer any support for that. It’s the same well-know problem with C’s non-standard error returns when you have an overlapping error return channel.
Herb Shutter conveniently put in his C++ proposal, Zero-overhead deterministic exceptions: Throwing values:
“Normal” vs. “error” [control flow] is a fundamental semantic distinction, and probably the most important distinction in any programming language even though this is commonly underappreciated.

Solution

Go’s current 
io.Reader
 interface is problematic because of the violation of the semantic distinction.
Adding The Semantic Distinction
First, we stop using error return for something which isn’t an error by declaring a new interface function.
Read(b []byte) (n int, left bool, err error)
Allowing Only Obvious Behaviour
Second, to avoid confusion and prevent clear errors we have guided to use the following helper wrapper to handle both of the allowed EOF behaviours. The wrapper offers only one explicit conduct to process the end of the data. Because the documentation says that returning zero bytes without any error (including EOF) must be allowed (“discouraged from returning a zero byte count with a nil error“) we cannot use zero bytes read as a mark of the EOF. Of course, the wrapper also maintains the error distinction.
type Reader struct {
	r   io.Reader
	eof bool
}

func (mr *MyReader) Read(b []byte) (n int, left bool, err error) {
	if mr.eof {
		return 0, !mr.eof, nil
	}
	n, err = mr.r.Read(b)
	mr.eof = err == io.EOF
	left = !mr.eof
	if mr.eof {
		err = nil
		left = true
	}
	return
}
We made an error distinction rule where error and success results are exclusive. We have used the distinction for the 
left
 return value as well. We will set it false when we have already read all the data which make the usage of the function easier as can be seen in the following for loop. You need to handle incoming data only when 
left
 is set, i.e. data is available.
for n, left, err := src.Read(dst); err == nil && left; n, left, err = src.Read(dst) {
	fmt.Printf("read: %d, data left: %v, err: %v\n", n, left, err)
}
As the sample code shows, it allows a happy path and error control flows to be separated, which makes program reasoning much easier. The solution we showed here isn’t perfect because Go’s multiple return values aren’t distinctive.
In our case, they all should be. However, we have learned that every newcomer (also them who are new with Go) can use our new
Read
 function without documentation or sample code. That is an excellent example of how important the semantic distinction for happy and error paths is.

Conclusion

Can we say that 
io.EOF
 is a mistake? I’d say so. There is a perfect reason why errors should be distinct from expected returns. We should always build algorithms that praise happy path and prevent errors.
Go’s error handling practice is still missing language features to help the semantic distinction. Luckily, most of us already treat errors in the distinct control flow.

Written by lainio | Programmer
Published by HackerNoon on 2020/09/30