Go vs Rust: A Sto-array of Arrays

Written by mslapek | Published 2022/01/24
Tech Story Tags: rustlang | golang | bug | arrays | coding | programming | data | software-development

TLDRThe case study shows data corruption caused by the naive use of append(...) in Go - which is a quite often used operation of pushing an element to the end of an array. The memory model in Rust blocks the invalid operation during compile time. This security net is at cost of the steeper learning curve of Rust.via the TL;DR App

Want to see disappearing data in Go caused by an innocent append of data to an array? Can this happen in Rust? Check this out!
I was playing with Go and Rust. When I was reading slices intro in Go, the following question popped up in my head:
For Go: When
append(...)
is safe?
After deeper thinking, I realized, that the answer is the same as for the next question!
For Rust: What's the difference between
Vec<SomeType>
and
&[SomeType]
?
The text answers the questions giving an interesting insight into design differences between the two languages. The case study below shows data corruption caused by naive use of
append(...)
.
Don't forget to look up the attached GIT repo!

Album software in Go

Let's write simple album software to manage photos. This will be a web server in Go.
Meet Bob who is passionate about photography.
Bob likes to take photos. He gives them names, to be able to find them easily at drunken parties.
// Photo was taken in particular time using camera.
// Has non-unique name given by user.
type Photo struct {
	ID       int
	Name     string
	DateTime time.Time
}
Bob groups photos into albums, such as "Nature", "Cars", or "Egypt".
// Album is collection of photos sorted by time.
// Has non-unique name given by user.
type Album struct {
	ID     int
	Name   string
	Photos []Photo
}
He wants to easily find photos from his visit to Boston (which took place from 2019-06-15 to 2019-06-17).
// GetPhotos returns photos in given time interval.
func (album *Album) GetPhotos(start, end time.Time) []Photo {
	var i, j int
	// because album is already sorted by time,
	// we just calculate i and j containing photos
	// from the requested interval.

	... // omitted

	return p[i:j]
}
Bob wants to impress friends by showing vacation pictures with pyramids.
// GetHolidayPhotos gets photos from June.
func GetHolidayPhotos(album *Album, year int) []Photo {
	photos := album.GetPhotos(
		time.Date(year, 6, 1, 0, 0, 0, 0, time.UTC),
		time.Date(year, 7, 1, 0, 0, 0, 0, time.UTC),
	)

	return photos
}
Seems OK.
Just one more thing...
Bob would like also to show photos from the winter holidays at his mother's home. Fortunately, we can adapt
GetHolidayPhotos
with
append(...)
:
decemberPhotos := album.GetPhotos(
	time.Date(year, 12, 1, 0, 0, 0, 0, time.UTC),
	time.Date(year+1, 1, 1, 0, 0, 0, 0, time.UTC),
)
// photos already has photos from June.
photos = append(photos, decemberPhotos...)
Of course, we wrote nice unit tests like this:
func TestGetHolidayPhotos(t *testing.T) {
	a := createAlbum() // factory creating example album

	got := GetHolidayPhotos(a, 2021)
	want := []Photo{
		{
			ID:       2,
			Name:     "Camellia",
			DateTime: time.Date(2021, 6, 12, 14, 0, 0, 0, time.UTC),
		},
		..., // omitted
		{
			ID:       7,
			Name:     "More snow",
			DateTime: time.Date(2021, 12, 15, 14, 0, 0, 0, time.UTC),
		},
	}

	if !reflect.DeepEqual(got, want) {
		t.Errorf("Got = %v, want %v", got, want)
	}
}
Tests passing, git commit, push, done!

The next day...

You had a nightmare where Bob's photos from Egypt were eaten by mummies. Maybe something is wrong with our code?
GetHolidayPhotos
is a query, in other words, it shouldn't modify the album. Let's make additional unit test:
func TestGetHolidayPhotosNotMutatingInput(t *testing.T) {
	album := createAlbum()

	// (1) we serialize album to JSON.
	serializedAlbum, err := json.Marshal(album)
	if err != nil {
		t.Fatalf("Got error %v", err)
	}

	// (2) perform the query.
	GetHolidayPhotos(album, 2021)

	// (3) serialize after the query.
	postmortemSerializedAlbum, err := json.Marshal(album)
	if err != nil {
		t.Fatalf("Got error %v", err)
	}

	// (4) serialized album shouldn't change.
	if !bytes.Equal(postmortemSerializedAlbum, serializedAlbum) {
		t.Errorf(
			"Got postmortemSerializedAlbum = %v, want serializedAlbum = %v",
			string(postmortemSerializedAlbum[:]),
			string(serializedAlbum[:]),
		)
	}
}
What do you expect - will it pass?
$ go test
--- FAIL: TestGetHolidayPhotosNotMutatingInput (0.00s)
    album_test.go:155: Got postmortemSerializedAlbum = {"ID":0, ...
FAIL
exit status 1
It failed!
GetHolidayPhotos
is corrupting input data!

What happened?

Let's look at what happens inside
GetHolidayPhotos
in the failing test.
First, we get photos from June:
photos := album.GetPhotos( /* June date range */ )
Now
photos
points to contents of
album.Photos
-
photos
is equal to
album.Photos[2..4]
.
It's how Go sees
photos
slice:
The diagram has a few parts:
  • All items (gray, orange, blue) are in
    album.Photos
    . For example,
    album.Photos[2]
    has a photo of camellia.
  • Orange items are referenced by the
    photos
    slice. For example,
    photos[1]
    has a photo of daisy. The length of the slice is
    len(photos) = 2
    .
Next December photos are appended:
photos = append(photos, decemberPhotos...)
Blue elements are treated by Go as free slots for
append
.
Let's see how
photos
slice look after the append:
Some blue elements were replaced with photos from December. Thus
photos
indeed contains photos from June and December.
However, as you can see from the diagram, this operation destroyed the contents of
album.Photos
!
Photo of "Forget Me Not" was discarded, and there is a duplicate photo of snow from 2021-12-11 in
album.Photos[4]
and
album.Photos[6]
.
This effect was already described from a technical perspective. Below I show a simple rule of thumb to avoid such issues from a semantic perspective.

What is array duality?

I propose in this article two forms in which you can meet arrays:
  • owned - you can append to the array,
  • borrowed - you cannot append to the array.
Owned are used to store data, which might shrink or expand. It could be a repository of users or a list of currently opened files.
Borrowed are pointing to another array or its subarray. It's a kind of view. Could be used to pass one sentence from a string to a function. Might be returned as a query result.
In our album software
album.Photos
is owned array, because the album contains/manages photos.
On the other hand,
album.GetPhotos(start_time, end_time)
gives borrowed array - it returns the slice
album.Photos[i:j]
. This way it does not have to allocate a new array - instead, it points to the original array from the album.
Let's look, at how the duality looks in Go and Rust.

In Go

In Go, we need a bit of self-discipline from a developer. There is no clear indicator, of whether an array is owned or borrowed.
However, sometimes it's easy to infer this from context. In case of
Album
, we know that album contains photos. We could add a method
AddPhoto(photo Photo)
to make this explicit.
When we subslice an array, usually the result is borrowed array, as in
album.GetPhotos
.
The exception is subslicing used to pop elements from the end of an array:
// myPhotos[5], myPhotos[6], myPhotos[7], ..., are discarded.
myPhotos = myPhotos[:5]

In Rust

This whole "array duality" was inspired by the ownership model in Rust. The distinction between owned and borrowed array is explicitly marked by type.
In Rust owned array is represented by
Vec<Photo>
. You can push and pop elements:
let mut photos = Vec::new();
photos.push(Photo { ... });
A borrowed array is represented by a slice
&[Photo]
. You get it by subslicing an array:
let june_photos = &photos[2..5];
However, Rust forbids to append to borrowed array:
june_photos.push(Photo { ... });
gives compile error:
error[E0599]: no method named push found for reference &[Photo] in the current scope
That's it!
Let's summarize it with a table:

Bugfix

Let's make a walk-through and:
  • ask whether a given array is owned or borrowed,
  • make sure, that we do not append to borrowed arrays.
This way the bug in the Go server should be fixed.
The original code:
// GetHolidayPhotos gets photos from June and December.
func GetHolidayPhotos(album *Album, year int) []Photo {
	photos := album.GetPhotos(
		time.Date(year, 6, 1, 0, 0, 0, 0, time.UTC),
		time.Date(year, 7, 1, 0, 0, 0, 0, time.UTC),
	)
	// (1) album.GetPhotos gives *borrowed* array.

	decemberPhotos := album.GetPhotos(
		... // omitted
	)
	photos = append(photos, decemberPhotos...)
	// (2) we are appending to *borrowed* array - BAD!

	return photos
}
We must ensure, that we do not
append(...)
to a borrowed array. Fixed code:
// GetHolidayPhotos gets photos from June and December.
func GetHolidayPhotos(album *Album, year int) []Photo {
	var photos []Photo
	// (1) we create a new empty array.
	// It's *owned* array.

	junePhotos := album.GetPhotos(
		time.Date(year, 6, 1, 0, 0, 0, 0, time.UTC),
		time.Date(year, 7, 1, 0, 0, 0, 0, time.UTC),
	)	
	photos = append(photos, junePhotos...)
	// (2) Appending to *owned* array - OK.

	decemberPhotos := album.GetPhotos(
		time.Date(year, 12, 1, 0, 0, 0, 0, time.UTC),
		time.Date(year+1, 1, 1, 0, 0, 0, 0, time.UTC),
	)
	photos = append(photos, decemberPhotos...)
	// (3) Appending to *owned* array - OK.

	return photos
}
Notice that appending to empty slice in Go is essentially a copy of the array in the argument.
Now test
TestGetHolidayPhotosNotMutatingInput
passes!

Rewrite It in Rust!

Out of curiosity, I've ported the software to Rust.
Album
and
Photo
structs are defined similarly as in Go. It's noteworthy that
Album
has explicitly owned an array of photos:
/// Album is a collection of photos sorted by time.
/// Has a non-unique name given by the user.
#[derive(Debug, PartialEq, Eq, Clone, Serialize, Deserialize)]
pub struct Album {
    pub id: u64,
    pub name: String,
    pub photos: Vec<Photo>,
}

album.get_photos(..)
returns borrowed array, notice returned type:
impl Album {
    /// Returns photos in given time interval.
    pub fn get_photos(
        &self,
        start: DateTime<Utc>,
        end: DateTime<Utc>,
    ) -> &[Photo] {
        let mut i: usize;
        let mut j: usize;
        // because album is already sorted by time,
        // we just calculate i and j containing photos
        // from the requested interval.

        ... // omitted

        &self.photos[i..j]
    }
}

And
get_holiday_photos
:
/// Gets photos from June and December.
pub fn get_holiday_photos(album: &Album, year: i32) -> Vec<Photo> {
    let mut photos = Vec::new();
    // (1) we create a new empty array.
    // It's *owned* array, because it's type is `Vec<Photo>`.

    let june_photos = album.get_photos(
        DateTime::from_utc(
            NaiveDate::from_ymd(year, 6, 1).and_hms(0, 0, 0),
            Utc,
        ),
        DateTime::from_utc(
            NaiveDate::from_ymd(year, 7, 1).and_hms(0, 0, 0),
            Utc,
        ),
    );    
    photos.extend_from_slice(june_photos);
    // (2) Appending to *owned* array - OK.

    let december_photos = album.get_photos(
        ... // omitted
    );    
    photos.extend_from_slice(december_photos);
    // (3) Appending to *owned* array - OK.

    photos
}

Trade-off

This case study highlights the differences between Go and Rust.
Rust requires a developer to declare explicitly how data will be used. Thus the compiler can statically verify this promise - which is useful for a team of developers. Go requires more self-discipline (the most notable example is passing data among threads).
Sometimes a library can take advantage of such declarations - for instance, JSON deserialization can take a borrowed substring of input, instead of copying it (a.k.a. "zero-copy deserialization") - more in the documentation of serde library.
On the other hand, Go has a lower learning curve - no discussion about vectors, ownership, or async stuff. A lot of problems can be solved by defensive programming, like array copying before append or sharing memory by communicating. You do not have to know about the existence of cows in the standard library...

Cows?

Up to this point, I've discussed two disjoint scenarios:
  • always returning from function owned array,
  • always returning from function borrowed array.
However, there is a third case: function might at runtime dynamically decide, whether to return owned or borrowed array.
Bob would like to show his holiday photos, but only from summer.
// GetHolidayPhotos gets photos from June and December.
func GetHolidayPhotos(
	album *Album,
	year int,
	june, december bool,
) []Photo
Instead of allocating always a new (owned) array, we can sometimes return borrowed array:
junePhotos := album.GetPhotos(
	... // omitted
)
// (1) `junePhotos` is borrowed slice from album.Photos.

if june && !december {	
	return junePhotos
	// (2) returning in this case *borrowed* array.
}

// (3) Original code returning *owned* array.
As was shown in the table, owned and borrowed arrays have the same type in Go.
And in Rust? Which type should be returned?
/// Gets photos from June and December.
pub fn get_holiday_photos(
    album: &Album,
    year: i32,
    june: bool,
    december: bool,
) -> Vec<Photo> // or &[Photo] ?
Let's try similar optimization as in Go:
if june && !december {
	return june_photos; // returning &[Photo]
}
gives error
error[E0308]: mismatched types
because
Vec<Photo>
is a different thing than
&[Photo]
.
This shows the steeper learning curve in Rust caused by the requirement of an explicit declaration of an array type (owned vs borrowed). To know how to solve this, you must be aware that cows exist and read the mysteriously named article "The Secret Life of Cows"!
Rust has a distinct type for such dynamic ownership:
Cow<[Photo]>
, which might contain owned or borrowed array. The first time I've met a cow in the wild was the search and replace string function in regex library.
For the sake of completeness, I give there updated table:

Conclusions

The text described "array duality" in Go and Rust, showing when it is correct to append a new value to some array.
The way the concept is handled in these post-2010 languages shows philosophical differences between them. Rust requires a developer to explicitly state the duality of each array in code. On the other hand, Go doesn't burden developers with such annotations, instead requires self-discipline from a developer.
This article was first published here

Written by mslapek | Plays with machine learning. Likes to solve problems in novel ways.
Published by HackerNoon on 2022/01/24