How to Remove Duplicates in Go Slices

Written by vgukasov | Published 2021/11/10
Tech Story Tags: go | golang | algorithms | programming | learn-to-code | duplicates | software-development | data-structures

TLDRGolang is a great language with a rich standard library, but it still has some useful functions. Let’s imagine that there is a need to write a function that makes the user IDs slice unique. This approach covers your needs if you have problems with performance and can mutate the input slice. We sort the slice and then iterate through it via 2 iterators: unique iterator and the usual. After all iterations, we return the subslice up to a unique iterator.via the TL;DR App

Besides Golang being a great language with a rich standard library, it still has lacks some useful functions. Today we’re gonna learn the ways of removing duplicates in your slices. Let’s imagine that there is a need to write a function that makes the user IDs slice unique.

Approach #1: Safe Go way

This approach is beneficial if you need to keep an input slice immutable. We make a new slice and only put the unique items there. Also, we check the uniqueness by using the helper struct — map.

func RemoveDuplicates(userIDs []int64) []int64 {
    // we need only to check the map's keys so we use the empty struct as values 
    // since it consumes 0 bytes of memory.
	processed := make(map[int64]struct{})

	uniqUserIDs := make([]int64, 0)
	for _, uid := range userIDs {
        // if the user ID has been processed already, we skip it
		if _, ok := processed[uid]; ok {
			continue
		}

        // append a unique user ID to the resulting slice.
		uniqUserIDs = append(uniqUserIDs, uid)

        // mark the user ID as existing.
		processed[uid] = struct{}{}
	}

	return uniqUserIDs
}

Approach #2: Effective in-place

This approach covers your needs if you have problems with performance and can mutate the input slice. We sort the slice and then iterate through it via 2 iterators: unique iterator and the usual. After all iterations, we return the subslice up to a unique iterator.

import (
	"sort"
)

func RemoveDuplicatesInPlace(userIDs []int64) []int64 {
    // if there are 0 or 1 items we return the slice itself.
	if len(userIDs) < 2 {
		return userIDs
	}

    // make the slice ascending sorted.
	sort.SliceStable(userIDs, func(i, j int) bool { return userIDs[i] < userIDs[j] })

	uniqPointer := 0

	for i := 1; i < len(userIDs); i++ {
        // compare a current item with the item under the unique pointer.
        // if they are not the same, write the item next to the right of the unique pointer.
		if userIDs[uniqPointer] != userIDs[i] {
			uniqPointer++
			userIDs[uniqPointer] = userIDs[i]
		}
	}

	return userIDs[:uniqPointer+1]
}

Benchmarks

Let’s measure both approaches and see how they differ in performance. We generate a random slice via the helper function:

func generateSlice() []int64 {
	s := make([]int64, 0, l)

	for i := 0; i < 1_000_000; i++ {
		s = append(s, rand.Int63())
	}

	return s
}

Benchmarks code:

func BenchmarkRemoveDuplicates(b *testing.B) {
	s := generateSlice()
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		RemoveDuplicates(s)
	}
}

func BenchmarkRemoveDuplicatesInPlace(b *testing.B) {
	s := generateSlice()
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		RemoveDuplicatesInPlace(s)
	}
}

Result for 1000 (1k) random items:

goos: darwin
goarch: amd64
cpu: Intel(R) Core(TM) i5-8257U CPU @ 1.40GHz
BenchmarkRemoveDuplicates
BenchmarkRemoveDuplicates-8          	   14552	     82441 ns/op	   64115 B/op	      76 allocs/op
BenchmarkRemoveDuplicatesInPlace
BenchmarkRemoveDuplicatesInPlace-8   	  238521	      4852 ns/op	      56 B/op	       2 allocs/op

Result for 1 million (1kk) random items:

goos: darwin
goarch: amd64
cpu: Intel(R) Core(TM) i5-8257U CPU @ 1.40GHz
BenchmarkRemoveDuplicates
BenchmarkRemoveDuplicates-8          	       6	 169626486 ns/op	95007038 B/op	   38356 allocs/op
BenchmarkRemoveDuplicatesInPlace
BenchmarkRemoveDuplicatesInPlace-8   	      82	  13069547 ns/op	      56 B/op	       2 allocs/op

We clearly see that approach #2 is approximately 15 times faster than #1 and it doesn’t allocate new memory.

Conclusion

Now we know how to remove duplicates in the Go slice in different ways. Depending on your project and goals you can use the best approach that fits your needs.


Written by vgukasov | Senior SWE at Akma Trading
Published by HackerNoon on 2021/11/10