Skip to main content
  1. Posts/

New Day 13 - Wave and Collapse Pt 3

NewDays glitch-aura-djinn Go

Title page of <em>Worlds in the Making: The Evolution of the Universe</em> by Svante Arrhenuis
Title page of Worlds in the Making: The Evolution of the Universe by Svante Arrhenuis

In which we (maybe) finish playing with something almost, but not entirely, unlike wave function collapse.


But First…
#

The only thing better than writing code is deleting code.

GitLab diff showing the removal of the ESTM function
GitLab diff showing the removal of the ESTM function

At this point, the ESTM function is just XSTM with a neighbor distance of 1. Some of the other functions combine easily as well, letting me remove 300+ lines from wfcish.go. It’s good to go through and consolidate occasionally, especially when looking at future refactors that might impact many call sites.


Two Optimizations
#

Restructure the Rules
#

Previously, we’d been using a “list of rules” structure:

pkg/wfcish/wfcish.go
22
23
24
25
26
27
// Allowed neighbor rule
type NeighborRule struct {
  HomeGraphic     uint16
  NeighborGraphic uint16
  Offsets         map[Offset]bool
}

But the way we tend to look up these rules is by the tile (HomeGraphic above), then offset, to see what neighbor graphics are allowed. So we shift the structure to align to that pattern:

pkg/wfcish/wfcish.go
18
19
20
21
22
23
24
25
26
27
28
type Offset struct {
  X, Y int
}

type TileMap map[uint16]bool

type OffsetMap map[Offset]TileMap

// The RuleMap is keyed by the central tile, and contains a map of offsets to
// maps of neighbor tiles at that offset.
type RuleMap map[uint16]OffsetMap

Then deriving the ruleset becomes pretty straightforward:

pkg/wfcish/wfcish.go
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
func DeriveRuleSetExtended(cells map[world.CellKey]world.Cell, distance int, ignoreNulls bool) RuleMap {
  neighborRules := make(RuleMap)

  for cellKey, cell := range cells {
    if ignoreNulls && cell.Graphic == "  " {
      continue
    }
    for relX := -distance; relX <= distance; relX++ {
      for relY := -distance; relY <= distance; relY++ {
        if relX == 0 && relY == 0 {
          continue
        }
        neighborKey := world.CellKey{X: cellKey.X + relX, Y: cellKey.Y + relY, Z: cellKey.Z}
        neighborGraphic := "  " // default to null
        if neighbor, success := cells[neighborKey]; success {
          neighborGraphic = neighbor.Graphic
        } else if !ignoreNulls {
          continue
        }
        offset := Offset{X: relX, Y: relY}
        if _, success := neighborRules[world.StringToUint16(cell.Graphic)]; !success {
          neighborRules[world.StringToUint16(cell.Graphic)] = make(OffsetMap)
        }
        if _, success := neighborRules[world.StringToUint16(cell.Graphic)][offset]; !success {
          neighborRules[world.StringToUint16(cell.Graphic)][offset] = make(TileMap)
        }
        ((neighborRules[world.StringToUint16(cell.Graphic)])[offset])[world.StringToUint16(neighborGraphic)] = true
      }
    }
  }

  return neighborRules
}

And so is applying those rules:

pkg/wfcish/wfcish.go
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
func ApplyRulesExtended(waveMap *WaveMapContents, ruleSet RuleMap, cellKey world.CellKey, distance int) error {
  cell := waveMap.getCell(cellKey.X, cellKey.Y, cellKey.Z)
  cellTile := cell.only()

  // Loop through the neighbors to limit their possible tiles.
  for relX := -distance; relX <= distance; relX++ {
    for relY := -distance; relY <= distance; relY++ {
      if relX == 0 && relY == 0 {
        continue // self, not neighbor
      }
      if cellKey.X+relX < 0 || cellKey.Y+relY < 0 || cellKey.X+relX > waveMap.Xdim-1 || cellKey.Y+relY > waveMap.Ydim-1 {
        continue // out of bounds
      }
      // Does such a neighbor exist?
      var neighborTile uint16
      offset := Offset{X: relX, Y: relY}
      neighbor := waveMap.getCell(cellKey.X+relX, cellKey.Y+relY, cellKey.Z)
      if neighbor != nil && !neighbor.isCollapsed() {
        // Loop through the neighbor's possible tiles.
        for _, neighborTile = range neighbor.possibleTiles() {
          if !ruleSet[cellTile][offset][neighborTile] {
            neighbor.zeroWeight(neighborTile) // Zero out its weight.
          }
        }
        if neighbor.numPossibleTiles() == 0 {
          neighborKey := world.CellKey{X: cellKey.X + relX, Y: cellKey.Y + relY, Z: cellKey.Z}
          return errors.New(fmt.Sprint("After tile", world.Uint16ToString(cellTile), "(", cellTile,
            ") denies neighbor ", world.Uint16ToString(neighborTile), "(", neighborTile, "), cell",
            neighborKey, "has no possible tiles", neighbor))
        }
      }
    }
  }
  return nil
}

How’s that help us?

$ time go run cmd/xstm/main.go --mapname Annwn --maplevel 4 --method xstm4 -neighbor-distance 4 --max-retries 0 --seed 0 --max-seed 1000 --seek-valid --cpuprofile cpu4.prof

...

FAILED to collapse map.

real	0m44.555s
user	0m54.697s
sys	0m2.177s

$ go tool pprof -http=localhost:1415 cpu4.prof
CPU profile, using better rule struct
CPU profile, using better rule struct

Better, and may enable further optimizations later on.

A Few Small Bits of Concurrency
#

So far we’ve been tweaking and optimizing for a single thread of execution (garbage collection aside). Since concurrency is one of the primary features of the Go language, we should probably get around to using it.

Protect the globals
#

Since floating-point math takes time, we’ve been caching the results of math.Log2() calls used in the entropy calculations. It’s always integer weights coming in, so we have slices of float64 indexed by those integers; one for math.Log2(float64(i)) and one for float64(i)*math.Log2(float64(i)). If we want multiple calculations to happen concurrently, we’ll need to make sure these global structures are protected. How often do we need to expand that cache? After a tiny bit of instrumentation:

$ go run cmd/xstm/main.go --mapname Annwn --maplevel 4 --method xstm4 -neighbor-distance 4 --max-retries 0 --seed 0 --max-seed 2000 --seek-original
...
FAILED to collapse map.
log2cache: Cache hits: 21026036, misses: 3
nlog2ncache: Cache hits: 125213093, misses: 2

Expecting many many more reads than writes, we’ll use a sync.RWMutex lock:

pkg/wfcish/wfcish.go
42
43
44
45
46
47
var (
  log2CacheLock   sync.RWMutex
  nlog2nCacheLock sync.RWMutex
  log2cache       []float64
  nlog2ncache     []float64
)

We’ll take a read lock before looking at the cache. If it turns out we need to extend the cache (i.e. we haven’t yet seen a value this high), we’ll drop it and take a read-write lock before mutating the cache. (Note that we check a second time, since some other call could have extended the cache between us dropping the read lock and picking up the read-write lock.)

pkg/wfcish/wfcish.go
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
func cachedLog2(n int) float64 {
  log2CacheLock.RLock()
  defer log2CacheLock.RUnlock()
  if len(log2cache) <= n {
    log2CacheLock.RUnlock()
    extendCachedLog2(n)
    log2CacheLock.RLock()
  }
  result := log2cache[n]
  return result
}

func cachedNLog2N(n int) float64 {
  nlog2nCacheLock.RLock()
  defer nlog2nCacheLock.RUnlock()
  if len(nlog2ncache) <= n {
    nlog2nCacheLock.RUnlock()
    extendCachedNLog2N(n)
    nlog2nCacheLock.RLock()
  }
  result := nlog2ncache[n]
  return result
}

func extendCachedLog2(n int) {
  log2CacheLock.Lock()
  defer log2CacheLock.Unlock()
  if len(log2cache) <= n {
    for i := len(log2cache); i < n+1; i++ {
      log2cache = append(log2cache, math.Log2(float64(i)))
    }
  }
  return
}

func extendCachedNLog2N(n int) {
  nlog2nCacheLock.Lock()
  defer nlog2nCacheLock.Unlock()
  if len(nlog2ncache) <= n {
    for i := len(nlog2ncache); i < n+1; i++ {
      nlog2ncache = append(nlog2ncache, float64(i)*math.Log2(float64(i)))
    }
  }
  return
}

Update All Entropies Concurrently
#

Now that we’ve protected those caches, we can launch a set of goroutines to update all of the cell entropies that have become invalid:

pkg/wfcish/wfcish.go
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
func (wm *WaveMapContents) updateCellEntropies() {
  // Asynchronously make sure the entropy is valid in each interesting cell
  // (not nil or collapsed)

  // Which cells can we fan out?
  fanOuts := make([]*SuperpositionCell, 0, 50)
  for _, cell := range wm.cellArray {
    if cell == nil || cell.isCollapsed() || cell.entropyValid {
      continue
    }
    fanOuts = append(fanOuts, cell)
  }

  if len(fanOuts) > 10 {
    var wg sync.WaitGroup
    wg.Add(len(fanOuts))
    for _, cell := range fanOuts {
      go wm.asyncCalculateEntropy(cell, &wg)
    }
    // fmt.Println("Waiting for entropies:", len(fanOuts))
    wg.Wait()
  } else {
    for _, cell := range fanOuts {
      wm.calculateEntropy(cell)
    }
  }
}

(Note that we only take on the overhead of the goroutines if there are more than 10 cells to update; a rule of thumb we’d want to revisit in any further optimizations.)

The only real difference in the “async” CalculateEntropy() function is a call the waitgroup’s Done() when it exits, and the lack of a return value:

pkg/wfcish/wfcish.go
218
219
220
221
222
223
224
225
226
func (wm *WaveMapContents) asyncCalculateEntropy(cell *SuperpositionCell, wg *sync.WaitGroup) {
  defer wg.Done()

  var sum_of_weights int
  var sum_of_wlogw, entropy float64
  for _, tile := range cell.possibleTiles() {
    wInt := wm.TypicalCell.weights[tile]
    sum_of_weights += wInt
...

Any Better?
#

Ok, so now we can split up the relatively expensive entropy calculations among our CPU cores; does that improve things, and if so, how do we examine the details of the improvement?

First, some XSTM benchmarks:

$ git checkout main

$ go test -bench ^Benchmark_ -run XXX gitlab.com/301days/glitch-aura-djinn/pkg/wfcish
goos: linux
goarch: amd64
pkg: gitlab.com/301days/glitch-aura-djinn/pkg/wfcish
cpu: AMD Ryzen 5 3600X 6-Core Processor
Benchmark_XSTM1-12    	     766	   2600697 ns/op
Benchmark_XSTM2-12    	     300	   5012752 ns/op
Benchmark_XSTM3-12    	     237	   5145435 ns/op
Benchmark_XSTM4-12    	     208	   5620292 ns/op
PASS
ok  	gitlab.com/301days/glitch-aura-djinn/pkg/wfcish	7.563s

$ git checkout concurrency0

$ go test -bench ^Benchmark_ -run XXX gitlab.com/301days/glitch-aura-djinn/pkg/wfcish
goos: linux
goarch: amd64
pkg: gitlab.com/301days/glitch-aura-djinn/pkg/wfcish
cpu: AMD Ryzen 5 3600X 6-Core Processor
Benchmark_XSTM1-12    	     648	   2706977 ns/op
Benchmark_XSTM2-12    	     225	   5795243 ns/op
Benchmark_XSTM3-12    	     288	   4470520 ns/op
Benchmark_XSTM4-12    	     237	   4994846 ns/op
PASS
ok  	gitlab.com/301days/glitch-aura-djinn/pkg/wfcish	7.174s

A mixed bag, as it were; slower when we’re ignoring the tile frequencies, then a bit faster. Makes sense since we’ve put some effort behind optimizing the entropy calculations.

Now let’s look closely at the differences when running XSTM4, using the tile weights and making sure that an equivalent number of each tile are in the result:

$ git checkout main

$ time go run cmd/xstm/main.go --mapname Annwn --maplevel 4 --method xstm4 -neighbor-distance 4 \
  --max-retries 0 --seed 0 --max-seed 2000 --seek-original --cpuprofile main.prof
...
FAILED to collapse map.

real	1m45.859s
user	2m9.076s
sys	0m4.732s

$ git checkout concurrency0

$ time go run cmd/xstm/main.go --mapname Annwn --maplevel 4 --method xstm4 -neighbor-distance 4 \
  --max-retries 0 --seed 0 --max-seed 2000 --seek-original --cpuprofile concurrency0.prof
...
FAILED to collapse map.

real	1m28.382s
user	2m20.320s
sys	0m8.192s

$ go tool pprof -http=localhost:1415 -diff_base=main.prof concurrency0.prof
Serving web UI on http://localhost:1415
Profiling comparison showing improvements on concurrency branch
Profiling comparison showing improvements on concurrency branch

Hmm. minEntropyCellsIndex() is faster now, but ApplyRulesExtended() is much faster. So far, aligning the rules structure with its usage provided more benefit than concurrently calculating the cell entropy. Very instructive.


One More Thing…
#

A t-shirt which reads &quot;There are two hard things in computer science: 0. Cache Invalidation, 1. Naming Things, 2. Off-by-one Errors&quot;.
A t-shirt which reads "There are two hard things in computer science: 0. Cache Invalidation, 1. Naming Things, 2. Off-by-one Errors".

Sometimes it’s when we refactor or optimize that we notice simple logical errors that we’ve made earlier. In adding some concurrency to the cell entropy calculations, it’s important to look at what structures those calculations depend upon.

pkg/wfcish/wfcish.go
143
144
145
146
147
148
149
150
151
152
153
func (wm *WaveMapContents) calculateEntropy(cell *SuperpositionCell) float64 {
  if cell.entropyValid {
    return cell.entropy
  }

  var sum_of_weights int
  var sum_of_wlogw, entropy float64
  for _, tile := range cell.possibleTiles() {
    wInt := wm.TypicalCell.weights[tile]
    sum_of_weights += wInt
...

In a previous optimization, we introduced this TypicalCell to keep track of the tile weights in one place (and decrease them as tiles are used in XSTM4). But when decreasing those weights, we don’t invalidate the cells’ entropy cache unless one of the tile weights has dropped to zero:

pkg/wfcish/wfcish.go
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
func DecreaseWeights(waveMap *WaveMapContents, collapsedCell world.CellKey) error {
  waveMap.minEntropyCacheValid = false
  tile := waveMap.getCell(collapsedCell.X, collapsedCell.Y, collapsedCell.Z).only()
  if waveMap.TypicalCell.decrementWeight(tile) {
    // Decrementing caused tile's removal, remove from all active cells
    size := waveMap.getSize()
    for i := 0; i < size; i++ {
      cell := waveMap.getCellByIndex(i)
      if cell == nil || cell.isCollapsed() {
        // the cell itself should already be collapsed, so this skips that too.
        continue
      }
      cell.zeroWeight(tile)
      if cell.numPossibleTiles() < 1 {
        return errors.New(fmt.Sprint("After decreasing weight of tile ", world.Uint16ToString(tile), "(", tile, ") in cell ",
          i, "has no possible tiles", cell))
      }
    }
  }

  return nil
}

So let’s fix that, invalidating the entropy cache of all affected tiles:

pkg/wfcish/wfcish.go
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
  }
  // Invalidate the entropy caches of any affected cells (i.e. cells that
  // haven't had that tile zeroed out).
  size := waveMap.getSize()
  for i := 0; i < size; i++ {
    cell := waveMap.getCellByIndex(i)
    if cell == nil || cell.isCollapsed() || cell.weights[tile] == 0 {
      continue
    }
    cell.entropyValid = false
  }

  return nil
}

Let’s use some benchmarking to check the effect:

$ go test -bench ^Benchmark_ -run XXX gitlab.com/301days/glitch-aura-djinn/pkg/wfcish \
  -benchtime 60s -max-level-size 100
goos: linux
goarch: amd64
pkg: gitlab.com/301days/glitch-aura-djinn/pkg/wfcish
cpu: AMD Ryzen 5 3600X 6-Core Processor
Benchmark_XSTM1-12             	   36015	   2026882 ns/op
Benchmark_XSTM2-12             	   18231	   3941250 ns/op
Benchmark_XSTM3-12             	   18088	   4034165 ns/op
Benchmark_XSTM4BadCache-12     	   17599	   4063649 ns/op
Benchmark_XSTM4GoodCache-12    	   17288	   4174782 ns/op
PASS
ok  	gitlab.com/301days/glitch-aura-djinn/pkg/wfcish	545.287s

Not too bad, but those are small maps (under 100 characters). How does it look if we include some larger maps?

$ go test -bench ^Benchmark_ -run XXX gitlab.com/301days/glitch-aura-djinn/pkg/wfcish \
  -benchtime 60s -max-level-size 1000
goos: linux
goarch: amd64
pkg: gitlab.com/301days/glitch-aura-djinn/pkg/wfcish
cpu: AMD Ryzen 5 3600X 6-Core Processor
Benchmark_XSTM1-12             	     237	 299239987 ns/op
Benchmark_XSTM2-12             	     225	 317676325 ns/op
Benchmark_XSTM3-12             	     219	 328860283 ns/op
Benchmark_XSTM4BadCache-12     	     222	 323149607 ns/op
Benchmark_XSTM4GoodCache-12    	      68	1027748103 ns/op
PASS
ok  	gitlab.com/301days/glitch-aura-djinn/pkg/wfcish	485.632s

Oh dear.


More to come
More to come

gitch-aura-djinn New Day 13 Code