Optimising SETS for Fusion

I can’t remember the last time I’ve been this pleased with an algorithm.

SETS has delivered shorter and easier-to-read code, a tidy 30-40% speedup that scales even better for larger graphs, and better unit test conformance, fixing some of the most evasive and hard-to-reproduce Fusion bugs I’ve ever had to deal with - all of this with a turnaround time of under 24 hours to go from original idea to complete implementation.

I outlined the basics of the algorithm yesterday. Today I’m going to quickly run through the changes and benefits of the algorithm as applied to Fusion.

From naïve to optimised

The original (naïve) implementation of SETS looked very similar to what I put forward in the original blog post. This was never intended to be permanent; instead, I was testing the viability of the algorithm and seeing if it could pass all of the unit tests.

local function updateAll(root: PubTypes.Dependency)
	local counters = {}
	local flags = {}
	local queue = {}

	-- Pass 1: counting up
	for object in root.dependentSet do
		table.insert(queue, object)
	end
	while #queue > 0 do
		local next = table.remove(queue, 1)
		counters[next] = (counters[next] or 0) + 1
		for object in next.dependentSet do
			table.insert(queue, object)
		end
	end

	-- Pass 2: counting down + processing
	for object in root.dependentSet do
		table.insert(queue, object)
		flags[object] = true
	end
	while #queue > 0 do
		local next = table.remove(queue, 1)
		counters[next] -= 1
		local setFlag = false
		if counters[next] == 0 and flags[next] then
			setFlag = next:update()
		end
		for object in next.dependentSet do
			table.insert(queue, object)
			if setFlag then
				flags[object] = true
			end
		end
	end
end

It was trivial to optimise this. Both passes iterate over the same objects in the same order by design, so instead of taking objects out of the queue for processing, we can preserve them and simply walk up the queue ourselves. That way, we can eliminate the expensive calls to table.remove(queue, 1), and the second pass no longer has to do any table processing at all.

local function updateAll(root: PubTypes.Dependency)
	local counters = {}
	local flags = {}
	local queue = {}
	local queuePos = 1

	-- Pass 1: counting up
	for object in root.dependentSet do
		table.insert(queue, object)
	end
	while #queue > 0 do
		local next = queue[queuePos]
		counters[next] = (counters[next] or 0) + 1
		for object in next.dependentSet do
			table.insert(queue, object)
		end
        queuePos += 1
	end

	-- Pass 2: counting down + processing
	for object in root.dependentSet do
		flags[object] = true
	end
    queuePos = 1
	while #queue > 0 do
		local next = queue[queuePos]
		counters[next] -= 1
		local setFlag = false
		if counters[next] == 0 and flags[next] then
			setFlag = next:update()
		end
		for object in next.dependentSet do
			if setFlag then
				flags[object] = true
			end
		end
        queuePos += 1
	end
end

From here, it’s standard optimisation territory. We can move the flag initialisation into the first for loop, and store the size of the queue ourselves so we don’t need to use table.insert or the # operator. In the second pass, we can reverse the order of the if setFlag and for loop, then merge it with the if block above to eliminate the setFlag variable altogether. We can also deduplicate table accesses using variables, and finally add some type annotations to get it to pass strict typing:

local function updateAll(root: PubTypes.Dependency)
	local counters: {[Descendant]: number} = {}
	local flags: {[Descendant]: boolean} = {}
	local queue: {Descendant} = {}
	local queueSize = 0
	local queuePos = 1

	for object in root.dependentSet do
		queueSize += 1
		queue[queueSize] = object
		flags[object] = true
	end

	-- Pass 1: counting up
	while queuePos <= queueSize do
		local next = queue[queuePos]
		local counter = counters[next]
		counters[next] = if counter == nil then 1 else counter + 1
		if (next :: any).dependentSet ~= nil then
			for object in (next :: any).dependentSet do
				queueSize += 1
				queue[queueSize] = object
			end
		end
		queuePos += 1
	end

	-- Pass 2: counting down + processing
	queuePos = 1
	while queuePos <= queueSize do
		local next = queue[queuePos]
		local counter = counters[next] - 1
		counters[next] = counter
		if counter == 0 and flags[next] and next:update() and (next :: any).dependentSet ~= nil then
			for object in (next :: any).dependentSet do
				flags[object] = true
			end
		end
		queuePos += 1
	end
end

Outcomes

Before I applied SETS, here’s how the above updateAll function performed with its previous implementation. The benchmarks are pretty basic - we’re just running sample code snippets around half a million times and averaging the run time across all of them to get the average execution time of a single run.

Benchmark results:
[+] updateAll
   [+] Update deep tree ......................... 1.99μs
   [+] Update empty tree ........................ 0.28μs
   [+] Update shallow tree ...................... 1.60μs
   [+] Update tree with complex dependencies .... 2.92μs

After merging in SETS, the numbers mostly improved, which was a welcome result:

Benchmark results:
[+] updateAll
   [+] Update deep tree ......................... 1.43μs
   [+] Update empty tree ........................ 0.13μs
   [+] Update shallow tree ...................... 1.64μs
   [+] Update tree with complex dependencies .... 2.24μs

From my limited time playing around with SETS in Fusion, it seems to scale much better for larger graphs of objects, which is awesome. It also fixed the gnarly issues me and Zack came across yesterday, which is great news.

Now that we’ve implemented SETS, we’re pretty much code-ready to launch Fusion v0.2. Unless there are any other concerning bugs, the plan is to finish up the supporting material (docs, examples, etc), fix Wally, and go live.