4

Impractical Experiments #3: Treating Controls Like Values

 3 years ago
source link: https://algassert.com/impractical-experiments/2015/05/17/Treating-Controls-like-Values.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

In this too-long post, I explain a hack I used to represent an operation modifier as an operation, then explore some of the mathematics around that hack.

Operations and Circuits as Matrices

Every operation on a circuit, whether the circuit is classical, probabilistic, or quantum, can be represented as a matrix.

Representing operations as matrices is nice. Doing so makes it easy to compile the overall effect of a circuit into a single operation: just multiply the matrices together. Also, combining independent operations that apply to different wires becomes straightforward: just use the Kronecker product.

Intuitively, the Kronecker product A⊗B works by tiling B inside of A, then scaling each tile by the coefficient it was paired with. For example, suppose you want to apply a Hadamard gate H to one wire and a NOT gate X to the other wire, like this:

──H──
──X──

The circuit's overall matrix is computed like this:

H=1√2[111−1]

X=[0110]

X⊗H=1√2[0⊗H1⊗H1⊗H0⊗H]=1√2[02HH02]=1√2[0011001−111001−100].

You also need to use the Kronecker product when the other wire doesn't have an operation, to expand the operation's matrix so it can be applied to the larger circuit's larger state vector. (To avoid affecting the state of the other wires, you Kronecker-product against the identity matrix (i.e. a no-op).)

Another way to make an operation apply to more wires is control it.

The Control Matrix

Controlled operations are operations conditioned to only occur if a designated control wire is ON. In diagrams, the control wire is indicated by covering it with a small black circle and connecting it to the other operation with a line:

──•──
  │
──X──

Sometimes, when making quick ASCII circuit diagrams, I omit the connecting line. A side effect of this is that the control ends up looking like an independent operation:

──•──
──X──

When I look at the above diagram, my first uninformed instinct is "What's the matrix for that weird • operation?".

Of course there is no matrix for •, because controls aren't operations. Controls are operation modifiers. Trying to compute the matrix for a controlled operation by evaluating X⊗ • is simply a type error. It's not even wrong.

... But ...

What if we tweaked the rules of arithmetic a bit, so that there was a matrix for this so-called "• gate"? When I wrote my toy quantum circuit simulator, that's exactly what I did.

The hack I used was to introduce a special value, which I'll call μ in this post. In the code, μ was just an instance of the Complex class. It had real part 1, and imaginary part 0, so it would act just like a normal 1 everywhere. That is, except in the code for the Kronecker product which explicitly special cased it:

def kronecker_product(m1, m2):
    w1, h1 = len(m1), len(m1[0])
    w2, h2 = len(m2), len(m2[0])
    return [[
        controlled_product(m1[i1][j1], m2[i2][j2], i1, i2, j1, j2)
        for i1 in range(w1), i2 in range(w2)]
        for j1 in range(h1), j2 in range(h2)]

def controlled_product(v1, v2, i1, i2, j1, j2):
    if v1 is SPECIAL_CONTROL_ONE:
        return SPECIAL_CONTROL_ONE if i2==j2 else 0
    if v2 is SPECIAL_CONTROL_ONE:
        return SPECIAL_CONTROL_ONE if i1==j1 else 0
    return v1*v2

Basically what the above code is saying is: when you tile one matrix inside the other, any tiled-inside matrix that gets paired with μ is replaced with a matrix with μs along its diagonal. In other words, μ⊗U has been defined (unusually) to be different from μ⋅U. Instead of μ⊗U=μ⋅U, we have μ⊗U=μ⋅I. For example, H⊗μ=[μ00μ].

Given this new μ value, it's easy to make a "• gate". When the input wire is OFF we want operations to be replaced by the identity matrix, so we'll Kronecker-scale by μ. When the input wire is ON, we want operations to apply, so we'll Kronecker-scale by 1. Thus we define the control gate's matrix to be:

C=[μ001]

Introducing μ and C is a useful hack, because the tweaks to the Kronecker product are small compared to having to add logic for noticing and generating controlled operations. Maybe μ is good for other things, too?

Tainted Numbers

I want to do more things with this μ value. I want to add it, multiply it, square it, you name it. Maybe it has more tricks up its sleeves.

A lot of interesting number systems start by introducing some new value, with special rules related to squaring. If you introduce a value i whose square is -1, you get the complex numbers. Complex numbers are super useful when dealing with rotating quantities in 2d. If you instead introduce a value ϵ whose square is 0, you get the dual numbers. Dual numbers make numeric differentiation really easy, because f(x+ϵ)−f(x)=ϵddxf(x). Introduce a value j whose square is +1 and you'll get the split-complex numbers. Split-complex numbers behave like time and space in special relativity.

So squaring seems like a good place to go when defining behavior. In the case of μ, I think the semantics we want are "sticking around". We want to use μ as a marker, something that sticks around when things are multiplied together so we can later tell what was controlled and what wasn't. With that in mind, we'll define μ2 to just be μ again.

I don't know of any standard name for the number system created by adding a μ such that μ2=μ. The lack of name probably has to do with the fact that it's a basis change away from being isomorphic to the split-complex numbers. Regardless, in this post I'll be calling μ-ified numbers tainted numbers (because μ taints values in a way that can't be removed).

Anytime you define a number system, the first order of business is to explore how typical operations behave. Is multiplication still commutative? Associative? Does division have more can't-divide-by-that corner cases? Do functions like ex do anything new?

For example, let's figure out what happens when you raise a tainted number a+bμ to the n'th power.

Start by expanding (a+bμ)n with the binomial theorem:

=nΣi=0(ni)ai(bμ)n−i

Now pull out the only term that won't get a μ factor:

=an+μn−1Σi=0(ni)aibn−i

And fill in the hole in the sum:

=an+(−an+nΣi=0(ni)aibn−i)μ

So that we can un-apply the binomial theorem, given us our answer:

(a+bμ)n=an+(a+b)nμ−anμ.

That turned out to simplify really well! (I enjoy this kind of thing far too much.)

Another good operation to test is the exponential function. The way to do that is to pick a definition of ex, usually the Taylor series definition ex=∞Σn=0xnn! works pretty well, and see what happens when you apply that definition to ea+μb. This is the same thing Euler did to show that eπi=−1. Unfortunately we won't come anywhere near that level of awe-inspiring surprise. ea+μb expands into:

=∞Σn=0(a+bμ)nn!

We can simplify the numerator of the summands by using the raised-to-nth-power equivalence we figured out a second ago:

=∞Σn=0an+(a+b)nμ−anμn!

Now that we have additions (and subtractions) inside our infinite sum, we can de-interleave it into three infinite sums. This isn't always a safe step (beware conditionally convergent series), but all of our sums converge absolutely so we'll probably only get yelled at a little:

=∞Σn=0ann!+μ∞Σn=0(a+b)nn!−μ∞Σn=0ann!

Each of the sums matches the series definition of ex. After packing the sums back into ex form, and apologizing for playing it a bit fast and loose (e.g. we used μ0 despite it having the same issues that 00 has), we get a nice solution:

ea+bμ=ea+(ea+b−ea)μ.

Notice that both exponentiating and raising to a power were affected in similar ways when generalized to work on tainted numbers. In both cases we found that f(a+bμ)=f(a)+(f(a+b)−f(a))μ. That's not a coincidence.

Consider the matrices I=[1001] and M=[0001]. Note that I⋅M=M and that M⋅M=M, just like how 1⋅μ=μ and μ⋅μ=μ. Addition, scaling, and multiplication of I and M also behave isomorphically to how they do for 1 and μ. This means we can use I and M, and linear combinations thereof, to represent tainted numbers! We can translate the number a+μb into the matrix [a00a+b], then translate facts about that kind of matrix back into facts about tainted numbers. For example, we can explain why functions are generalizing in the same way.

(You can use the same represent-the-values-as-a-matrix trick for complex numbers, for dual numbers, for combinations thereof, and for lots of other algebras. Unfortunately I don't know the term for this general technique. I thought it was called "representing X as a matrix algebra", but there's no Wikipedia article for "matrix algebra".)

The eigenvalues of the matrix [a00a+b] are just a and a+b. A good rule of thumb for applying functions to a matrix is to decompose the matrix into its eigenvalue/vector parts, transform the eigenvalues with the function in question, then put the matrix back together. (I think it works as long as the function is analytic?) That's why f(a+μb) ends up being in terms of f(a) and f(a+b): because a and a+b are the eigenvalues being transformed. Then, recovering the μ part, the new b, requires subtracting off the added-on a part. Thus the f(a+μb)=f(a)+μ(f(a+b)−f(a)) pattern.

Anyways, that ends the tangent into basic abstract algebra. Let's get back to operations on circuits.

Merging Operations Into Controls

When we have a circuit like this:

──H─X─H──
    │
──H─•─H──

We definitely can't merge the Hadamard gates on the top wire into the NOT gate. That would cause the Hadamards to also be controlled by the bottom wire, changing the behavior of the circuit. But what if we tried to multiply the bottom Hadamards into the control? Then we would get:

H⋅C⋅H

=12[111−1]⋅[μ001]⋅[111−1]

=12[1⋅μ+1⋅01⋅0+1⋅11⋅μ−1⋅01⋅0−1⋅1]⋅[111−1]

=12[μ1μ−1]⋅[111−1]

=12[μ+1μ−1μ−1μ+1]

Okay, so we get a "weird control" that has tainted numbers for all of its entries. What happens if we combine that weird control with X, using the Kronecker product with μ special-cased?

X⊗(H⋅C⋅H)

Let's start by tiling our known solution inside of the X gate's matrix:

=1√2[0⊗[μ+1μ−1μ−1μ+1]1⊗[μ+1μ−1μ−1μ+1]1⊗[μ+1μ−1μ−1μ+1]0⊗[μ+1μ−1μ−1μ+1]]

Flattening the above expression into a single matrix is a bit tricky. The μ are on the right-hand side this time, so the diagonal we should put μs on is bit harder to see. Basically, any μs in the top-left and bottom-right sections stay μs while μs in the top-right and bottom-left get replaced with 0s. It's also a bit confusing that the 1s being added/subtracted from the μs follow the normal Kronecker product rules, instead of the special-case follow-the-diagonal rule. Carefully put it all together and you'll find:

=12[μμ1−1μμ−111−1μμ−11μμ]

We don't need to keep track of what's controlling what anymore, so we can clean up the μs by applying the function clean(a+bμ)=a+b:

→12[111−111−111−111−1111]

We're almost done! Our circuit looks like this:

    ┌─┐
──H─┤?├─H──
    │?│
────┤?├────
    └─┘

To get the matrix for the whole circuit, we just need to multiply in those last two Hadamard gates:

(H⊗H)⋅(X⊗C)⋅(H⊗H)

=(H⊗I)⋅(X⊗(H⋅C⋅H))⋅(H⊗I)

=1√2[1010010110−10010−1]⋅12[111−111−111−111−1111]⋅1√2[1010010110−10010−1]

Which is equal to...

=[1000010000010010]

Which is this circuit:

──•──
  │
──X──

Which means... surrounding a controlled-not with Hadamard operations on all sides will swap which wire the control and the NOT are on! This is actually a well known trick, but the fact that we computed it correctly indicates that merging operations into controls is safe. We have not-rigorously-at-all shown that (C⊗U)⋅(V⊗I)=(C⋅V)⊗U (and this holds when you reverse all the ⊗ terms and/or all the ⋅ terms).

Intuitively, merging operations into the controls can be thought of as modifying the controls to apply in a different basis. For example, because the Hadamard gate swaps between the X basis and Z basis, merging in a Hadamard operation on each side of a control causes the control to apply to the X observable instead of the Z observable (the Z observable is the usual computational basis).

Multiple Controls

Do things continue to work when there's multiple controls? Let's consider a Toffoli gate:

──X──
  │
──•──
  │
──•──

We can start by combining the controls by themselves:

= [μ⊗[μ001]0⊗[μ001]0⊗[μ001]1⊗[μ001]]

= [μ0000μ0000μ00001]

Note that the entire diagonal is made up of μs, except for the bottom-right value (a 1). This pattern continues for all Kronecker powers C⊗n of C. (The resulting matrix can be stated succinctly in bra-ket notation: C⊗n=μI2n+(1−μ)|2n−1⟩⟨2n−1|.)

Getting back to computing the matrix of a Toffoli gate:

C⊗2⊗X

=[μ0000μ0000μ00001]⊗X

=[μ⊗X0000μ⊗X0000μ⊗X00001⊗X]

=[μ00000000μ00000000μ00000000μ00000000μ00000000μ000000000100000010]

The above matrix is in fact the correct matrix for the Toffoli gate (or, it would be after applying clean). So it appears that things continue to work when there are multiple controls. Let's try merging larger operations into larger controls, and see if that works.

Merging Multiple Operations into Multiple Controls Multiple Times

A task you're likely to run into a lot when computing is incrementing. Fortunately, circuits that increment are quite simple to make out of controlled-NOTs. Here's one that increments three bits (with wraparound on overflow):

──•─•─X──
  │ │
──•─X────
  │
──X──────

The pattern continues exactly like you're suspecting. To make an increment that works on more bits, just keep adding slightly larger controlled-NOTs in front.

The particular pattern of gates used to make an increment out of controlled-NOTS is interesting to us, because each operation has controls on all wires affected by the smaller operations. That suggests we can merge those smaller operations into the controls, and simplify (I⊗I⊗X)⋅(I⊗X⊗C)⋅(X⊗C⊗C) to use fewer large matrix multiplications.

To start with, note that the Kronecker product distributes over matrix multiplication. This lets us simplify the sub-expression (I⊗I⊗X)⋅(I⊗X⊗C) into I⊗((I⊗X)⋅(X⊗C)). Also, from what we figured out earlier about merging single operations into single controls, we can guess that (I⊗X)⋅(X⊗C) simplifies into X⊗(X⋅C).

Let's evaluate that simplified sub-expression:

X⊗(X⋅C)

=X⊗[01μ0]

=[0⊗[01μ0]1⊗[01μ0]1⊗[01μ0]0⊗[01μ0]]

=[0001μ000010000μ0]

Notice how, in the above matrix, the output (row with non-zero entry) is always one more than the input column. It's the increment matrix for 2 bits (or, it would be after cleaning)!

Knowing the 2-bit case, we can now evaluate the 3-bit case from the start:

(I⊗I⊗X)⋅(I⊗X⊗C)⋅(X⊗C⊗C)

Pull out the distributed I:

=(I⊗((I⊗X)⋅(X⊗C)))⋅(X⊗C⊗2)

Pull the X out by merging operations into said X's controls:

=X⊗(((I⊗X)⋅(X⊗C))⋅C⊗2)

Pull the inner X out by merging operations into said X's controls:

=X⊗((X⊗(X⋅C))⋅C⊗2)

Expand the 2-bit increment gate:

=X⊗([0001μ000010000μ0]⋅C⊗2)

Evaluate the matrix multiplication:

=X⊗[0001μ0000μ0000μ0]

Evaluate the Kronecker product:

=[00000001μ00000000μ00000000μ00000000100000000μ00000000μ00000000μ0]

And we're done. The same row-one-more-than-column pattern is back, meaning we do in fact have the three-bit increment matrix.

Thinking about what's happening more abstractly, the original construction for the matrix was "make a triangle of controlled-NOTs". More specifically: Inc(n)=Πni=1(I⊗i−1⊗X⊗C⊗n−i).

Our new construction is "do a smaller increment, but merged into a new controlled-not". More specifically: Inc(n)=X⊗(Inc(n−1)⋅C⊗n−1)

The reason this works all comes down to μ, of course. First, the smaller increment gate is multiplied by the controls. This causes all of the non-zero (on the off-by-one diagonal) to become μ, but leaves the 1 in the top-right corner alone. The Kronecker product then expands all those μs into [μ00μ], effectively duplicating them into the top-left and bottom-right quadrants of the new larger increment matrix. The only break in the new diagonal is the top-right corner of the bottom-left quadrant, but that's filled by the top-right 1 being expanded into [0110]. With that done, we have built a correct increment matrix from the smaller increment.

The benefit of the recursive merge-into-controls evaluation strategy is in the size of the multiplications. Initially, naively, we were doing n−1 matrix multiplications of size 2n x 2n. By doing most of the multiplications in the smaller recursive step, we're instead performing one 2x2 matrix multiplication, one 4x4 matrix multiplication, one 8x8 matrix multiplication, and so forth until 2n x 2n. This is a runtime speedup factor of n over the naive strategy. We would even get this speedup automatically (without special-casing incrementing) by instituting a "merge operations into controls when possible" optimization.

Which is not to say that it's the best approach...

Impractical

I must confess: I had fun working out all these facts about μ and C but, ultimately, I don't think they're very useful beyond the initial hack.

Consider the "speedup" we achieved by merging operations into controls. A better way to get that speedup would be to simply recognize that an operation with m controls affects at most 2n−m amplitudes. By only touching that subset of amplitudes, we get the same speedup in a much simpler way. Plus, if we were really focused about optimizing, we wouldn't be using matrix multiplications for X gates. For example, incrementing is just a rotate-array-by-1 operation. With a circular array, it's constant time! The "optimization benefits" are trivially beatable.

Another problem is that the optimizations due to μ will combine poorly with other optimizations, because μ breaks some mathematical identities. For example, it is no longer the case that (X⊗Y)⋅(Z⊗T)=(X⋅Z)⊗(Y⋅T). Any optimization that implicitly relies on that for correctness, especially ones where you blindly re-arrange gates, would need to check for μ or C beforehand.

Whether or not it's useful, I do think μ is interesting. Something worth experimenting with, impractical though it may be.

Summary

By introducing a special value μ, that the Kronecker product special-cases, the concept of "use this wire as a control for that operation" can be turned into an operation.

μ a succinct way to specify controlled operations, but ultimately amounts to technical debt due to breaking some useful mathematical identities.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK