Let $p$ be an $n$-variate and degree $d$ polynomial computed by an arithmetic formula of size $s$, then $p$ can also be computed by an arithmetic formula of size $O(\poly(s))$ and depth $O(\log s)$.
The key observation is that a formula is simply a tree and hence at any vertex of the formula, we can naturally define a sub-formula. This allows us to perform induction.
General circuits to poly-logarithmic depth and depth-4 circuits
We have seen depth reduction for formula before by induction on the depth of the sub formulas. Since a formula is a tree, the induction is natural and don’t have too many technicalities. It turns out that the depth reduction for arithmetic branching programs is not so difficult either and have the same $O(\log d)$ result. However, for general arithmetic circuits, it does not seem to have such nice structure for depth reduction since a circuit could be arbitrary directed graph. Surprisingly, the following theorem shows that any arithmetic circuits can be reduced to log-depth circuits. Concretely, it implies $\VP=\VNC{2}$.
Let $p$ be an $n$-variate and degree $d$ polynomial computed by a homogeneous arithmetic circuit $C$ of size $s$. Then, there exists a circuit $C'$ that computes $p$ and satisfies the following properties.
1. $\text{size}(C')=\poly(s,n,d)$ and $\text{depth}(C')=\log d$.
2. The product gates have bounded fan-in and the sum gates have unbounded fan-in.
3. For any product fate $u$ in $C'$ and $u'$ be its input. $\text{deg}(u')\leq\text{deg}(u)/2$.
From [VSBR83], we can further reduce general circuits to depth four circuits of polynomial-size.
Let $p$ be an $n$-variate and degree $d$ polynomial computed by a homogeneous arithmetic circuit $C$ of size $s$. Then, there exists a $\Sigma\Pi\Sigma\Pi$ circuit $C'$ of size $s^{O(\sqrt{d})}$ that computes $p$. Specifically, the fan-in of the product gates is $O(\sqrt{d})$.
Generalized gate quotient
Let $C$ be a circuit and $u,v$ be any two gates in $C$. The gate quotitient of $u$ with respect to $v$, denoted as $[u,v]$, is defined as follows.
1. If $u=v$, $[u,v]=1$.
2. If $v$ is not in the sub-circuit of $u$, then $[u,v]=0$.
3. If $u$ is a sum gate, i.e., $u=u_1+u_2$, then $[u,v]=[u_1,v]+[u_2,v]$.
4. If $u$ is a product gate, i.e., $u=u_L\cdot u_R$, then $[u,v]=[u_L]+[u_R,v]$.
Recall that $[u]$ is defined as the polynomial computed by $u$.
Let $C$ be a circuit and $m\in\N$. Define the $m$th front family as follows.
\begin{equation}
\mathcal{F}_m = \{v:\ v=u\cdot w,\ \text{deg}(v)\geq m,\ \text{deg}(u),\text{deg}(w)<m \}.
\end{equation}
Let $C$ be a circuit and $u,v$ be any two gates in $C$ such that $[u,v]\neq0$. We have $\text{deg}([u,v])=\text{deg}(u)-\text{deg}(v)$.
The proof is based on induction on the degree of $u$. Note that it suffices to consider $u$ being a product gate. For the base case, $u$ is a product gate multiplying two gates that are either variable or constant. Clearly that the equality holds. Consider $u=u_L\cdot u_R$, by definition we have $[u,v]=[u_L]\cdot[u_R,v]$ and thus
\begin{equation}
\text{deg}([u,v]) = \text{deg}([u_L]\cdot[u_R,v]) = \text{deg}(u_L) + \text{deg}([u_R,v]).
\end{equation}
By the induction hypothesis, we have $\text{deg}([u_R,v])=\text{deg}(u_R)-\text{deg}(v)$. As $\text{deg}(u_L)+\text{deg}(u_R)=\text{deg}(u)$, we have $\text{deg}([u,v])=\text{deg}(u)-\text{deg}(v)$.
Let $m\in\N$, $C$ be a circuit, and $u,v$ be any two gates in $C$ such that $\text{deg}(u)\geq m>\text{deg}(v)$. We have
\begin{equation}
[u] = \sum_{w\in\mathcal{F}_m}[w]\cdot[u,w],
\end{equation}
and
\begin{equation}
[u,v] = \sum_{w\in\mathcal{F}_m}[u,w]\cdot[w,v].
\end{equation}
The proof is based on induction on the depth of $u$. The base case is then $u\in\mathcal{F}_m$. That is, $u=u_L\cdot u_R$ where $\text{deg}(u_L),\text{deg}(u_R)<m$. Note that for any $w\in\mathcal{F}_m\backslash\{u\}$, as $\text{deg}(w)\geq m$, $w$ definitely does not lie in the sub-circuit of $u$ and thus $[u,w]=0$. Namely,
\begin{equation}
[u] = [u]\cdot[u,u] = \sum_{w\in\mathcal{F_m}}[w]\cdot[u,w].
\end{equation}
Similarly,
\begin{equation}
[u,v] = [u,u]\cdot[u,v] = \sum_{w\in\mathcal{F}_m}[u,w]\cdot[w,v].
\end{equation}
Now, consider two cases: 1. $u=u_1+u_2$ and 2. $u=u_L\cdot u_R$ where $\text{deg}(u_R)\geq m$.
1. By induction hypothesis, both $u_1$ and $u_2$ have frontier decomposition and thus
\begin{equation}
[u]=[u_1]+[u_2]=\sum_{w\in\mathcal{F}_m}[w]\cdot[u_1,w]+\sum_{w\in\mathcal{F}_m}[w]\cdot[u_2,w]=\sum_{w\in\mathcal{F}_m}[w]\cdot[u,w],
\end{equation}
and
\begin{align}
[u,v]&=[u_1,v]+[u_2,v]=\sum_{w\in\mathcal{F}_m}[u_1,w]\cdot[w,v]+\sum_{w\in\mathcal{F}_m}[u_2,w]\cdot[w,v]\\
&=\sum_{w\in\mathcal{F}_m}[u,w]\cdot[w,v].
\end{align}
2. By induction hypothesis, $u_R$ has frontier decomposition and thus
\begin{equation}
[u]=[u_L]\cdot[u_R]=[u_L]\cdot\sum_{w\in\mathcal{F}_m}[w]\cdot[u_R,w]=\sum_{w\in\mathcal{F}_m}[w]\cdot[u,w],
\end{equation}
and
\begin{equation}
[u,v]=[u_L]\cdot[u_R,v]=[u_L]\cdot\sum_{w\in\mathcal{F}_m}[u_R,w]\cdot[w,v]=\sum_{w\in\mathcal{F}_m}[u,w]\cdot[w,v].
\end{equation}
Proof of [VSBR83]
The proof of the main theorem is then based on induction on the degree of each gates. Concretely, we are going to construct a series of circuits that $C_0,C_1,\dots,C_{\log d}$ such that for any $i\in\{0,1,\dots,\log d\}$, $C_i$ computes
(i) $[u]$: $\forall u\in C$, $2^i<\text{deg}(u)\leq2^{i+1}$.
(ii) $[u,v]$: $\forall u,v\in C$, $2^i<\text{deg}([u,v])\leq2^{i+1}$.
Also, $\text{depth}(C_i)=O(i)$ and $C_i$ satisfies the conditions stated in the theorem for $C’$. In the end, output $C’=C_{\log d}$.
One can first see that $C_0$ can be done with brute-force. Next, consider $i=1,2,\dots,\log d$.
(i) For any $u\in C$, $2^i<\text{deg}(u)\leq2^{i+1}$.
The goal is to construct a sum of product of lower level circuit, i.e., degree less than $\text{deg}([u])/2$, for $[u]$. Pick $m=\text{deg}(u)/2-1$. By the frontier decomposition lemma, we have
\begin{equation}
[u] = \sum_{w\in\mathcal{F}_m}[w]\cdot[u,w].
\end{equation}
By the degree of gate quotitient lemma, we know that $\text{deg}([u,w])=\text{deg}([u])-\text{deg}([w])< m$. Thus, by induction hypothesis, we have a depth $O(i)$ circuit for $[u,w]$. However, $[w]$ the degree exceeds $m$ so we don’t have the guarantee of induction hypothesis. Nevertheless, as $w\in\mathcal{F}_m$, we know $w=w_L\cdot w_R$ where $\text{deg}(w_L),\text{deg}(w_R)<m$. As a result, $[u]$ can be constructed by a sum of three product of circuits from lower induction level. Note that as $\card{\mathcal{F}_m}\leq s$, the size of the new circuit for $[u]$ is still polynomial in $s$.
(ii) For any $u,v\in C$, $2^i<\text{deg}([u,v])\leq2^{i+1}$.
The goal is to construct a sum of product of lower level circuit, i.e., degree less than $\text{deg}([u,v])/2$, for $[u,v]$. Pick $m=\frac{\text{deg}(u)+\text{deg}(v)}{2}-1$. By the frontier decomposition lemma, we have
\begin{equation}
[u,v] = \sum_{w\in\mathcal{F}_m}[u,w]\cdot[w,v].
\end{equation}
By the degree of gate quotitient lemma, we know that $\text{deg}(u)-\text{deg}(w)\leq\frac{\text{deg}(u)-\text{deg}(v)}{2}$. However, it is unclear about the degree of $[w,v]$ so we decompose $w$ by the definition of frontier and yield $[w,v]=[w_L]\cdot[w_R,v]$ where $\text{deg}(w_L),\text{deg}(w_R)<m$. Note that $\text{deg}([w_R,v])\leq\frac{\text{deg}(u)-\text{deg}(v)}{2}$. As a result, now we only need to deal with $[w_L]$.
Pick $m’=\frac{\text{deg}(u)-\text{deg}(v)}{2}$. By the frontier decomposition lemma, we have
\begin{equation}
[w_L] = \sum_{w’\in\mathcal{F}_{m’}}[w’]\cdot[w_L,w’].
\end{equation}
By the degree of gate quotitient lemma, we know that $\text{deg}([w_L,w’])\leq\frac{\text{deg}(u)-\text{deg}(v)}{2}$ and by applying frontier decomposition lemma on $[w’]$, we finally get $[w’]=[w’_L]\cdot[w’_R]$ where $\text{deg}([w’_L]),\text{deg}([w’_R])\leq\frac{\text{deg}(u)-\text{deg}(v)}{2}$.
To sum up, we have
\begin{equation}
[u,v] = \sum_{w\in\mathcal{m},w’\in\mathcal{F}_{m’}}[u,w]\cdot[w’_L]\cdot[w’_R]\cdot[w_L,w’]\cdot[w_R,v].
\end{equation}
That is, $[u,v]$ can be constructed by a sum of five product of circuits from lower induction level.
Proof of reduction to depth four circuits
From [VSBR83], we know that if a degree $d$ polynomial $p$ can be computed by a size $s$ circuit $C$, then there exists a circuit $C’$ that computes $p$ as follows.
\begin{equation}
C’= \sum_{i\in[s]}g_{i1}\cdot g_{i2}\cdots g_{i5},
\end{equation}
where $g_{ij}$ is in the form of $\Sigma^s\Pi^{d/2}$, i.e., sum of product of polynomials of degree at most $d/2$. Now, the goal is to transform those $g_{ij}$ into $\Sigma\Pi^t$ where $t$ could be much less that $d/2$, say $O(\sqrt{d})$. A naive approach is replacing those $g_{ij}$ that consists some polynomials having degree greater than $t$. Concretely, suppose $g_{ij}=\sum_{i’}g’_{i’1}\cdot g’_{i’2}\cdots g’_{i5}$ and exists $j’\in[5]$ such that $\text{deg}(g’_{i’j’})>t$. We will replace $g_{ij}$ by expanding it with Theorem (VSBR83). Note that in each round, we will scan through all $i\in[s]$ in the first level and replace thouse which has a large degree component. For instance, say $g_{i5}$ has component of large degree for every $i\in[s]$, then we replace it by $g_{i5} = \sum_{i’}g’_{i’1}\cdots g’_{i’5}$ and get
\begin{equation}
C’= \sum_{i,i’\in[s]}g_{i1}\cdot g_{i2}\cdots g_{i4}\cdot g’_{i’1}\cdots g’_{i’5}.
\end{equation}
Now, we are claiming that this replacing procedure can be done in $O(d/t)$ steps. Namely, after $O(d/t)$ rounds, the circuit will become in $\Sigma\Pi\Sigma\Pi^{t}$ form. Specifically, the number of summands would be $s^{O(d/t)}$ since each rouns increase at most $s$ summands.
The main reason is because the circuit is homogeneous. As a result, after the decomposition the total degree at the first level remains $d$. As in each round we only decompose those component that has degree larger than $t$, there exists at least one new component having degree at least $t/5$. Suppose we perform $r$ rounds, the total degree is at least $r\cdot t/5$, which should be less than $d$. Thus, we have $r=O(d/t)$.
Last but not least, let’s estimate the size of the resulting circuit. As we can brute-forcely generate all monomials of degree at most $t$, the last two levels have size at most $n^t=s^{(t)}$. Also, as we discussed before the total number of summands in the first level is at most $s^{O(t/d)}$ and thus the size of the resulting circuit is at most $s^{O(t+d/t)}$. By picking $t=O(\sqrt{d})$, we have a $\Sigma\Pi\Sigma\Pi$ circuit of size $s^{O(\sqrt{d})}$ for $p$.