Back to Computational Complexity
Back to notes

# Long-Proof-Constant-Query PCP

80%

## Introduction

In this note, we are going to see a PCP with long proof (exponential long) but constant number of queries. Concretely, we are going to show that $\NP\in\PCP[\poly(n),O(1)]$.

### Flow

The construction is based on the satisfiability of quadratic polynomials, a.k.a., $\QPSAT$, which is a $\NP$-complete problem. By using Hadamard encoding, the verifier can efficiently query the evaluation of the prover’s quadratic function while in the meantime the verifier can also efficiently check the validity of the encoding.

## The $\NP$-complete problem: $\QPSAT$

Let’s define the $\QPSAT$ as follows.

• Input: $t=\poly(n)$ quadratic polynomials $P_1,\dots,P_t$ on $n$ variables over $\bbF_2$.
• Output: Output yes iff there’s a common root for $P_1,\dots,P_t$.

Clearly that $\QPSAT\in\NP$ since a root is a certificate for a yes instance.

### $\NP$-hardness of $\QPSAT$

Let’s show the $\NP$-hardness of $\QPSAT$ by reducing from $\CircSat$. The proof is actually quite simple and is recommended for reader to try it as an exercise.

$\CircSat\leq\QPSAT$.

Given a polynomial size circuit $C(x_1,\dots,x_n)$. Index the gates in $C$ by $g_{n+1},\dots,g_{n+m}$ where $m$ is a polynomial in $n$. For each $j\in[m]$, $g_{n+j}=(\tau,j_1,j_2,j_3)$ where $\tau:\bbF_2\times\bbF^2\rightarrow\bbF_2$ is the function of the gate and $j_1,j_2,j_3$ is the index of the two inputs and the single output. Define a $\QPSAT$ instance as follows.

• Define variables $y_1,\dots,y_{n+m}$ such that $y_i$ represents $x_i$ for $i\in[n]$ and $y_{n+j}$ represents the $g_{n+j}$ for $j\in[m]$.
• For each $j\in[m]$, define a quadratic polynomial $P_j$ on variables $y_{j_1},y_{j_2},y_{j_3}$ such that $P_j=0$ iff $\tau(y_{j_1},y_{j_2})=y_{j_3}$.
• The $\QPSAT$ instance is defined as $QP=(P_1,\dots,p_{m})$ on variables $y_1,\dots,y_{n+m}$.

Let’s check that this reduction is valid.

• (completeness) If $C$ is satisfiable, then set the variables of $QP$ according to the value of $x_1,\dots,x_n$ and the outputs of each gates in $C$. Clearly that by the definition $QP$ is satisfied by this assignment.
• (soundness) If $QP$ is satisfiable, then set $x_i=y_i$ for $i\in[n]$. Clearly that by the definition $C$ is satisfied by this assignment.

## Constant-query PCP for $\QPSAT$

In this section, our goal is to design an probabilistic checkable proof for $\QPSAT$. Intuitively, we imagine that the prover holds an assignment of the variables and encode it into a certificate. We want the following properties:

• (constant-query) The verifier only needs to query constantly many bits in the certificate.
• (perfect completeness) If the assignment is a satisfying assignment, then the verifier will accept with probability 1.
• (constant soundness) If the assignment is not a satisfying assignment, the the verifier can find out with constant probability.

Given $x\in\bbF_2^n$, the Hadamard encoding of $x$ is defined as $H_1(x):=(\langle a,x\rangle)_{a\in\bbF_2}$. Namely, $H_1(x)$ encodes the evaluation of $x$ on every linear function: $p_a(x)=\sum_{i\in[n]}a_i\cdot x_i$ on $\bbF_2^n$ for any $a\in\bbF_2^n$.

Furthermore, we can extend the Hadamard encoding to quadratic polynomials as follows. Given $x\in\bbF_2^n$, define $H_2(X):=(\sum_{i,j\in[n]}B_{ij}X_{ij})_{B\in\bbF_2^{n\times n}}$. Namely, $H_2(xx^{\intercal})$ encodes the evaluation of every quadratic polynomials over $\bbF_2^n$.

Now, suppose the prover holds an assignment $x\in\bbF_2^n$ and the verifier wants to evaluate $p(x)=x^{\intercal}Bx+a^{\intercal}x$. The verifier can simply query $(H_1(x))_a$ and $(H_2(X))_B$. That is, if the prover is honest, then the verifier can evaluate any quadratic polynomial with only two queries.

However, what if the prover is not honest? We need to come up with some methods to efficiently detect whether the prover is cheating to the verifier. Concretely, we need the following three guarantees.

• Make sure the certificates provided by the prover are valid Hadamard encodings for some $x$ and $X$.
• Make sure the verifier can evaluate an arbitrary quadratic polynomial.
• Make sure that $H_1(x)$ and $H_2(X)$ are consistent.

### Linearity testing: Efficiently detecting cheating

Here, we adopt the BLR linearity test. BLR test is a three-query test with $O(\log n)$ randomness. The following is the formal theorem statement.

Given oracle access to a boolean function $f:\bbF_2^n\rightarrow\bbF$, the BLR test works as follows:

1. Uniformly random sample $x,y\in\bbF_2^n$.
2. Query $x,y,x+y$.
3. If $f(x)+f(y)=f(x+y)$, then accept. Otherwise, reject.

The BLR test has the following theoretical guarantees:

• (perfect completeness) If $f$ is a linear function, then the test always accepts.
• (soundness) If the test accepts with probability at least $1-\delta$ for some $0\leq\delta\leq 1$, then $f$ is $\delta$-close to a linear function.

The analysis of BLR test is based on Fourier analysis. Please see my note on BLR test for details.

Note that we can view the Hadamard encoding of any quadratic terms of $x$ as a linear function from $\bbF_2^{n\times n}$ to $\bbF_2^n$. Thus, to test whether $H_1(x)$ and $H_2(X)$ is a valid encoding, we only need 6 queries and $O(\log n)$ randomness.

The verifier can use $O(\log n)$ randomness and 3 queries to test whether an oracle is close to a linear function with high probability.

### Self-correcting: Efficiently evaluating

Given oracle access to a boolean function $f:\bbF_2^n\rightarrow\bbF_2$ that is $\delta$-close to a linear boolean function $\tilde{f}:\bbF_2^n\rightarrow\bbF_2$, the goal of self-correcting is to evaluate $\tilde{f}$ with high success probability.

Given oracle access to a boolean function $f:\bbF_2^n\rightarrow\bbF_2$ that is $\delta$-close to a linear boolean function $\tilde{f}:\bbF_2^n\rightarrow\bbF_2$. For any $x\in\bbF_2^n$, the self-correcting process works as follows:

1. Uniformly random sample $y\in\bbF_2^n$.
2. Output $f(x+y)-f(y)$.

We have that the output of the self-correcting process is equal to $\tilde{f}(x)$ with probability at least $1-2\delta$.

As $f$ is $\delta$-close to $\tilde{f}$ and $x+y,y$ are uniformly distributed on $\bbF_2^n$, we have

• $\mathbb{P}[f(x+y)\neq\tilde{f}(x+y)]<\delta$.
• $\mathbb{P}[f(y)\neq\tilde{f}(y)]<\delta$.

Thus, $$\mathbb{P}[f(x+y)-f(y)\neq\tilde{f}(x)]\leq\mathbb{P}[f(x+y)\neq\tilde{f}(x+y)]+\mathbb{P}[f(y)\neq\tilde{f}(y)]<2\delta.$$

The verifier can use $O(\log n)$ randomness and 2 queries to evaluate a linear function with high probability.

### Consistency of $H_1$ and $H_2$

Given oracle access to $H_1(x)$ and $H_2(x)$, the verifier would like to make sure that the two Hadamard encodings are encoding the same assignment $x$. To do so, it suffices to check that $\langle a,x\rangle\cdot\langle a’,x\rangle=x^{\intercal}Bx$ where $B_{ij}=a_ia’_j$ for any $a,a’\in\bbF_2^n$. However, it is impossible for the verifier to check every $a,a’\in\bbF_2^n$. As a result, the verifier will randomly pick $a,a’$ and then do the check.

Given oracle access to $f_1,f_2$, the verifier performs the following consistency check:

1. Uniformly random sample $a,a’\in\bbF_2^n$. Take $B\in\bbF_2^{n\times n}$ such that $B_{ij}=a_ia’_j$ for any $i,j\in[n]$.
2. Denote $\text{self-correct}$ as the process defined in the previous subsection, the verifier checks whether $\text{self-correct}(f_1,a)\cdot\text{self-correct}(f_1,a’)=\text{self-correct}(f_2,B)$. If yes, then accepts. Otherwise, rejects.

We have the following theoretical guarantees:

• (perfect completeness) If $f_1=H_1(x)$ and $f_2=H_2(xx^{\intercal})$, then the verifier always accepts.
• (soundness) If $f_1$ is $\delta$-close to $H_1(x)$ and $f_2$ is $\delta$-close to $H_2(X)$. Suppose there exists $i,j\in[n]$ such that $X_{ij}\neq x_ix_j$, then the verifier rejects with probability at least $\frac{1}{4}-6\delta$.

• (perfect completeness) As $\text{self-correct}(f_1,a)=\langle a,x\rangle$, $\text{self-correct}(f_1,a’)=\langle a’,x\rangle$, and $\text{self-correct}(f_2,B)=x^{\intercal}Bx$, we have \begin{align} \text{self-correct}(f_1,a)\cdot\text{self-correct}(f_1,a’) &= \langle a,x\rangle\cdot\langle a’,x\rangle\\
&=\sum_{i,j\in[n]}a_ia_jx_ix_j\\
&=x^{\intercal}Bx=\text{self-correct}(f_2,B). \end{align}

• (soundness) As $f_1$ is $\delta$-close to $H_1(x)$ and $f_2$ is $\delta$-close to $H_2(X)$, we have \begin{align} \mathbb{P}[\text{self-correct}(f_1,a)\neq\langle a,x\rangle]&\leq2\delta\\
\mathbb{P}[\text{self-correct}(f_1,a’)\neq\langle a’,x\rangle]&\leq2\delta\\
\mathbb{P}[\text{self-correct}(f_2,B)\neq\sum_{i,j\in[n]}B_{ij}X_{ij}]&\leq2\delta. \end{align}

As $X\neq xx^{\intercal}$, say $X_{ij}\neq (xx^{\intercal})_{ij}$, we have $$\mathbb{P}[(xx^{\intercal}a’)_i \neq (Xa’)_i]=\frac{1}{2},$$ and thus \begin{align} \mathbb{P}[axx^{\intercal}a’ \neq aXa’]&\geq\mathbb{P}[(xx^{\intercal}a’)_i \neq (Xa’)_i]\\
&\cdot\mathbb{P}[axx^{\intercal}a’ \neq aXa’|\mathbb{P}[(xx^{\intercal}a’)_i \neq (Xa’)_i]]\\
&=\frac{1}{4}. \end{align}

Denote the event where $\text{self-correct}(f_1,a)=\langle a,x\rangle$, $\text{self-correct}(f_1,a’)=\langle a’,x\rangle$, and $\text{self-correct}(f_2,B)=\sum_{i,j\in[n]}B_{ij}X_{ij}$ as $E$, we have \begin{align} \mathbb{P}[\text{the verifier accepts}] &\leq\mathbb{P}[E]\cdot\mathbb{P}[\langle a,x\rangle\cdot\langle a’,x\rangle=\sum_{i,j\in[n]}B_{ij}X_{ij}|E] + \mathbb{P}[\bar{E}]\\
&\leq\frac{3}{4}+6\delta. \end{align} Thus, we conclude that $\mathbb{P}[\text{the verifier rejects}]\geq\frac{1}{4}-6\delta$.

• We assume that the two oracles are both $\delta$-close to some Hadamard encodings because we will perform the linearity test first. That is, if the two oracles passes the linearity test with high probability, then they are both close to some Hadamard encodings.