Is Hierarchical Quantization Essential for Optimal Reconstruction?
Can a single-level VQ-VAE, with matched representational budget and no codebook collapse, equal the reconstruction fidelity of its hierarchical counterpart? We revisit this question by comparing a two-level hierarchical VQ-VAE and a capacity-matched single-level model on high-resolution ImageNet images.