QIANets: Quantum-Integrated Adaptive Networks for Reduced Latency and Improved Inference Times in CNN Models

Convolutional neural networks (CNNs) have made significant advances in computer vision tasks, yet their high inference times and latency often limit real-world applicability. While model compression techniques have gained popularity as solutions, they often overlook the critical balance between low latency and uncompromised accuracy. By harnessing quantum-inspired pruning, tensor decomposition and annealing-based matrix factorization – three quantum-inspired concepts – we introduce QIANets: a novel approach of redesigning the traditional GoogLeNet, DenseNet, and ResNet-18 model architectures to process more parameters and computations whilst maintaining low inference time.
Proceedings at the 38th Conference on Neural Information Processing Systems, Machine Learning & Compression Workshop