NSIO: Neural Search Indexing Optimization
NSIO (Neural Search Indexing Optimization) investigates how to make the Differentiable Search Index (DSI) โ a model that memorises a document corpus and retrieves documents by generating their IDs โ more accurate, efficient, and memory-friendly.
Background: Differentiable Search Index
Traditional search systems use two separate components: an indexer (inverted index or dense vector store) and a retrieval model. DSI collapses both into a single sequence-to-sequence transformer: given a query, it generates the document ID of the most relevant document directly โ no separate index needed. This approach (Tay et al., 2022) shows strong results on MS MARCO but is expensive to fine-tune at scale.
Optimisations
Data Augmentation
- Num2Word: convert numeric tokens to their word forms, reducing out-of-vocabulary issues.
- Stopwords Removal: reduce noise in document representations.
- POS-MLM: Part-of-Speech guided masked language modelling to generate diverse query variants.
Parameter-Efficient Fine-Tuning (PEFT)
Avoid full model fine-tuning โ adapt only a small number of parameters:
| Method | Description |
|---|---|
| LoRA | Low-Rank Adapters injected into attention weight matrices |
| QLoRA | LoRA with 4-bit quantised base model โ drastically reduces GPU memory |
| AdaLoRA | Adaptive rank allocation โ higher rank for more important layers |
| ConvoLoRA | Novel convolutional LoRA variant for capturing local patterns |
Evaluation
All experiments evaluated on the MS MARCO document ranking benchmark. Metrics: MRR@10, Recall@100. QLoRA achieves near full-fine-tune accuracy at a fraction of the memory footprint.
Technology
Python, Hugging Face Transformers + PEFT library, PyTorch, Jupyter Notebooks.
