Probabilistic Information Retrieval & LM-based IR
Probabilistic models of IR: BM25, language models, and relevance models for document retrieval.
Retrieval-augmented generation, search systems, and document intelligence.
28
Courses
2
Subcategories
909h+
Total Hours
All levels
Difficulty Range
Probabilistic Information Retrieval & LM-based IR
Probabilistic models of IR: BM25, language models, and relevance models for document retrieval.
Learning-to-Rank
Consistency and calibration of learning-to-rank: pairwise, listwise losses, and surrogate analysis.
Dense vs Sparse Retrieval
Theory of dense and sparse neural retrieval: representation, training, and fusion strategies.
Metric Learning & Approximate Nearest Neighbor
Theory of metric learning losses and ANN data structures for embedding-based retrieval.
RAG Error Decomposition & Performance Bounds
Analyze RAG system errors: retrieval failures, generation hallucinations, and end-to-end performance bounds.
Evaluation Theory in IR/NLP
Rigorous evaluation methodology: inter-annotator agreement, statistical testing, and replicability in IR/NLP.
Tokenization & Subword Models
Information-theoretic analysis of tokenization: BPE, Unigram, and their impact on downstream performance.
Fact Verification & Hallucination Testing
Methods for automated fact checking, hallucination detection, and faithfulness evaluation in LLMs.
Document Structure as Graphs
Model document structure—sections, tables, references—as graphs for enhanced understanding.
Provenance & Verifiable Retrieval
Track and verify the provenance of retrieved information for trustworthy RAG systems.
Cross-Lingual Retrieval & Alignment Theory
Theory of cross-lingual information retrieval: multilingual embeddings, alignment, and zero-shot transfer.
Knowledge Editing & Consistency Constraints
Edit knowledge in language models while maintaining consistency: ROME, MEND, and constraint propagation.