DOCS = {
"transformer_architecture.md": textwrap.dedent("""
# Transformer Structure
## Overview
The Transformer is a deep studying structure launched in "Consideration Is All
You Want" (Vaswani et al., 2017). It changed recurrent networks with a
self-attention mechanism, enabling parallel coaching and higher long-range
dependency modelling.
## Key Parts
- **Multi-Head Self-Consideration**: Computes consideration in h parallel heads, every
with its personal discovered Q/Okay/V projections, then concatenates and tasks.
- **Feed-Ahead Community (FFN)**: Two linear layers with a ReLU activation,
utilized position-wise.
- **Positional Encoding**: Sinusoidal or discovered embeddings that inject
sequence-order info, since consideration is permutation-invariant.
- **Layer Normalisation**: Utilized earlier than (Pre-LN) or after (Put up-LN) every
sub-layer, stabilising gradients.
- **Residual Connections**: Added round every sub-layer to ease gradient circulate.
## Encoder vs Decoder
The encoder stack processes enter tokens bidirectionally (e.g. BERT).
The decoder stack makes use of causal (masked) consideration over earlier outputs plus
cross-attention over encoder outputs (e.g. GPT, T5).
## Scaling Legal guidelines
Kaplan et al. (2020) confirmed that mannequin loss decreases predictably as an influence
regulation with compute, information, and parameter depend. This motivated GPT-3 (175B) and
subsequent giant language fashions.
## Limitations
- Quadratic complexity in sequence size: O(n^2)
- No inherent recurrence -> long-context challenges
- Excessive reminiscence footprint throughout coaching
## References
Vaswani et al. (2017). Consideration Is All You Want. NeurIPS.
Kaplan et al. (2020). Scaling Legal guidelines for Neural Language Fashions. arXiv:2001.08361.
"""),
"rag_systems.md": textwrap.dedent("""
# Retrieval-Augmented Technology (RAG)
## Definition
RAG augments a generative LLM with a retrieval step: given a question, related
paperwork are fetched from a corpus and prepended to the immediate, giving the
mannequin grounded context past its coaching information.
## Structure
1. **Indexing Part** — Paperwork are chunked, embedded by way of a bi-encoder
(e.g. text-embedding-3-large), and saved in a vector database (e.g.
Faiss, Pinecone, Weaviate).
2. **Retrieval Part** — The person question is embedded; approximate nearest-
neighbour (ANN) search returns the top-k chunks.
3. **Technology Part** — Retrieved chunks + question are handed to the LLM
which synthesises a remaining reply.
## Variants
- **Dense Retrieval**: DPR, Contriever — queries and docs in the identical house.
- **Sparse Retrieval**: BM25 — time period frequency-based, no embeddings wanted.
- **Hybrid Retrieval**: Reciprocal Rank Fusion (RRF) combines dense + sparse.
- **Re-ranking**: A cross-encoder re-scores the top-k earlier than the LLM sees them.
## Challenges
- Context window limits: lengthy retrieved passages might not match.
- Retrieval high quality is a tough ceiling on technology high quality.
- Chunking technique considerably impacts recall.
- Multi-hop questions require iterative retrieval (IRCoT, ReAct).
## Relationship to Transformers
RAG programs depend on transformer-based encoders for embedding and decoder
fashions for technology. The standard of the embedding mannequin straight determines
retrieval precision and recall.
## References
Lewis et al. (2020). RAG for Data-Intensive NLP Duties. NeurIPS.
Gao et al. (2023). RAG for Giant Language Fashions. arXiv:2312.10997.
"""),
"knowledge_graph_integration.md": textwrap.dedent("""
# Data Graphs and LLM Integration
## What's a Data Graph?
A data graph (KG) is a directed labelled graph of entities (nodes) and
relations (edges): (topic, predicate, object) triples, e.g.
(Vaswani, authored, "Consideration Is All You Want").
## Why Mix KGs with LLMs?
LLMs hallucinate details; KGs present structured, verifiable floor reality.
KGs are onerous to question in pure language; LLMs present the interface.
Collectively they allow devoted, grounded, explainable query answering.
## Integration Methods
### KG-Augmented Technology (KGAG)
Retrieve triples or sub-graphs as an alternative of textual content chunks, serialise into textual content,
then feed to the LLM immediate.
### LLM-Assisted KG Building
LLMs extract (topic, relation, object) triples from unstructured textual content,
lowering guide curation effort considerably.
### GraphRAG (Microsoft Analysis, 2024)
GraphRAG clusters doc communities, generates group summaries, and
shops them in a KG. Queries answered by map-reduce over group summaries
outperform flat-vector RAG on sensemaking duties.
## Challenges
- KG building high quality will depend on extraction LLM accuracy.
- Graph databases add infrastructure complexity.
- Ontology design requires area experience.
- KGs go stale with out steady replace pipelines.
## Relation to RAG and Transformers
KG integration addresses two key RAG limitations: lack of structured reasoning
and incapability to observe multi-hop relations.
## References
Pan et al. (2023). Unifying LLMs and KGs. IEEE Clever Methods.
"""),
}
