Assignment 2: RAG System

Due: Sunday, March 9, 2026 at 11:59 PM Points: 60 (50 implementation + 10 peer review) Peer Review Due: Thursday, March 12, 2026 at 11:59 PM Submission: GitHub repository + Canvas link

Overview

In this assignment, you will build a Retrieval-Augmented Generation (RAG) system that answers questions about research papers from this course. You will implement document chunking, vector storage, retrieval, and answer generation with citations.

Starter code is provided. Your task is to complete the implementation of key functions.

Learning Objectives

By completing this assignment, you will:

Implement document chunking strategies
Create and query a vector database using Chroma
Build a retrieval pipeline with similarity search
Generate answers grounded in retrieved context
Evaluate retrieval quality using standard metrics

Setup

1. Create Your Repository from the Template

Go to the starter template: cegme/cis6930sp26-assignment2-starter
Click “Use this template” → “Create a new repository”
Name your repository cis6930sp26-assignment2
Set it to Private
Clone your new repository:

git clone https://github.com/YOUR_USERNAME/cis6930sp26-assignment2.git
cd cis6930sp26-assignment2

# Install dependencies
uv sync

2. Download Papers

Download at least 5 PDF documents (research papers, documentation, etc.) and place them in the papers/ directory:

papers/
├── lewis2020rag.pdf          # Required: Original RAG paper
├── wei2022cot.pdf            # Required: Chain-of-Thought paper
├── paper3.pdf
├── paper4.pdf
└── paper5.pdf

3. Set Up API Keys

cp .env.example .env
# Edit .env with your API keys

Task: Complete the Functions

The starter code provides the structure. You need to implement the functions marked with # TODO.

File 1: `chunker.py` (10 points)

Implement document chunking:

def chunk_document(text: str, chunk_size: int = 500, overlap: int = 50) -> list[str]:
    """
    Split a document into overlapping chunks.

    Args:
        text: The document text to chunk
        chunk_size: Maximum characters per chunk
        overlap: Number of characters to overlap between chunks

    Returns:
        List of text chunks

    TODO: Implement this function.
    - Split on sentence boundaries when possible
    - Ensure chunks don't exceed chunk_size
    - Include overlap characters from previous chunk
    """
    raise NotImplementedError("Implement chunk_document")

def chunk_by_paragraphs(text: str, max_chunk_size: int = 1000) -> list[str]:
    """
    Split a document by paragraphs, merging small paragraphs.

    Args:
        text: The document text to chunk
        max_chunk_size: Maximum characters per chunk

    Returns:
        List of text chunks (each containing one or more paragraphs)

    TODO: Implement this function.
    - Split on double newlines (paragraphs)
    - Merge consecutive small paragraphs until max_chunk_size
    - Don't split paragraphs mid-text
    """
    raise NotImplementedError("Implement chunk_by_paragraphs")

File 2: `vectorstore.py` (15 points)

Implement vector storage and retrieval:

def create_vectorstore(chunks: list[str], metadatas: list[dict]) -> Chroma:
    """
    Create a Chroma vector store from document chunks.

    Args:
        chunks: List of text chunks
        metadatas: List of metadata dicts (one per chunk)

    Returns:
        Chroma vector store instance

    TODO: Implement this function.
    - Use sentence-transformers for embeddings (all-MiniLM-L6-v2)
    - Store chunks with their metadata
    - Persist to ./chroma_db directory
    """
    raise NotImplementedError("Implement create_vectorstore")

def retrieve(vectorstore: Chroma, query: str, k: int = 3) -> list[Document]:
    """
    Retrieve the top-k most relevant chunks for a query.

    Args:
        vectorstore: The Chroma vector store
        query: The search query
        k: Number of documents to retrieve

    Returns:
        List of Document objects with page_content and metadata

    TODO: Implement this function.
    - Use similarity search
    - Return top k results
    """
    raise NotImplementedError("Implement retrieve")

def retrieve_with_scores(vectorstore: Chroma, query: str, k: int = 3) -> list[tuple[Document, float]]:
    """
    Retrieve top-k chunks with their similarity scores.

    Args:
        vectorstore: The Chroma vector store
        query: The search query
        k: Number of documents to retrieve

    Returns:
        List of (Document, score) tuples, sorted by relevance

    TODO: Implement this function.
    - Use similarity_search_with_score
    - Return documents with their scores
    """
    raise NotImplementedError("Implement retrieve_with_scores")

File 3: `generator.py` (15 points)

Implement answer generation:

def generate_answer(query: str, context_docs: list[Document], llm) -> str:
    """
    Generate an answer based on retrieved context.

    Args:
        query: The user's question
        context_docs: Retrieved documents to use as context
        llm: The language model to use

    Returns:
        Generated answer string

    TODO: Implement this function.
    - Format the context documents into the prompt
    - Include source attribution instructions
    - Call the LLM and return the response
    """
    raise NotImplementedError("Implement generate_answer")

def generate_answer_with_citations(query: str, context_docs: list[Document], llm) -> dict:
    """
    Generate an answer with explicit citations to source documents.

    Args:
        query: The user's question
        context_docs: Retrieved documents to use as context
        llm: The language model to use

    Returns:
        Dictionary with:
        - "answer": The generated answer text
        - "citations": List of cited source documents

    TODO: Implement this function.
    - Number each source in the prompt (e.g., [1], [2])
    - Instruct the LLM to cite sources by number
    - Parse citations from the response
    - Return structured output
    """
    raise NotImplementedError("Implement generate_answer_with_citations")

File 4: `evaluate.py` (10 points)

Implement evaluation metrics:

def precision_at_k(retrieved_ids: list[str], relevant_ids: list[str], k: int) -> float:
    """
    Calculate precision@k for retrieval evaluation.

    Args:
        retrieved_ids: List of retrieved document IDs (in order)
        relevant_ids: List of actually relevant document IDs
        k: Number of top results to consider

    Returns:
        Precision@k score (0.0 to 1.0)

    TODO: Implement this function.
    - Consider only the top k retrieved documents
    - Calculate: (relevant docs in top k) / k
    """
    raise NotImplementedError("Implement precision_at_k")

def mean_reciprocal_rank(queries_results: list[tuple[list[str], str]]) -> float:
    """
    Calculate Mean Reciprocal Rank (MRR) across multiple queries.

    Args:
        queries_results: List of (retrieved_ids, first_relevant_id) tuples

    Returns:
        MRR score (0.0 to 1.0)

    TODO: Implement this function.
    - For each query, find rank of first relevant document
    - Reciprocal rank = 1/rank (or 0 if not found)
    - Return mean across all queries
    """
    raise NotImplementedError("Implement mean_reciprocal_rank")

Running the System

After implementing the functions:

# Run tests (do this first!)
uv run pytest

# Index the papers
uv run python index.py

# Query the system
uv run python query.py "What is retrieval augmented generation?"

# Run evaluation
uv run python run_evaluation.py

Requirements

Implementation (40 points)

Component	Points	Description
Chunking (`chunker.py`)	10	Both chunking functions work correctly
Vector Store (`vectorstore.py`)	15	Vector store creation and retrieval work
Generation (`generator.py`)	15	Answer generation with citations works

Evaluation (10 points)

Component	Points	Description
Metrics (`evaluate.py`)	5	Metrics implemented correctly
Evaluation Results	5	Run evaluation on 5+ test queries, report results in README

Peer Review (10 points)

Component	Points	Description
Complete 2 peer reviews	10	Submit reviews as GitHub Issues by March 12

README.md

Your README must include:

1. Setup Instructions

How to install dependencies and run the system.

2. Evaluation Results

Report your retrieval evaluation results:

## Evaluation Results

| Metric | Score |
|--------|-------|
| Precision@3 | 0.XX |
| Precision@5 | 0.XX |
| MRR | 0.XX |

### Test Queries Used
1. "What is chain-of-thought prompting?"
2. "How does RAG reduce hallucination?"
3. ...

3. Example Queries

Show 2-3 example queries with the system’s answers and citations.

4. Chunking Strategy Discussion

Briefly explain your chunking approach and any experiments you tried.

COLLABORATORS.md

Document all collaboration and AI assistance (required).

Project Structure

cis6930sp26-assignment2/
├── papers/                    # Your PDF papers (not committed)
│   ├── lewis2020rag.pdf
│   └── ...
├── tests/
│   ├── test_chunker.py
│   ├── test_vectorstore.py
│   ├── test_generator.py
│   └── test_evaluate.py
├── chroma_db/                 # Generated vector store
├── chunker.py                 # TODO: Implement chunking
├── vectorstore.py             # TODO: Implement vector store
├── generator.py               # TODO: Implement generation
├── evaluate.py                # TODO: Implement evaluation
├── index.py                   # Provided: Indexing script
├── query.py                   # Provided: Query script
├── run_evaluation.py          # Provided: Evaluation script
├── .env.example
├── .gitignore
├── COLLABORATORS.md
├── README.md
└── pyproject.toml

Submission

Create a private repository named cis6930sp26-assignment2
Add cegme as an Admin collaborator
Tag your final submission:
```
git tag v1.0
git push origin v1.0
```
Submit the repository URL to Canvas

Peer Review (10 points)

You will review 2 classmates’ submissions. Peer reviews are assigned in Canvas after the submission deadline.

Due: Thursday, March 12, 2026 at 11:59 PM

What to Evaluate

For each submission you review, check:

Does the code run? (Clone, install, run tests)
- Can you run uv sync and uv run pytest?
- Do the tests pass?
Does the RAG system work?
- Can you index papers with uv run python index.py?
- Can you query with uv run python query.py "test question"?
- Are answers relevant with proper citations?
Code quality
- Are the TODO functions implemented correctly?
- Is the code readable and well-organized?
Documentation
- Does the README include evaluation results?
- Are example queries shown with answers?
- Is the chunking strategy explained?

Review Format

Submit your review as a GitHub Issue on the repository you’re reviewing. Use this template:

## Peer Review

**Reviewer:** [Your Name]

### Functionality
- [ ] Tests pass
- [ ] Indexing works
- [ ] Querying returns relevant answers with citations

### Code Quality
- [ ] Chunking implemented correctly
- [ ] Vector store operations work
- [ ] Answer generation includes citations

### Documentation
- [ ] README has evaluation results
- [ ] Example queries shown
- [ ] Chunking strategy explained

### Comments
[Your constructive feedback here]

### Score Suggestion
[X/50] - Brief justification

Grading

Component	Points
Implementation	40
Evaluation	10
Peer Review Completion	10
Total	60

Tips

Start with chunker.py - It has no dependencies on other modules
Test incrementally - Run pytest tests/test_chunker.py before moving on
Use the provided tests - They show expected behavior
Check embedding dimensions - all-MiniLM-L6-v2 produces 384-dim vectors
Handle PDF extraction errors - Some PDFs may have issues; skip or handle gracefully

Resources

Academic Integrity

This is an individual assignment. You may discuss concepts with classmates, but all code must be your own. Document all collaboration in COLLABORATORS.md.

Assignment 2: RAG System

Overview

Learning Objectives

Setup

1. Create Your Repository from the Template

2. Download Papers

3. Set Up API Keys

Task: Complete the Functions

File 1: chunker.py (10 points)

File 2: vectorstore.py (15 points)

File 3: generator.py (15 points)

File 4: evaluate.py (10 points)

Running the System

Requirements

Implementation (40 points)

Evaluation (10 points)

Peer Review (10 points)

README.md

1. Setup Instructions

2. Evaluation Results

3. Example Queries

4. Chunking Strategy Discussion

COLLABORATORS.md

Project Structure

Submission

Peer Review (10 points)

What to Evaluate

Review Format

Grading

Tips

Resources

Academic Integrity

File 1: `chunker.py` (10 points)

File 2: `vectorstore.py` (15 points)

File 3: `generator.py` (15 points)

File 4: `evaluate.py` (10 points)