---
title: "MCP Server: Design and Implementation"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{MCP Server: Design and Implementation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

## The Problem

The original MCP server shipped 9 tools that mostly serialized raw data rows as
JSON and sent them to Claude. This created three compounding problems:

1. **Size**: A single `get_dataset` call on a moderately large table (e.g.,
   Grade Results with 50K rows) would hit the MCP protocol's 1MB response limit,
   forcing artificial row caps.
2. **Waste**: Claude received thousands of JSON rows it couldn't meaningfully
   compute on. Every "what's the average grade?" question required the LLM to
   mentally sum and divide across a JSON array.
3. **Rigidity**: Dedicated tools like `student_summary` and `course_summary`
   hard-coded specific analysis patterns. Any question outside those patterns
   required a new tool or awkward workarounds.

The redesign inverts the architecture: Claude writes R code, the server executes
it in a persistent workspace, and returns compact results -- text summaries and
file-based visualizations.

## Design Principles

### Compute where compute belongs

R is purpose-built for data manipulation. Claude is purpose-built for reasoning
about what manipulation to do. The `execute_r` tool bridges these: Claude writes
a dplyr pipeline, ggplot visualization, or statistical test; the server evaluates
it and returns the result. No raw data crosses the wire unless explicitly
requested.

### Visualization: interactive HTML first, static PNG fallback

The preferred visualization path uses Chart.js injected into self-contained HTML
files:

1. `execute_r` aggregates data in R (counts, means, distributions)
2. Claude builds an HTML string with Chart.js loaded from CDN
3. `write_chart(html, 'name.html')` saves it safely to the output directory
4. The user opens the HTML file in their browser

```
execute_r → aggregate data → paste0(Chart.js HTML) → write_chart()
    ↓
Interactive chart in browser (tooltips, hover, responsive)
```

As a fallback for cases where the user explicitly requests a PNG or a chart type
Chart.js cannot handle, `execute_r` supports returning ggplot objects directly.
When detected, the server renders the plot to a PNG file and generates an HTML
viewer wrapper. Both files are saved to the configured output directory.

### Outcome-oriented tools

The old server had tools organized around API operations: `get_dataset`,
`join_datasets`, `student_summary`. The redesign organizes around user goals:

| Old (9 tools) | New (7 tools) | Why |
|---|---|---|
| `get_dataset` | `execute_r` | Claude writes `bs_get_dataset("Users") %>% head(10)` |
| `student_summary` | `execute_r` | Claude writes the exact summary needed |
| `course_summary` | `execute_r` | Claude writes the exact aggregation |
| `join_datasets` | `execute_r` | Claude calls `bs_join()` directly |
| `describe_dataset` | `describe_dataset` (enhanced) | Now returns column stats, not sample rows |
| -- | `get_data_summary` | Quick filter/group stats without writing R |

### Progressive disclosure

Every response from `describe_dataset` and `get_data_summary` ends with a hint:

> "Use execute_r for custom analysis. Available: bs_get_dataset(), bs_join(),
> dplyr verbs."

This teaches the model (and the user reading the transcript) that there's a more
powerful tool available when the pre-built summaries aren't enough.

### Server instructions

The MCP `initialize` handshake includes an `instructions` field -- a block of
text that tells Claude the recommended workflow before any tools are called:

```
1. Use list_datasets or search_datasets to discover data
2. Use describe_dataset to understand columns and distributions
3. Use get_data_summary with filter_by/group_by for quick stats
4. Use execute_r for custom analysis
```

The instructions also include performance guidelines, the Chart.js HTML pattern,
available packages (dplyr, tidyr, ggplot2, lubridate, scales), and the date axis
gotcha for ggplot2.

## Architecture

### Persistent workspace

The server creates a `.mcp_workspace` environment at startup with `dplyr`,
`tidyr`, `ggplot2`, `lubridate`, and `scales` pre-loaded. Every `execute_r` call
evaluates code in this environment, so variables persist between calls:

```r
# Call 1: load and filter
users <- bs_get_dataset("Users") %>% filter(role_name == "Student")

# Call 2: uses `users` from previous call
users %>% count(org_unit_id, sort = TRUE)
```

The workspace also exposes:

- `output_dir` -- the path to the configured output directory
- `write_chart(html, filename)` -- safe writer that only accepts `.html` files
  and only writes to the output directory (bare filenames only, no path
  traversal)

### Output directory

Visualizations are written to a configurable output directory. Resolution order:

1. `BRIGHTSPACER_OUTPUT_DIR` environment variable (if set)
2. `<package_root>/brightspaceR_output` (development mode)
3. `<working_directory>/brightspaceR_output` (fallback)
4. `<temp_dir>/brightspaceR_output` (last resort if directory creation fails)

The path is normalised with `normalizePath()` and logged at startup. If creation
fails, the server falls back gracefully and logs a warning.

### Smart result formatting

`handle_execute_r` inspects the result type and formats accordingly:

| Result type | Formatting |
|---|---|
| `data.frame` with <=50 rows | Full text table via `print()` |
| `data.frame` with >50 rows | `head(20)` + "... N more rows" hint |
| `gg`/`ggplot` object | Render to 900x600 PNG + HTML viewer, return file paths |
| Character ending in `.html`/`.png`/`.pdf`/`.csv` | Return as file path |
| Anything else | `capture.output(print(x))` |
| Error | `isError: TRUE` with the error message |

### Column summary helper

`summarize_column()` is a shared function used by both `describe_dataset` and
`get_data_summary`. It produces type-appropriate stats:

- **Numeric**: min, max, mean, n_missing
- **Character/Factor**: n_unique, top 3 values with counts, n_missing
- **Logical**: n_true, n_false, n_missing
- **Date/POSIXt**: min, max, n_missing

This replaces the old approach of sending 5 sample rows, which told Claude
almost nothing about the data distribution.

### Response helpers

Composable functions build MCP-compliant responses:

- `mcp_text(text, audience)` -- Creates a `TextContent` block with optional
  `audience` annotation (`["user"]` for clean output, `["assistant"]` for
  verbose diagnostics like row counts and join warnings).
- `mcp_result(contents, is_error)` -- Wraps a list of content blocks into a
  tool result. Handles size truncation at ~800KB.

### Size guard

The `mcp_result()` function checks the serialized JSON size of every response.
If it exceeds 800KB, it finds the first text content block and truncates it with
a clear message:

```
... [TRUNCATED: response exceeded size limit.
Use head()/filter() to narrow results in execute_r.]
```

This is a safety net, not the primary size control. The real size control is that
`execute_r` returns computed results (a count, a mean, a chart file path) rather
than raw data.

### Defensive execution

`execute_r` includes several layers of protection against runaway queries and
unsafe code:

**AST code inspection**: Before any code executes, `check_code_safety()` parses
it into an abstract syntax tree and walks every node looking for blocked
constructs. Blocked categories:

| Category | Examples |
|---|---|
| **Direct package access** | `brightspaceR::`, `httr::`, `httr2::`, `curl::`, `jsonlite::`, `config::` |
| **Metaprogramming** | `eval`, `evalq`, `do.call`, `get`, `mget`, `match.fun` |
| **Environment access** | `Sys.getenv`, `Sys.setenv` |
| **Shell commands** | `system`, `system2`, `shell` |
| **File I/O** | `readLines`, `writeLines`, `readRDS`, `saveRDS`, `write.csv`, `scan`, `file` |
| **Network** | `download.file`, `url`, `socketConnection` |

If any blocked construct is detected, the code is rejected with an error listing
the specific violations. Syntax errors pass through the safety check (they'll
fail at eval anyway). Comments and string literals containing blocked names are
not flagged -- only actual function calls in the AST are checked.

The `write_chart(html, filename)` function in the workspace provides a safe
alternative to `writeLines()` for Chart.js output. It only accepts `.html`
filenames and only writes to the output directory.

**PII field policy**: Datasets are filtered through a YAML-driven column
allowlist before they enter the cache (and thus before any code can access them).
See the [PII Field Policy](#pii-field-policy) section below.

**30-second timeout**: Every eval is wrapped in `setTimeLimit(elapsed = 30)`. If
a Cartesian join or unfiltered aggregation over millions of rows hangs, the
server kills it and returns a clear error: *"Execution timed out after 30
seconds. Try filtering data earlier or breaking into smaller steps."*

**Row-count reporting**: The workspace's `bs_get_dataset()` wrapper emits a
message like `[Grade Results: 523,041 rows x 12 cols]` every time it's called.
These messages are captured and returned to Claude as assistant-facing context
(using `audience: ["assistant"]`), so Claude always knows how big the data it
just loaded is -- and can course-correct before the next operation.

**Large-dataset warnings**: If a dataset exceeds 50K rows, the message includes
an explicit warning: *"WARNING: large dataset -- filter early to avoid slow
operations"*. The same applies to `bs_join()` when either input exceeds 50K
rows.

**Server instructions**: The `initialize` response tells Claude to always call
`describe_dataset` first to check row counts, filter early on large tables, and
never return raw unfiltered data frames. It also documents the safety policy so
Claude knows which functions are blocked and uses `write_chart()` instead of
`writeLines()`.

**Audit logging**: Every tool call is logged to `mcp_audit.jsonl`. See the
[Audit Logging](#audit-logging) section below.

These defenses are layered: the AST inspector blocks dangerous code, the field
policy strips PII, instructions guide Claude to do the right thing, row-count
messages let it self-correct mid-execution, the timeout is the hard backstop,
and the audit log records everything for review.

## Tool Reference

### Discovery tools

**`list_datasets`** -- Returns all available BDS dataset names and descriptions.
No parameters.

**`search_datasets`** -- Case-insensitive keyword search across dataset names
and descriptions.

**`auth_status`** -- Reports whether the server has a valid Brightspace OAuth
token.

**`list_schemas`** -- Lists registered dataset schemas and their key columns
(used by `bs_join()`).

### Analysis tools

**`describe_dataset(name)`** -- Loads a dataset and returns per-column summary
statistics. The footer suggests using `execute_r` for custom queries.

**`get_data_summary(dataset, filter_by?, group_by?)`** -- The "quick stats"
tool. Loads a dataset, optionally filters by column-value pairs, optionally
groups. Without grouping, returns per-column stats. With grouping, returns group
counts and numeric column means per group. Always suggests `execute_r` for
anything more complex.

**`execute_r(code)`** -- The core tool. Claude writes R code as a string, the
server evaluates it in the persistent workspace, and returns the result. Handles
data frames, ggplot objects (saved as PNG + HTML), file paths, and arbitrary R
objects.

## Types of Analysis

The MCP server is designed for the kinds of questions LMS administrators and
institutional researchers typically ask. Here are the main categories with
example prompts.

### Enrollment analytics

- "How many students are enrolled this semester vs last semester?"
- "Show me enrollment trends by month for the last 2 years"
- "Which courses have the highest drop rates?"
- "Break down enrollments by role -- how many students, instructors, TAs?"

### Grade performance

- "What's the average grade in STAT101?"
- "Show me the grade distribution across all courses as a histogram"
- "Which courses have the lowest pass rates (below 50%)?"
- "Compare grade distributions between online and in-person sections"

### Student engagement

- "How many users haven't logged in for 90 days?"
- "What's the average content completion rate per course?"
- "Which discussion forums have the most activity?"
- "Show me quiz attempt patterns -- when do students take quizzes?"

### Course health

- "Which courses have no grade items set up?"
- "List courses that haven't been accessed in 6 months"
- "Show me assignment submission rates by course"
- "Which courses have the most content objects?"

### Cross-dataset joins

- "Join users and enrollments to find students enrolled in more than 5 courses"
- "Correlate discussion post frequency with final grades"
- "Find users with attendance below 80% whose grades are above 70%"

### Reporting and dashboards

- "Build me a dashboard showing KPIs for this semester"
- "Generate a grade report for all courses in the Business faculty"
- "Create a chart showing monthly active users over the past year"

Each of these can be handled through the MCP workflow: discover datasets, check
column distributions, then write R code that aggregates, joins, and visualizes.

## Typical Conversation Flow

```
User: "How are students performing in STAT101?"

Claude thinks: I need to find the course, get grade data, and summarize it.

1. search_datasets(keyword="grade")
   -> Finds "Grade Results"

2. describe_dataset(name="Grade Results")
   -> Sees columns: user_id, org_unit_id, points_numerator, etc.
   -> Sees row count: 523,041

3. execute_r(code="""
   grades <- bs_get_dataset("Grade Results")
   orgs <- bs_get_dataset("Org Units")
   stat101 <- orgs %>% filter(grepl("STAT101", name))
   grades %>%
     filter(org_unit_id %in% stat101$org_unit_id) %>%
     summarise(
       n_students = n_distinct(user_id),
       mean_grade = mean(points_numerator, na.rm = TRUE),
       median_grade = median(points_numerator, na.rm = TRUE)
     )
   """)
   -> Returns: tibble with 1 row of summary stats

4. execute_r(code="""
   # Interactive chart
   grade_dist <- grades %>%
     filter(org_unit_id %in% stat101$org_unit_id,
            !is.na(points_numerator)) %>%
     mutate(bracket = cut(points_numerator,
       breaks = seq(0, 100, 10), include.lowest = TRUE)) %>%
     count(bracket)

   html <- paste0('<!DOCTYPE html>...<script src=\"https://cdn.jsdelivr.net/npm/chart.js\"></script>...')
   write_chart(html, 'stat101_grades.html')
   browseURL(file.path(output_dir, 'stat101_grades.html'))
   """)
   -> Opens interactive bar chart in browser
```

Note how calls 3 and 4 share variables (`grades`, `orgs`, `stat101`) via the
persistent workspace. Claude computes with R, then builds Chart.js HTML for
interactive rendering.

## PII Field Policy

Raw BDS datasets contain personally identifiable information -- names, emails,
free-text comments -- that shouldn't flow to an AI model. The server applies a
YAML-driven column policy to every dataset before it enters the cache.

### Policy file

The default policy lives at `inst/mcp/field_policy.yml`. Each dataset entry
specifies a mode:

| Mode | Behaviour |
|---|---|
| `allow` | Only listed columns pass through; everything else is dropped |
| `redact` | Listed columns have their values replaced with `[REDACTED]` |
| `all` | All columns pass through unchanged |

Datasets not listed in the policy pass through with a warning to stderr.

### Default policy

Datasets with PII use `allow` mode to expose only safe columns:

| Dataset | Hidden columns |
|---|---|
| **Users** | UserName, FirstName, MiddleName, LastName, ExternalEmail, OrgDefinedId |
| **Grade Results** | Comments, PrivateComments |
| **Assignment Submissions** | Feedback, FeedbackUserId |
| **Quiz User Answers** | Comment, LastModifiedBy |

All other datasets (Org Units, Role Details, Content Objects, etc.) use `all`
mode since they contain no direct PII.

### Custom policy

Override the default by setting the `BRIGHTSPACER_FIELD_POLICY` environment
variable to the path of your custom YAML file:

```json
{
  "env": {
    "BRIGHTSPACER_FIELD_POLICY": "/path/to/custom_field_policy.yml"
  }
}
```

Resolution order: `BRIGHTSPACER_FIELD_POLICY` env var > `field_policy.yml` in
the working directory > bundled `inst/mcp/field_policy.yml`.

### Where filtering happens

The field policy is applied inside `get_cached_dataset()`, after fetching but
before caching. The server delegates to `bs_apply_field_policy()` from the
package (the same function available to regular R scripts via
`library(brightspaceR)`). This means all downstream consumers --
`describe_dataset`, `get_data_summary`, and `execute_r` via
`bs_get_dataset()` -- see only the filtered columns. There is no way for code
running in the workspace to access columns removed by the policy.

## ID Pseudonymisation

Even after the PII field policy strips names and emails, person-referencing ID
columns (UserId, SubmitterId, LastModifiedBy, etc.) still contain raw integers
that could be reversed to real people by anyone with Brightspace admin access.
The pseudonymisation layer addresses this by HMAC-hashing all person IDs with a
session-scoped key.

### How it works

At server startup, a 32-byte random key is generated via
`openssl::rand_bytes(32)` and stored in a module-level variable. Each person ID
value is hashed with HMAC-SHA256 (`openssl::sha256()`), truncated to the first
8 hex characters, and prefixed with `usr_`. For example, UserId `12345` becomes
`usr_a3f2b1c8`.

- **Deterministic within a session**: the same ID always maps to the same
  pseudonym, so joins, grouping, and cross-dataset references work normally.
- **Unrecoverable across sessions**: when the server restarts, a new key is
  generated and all pseudonyms change. There is no stored mapping to reverse.
- **NA passthrough**: `NA` values remain `NA`.

### Which columns are hashed

A built-in registry (exported as `PERSON_ID_COLUMNS` internally, used by
`bs_pseudonymise_df()`) maps dataset names to their person-referencing ID
columns. Only person IDs are hashed — structural IDs (OrgUnitId,
GradeObjectId, ContentObjectId, etc.) are left untouched so the AI can still
join and filter on them.

| Dataset | Pseudonymised columns |
|---|---|
| Users | UserId |
| User Enrollments | UserId |
| Grade Results | UserId, LastModifiedBy |
| Assignment Submissions | SubmitterId, FeedbackUserId |
| Quiz User Answers | LastModifiedBy |
| Content User Progress | UserId |
| Quiz Attempts | UserId |
| Discussion Posts | UserId |
| Discussion Topics | LastPostUserId, DeletedByUserId |
| Content Objects | CreatedBy, LastModifiedBy, DeletedBy |
| Grade Objects | DeletedByUserId |
| Enrollments and Withdrawals | UserId, ModifiedByUserId |
| Final Grades | UserId |
| Attendance Records | UserId |

Some of these columns may already be dropped by the field policy (e.g.,
FeedbackUserId in Assignment Submissions). The pseudonymisation function silently
skips columns not present in the data frame.

### Where it runs

Pseudonymisation runs inside `get_cached_dataset()`, after field policy
filtering and before caching. The server calls `bs_pseudonymise_df()` from the
package (the same function available to regular R scripts) with the
session-scoped key. This means all downstream consumers — `describe_dataset`,
`get_data_summary`, and `execute_r` via `bs_get_dataset()` — see only
pseudonymised IDs. The pseudonymisation is baked into the cache, so it is
applied once per dataset per session.

### Combined effect with field policy

Together, the PII field policy and ID pseudonymisation achieve full
pseudonymisation:

1. **Field policy** drops direct identifiers: FirstName, LastName, Email,
   UserName, OrgDefinedId, free-text comments
2. **Pseudonymisation** hashes indirect identifiers: UserId, SubmitterId,
   LastModifiedBy, etc.

The AI model cannot see who a person is (no names/emails) or reverse-engineer
their identity from an ID (hashed with a session-scoped key).

For a detailed assessment of how these protections align with ENISA, NIST,
ISO 25237, GDPR, FERPA, and HIPAA standards — and how to apply the same
techniques in your own R scripts — see `vignette("privacy-compliance")`.

## Audit Logging

Every tool call is recorded in an append-only JSONL file at
`{output_dir}/mcp_audit.jsonl`. Each line is a JSON object with:

| Field | Description |
|---|---|
| `timestamp` | ISO 8601 with milliseconds (UTC) |
| `tool` | Tool name (e.g., `execute_r`, `describe_dataset`) |
| `arguments` | Tool arguments (code truncated to 500 chars) |
| `response_bytes` | Approximate response size in bytes |
| `code_blocked` | `true` if code was rejected by the AST inspector |
| `blocked_constructs` | List of blocked functions/packages (if applicable) |
| `is_error` | `true` if the tool call resulted in an error |

Special entries:

- `session_start` -- logged when the server starts (after authentication)
- `session_end` -- logged when the server shuts down

The audit log is machine-readable and can be reviewed directly or post-processed
with standard JSON tools:

```bash
# View all blocked code attempts
cat brightspaceR_output/mcp_audit.jsonl | jq 'select(.code_blocked == true)'

# Count tool calls by type
cat brightspaceR_output/mcp_audit.jsonl | jq -r '.tool' | sort | uniq -c | sort -rn
```

## Security Considerations

The MCP server is designed for single-user local deployment. The trust boundary
is the same as opening an R console -- the server runs locally, authenticated
with the user's own Brightspace credentials.

Within that boundary, several layers protect against accidental data exposure
and code misuse:

1. **AST code inspection** blocks dangerous functions (API access, shell
   commands, file I/O, metaprogramming) before code executes
2. **PII field policy** strips sensitive columns (names, emails, comments)
   before data enters the workspace
3. **ID pseudonymisation** HMAC-hashes all person-referencing ID columns with a
   session-scoped key, so the AI sees `usr_a3f2b1c8` instead of raw integers
4. **Audit logging** records every tool call, blocked attempt, and error for
   post-hoc review
5. **Execution timeout** (30 seconds) prevents runaway queries
6. **Size guard** (~800KB) truncates oversized responses
7. **Row-count warnings** help the model self-correct on large datasets

For production deployments where multiple users share a server, `execute_r`
would need process-level isolation (e.g., callr). That's out of scope for the
current single-user design.