Trust & Safety Infrastructure

Civitas AI

Enterprise-grade content moderation with ML-powered classification, configurable policies, human-in-the-loop review, and immutable audit trails.

EU AI Act Ready NIST AI RMF SOC 2 Controls

Navigate this presentation using arrow keys (← →) or scroll. This platform demonstrates a complete AI governance solution from content analysis through compliance reporting — built entirely with AI assistance.

The Challenge

Content Moderation at Scale

Millions of user-generated content pieces daily
Toxic, harmful, and policy-violating content
Real-time decision requirements
Multi-platform, multi-language challenges

Regulatory Pressure

EU AI Act compliance requirements
Transparency and explainability mandates
Human oversight obligations
Immutable audit trail requirements

Operational Complexity

Inconsistent moderation decisions
No visibility into AI decision-making
Difficult policy enforcement
Missing evidence for appeals

The Solution

ML-powered automated classification
Configurable, versioned policies
Human-in-the-loop escalation
Cryptographically-secured audit trail

System Architecture

Enterprise-grade microservices with edge deployment

Built with AI

Edge Layer — Global CDN

Cloudflare Pages

React SPA with TypeScript

Edge Cache 275+ PoPs

Access Gate

Cloudflare Workers + KV

Magic Link Session KV

API Layer — Google Cloud Run (Serverless)

Gateway

Rate Limit • CORS • Auth

Moderation

HuggingFace ML

Policy Engine

Configurable Rules

Review

Human-in-Loop

Data Layer — Managed Services

Supabase PostgreSQL

Pooled Connections • TLS

Upstash Redis

Serverless • TLS

Neo4j Aura

Knowledge Graph

Live Demo

Click any card below to open the live application in a new tab. The Dashboard shows real-time metrics, Moderation Demo lets you test content classification, Policy Management demonstrates configurable rules, and Audit Log displays immutable evidence records.

API in Action

Try it now! Type any text in the box below and click "Moderate Text" to see real-time classification. The API analyzes content for toxicity, hate speech, harassment, violence, spam, and more. Watch the request/response panel to see the actual JSON payloads.

Request / Response

POST /api/v1/moderate

{
  "content": "Hello...",
  "source": "demo"
}

Response

{
  "action": "allow",
  "category_scores": {...}
}

Policy Engine

Configurable Thresholds

Toxicity → Block 0.80

Hate → Block 0.70

Harassment → Warn 0.75

Profanity → Warn 0.90

Multi-Policy Support

Standard Community Guidelines

v1 • Global • Published

Active

Youth Safe Mode

v1 • Under 13 • Published

Active

Relaxed Forum Policy

v1 • US Forums • Draft

Draft

Human-in-the-Loop

Review Queue Workflow

1

Content Escalated

ML confidence below threshold or edge case detected

2

Moderator Review

Human reviews content with ML recommendations

3

Decision with Rationale

Approve/Reject/Escalate with mandatory explanation

4

Evidence Recorded

Immutable audit trail with cryptographic hash

Moderator Actions

Compliance & Audit

Immutable Evidence Records

{
  "id": "e1000000-0000-...",
  "control_id": "MOD-001",
  "decision_id": "d0000000-...",
  "automated_action": "block",
  "category_scores": {
    "toxicity": 0.92,
    "hate": 0.95
  },
  "submission_hash": "sha256:a7f3b...",
  "immutable": true,
  "integrity_hash": "sha256:c9d2e..."
}

Audit Trail Features

✓ Cryptographic hash chain
✓ Tamper detection triggers
✓ Full decision lineage
✓ CSV/JSON export
✓ Policy version tracking
✓ Human review rationale

Regulatory Compliance

Click any framework card to see the detailed controls mapped to that regulation. Each control includes its ID, description, criticality level, and the specific articles or clauses it addresses.

🇪🇺

EU AI Act

Art. 9, 13, 14, 15, 17

5 controls mapped

Click for details →

🏛️

NIST AI RMF

MAP, MEASURE, MANAGE, GOVERN

4 controls mapped

Click for details →

🌐

ISO 42001

Clause 6, 8, 9

3 controls mapped

Click for details →

🔒

GDPR

Art. 5, 22, 32

4 controls mapped

Click for details →

✅

SOC 2

CC3, CC6, CC7

5 controls mapped

Click for details →

12 implemented controls with full traceability to regulatory requirements

Knowledge Graph

Drag nodes to explore the relationship graph. This visualization shows how services connect to controls, which map to compliance frameworks (EU AI Act, NIST, ISO, GDPR, SOC 2). The graph is powered by Neo4j Aura with 112 nodes and 138 relationships modeling the complete domain.

Integration Patterns

REST API

Direct HTTP integration with JSON payloads

POST /api/v1/moderate
Authorization: Bearer {api_key}
{"content": "...", "source": "web"}
          

Mobile SDK

Native iOS/Android with offline queue

CivitasSDK.moderate(text) { result ->
  when(result.action) {
    ALLOW -> publish()
    BLOCK -> reject()
  }
}
          

LLM Guardrails

Pre/post-processing for LLM outputs

llm_output = model.generate(prompt)
result = civitas.moderate(llm_output)
if result.action == "block":
    return SAFE_FALLBACK
          

Webhooks

Event-driven notifications

{
  "event": "moderation.decision",
  "action": "escalate",
  "decision_id": "..."
}
          

Cloud Deployment

Live URLs

Frontend: civitas.pages.dev

API: gateway-xxx.run.app

Database: Supabase (us-west-2)

Graph: Neo4j Aura

4

Microservices

<100ms

API Latency (p95)

99.9%

Uptime SLA

Security & Access Control

Enterprise authentication with zero-trust architecture

Authentication Flow

User Visits Protected Site

Access Gate Password Check

Magic Link Email Verification

Authenticated 24h Session

Cloudflare Workers

Edge-deployed authentication gate
KV storage for sessions & tokens
Global low-latency access

Security Controls

Password-gated access
Email verification via magic link
Secure HttpOnly cookies

Audit & Logging

Email access logging (90 days)
IP & country tracking
Admin access log endpoint

Seamless UX

One-click magic link auth
Auto-login after verification
Cross-domain session handling

Comprehensive Test Suite

Multi-layered testing strategy ensures reliability and compliance. E2E tests validate user workflows, CDD tests verify governance controls, and the CI/CD pipeline runs 36 automated tests on every commit.

36

Tests Passing

4

Test Suites

12

CDD Controls

<17s

Full Suite Runtime

E2E Tests (Playwright)

11 Audit Log Evidence, Export, Search

8 Moderation Demo Submit, ALLOW/BLOCK, Scores

7 Moderator Queue Review, Approve, Override

10 Policy Management Create, Edit, Thresholds

Compliance-Driven Development

✓ MOD-001 Automated Classification

✓ MOD-002 Real-Time Feedback

✓ POL-001 Threshold Policy

✓ GOV-002 Human Review

✓ AUD-001 Immutable Evidence

GitHub Actions CI/CD Pipeline

Unit Tests Integration Tests CDD Tests E2E Tests Code Coverage golangci-lint Trivy Security Scan

Security Posture

Defense-in-depth across every layer

🔒

Authentication

✓ Cloudflare Access Gate (magic link)
✓ SHA-256 hashed API keys
✓ RBAC (admin/moderator/viewer)
✓ Constant-time token comparison
✓ Internal service auth tokens

🛡

Infrastructure

✓ Non-root Docker containers
✓ Multi-stage builds (minimal surface)
✓ go.sum supply chain verification
✓ Secrets in env vars, never in code
✓ Cloud Run scale-to-zero isolation

🧱

Application

✓ HTTP server timeouts (Slowloris)
✓ Bounded proxy response reads (10MB)
✓ Scoped CORS (no wildcards)
✓ Pagination limits on all endpoints
✓ No error detail leaks in health checks

📜

Audit & Evidence

✓ Immutable evidence records (AUD-001)
✓ Full decision traceability with correlation IDs
✓ Content hashing for integrity verification
✓ Structured logging with zap (no PII)

A+

Security Audit Score

0 Critical 0 High 0 Medium

All findings resolved

Advanced Language Filtering

9 enhancements for multi-language, multi-provider resilience

Text Normalizer

NFKC normalization, zero-width stripping, homoglyph and leetspeak decoding to defeat Unicode evasion

Language Detection

Automatic language identification via lingua-go with routing to language-aware classification providers

Ensemble Classification

Parallel multi-provider execution with agreement scoring and auto-escalation on disagreement

Score Calibration

Per-provider offset and scale normalization to a unified 0-1 scale with dynamic feedback-driven tuning

LLM Second-Pass

Ambiguous scores (0.3-0.7) re-evaluated by Claude or GPT-4 with structured classification prompts

Context-Aware Policy

Threshold overrides based on request metadata (audience, platform) for context-sensitive moderation

User Trust Scoring

Rolling-window behavioral scoring adjusts policy thresholds per user based on moderation history

Feedback Loop

Human review outcomes feed back to provider calibration, continuously improving classification accuracy

Expanded Categories

Self-harm, spam, and PII detection added to the 6 existing moderation categories for 9 total

38 unit tests 4 migrations 13 new files 10 modified files

Roadmap

Phase 1: Foundation

Core moderation, policy engine, review queue, audit trail

Complete ML Classification • Policy Rules • Evidence Chain

Phase 2: Scale

Multi-language support, ensemble classification, LLM second-pass, feedback loop

Complete i18n • Normalizer • Ensemble • Calibration • Trust Scoring

Phase 3: Enterprise

Multi-tenant, SSO, advanced analytics, SLA dashboard

Q4 2026 SAML/OIDC • Tenant Isolation • BI Integration

Get Started

Enterprise-grade content moderation, ready for production

Try the API

Documentation

API reference, integration guides, and examples

Contact

proth1@gmail.com

License

MIT Open Source

Civitas AI

The Challenge

Content Moderation at Scale

Regulatory Pressure

Operational Complexity

The Solution

System Architecture

Cloudflare Pages

Access Gate

Gateway

Moderation

Policy Engine

Review

Supabase PostgreSQL

Upstash Redis

Neo4j Aura

Live Demo

Dashboard Overview

Moderation Demo

Policy Management

Audit Log

API in Action

Try It Live

Request / Response

Policy Engine

Configurable Thresholds

Multi-Policy Support

Human-in-the-Loop

Review Queue Workflow

Moderator Actions

Compliance & Audit

Immutable Evidence Records

Audit Trail Features

Regulatory Compliance

EU AI Act

NIST AI RMF

ISO 42001

GDPR

SOC 2

Knowledge Graph

Integration Patterns

REST API

Mobile SDK

LLM Guardrails

Webhooks

Cloud Deployment

Live URLs

Security & Access Control

Authentication Flow

Cloudflare Workers

Security Controls

Audit & Logging

Seamless UX

Comprehensive Test Suite

E2E Tests (Playwright)

Compliance-Driven Development

GitHub Actions CI/CD Pipeline

Security Posture

Authentication

Infrastructure

Application

Audit & Evidence

Advanced Language Filtering

Text Normalizer

Language Detection

Ensemble Classification

Score Calibration

LLM Second-Pass

Context-Aware Policy

User Trust Scoring

Feedback Loop

Expanded Categories

Roadmap

Phase 1: Foundation

Phase 2: Scale

Phase 3: Enterprise

Get Started

Documentation

Contact

License