Trust & Safety Infrastructure

Civitas AI

Enterprise-grade content moderation with ML-powered classification, configurable policies, human-in-the-loop review, and immutable audit trails.

EU AI Act Ready NIST AI RMF SOC 2 Controls

Navigate this presentation using arrow keys (← →) or scroll. This platform demonstrates a complete AI governance solution from content analysis through compliance reporting — built entirely with AI assistance.

The Challenge

Content Moderation at Scale

  • Millions of user-generated content pieces daily
  • Toxic, harmful, and policy-violating content
  • Real-time decision requirements
  • Multi-platform, multi-language challenges

Regulatory Pressure

  • EU AI Act compliance requirements
  • Transparency and explainability mandates
  • Human oversight obligations
  • Immutable audit trail requirements

Operational Complexity

  • Inconsistent moderation decisions
  • No visibility into AI decision-making
  • Difficult policy enforcement
  • Missing evidence for appeals

The Solution

  • ML-powered automated classification
  • Configurable, versioned policies
  • Human-in-the-loop escalation
  • Cryptographically-secured audit trail

System Architecture

Enterprise-grade microservices with edge deployment

Built with AI
Edge Layer — Global CDN

Cloudflare Pages

React SPA with TypeScript

Edge Cache 275+ PoPs

Access Gate

Cloudflare Workers + KV

Magic Link Session KV
API Layer — Google Cloud Run (Serverless)

Gateway

Rate Limit • CORS • Auth

Moderation

HuggingFace ML

Policy Engine

Configurable Rules

Review

Human-in-Loop

Data Layer — Managed Services

Supabase PostgreSQL

Pooled Connections • TLS

Upstash Redis

Serverless • TLS

Neo4j Aura

Knowledge Graph

Live Demo

Click any card below to open the live application in a new tab. The Dashboard shows real-time metrics, Moderation Demo lets you test content classification, Policy Management demonstrates configurable rules, and Audit Log displays immutable evidence records.

Dashboard Overview

Moderation Demo

Policy Management

API in Action

Try it now! Type any text in the box below and click "Moderate Text" to see real-time classification. The API analyzes content for toxicity, hate speech, harassment, violence, spam, and more. Watch the request/response panel to see the actual JSON payloads.

Try It Live

Request / Response

POST /api/v1/moderate
{
  "content": "Hello...",
  "source": "demo"
}
Response
{
  "action": "allow",
  "category_scores": {...}
}

Policy Engine

Configurable Thresholds

Toxicity → Block 0.80
Hate → Block 0.70
Harassment → Warn 0.75
Profanity → Warn 0.90

Multi-Policy Support

Standard Community Guidelines
v1 • Global • Published
Active
Youth Safe Mode
v1 • Under 13 • Published
Active
Relaxed Forum Policy
v1 • US Forums • Draft
Draft

Human-in-the-Loop

Review Queue Workflow

1
Content Escalated
ML confidence below threshold or edge case detected
2
Moderator Review
Human reviews content with ML recommendations
3
Decision with Rationale
Approve/Reject/Escalate with mandatory explanation
4
Evidence Recorded
Immutable audit trail with cryptographic hash

Moderator Actions

Compliance & Audit

Immutable Evidence Records

{
  "id": "e1000000-0000-...",
  "control_id": "MOD-001",
  "decision_id": "d0000000-...",
  "automated_action": "block",
  "category_scores": {
    "toxicity": 0.92,
    "hate": 0.95
  },
  "submission_hash": "sha256:a7f3b...",
  "immutable": true,
  "integrity_hash": "sha256:c9d2e..."
}

Audit Trail Features

  • Cryptographic hash chain
  • Tamper detection triggers
  • Full decision lineage
  • CSV/JSON export
  • Policy version tracking
  • Human review rationale

Regulatory Compliance

Click any framework card to see the detailed controls mapped to that regulation. Each control includes its ID, description, criticality level, and the specific articles or clauses it addresses.

🇪🇺

EU AI Act

Art. 9, 13, 14, 15, 17

5 controls mapped
Click for details →
🏛️

NIST AI RMF

MAP, MEASURE, MANAGE, GOVERN

4 controls mapped
Click for details →
🌐

ISO 42001

Clause 6, 8, 9

3 controls mapped
Click for details →
🔒

GDPR

Art. 5, 22, 32

4 controls mapped
Click for details →

SOC 2

CC3, CC6, CC7

5 controls mapped
Click for details →
12 implemented controls with full traceability to regulatory requirements

Knowledge Graph

Drag nodes to explore the relationship graph. This visualization shows how services connect to controls, which map to compliance frameworks (EU AI Act, NIST, ISO, GDPR, SOC 2). The graph is powered by Neo4j Aura with 112 nodes and 138 relationships modeling the complete domain.

Integration Patterns

REST API

Direct HTTP integration with JSON payloads

POST /api/v1/moderate Authorization: Bearer {api_key} {"content": "...", "source": "web"}

Mobile SDK

Native iOS/Android with offline queue

CivitasSDK.moderate(text) { result -> when(result.action) { ALLOW -> publish() BLOCK -> reject() } }

LLM Guardrails

Pre/post-processing for LLM outputs

llm_output = model.generate(prompt) result = civitas.moderate(llm_output) if result.action == "block": return SAFE_FALLBACK

Webhooks

Event-driven notifications

{ "event": "moderation.decision", "action": "escalate", "decision_id": "..." }

Cloud Deployment

Live URLs

Frontend: civitas.pages.dev
API: gateway-xxx.run.app
Database: Supabase (us-west-2)
Graph: Neo4j Aura
4
Microservices
<100ms
API Latency (p95)
99.9%
Uptime SLA

Security & Access Control

Enterprise authentication with zero-trust architecture

Authentication Flow

User Visits Protected Site
Access Gate Password Check
Magic Link Email Verification
Authenticated 24h Session

Cloudflare Workers

  • Edge-deployed authentication gate
  • KV storage for sessions & tokens
  • Global low-latency access

Security Controls

  • Password-gated access
  • Email verification via magic link
  • Secure HttpOnly cookies

Audit & Logging

  • Email access logging (90 days)
  • IP & country tracking
  • Admin access log endpoint

Seamless UX

  • One-click magic link auth
  • Auto-login after verification
  • Cross-domain session handling

Comprehensive Test Suite

Multi-layered testing strategy ensures reliability and compliance. E2E tests validate user workflows, CDD tests verify governance controls, and the CI/CD pipeline runs 36 automated tests on every commit.

36
Tests Passing
4
Test Suites
12
CDD Controls
<17s
Full Suite Runtime

E2E Tests (Playwright)

11 Audit Log Evidence, Export, Search
8 Moderation Demo Submit, ALLOW/BLOCK, Scores
7 Moderator Queue Review, Approve, Override
10 Policy Management Create, Edit, Thresholds

Compliance-Driven Development

MOD-001 Automated Classification
MOD-002 Real-Time Feedback
POL-001 Threshold Policy
GOV-002 Human Review
AUD-001 Immutable Evidence

GitHub Actions CI/CD Pipeline

Unit Tests Integration Tests CDD Tests E2E Tests Code Coverage golangci-lint Trivy Security Scan

Security Posture

Defense-in-depth across every layer

🔒

Authentication

  • ✓ Cloudflare Access Gate (magic link)
  • ✓ SHA-256 hashed API keys
  • ✓ RBAC (admin/moderator/viewer)
  • ✓ Constant-time token comparison
  • ✓ Internal service auth tokens
🛡

Infrastructure

  • ✓ Non-root Docker containers
  • ✓ Multi-stage builds (minimal surface)
  • ✓ go.sum supply chain verification
  • ✓ Secrets in env vars, never in code
  • ✓ Cloud Run scale-to-zero isolation
🧱

Application

  • ✓ HTTP server timeouts (Slowloris)
  • ✓ Bounded proxy response reads (10MB)
  • ✓ Scoped CORS (no wildcards)
  • ✓ Pagination limits on all endpoints
  • ✓ No error detail leaks in health checks
📜

Audit & Evidence

  • ✓ Immutable evidence records (AUD-001)
  • ✓ Full decision traceability with correlation IDs
  • ✓ Content hashing for integrity verification
  • ✓ Structured logging with zap (no PII)
A+
Security Audit Score
0 Critical 0 High 0 Medium
All findings resolved

Advanced Language Filtering

9 enhancements for multi-language, multi-provider resilience

Text Normalizer

NFKC normalization, zero-width stripping, homoglyph and leetspeak decoding to defeat Unicode evasion

Language Detection

Automatic language identification via lingua-go with routing to language-aware classification providers

Ensemble Classification

Parallel multi-provider execution with agreement scoring and auto-escalation on disagreement

Score Calibration

Per-provider offset and scale normalization to a unified 0-1 scale with dynamic feedback-driven tuning

LLM Second-Pass

Ambiguous scores (0.3-0.7) re-evaluated by Claude or GPT-4 with structured classification prompts

Context-Aware Policy

Threshold overrides based on request metadata (audience, platform) for context-sensitive moderation

User Trust Scoring

Rolling-window behavioral scoring adjusts policy thresholds per user based on moderation history

Feedback Loop

Human review outcomes feed back to provider calibration, continuously improving classification accuracy

Expanded Categories

Self-harm, spam, and PII detection added to the 6 existing moderation categories for 9 total

38 unit tests 4 migrations 13 new files 10 modified files

Roadmap

Phase 1: Foundation

Core moderation, policy engine, review queue, audit trail

Complete ML Classification • Policy Rules • Evidence Chain

Phase 2: Scale

Multi-language support, ensemble classification, LLM second-pass, feedback loop

Complete i18n • Normalizer • Ensemble • Calibration • Trust Scoring

Phase 3: Enterprise

Multi-tenant, SSO, advanced analytics, SLA dashboard

Q4 2026 SAML/OIDC • Tenant Isolation • BI Integration

Get Started

Enterprise-grade content moderation, ready for production

Documentation

API reference, integration guides, and examples

Contact

proth1@gmail.com

License

MIT Open Source