AMBIC Data Platform

AMBIC Data Platform

Full-stack web platform for biomanufacturing data: ingestion, normalization, visualization, and sharing across research teams. Handles life-critical data with provenance tracking and granular access control.

Date2025-02
Tags
Data PlatformReactSupabaseBiomanufacturing

Context

Researchers at the Advanced Mammalian Biomanufacturing Innovation Center (AMBIC) work with process data in CSV, XLSX, and JSON from a range of instruments and equipment. Before this platform, data sharing between teams meant emailing spreadsheets, and there was no standardized schema across datasets.

Because this data feeds into pharmaceutical manufacturing decisions, integrity requirements are high. A normalization bug that silently drops rows or misaligns columns could lead to incorrect conclusions in a clinical context.

Architecture

Frontend is React + TypeScript with shadcn/ui and Recharts for interactive visualization. Backend runs on Supabase: PostgreSQL for structured data, Deno Edge Functions for file operations, object storage for raw uploads.

Stack

LayerDetail
FrontendReact 18, TypeScript, Vite (SWC), Tailwind CSS, shadcn/ui, Recharts
TablesTanStack Table + TanStack Virtual, 2D row + column virtualization for 3,000+ column datasets
BackendSupabase: PostgreSQL + Deno Edge Functions + object storage
Data parsingPapaparse (CSV), ExcelJS (XLSX), native JSON
StateReact Query (server), React Context (auth), URL params (navigation)
AuthSupabase Auth (email/password) + MVP bypass token (internal testing)
SecurityRow-level security on all tables; %/content/% path pattern for storage RLS
Edge FunctionsDeno-based file CRUD, dataset processing, sharing operations

Key Technical Challenges

  • Wide datasets. Some biomanufacturing datasets have 3,000+ columns. The table component uses 2D virtualization (both row and column) so only the visible viewport renders. Without this, initial load would freeze the browser.
  • Schema normalization with provenance. Researchers map source columns to a standardized target schema. Every normalized dataset maintains a link back to its source, preserving the full transformation history.
  • Hierarchical file management. Folder nesting, dataset linking, and three-tier access control (owner, editor, viewer). The Supabase RLS policies for nested storage paths required careful attention, particularly around the %/content/% path pattern matching.

Data Visualization

Interactive time series charts, data exploration dashboards, annotations, and dataset sharing with access control. The upload and normalization flow was designed to match how researchers already think about their experiments, not how the underlying database models the data.