RootIB Provenance Scanner — Barbrick Design

🔐 GitHub Authentication

Authenticate with GitHub via Device Flow to unlock authenticated API calls (60 → 5,000 req/hr), private repo scanning, and code search. Requires the Railway backend with ROOTIBAPP_CLIENT_ID / ROOTIBAPP_CLIENT_SECRET set in Railway Variables.

Checking OAuth status…

⚡ Live Lookup

Enter a GitHub username or owner/repo to scan for RootIB provenance markers. Results are powered by the GitHub API and Groq AI analysis — no local backend required.

⚙️ Backend URL

API calls are routed to githubBackend.html which acts as a virtual backend via a Service Worker (persists in the browser, no server required). If localhost:8000 is running it is used automatically instead. Override the URL below if you are running a real backend elsewhere.
💡 Tip: set window.__BACKEND_API_BASE__ = "https://your-backend.example.com" in the browser console to override globally at runtime.

Scanner API URL

🔎 Pending Match Queue

Matches identified by the backend scanner that await human approval before being written to the ledger. Powered by the Railway-hosted Scanner API. Each match can be Approved, Rejected, or Skipped.

🌐 Global Scan Status

Live counters from the autonomous backend worker. Powered by the Railway-hosted Scanner API. Override the URL in the ⚙️ Backend / Railway URL card above to point to a local instance if needed.

—

Public Repos

—

Files Indexed

—

Explicit Markers

—

Conceptual Connections

—

Pending Matches

—

Confirmed Links

—

Incremental Runs

—

Cursor

Last crawl: — · Last incremental: —

1

Unlock Controls

2

Start githubBackend.html Worker

3

Run Operations in Sequence

📋 Activity Log

📅 Origin Timeline

Enter a RootIB origin ID to see its full timeline of external matches.

RootIB Origin ID

ℹ️ How it works

The GitHub REST API fetches all public repos for the target user via paginated requests (per_page=100), with no artificial cap — all repos are scanned.
A regex scanner searches each file for explicit RootIB: <ID> patterns — these are counted as exact matches.
A concept matcher runs token n-gram Jaccard similarity (structural) and bag-of-words cosine similarity (semantic) comparisons against every origin description in the local ledger.
Found IDs and concept matches are cross-referenced against the local RootIB ledger (rootib-ledger.json) — confirmed ledger entries are highlighted in purple.
Groq AI (llama-3.3-70b-versatile) provides an intelligent provenance assessment of the findings.
The Groq API key is pre-embedded via the RootIB Scanner Setup GitHub Actions workflow using the GROK repo secret.

To enable AI analysis: add your Groq API key as GROK in Settings → Secrets → Actions, then run the RootIB Scanner Setup workflow. Free keys at console.groq.com/keys.

🔌 APIs Used

API	Endpoint	Purpose
GitHub REST	`api.github.com/users/{user}/repos`	List user's public repos
GitHub REST	`api.github.com/repos/{owner}/{repo}/git/trees/HEAD`	Fetch file tree recursively
GitHub Raw	`raw.githubusercontent.com/{owner}/{repo}/HEAD/{path}`	Fetch file contents
Groq AI	`api.groq.com/openai/v1/chat/completions`	AI provenance analysis
Local Ledger	`/js/rootib-ledger.json`	Cross-reference known RootIB origins
Scanner API	`POST /scan`	Ingest repos into the provenance DB
Scanner API	`POST /scan/global`	Page through all public GitHub repos globally
Scanner API	`POST /scan/incremental`	Re-scan repos changed since last crawl
Scanner API	`GET /scan/status`	Global scan counters for the UI dashboard
Scanner API	`POST /canonicalize`	Canonicalize ingested file contents
Scanner API	`POST /match`	Run matching engine — results go to pending queue
Scanner API	`GET /review/pending`	List pending matches awaiting human approval
Scanner API	`POST /review/pending/{id}`	Approve / reject / skip a pending match
Scanner API	`POST /valuation/recalculate`	Trigger auto-valuation for all origins
Scanner API	`GET /valuation/{origin_id}`	Retrieve current valuation for an origin

🏗️ Full Pipeline Overview

The RootIB Global Provenance System is a multi-stage pipeline that ingests GitHub repositories, canonicalizes their contents, finds matches against the RootIB origin ledger, routes findings through a human approval gate, and then automatically updates the ledger and recalculates valuations.

Repository Ingestion — POST /scan — Discovers all public repos for a GitHub user or org and fetches the full file tree + raw content via the GitHub API. Commit timestamps (first and last) are stored per file.
Canonicalization — POST /canonicalize — Each file is normalized (whitespace, comments stripped), hashed with SHA-256, and split into function/class-level artifacts. Language-aware extractors handle Python, JavaScript, TypeScript, and more.
Matching Engine — POST /match — Compares every artifact against every RootIB origin using three match types: exact (hash), structural (token n-gram Jaccard), and semantic (bag-of-words cosine). All matches are written to pending_provenance_links with status = pending.
Evidence Bundle — Every pending match carries an evidence bundle — a JSON object containing the file path, GitHub URL, match type, similarity score, origin description snippet, and structural diff — giving the human reviewer all context needed to make a decision.
Human Review Gate — GET /review/pending + POST /review/pending/{id} — A reviewer sees each pending match and chooses Approve, Reject, or Skip. No ledger entry is ever created without explicit human approval.
Ledger Update Automation — On approval the system: (a) moves the row from pending_provenance_links → provenance_links; (b) rebuilds the RootIB ledger JSON entry for that origin; (c) commits the updated js/rootib-ledger.json to GitHub via the Contents API; (d) fires a repository_dispatch event to trigger a Pages site rebuild. Ledger entries are immutable once confirmed.
Auto-Valuation Engine — POST /valuation/recalculate — Runs automatically after every approved merge. Computes nine weighted factors (originality, reuse count, temporal precedence, cross-repo propagation, developer contribution weight, chain position, and more) and stores a current_value ∈ [0, 100] per origin in rootib_values.

📋 Evidence Bundle

Each pending provenance match includes a structured evidence bundle so reviewers can make informed decisions without looking at the raw code.

Field	Description
`match_type`	exact / structural / semantic
`file_path`	Path within the external repo
`repo`	GitHub full repo name (`owner/name`)
`github_url`	Direct link to the file on GitHub
`unit_path`	Class / function extracted (or `file-level`)
`jaccard_score` / `cosine_score`	Similarity metric (structural / semantic matches)
`hash`	SHA-256 canonical hash (exact matches)
`origin_description_snippet`	First 300 chars of origin description for context

📈 Auto-Valuation Engine

After every approved ledger merge the engine recalculates a current_value score (0–100) for every RootIB origin using nine weighted factors:

Factor	Weight	Description
Originality score	15%	Fraction of canonical hashes not seen elsewhere
Downstream reuse count	20%	Number of confirmed provenance links
Structural similarity depth	10%	Average Jaccard score across structural matches
Semantic influence	5%	Average cosine similarity across semantic matches
Temporal precedence	15%	Fraction of external artifacts that appear after origin timestamp
Cross-repo propagation	15%	Number of distinct repos containing confirmed matches
Developer contribution weight	10%	Number of distinct repo owners who adopted the concept
Chain position	5%	root / branch / leaf based on concept-tag lineage depth
Impact amplification	5%	Total confirmed links weighted by inverse lineage depth

Results are stored in rootib_values and are queryable via GET /valuation/{origin_id}. Valuation is triggered automatically on every approved provenance link; it can also be triggered manually via POST /valuation/recalculate.

🔍 RootIB Provenance Scanner