🔍 RootIB Provenance Scanner

Cross-reference any GitHub user or repo against the RootIB origin ledger. Discover where RootIB-stamped ideas appear — and when.

🔐 GitHub Authentication

Authenticate with GitHub via Device Flow to unlock authenticated API calls (60 → 5,000 req/hr), private repo scanning, and code search. Requires the Railway backend with ROOTIBAPP_CLIENT_ID / ROOTIBAPP_CLIENT_SECRET set in Railway Variables.

Checking OAuth status…

⚡ Live Lookup

Enter a GitHub username or owner/repo to scan for RootIB provenance markers. Results are powered by the GitHub API and Groq AI analysis — no local backend required.

⚙️ Backend URL

API calls are routed to githubBackend.html which acts as a virtual backend via a Service Worker (persists in the browser, no server required). If localhost:8000 is running it is used automatically instead. Override the URL below if you are running a real backend elsewhere.
💡 Tip: set window.__BACKEND_API_BASE__ = "https://your-backend.example.com" in the browser console to override globally at runtime.

🔎 Pending Match Queue

Matches identified by the backend scanner that await human approval before being written to the ledger. Powered by the Railway-hosted Scanner API. Each match can be Approved, Rejected, or Skipped.

🌐 Global Scan Status

Live counters from the autonomous backend worker. Powered by the Railway-hosted Scanner API. Override the URL in the ⚙️ Backend / Railway URL card above to point to a local instance if needed.

Public Repos
Files Indexed
Explicit Markers
Conceptual Connections
Pending Matches
Confirmed Links
Incremental Runs
Cursor
Last crawl:  ·  Last incremental:
1
Unlock Controls
2
Start githubBackend.html Worker
3
Run Operations in Sequence
📋 Activity Log

📅 Origin Timeline

Enter a RootIB origin ID to see its full timeline of external matches.

ℹ️ How it works

  1. The GitHub REST API fetches all public repos for the target user via paginated requests (per_page=100), with no artificial cap — all repos are scanned.
  2. A regex scanner searches each file for explicit RootIB: <ID> patterns — these are counted as exact matches.
  3. A concept matcher runs token n-gram Jaccard similarity (structural) and bag-of-words cosine similarity (semantic) comparisons against every origin description in the local ledger.
  4. Found IDs and concept matches are cross-referenced against the local RootIB ledger (rootib-ledger.json) — confirmed ledger entries are highlighted in purple.
  5. Groq AI (llama-3.3-70b-versatile) provides an intelligent provenance assessment of the findings.
  6. The Groq API key is pre-embedded via the RootIB Scanner Setup GitHub Actions workflow using the GROK repo secret.

To enable AI analysis: add your Groq API key as GROK in Settings → Secrets → Actions, then run the RootIB Scanner Setup workflow. Free keys at console.groq.com/keys.

🔌 APIs Used

APIEndpointPurpose
GitHub RESTapi.github.com/users/{user}/reposList user's public repos
GitHub RESTapi.github.com/repos/{owner}/{repo}/git/trees/HEADFetch file tree recursively
GitHub Rawraw.githubusercontent.com/{owner}/{repo}/HEAD/{path}Fetch file contents
Groq AIapi.groq.com/openai/v1/chat/completionsAI provenance analysis
Local Ledger/js/rootib-ledger.jsonCross-reference known RootIB origins
Scanner APIPOST /scanIngest repos into the provenance DB
Scanner APIPOST /scan/globalPage through all public GitHub repos globally
Scanner APIPOST /scan/incrementalRe-scan repos changed since last crawl
Scanner APIGET /scan/statusGlobal scan counters for the UI dashboard
Scanner APIPOST /canonicalizeCanonicalize ingested file contents
Scanner APIPOST /matchRun matching engine — results go to pending queue
Scanner APIGET /review/pendingList pending matches awaiting human approval
Scanner APIPOST /review/pending/{id}Approve / reject / skip a pending match
Scanner APIPOST /valuation/recalculateTrigger auto-valuation for all origins
Scanner APIGET /valuation/{origin_id}Retrieve current valuation for an origin

🏗️ Full Pipeline Overview

The RootIB Global Provenance System is a multi-stage pipeline that ingests GitHub repositories, canonicalizes their contents, finds matches against the RootIB origin ledger, routes findings through a human approval gate, and then automatically updates the ledger and recalculates valuations.

  1. Repository IngestionPOST /scan — Discovers all public repos for a GitHub user or org and fetches the full file tree + raw content via the GitHub API. Commit timestamps (first and last) are stored per file.
  2. CanonicalizationPOST /canonicalize — Each file is normalized (whitespace, comments stripped), hashed with SHA-256, and split into function/class-level artifacts. Language-aware extractors handle Python, JavaScript, TypeScript, and more.
  3. Matching EnginePOST /match — Compares every artifact against every RootIB origin using three match types: exact (hash), structural (token n-gram Jaccard), and semantic (bag-of-words cosine). All matches are written to pending_provenance_links with status = pending.
  4. Evidence Bundle — Every pending match carries an evidence bundle — a JSON object containing the file path, GitHub URL, match type, similarity score, origin description snippet, and structural diff — giving the human reviewer all context needed to make a decision.
  5. Human Review GateGET /review/pending + POST /review/pending/{id} — A reviewer sees each pending match and chooses Approve, Reject, or Skip. No ledger entry is ever created without explicit human approval.
  6. Ledger Update Automation — On approval the system: (a) moves the row from pending_provenance_linksprovenance_links; (b) rebuilds the RootIB ledger JSON entry for that origin; (c) commits the updated js/rootib-ledger.json to GitHub via the Contents API; (d) fires a repository_dispatch event to trigger a Pages site rebuild. Ledger entries are immutable once confirmed.
  7. Auto-Valuation EnginePOST /valuation/recalculate — Runs automatically after every approved merge. Computes nine weighted factors (originality, reuse count, temporal precedence, cross-repo propagation, developer contribution weight, chain position, and more) and stores a current_value ∈ [0, 100] per origin in rootib_values.

📋 Evidence Bundle

Each pending provenance match includes a structured evidence bundle so reviewers can make informed decisions without looking at the raw code.

FieldDescription
match_typeexact / structural / semantic
file_pathPath within the external repo
repoGitHub full repo name (owner/name)
github_urlDirect link to the file on GitHub
unit_pathClass / function extracted (or file-level)
jaccard_score / cosine_scoreSimilarity metric (structural / semantic matches)
hashSHA-256 canonical hash (exact matches)
origin_description_snippetFirst 300 chars of origin description for context

📈 Auto-Valuation Engine

After every approved ledger merge the engine recalculates a current_value score (0–100) for every RootIB origin using nine weighted factors:

FactorWeightDescription
Originality score15%Fraction of canonical hashes not seen elsewhere
Downstream reuse count20%Number of confirmed provenance links
Structural similarity depth10%Average Jaccard score across structural matches
Semantic influence5%Average cosine similarity across semantic matches
Temporal precedence15%Fraction of external artifacts that appear after origin timestamp
Cross-repo propagation15%Number of distinct repos containing confirmed matches
Developer contribution weight10%Number of distinct repo owners who adopted the concept
Chain position5%root / branch / leaf based on concept-tag lineage depth
Impact amplification5%Total confirmed links weighted by inverse lineage depth

Results are stored in rootib_values and are queryable via GET /valuation/{origin_id}. Valuation is triggered automatically on every approved provenance link; it can also be triggered manually via POST /valuation/recalculate.