SCIBASE-AI · KoiosSG · May 28, 2026 · May 28, 2026 · May 28, 2026 · May 29, 2026
diff --git a/multilingual-entity-alias-guard/README.md b/multilingual-entity-alias-guard/README.md
@@ -0,0 +1,27 @@
+# Multilingual Entity Alias Guard
+
+This module adds a focused Scientific Knowledge Graph Integration slice for SCIBASE issue #17. It normalizes multilingual scientific mentions before they become graph nodes, entity-page aliases, or recommendation signals.
+
+The guard accepts trusted translated aliases only when numeric confidence evidence is present, preserves original language tags, normalizes language-tag casing and underscore or hyphen regional separators for lookup, falls back from regional language tags to their base language, emits JSON-LD-style entity packets, holds homographs, false friends, same-language alias collisions, extractor-candidate/alias conflicts, malformed mention text, and mixed-script Latin-language lookalikes including lowercase Greek or Cyrillic confusables for curator review, suppresses low-confidence or missing-confidence aliases before recommendations are shown, and treats omitted or malformed localized names, mentions, or homograph policies as sparse graph evidence instead of crashing corpus review.
+
+## Run
+
+```bash
+npm test
+npm run demo
+npm run video
+npm run check
+```
+
+## Outputs
+
+- `reports/alias-guard-packet.json`
+- `reports/sparse-alias-guard-packet.json`
+- `reports/candidate-alias-conflict-packet.json`
+- `reports/malformed-mention-text-packet.json`
+- `reports/malformed-alias-evidence-packet.json`
+- `reports/alias-guard-report.md`
+- `reports/summary.svg`
+- `reports/demo.mp4`
+
+All data is synthetic. The module does not call live ontologies, identity providers, external APIs, private corpora, search indexes, or recommendation systems.
diff --git a/multilingual-entity-alias-guard/acceptance-notes.md b/multilingual-entity-alias-guard/acceptance-notes.md
@@ -0,0 +1,28 @@
+# Acceptance Notes
+
+This #17 slice focuses specifically on multilingual scientific alias quality before graph nodes and recommendations are produced.
+
+It is not:
+
+- a broad entity extractor or navigator
+- an ontology deprecation or synonym migration tool
+- a recommendation visibility or diversity guard
+- a geospatial, clinical trial, biological accession, software runtime, or temporal validity guard
+
+Validation coverage:
+
+- trusted CRISPR aliases in English, German, and Spanish map to one canonical MeSH entity
+- Spanish `control` is held as a homograph/false friend instead of silently creating a statistical control-group edge
+- same-language translated alias collisions are held instead of silently attaching a mention to the wrong canonical entity
+- extractor candidate IDs that disagree with multilingual alias lookup are held instead of silently overriding either signal
+- language-tag case differences do not suppress trusted translated aliases
+- regional language tags such as `es-MX` use base-language alias and homograph policy while preserving the original tag
+- underscore regional language tags such as `es_MX` use the same base-language alias and homograph policy while preserving the original tag
+- mixed-script Latin-language aliases such as Cyrillic-lookalike `CRISPR` text or lowercase Greek-alpha `CRISPR-Cαs9` text are held for curator review instead of becoming quiet unknowns
+- low-confidence French alias output is suppressed from recommendations
+- missing or non-numeric confidence evidence is suppressed before graph recommendations
+- sparse ontology/corpus exports with omitted localized names, mention lists, or homograph policies do not crash corpus review
+- malformed localized-name entries are omitted from alias lookup and JSON-LD alternate names, with alias evidence issues preserved for review
+- malformed mention text values are held for curator review instead of crashing alias normalization or reaching recommendation-safe IDs
+- localized names remain language-tagged on entity packets
+- audit output is deterministic and private-data free
diff --git a/multilingual-entity-alias-guard/demo.js b/multilingual-entity-alias-guard/demo.js
@@ -0,0 +1,172 @@
+const fs = require('fs');
+const path = require('path');
+const { evaluateAliasGuard, buildSampleCorpus } = require('./index');
+
+const reportsDir = path.join(__dirname, 'reports');
+fs.mkdirSync(reportsDir, { recursive: true });
+
+const result = evaluateAliasGuard(buildSampleCorpus());
+const sparseResult = evaluateAliasGuard({
+  corpusId: 'kg-sparse-ontology-export-17',
+  generatedAt: '2026-05-30T12:00:00Z',
+  entities: [
+    {
+      id: 'entity:mesh:D012345',
+      canonicalName: 'Sparse Ontology Entity',
+      ontology: 'MeSH',
+      identifier: 'D012345'
+    }
+  ]
+});
+const conflictResult = evaluateAliasGuard({
+  ...buildSampleCorpus(),
+  corpusId: 'kg-candidate-alias-conflict-17',
+  generatedAt: '2026-05-30T12:30:00Z',
+  mentions: [
+    {
+      id: 'mention-diabetes-conflicting-candidate',
+      documentId: 'paper-17',
+      text: 'diabetes mellitus',
+      language: 'es',
+      confidence: 0.93,
+      candidateEntityId: 'entity:stat:control-group'
+    }
+  ]
+});
+const malformedMentionResult = evaluateAliasGuard({
+  ...buildSampleCorpus(),
+  corpusId: 'kg-malformed-mention-text-17',
+  generatedAt: '2026-05-31T10:45:00Z',
+  mentions: [
+    {
+      id: 'mention-malformed-text',
+      documentId: 'paper-18',
+      text: { value: 'diabetes mellitus' },
+      language: 'es',
+      confidence: 0.94,
+      candidateEntityId: 'entity:mesh:D003920'
+    }
+  ]
+});
+const malformedAliasEvidenceResult = evaluateAliasGuard({
+  ...buildSampleCorpus(),
+  corpusId: 'kg-malformed-localized-name-17',
+  generatedAt: '2026-05-31T10:46:00Z',
+  entities: [
+    {
+      id: 'entity:mesh:D003920',
+      canonicalName: 'Diabetes Mellitus',
+      ontology: 'MeSH',
+      identifier: 'D003920',
+      localizedNames: {
+        es: ['diabetes mellitus', { value: 'diabete mellitus' }]
+      }
+    }
+  ],
+  mentions: [
+    {
+      id: 'mention-diabetes-es',
+      documentId: 'paper-19',
+      text: 'diabetes mellitus',
+      language: 'es',
+      confidence: 0.94
+    }
+  ]
+});
+
+const packetPath = path.join(reportsDir, 'alias-guard-packet.json');
+const sparsePacketPath = path.join(reportsDir, 'sparse-alias-guard-packet.json');
+const conflictPacketPath = path.join(reportsDir, 'candidate-alias-conflict-packet.json');
+const malformedMentionPacketPath = path.join(reportsDir, 'malformed-mention-text-packet.json');
+const malformedAliasEvidencePacketPath = path.join(reportsDir, 'malformed-alias-evidence-packet.json');
+const reportPath = path.join(reportsDir, 'alias-guard-report.md');
+const svgPath = path.join(reportsDir, 'summary.svg');
+
+fs.writeFileSync(packetPath, `${JSON.stringify(result, null, 2)}\n`);
+fs.writeFileSync(sparsePacketPath, `${JSON.stringify(sparseResult, null, 2)}\n`);
+fs.writeFileSync(conflictPacketPath, `${JSON.stringify(conflictResult, null, 2)}\n`);
+fs.writeFileSync(malformedMentionPacketPath, `${JSON.stringify(malformedMentionResult, null, 2)}\n`);
+fs.writeFileSync(malformedAliasEvidencePacketPath, `${JSON.stringify(malformedAliasEvidenceResult, null, 2)}\n`);
+
+const accepted = result.mentionDecisions
+  .filter((decision) => decision.decision === 'accept-canonical-entity')
+  .map((decision) => `- ${decision.id}: ${decision.text} (${decision.language}) -> ${decision.candidateEntityId}`)
+  .join('\n');
+
+const held = result.curatorActions
+  .map((action) => `- ${action.id}: ${action.action} (${action.language}:${action.text})`)
+  .join('\n');
+
+const markdown = `# Multilingual Entity Alias Guard
+
+Corpus: ${result.corpusId}
+Generated: ${result.generatedAt}
+
+## Summary
+
+- Accepted mentions: ${result.summary.acceptedMentions}
+- Held curator-review mentions: ${result.summary.heldMentions}
+- Suppressed low-confidence mentions: ${result.summary.suppressedMentions}
+- Entity packets emitted: ${result.summary.entityPackets}
+- Audit digest: ${result.auditDigest}
+
+## Accepted Canonical Mappings
+
+${accepted}
+
+## Curator Actions
+
+${held}
+
+## Recommendation Guard
+
+Held or suppressed mentions are not allowed to drive entity-page recommendations until a curator verifies the alias mapping.
+
+## Sparse Corpus Guard
+
+Sparse ontology or corpus exports that omit localized names, mention lists, or homograph policy still produce deterministic graph review evidence. The sparse fixture emitted ${sparseResult.summary.entityPackets} entity packet and ${sparseResult.mentionDecisions.length} mention decisions.
+
+## Candidate Alias Conflict Guard
+
+Extractor candidates that disagree with trusted multilingual alias lookup are held for curator review instead of silently overriding the upstream candidate. The conflict fixture decision is ${conflictResult.mentionDecisions[0].decision} with reason ${conflictResult.mentionDecisions[0].reason}.
+
+## Malformed Mention Text Guard
+
+Malformed mention text values are held for curator review instead of crashing alias normalization. The malformed fixture decision is ${malformedMentionResult.mentionDecisions[0].decision} with reason ${malformedMentionResult.mentionDecisions[0].reason}, and it emits ${malformedMentionResult.curatorActions[0].action}.
+
+## Malformed Alias Evidence Guard
+
+Malformed localized-name evidence is omitted from alias lookup and JSON-LD alternate names instead of crashing ontology review. The malformed alias fixture records ${malformedAliasEvidenceResult.entityPackets[0].aliasEvidenceIssues.length} alias evidence issue with reason ${malformedAliasEvidenceResult.entityPackets[0].aliasEvidenceIssues[0].reason}.
+
+## Safety
+
+All fixtures are synthetic. The module does not call live ontologies, identity providers, external APIs, private corpora, search indexes, or recommendation systems.
+`;
+
+fs.writeFileSync(reportPath, markdown);
+
+const svg = `<svg xmlns="http://www.w3.org/2000/svg" width="1280" height="720" viewBox="0 0 1280 720">
+  <rect width="1280" height="720" fill="#0c2130"/>
+  <rect x="54" y="58" width="1172" height="604" rx="18" fill="#142f42" stroke="#7bd88f" stroke-width="4"/>
+  <text x="96" y="136" fill="#ffffff" font-family="Arial, sans-serif" font-size="44" font-weight="700">Multilingual Entity Alias Guard</text>
+  <text x="96" y="210" fill="#d8f6df" font-family="Arial, sans-serif" font-size="28">Accepted canonical mentions: ${result.summary.acceptedMentions}</text>
+  <text x="96" y="260" fill="#d8f6df" font-family="Arial, sans-serif" font-size="28">Held curator-review mentions: ${result.summary.heldMentions}</text>
+  <text x="96" y="310" fill="#d8f6df" font-family="Arial, sans-serif" font-size="28">Suppressed low-confidence mentions: ${result.summary.suppressedMentions}</text>
+  <text x="96" y="380" fill="#ffffff" font-family="Arial, sans-serif" font-size="24">Languages preserved: en, de, es, fr</text>
+  <text x="96" y="430" fill="#ffffff" font-family="Arial, sans-serif" font-size="24">JSON-LD entity packets ready for schema.org-style pages</text>
+  <text x="96" y="510" fill="#ffd37a" font-family="Arial, sans-serif" font-size="26">Unsafe or malformed aliases are held before recommendations are shown.</text>
+  <text x="96" y="574" fill="#a6d7c3" font-family="Arial, sans-serif" font-size="18">${result.auditDigest}</text>
+</svg>
+`;
+
+fs.writeFileSync(svgPath, svg);
+
+console.log(`Wrote ${path.relative(__dirname, packetPath)}`);
+console.log(`Wrote ${path.relative(__dirname, sparsePacketPath)}`);
+console.log(`Wrote ${path.relative(__dirname, conflictPacketPath)}`);
+console.log(`Wrote ${path.relative(__dirname, malformedMentionPacketPath)}`);
+console.log(`Wrote ${path.relative(__dirname, malformedAliasEvidencePacketPath)}`);
+console.log(`Wrote ${path.relative(__dirname, reportPath)}`);
+console.log(`Wrote ${path.relative(__dirname, svgPath)}`);
+console.log(`Accepted mentions: ${result.summary.acceptedMentions}`);
+console.log(`Suppressed mentions: ${result.summary.suppressedMentions}`);