This advanced ranking system combines 4 independent factors to rank search results:
- Semantic Relevance (RRF): Hybrid search combining vector similarity + keyword matching
- Connectivity: How well-connected the entity is in the knowledge graph
- Temporal Relevance: Whether the entity was active during the query time period
- Query Term Matching: How well the entity matches specific terms in the query
This multi-factor approach is especially powerful for complex queries like:
- "Who was the California Attorney General in 2020?" (temporal + specific role)
- "What position did Gavin Newsom hold in San Francisco?" (location + role matching)
The original search_episodes() function uses:
- Hybrid Search: Combines semantic (vector) and keyword search
- RRF (Reciprocal Rank Fusion): Merges results from both methods
- Returns top 5 results based on RRF scores only
The search_episodes_with_custom_ranking() function adds:
- Connection Count: Counts how many relationships each node has
- Weighted Scoring: Combines original RRF score with connection importance
- Configurable Weight: Adjustable parameter to control connection influence
The search_episodes_with_temporal_ranking() function adds:
- Automatic Year Detection: Extracts years from queries (e.g., "2020" from the query)
- Temporal Matching: Checks if entities were valid/active during the query time period
- Date Range Support: Handles term_start/term_end, valid_at/invalid_at timestamps
- Triple Weighting: Balances semantic relevance, connections, AND temporal match
The search_episodes_with_multi_factor_ranking() function provides:
- Query Term Extraction: Identifies important terms and proper nouns from the query
- Comprehensive Matching: Checks node name, summary, attributes, connected entities, and labels
- Weighted Term Scoring: Different weights for different match types (name > summary > attributes)
- Quadruple Weighting: Balances ALL four factors for optimal results
The new search_episodes_with_question_aware_ranking() function adds:
- Question Type Detection: Automatically detects WHO/WHERE/WHEN/WHAT/WHY/HOW questions
- Entity Type Intelligence: Adjusts weights based on question type (WHO → boost PER entities)
- Intent Understanding: Understands what kind of answer the user wants
- 5-Factor Ranking: Adds entity type matching as the 5th ranking factor
final_score = (1 - connection_weight) × original_score + connection_weight × normalized_connections
final_score = semantic_weight × original_score +
connection_weight × normalized_connections +
temporal_weight × temporal_relevance
where: semantic_weight = 1.0 - connection_weight - temporal_weight
final_score = semantic_weight × original_score +
connection_weight × normalized_connections +
temporal_weight × temporal_relevance +
query_match_weight × query_term_match
where: semantic_weight = 1.0 - connection_weight - temporal_weight - query_match_weight
final_score = semantic_weight × original_score +
connection_weight × normalized_connections +
temporal_weight × temporal_relevance +
query_match_weight × query_term_match +
entity_type_weight × entity_type_match
where: semantic_weight = 1.0 - connection_weight - temporal_weight - query_match_weight - entity_type_weight
Entity Type Match:
- Automatically detects question type (WHO/WHERE/WHEN/etc.)
- Applies appropriate entity type weights
- Example: WHO questions boost PER entities 2.0x, penalize LOC entities 0.3x
Where:
original_score: RRF score from hybrid search (0 to 1)normalized_connections: Node's connections / max_connections (0 to 1)temporal_relevance: How well the node matches the query time period (0 to 1)- 1.0: Perfect temporal match (entity was active during query year)
- 0.8: Likely match (entity started before query year, no end date)
- 0.5: No temporal information or weak match
- 0.3: Outside validity period
query_term_match: How well the node matches specific query terms (0 to 1)- Checks: node name (weight 3.0), proper nouns (2.0), summary (1.5), connected entities (1.0), attributes (1.0), labels (0.5)
- Normalized to 0-1 range
Connection-Only:
- 0.3: 70% RRF, 30% connections (recommended default)
Temporal-Aware:
- connection_weight=0.2, temporal_weight=0.3: 50% semantic, 20% connections, 30% temporal
- connection_weight=0.1, temporal_weight=0.5: 40% semantic, 10% connections, 50% temporal (strong temporal focus)
Multi-Factor (RECOMMENDED for complex queries):
- connection=0.15, temporal=0.25, query_match=0.25: 35% semantic, 15% connections, 25% temporal, 25% query match (balanced)
- connection=0.1, temporal=0.3, query_match=0.3: 30% semantic, 10% connections, 30% temporal, 30% query match (focus on matching)
- connection=0.2, temporal=0.2, query_match=0.3: 30% semantic, 20% connections, 20% temporal, 30% query match (connectivity + matching)
Queries Neo4j to get comprehensive connection information for a node:
MATCH (n {uuid: $uuid})-[r]-(connected)
RETURN
count(r) as connection_count,
collect(DISTINCT connected.name) as connected_entities,
collect(DISTINCT type(r)) as relationship_typesReturns a dictionary with:
count: Number of connectionsentities: List of connected entity namesrelationship_types: List of relationship types
Simplified version that returns just the connection count.
Extracts temporal information from node properties and related episodes:
MATCH (n {uuid: $uuid})
OPTIONAL MATCH (n)-[:PART_OF]-(episode)
RETURN
n.created_at, n.valid_at, n.invalid_at,
collect(DISTINCT episode.valid_at) as episode_dates,
properties(n) as node_propertiesReturns temporal data including:
created_at,valid_at,invalid_at: Timestampsepisode_dates: Related episode timestampsproperties: All node properties (may contain date fields liketerm_start,term_end)
Calculates how well a node matches the query time period:
- Checks property fields:
term_start,term_end,start_date,end_date,year,date - Parses date ranges (e.g., "2011-2017")
- Checks validity timestamps (
valid_at,invalid_at) - Returns score 0.0 to 1.0 based on temporal match quality
Extracts important information from the query:
- Removes stop words (who, what, when, the, a, etc.)
- Identifies important terms (length > 2, not stop words)
- Extracts proper nouns (capitalized words)
- Auto-detects year mentions
- Returns dict with all extracted information
Calculates how well a node matches specific query terms:
- Node name match (weight 3.0): Exact or partial match in node name
- Proper noun match (weight 2.0): Matches capitalized terms from query
- Summary match (weight 1.5): Terms found in node summary
- Connected entities match (weight 1.0): Terms in connected entity names
- Attributes match (weight 1.0): Terms in node attributes
- Labels match (weight 0.5): Terms in node labels
- Returns normalized score 0.0 to 1.0
Main enhanced search function that:
- Performs initial hybrid search (gets top 20 candidates)
- Fetches connection counts for each node
- Normalizes connection scores
- Calculates weighted final scores
- Re-ranks and returns top 5 results
# Standard search (RRF only)
await search_episodes(graphiti, "Who was the California Attorney General in 2020?")
# Enhanced search with connection ranking
await search_episodes_with_custom_ranking(
graphiti,
"Who was the California Attorney General in 2020?",
connection_weight=0.3 # 30% weight on connections
)
# Temporal-aware search (for date queries)
await search_episodes_with_temporal_ranking(
graphiti,
"Who was the California Attorney General in 2020?",
connection_weight=0.2, # 20% weight on connections
temporal_weight=0.3, # 30% weight on temporal match
query_year=2020 # Optional: auto-detected if not provided
)
# Multi-factor search (RECOMMENDED for complex queries)
await search_episodes_with_multi_factor_ranking(
graphiti,
"Who was the California Attorney General in 2020?",
connection_weight=0.15, # 15% weight on connections
temporal_weight=0.25, # 25% weight on temporal match
query_match_weight=0.25, # 25% weight on query term matching
query_year=2020 # Optional: auto-detected if not provided
)
# Remaining 35% goes to semantic relevance (RRF)Each result shows:
- Node UUID: Unique identifier
- Node Name: Entity name
- Content Summary: Brief description
- Node Labels: Entity types
- Total Connections: Number of relationships
- Connected To: Names of connected entities (up to 10 shown)
- Relationship Types: Types of relationships (e.g., "RELATED_TO", "WORKS_FOR")
- term_start / term_end: Start and end dates from node properties
- Other date fields: year, date, start_date, end_date, etc.
Shows detailed breakdown of each factor:
├─ Semantic (RRF): 0.8500 × 0.35 = 0.2975
├─ Connections: 0.7500 × 0.15 = 0.1125
├─ Temporal Match: 1.0000 × 0.25 = 0.2500
├─ Query Term Match: 0.9200 × 0.25 = 0.2300
└─ FINAL SCORE: 0.8900
Each line shows:
- Factor score (0-1) × weight = contribution to final score
- Additional metadata associated with the node
- Promotes Central Entities: Entities with many relationships (like "California", "Governor") rank higher
- Context Awareness: Well-connected nodes are often more important in the knowledge graph
- Flexible Tuning: Adjust
connection_weightbased on your use case - Transparent Scoring: Shows all score components for debugging
- Explainable Results: Displays connected entities and relationship types, making it clear WHY a node ranks highly
- Relationship Insights: See what types of relationships exist (e.g., "WORKS_FOR", "LOCATED_IN")
- No specific time period mentioned
- Simple entity lookup
- Speed is critical (fewest database queries)
- Broad exploratory queries
- You want to find central/important entities
- Exploring entity relationships
- No temporal context needed
- Finding "hub" nodes in the graph
- Query mentions a specific year or date
- Historical queries (e.g., "Who was X in 2020?")
- Time-sensitive information needed
- Entities have term limits or validity periods
- Complex queries with multiple constraints
- Queries with specific terms AND dates
- Need to balance multiple relevance signals
- Want accurate results without question-type detection
- Any WHO/WHERE/WHEN/WHAT/WHY/HOW question
- Want the system to understand user intent automatically
- Need entity-type aware results (WHO → people, WHERE → places)
- Maximum accuracy for natural language questions
- Production use with diverse query types
- Factual Queries: Lower weight (0.2-0.3) - prioritize semantic match
- Entity Discovery: Higher weight (0.5-0.7) - find central entities
- Relationship Exploration: High weight (0.7-1.0) - find hub nodes
- Recent queries (last 5 years):
temporal_weight=0.3-0.4 - Historical queries (10+ years ago):
temporal_weight=0.4-0.5 - Precise date queries ("in 2020"):
temporal_weight=0.5 - Vague temporal queries ("recently"):
temporal_weight=0.2
Query: "Who was the California Attorney General in 2020?"
Standard Search (RRF only) might return:
- Kamala Harris (high semantic match, but was AG 2011-2017, NOT in 2020)
- California (keyword match)
- Attorney General (keyword match)
Enhanced Search (Connection-based) might return:
- California (moderate semantic + very high connections)
- Kamala Harris (high semantic + moderate connections, but wrong time period)
- Governor role (moderate semantic + high connections)
Temporal-Aware Search (Connection + Temporal) returns:
- Xavier Becerra (moderate semantic + moderate connections + PERFECT temporal match: AG 2017-2021)
- Kamala Harris (high semantic + moderate connections + LOW temporal: AG 2011-2017)
- California (moderate semantic + very high connections + neutral temporal)
The temporal-aware version correctly identifies that Xavier Becerra was the AG in 2020, even if Kamala Harris has a stronger semantic match, because it factors in the temporal relevance!
[Rank 1]
Node UUID: abc-123-def
Node Name: Kamala Harris
📊 Connection Analysis:
Total Connections: 15
Connected To: California, Attorney General, San Francisco, District Attorney, Governor, Senate, ...
Relationship Types: WORKS_FOR, LOCATED_IN, HOLDS_POSITION, RELATED_TO
📈 Scoring Breakdown:
Original Score (RRF): 0.8500
Connection Score: 0.7500
Final Weighted Score: 0.8200
🏷️ Attributes:
position: Attorney General
state: California
term_start: 2011-01-03
term_end: 2017-01-03
---
This makes it immediately clear:
- Why the entity ranks highly (high RRF score + many connections)
- What it's connected to (California, Attorney General role, etc.)
- How it's connected (relationship types like WORKS_FOR, LOCATED_IN)