Tech Companies SWE Online Assessment: Real Coding Problems

科技公司软件工程师在线评估：真实编程题

9 January 2026

20 min read

SWE Interview Coach

Ex-Apple engineer, LeetCode Grandmaster, helped 500+ candidates

摘要 Summary

Real coding problems from tech company software engineer online assessments, with solutions and optimization strategies.

科技公司软件工程师在线评估的真实编程题，包含解决方案和优化策略。

2024 US New Graduate Software Engineer OA Coding Problems Collection Comprehensive Guide for 2025 Fall Recruitment Preparation Real Online Assessment Questions from Top Tech Companies Table of Contents

• About This Collection

How to Use This Guide
OA Preparation Strategy
Common OA Patterns and Tips

• Problem 1: Maximum Subarray with K Distinct Elements

Problem 2: Robot Path Planning with Obstacles
Problem 3: String Transformation Minimum Steps
Problem 4: Binary Tree Level Order Traversal with Conditions
Problem 5: Graph Shortest Path with Dynamic Weights

• Problem 6: Social Network Friend Recommendations

Problem 7: Message Delivery Optimization
Problem 8: Content Moderation Algorithm
Problem 9: News Feed Ranking System
Problem 10: User Activity Pattern Analysis

• Problem 11: Package Delivery Route Optimization

Problem 12: Inventory Management System
Problem 13: Customer Review Sentiment Analysis
Problem 14: Warehouse Robot Navigation
Problem 15: Prime Delivery Time Calculation

• Problem 16: Excel Formula Parser

Problem 17: Teams Meeting Scheduler
Problem 18: OneDrive File Synchronization
Problem 19: Azure Resource Allocation
Problem 20: Office Document Version Control

• Problem 21: iOS App Memory Management

Problem 22: Music Playlist Optimization
Problem 23: Device Battery Life Prediction
Problem 24: Photo Library Organization
Problem 25: Siri Voice Command Processing

• Problem 26: Video Streaming Quality Optimization

Problem 27: Content Recommendation Engine
Problem 28: User Viewing Pattern Analysis
Problem 29: Subtitle Synchronization
Problem 30: Content Delivery Network Routing

• Problem 31: Ride Matching Algorithm

Problem 32: Dynamic Pricing Calculator
Problem 33: Driver Route Optimization
Problem 34: Surge Pricing Prediction
Problem 35: Food Delivery Time Estimation

• Problem 36: Professional Network Analysis

Problem 37: Job Recommendation System
Problem 38: Skill Endorsement Validation
Problem 39: Connection Suggestion Algorithm
Problem 40: Content Feed Personalization

• Problem 41: Property Search and Filtering

Problem 42: Booking Conflict Resolution
Problem 43: Host Rating System
Problem 44: Price Optimization Algorithm
Problem 45: Travel Itinerary Planning

• Problem 46: Two Sum Variations

Problem 47: Sliding Window Maximum
Problem 48: Merge Intervals Advanced
Problem 49: LRU Cache Implementation
Problem 50: Design Rate Limiter

• Algorithm Pattern Recognition

Time and Space Complexity Analysis
Common Data Structures Usage
Debugging and Testing Techniques

About This Collection This comprehensive collection contains 50 real Online Assessment (OA) coding problems

python

from major US tech companies, specifically targeting New Graduate Software Engineer
positions for 2025 fall recruitment. All problems have been collected from actual OA
experiences in 2024 and represent the current trends in technical screening.

How to Use This Guide

OA Preparation Strategy Phase 1: Foundation Building (2-3 weeks)

Review fundamental data structures and algorithms
Practice basic problems from each category
Build coding speed and accuracy Phase 2: Pattern Recognition (2-3 weeks)
Focus on identifying problem patterns
Practice medium-difficulty problems
Learn optimization techniques Phase 3: Company-Specific Practice (1-2 weeks)
Target specific company problem styles
Practice under time constraints
Review and debug solutions efficiently Common OA Patterns and Tips Most Frequent Problem Types:

Key Success Factors:

Read problems carefully and identify edge cases
Start with brute force, then optimize
Write clean, readable code with proper variable names
Test with provided examples and edge cases
Manage time effectively across multiple problems

Problem 1: Maximum Subarray with K Distinct Elements Difficulty: Medium | Time Limit: 45 minutes | Company: Google Problem Statement: Given an array of integers and an integer k, find the maximum sum of a subarray that contains exactly k distinct elements. Example: Plain Text Input: nums $ = $ [1, 2, 1, 3, 4], k $ = $ 3 Output: 10 Explanation: Subarray [2, 1, 3, 4] has 3 distinct elements (2, 1, 3, 4) and sum $ = $ 10 Constraints:

1 ≤ nums.length ≤ 10^5
-10^4 ≤ nums[i] ≤ 10^4
1 ≤ k ≤ nums.length Solution Approach: This problem combines sliding window technique with hash map for tracking distinct elements. The key insight is to use a two-pointer approach while maintaining a count of distinct elements. Algorithm:

elements

Time Complexity: $O(n)$ Space Complexity: $O(k)$ Python

python

def maxSubarraySum(nums, k):
if not nums or k <= 0:
return 0
n = len(nums)
max_sum = float('-inf')
# Try all possible starting positions
for start in range(n):
element_count = {}
current_sum = 0
distinct_count = 0
# Expand window from current start position
for end in range(start, n):
# Add current element
if nums[end] not in element_count:
element_count[nums[end]] = 0
distinct_count += 1
element_count[nums[end]] += 1
current_sum += nums[end]
# If we have exactly k distinct elements
if distinct_count == k:
max_sum = max(max_sum, current_sum)
# If we have more than k distinct elements, break
elif distinct_count > k:
break
return max_sum if max_sum != float('-inf') else 0
# Optimized sliding window solution
def maxSubarraySumOptimized(nums, k):
if not nums or k <= 0:
return 0
n = len(nums)
max_sum = float('-inf')
left = 0
element_count = {}
current_sum = 0
for right in range(n):
# Add right element
if nums[right] not in element_count:
element_count[nums[right]] = 0
element_count[nums[right]] += 1
current_sum += nums[right]
# Shrink window if we have more than k distinct elements
while len(element_count) > k:
element_count[nums[left]] -= 1
if element_count[nums[left]] == 0:
del element_count[nums[left]]
current_sum -= nums[left]
left += 1
# Check if we have exactly k distinct elements
if len(element_count) == k:
max_sum = max(max_sum, current_sum)
return max_sum if max_sum != float('-inf') else 0
# Test cases
def test_maxSubarraySum():
# Test case 1
nums1 = [1, 2, 1, 3, 4]
k1 = 3
result1 = maxSubarraySumOptimized(nums1, k1)
print(f"Test 1: nums={nums1}, k={k1}, result={result1}")
# Test case 2
nums2 = [1, 1, 1, 1]
k2 = 1
result2 = maxSubarraySumOptimized(nums2, k2)
print(f"Test 2: nums={nums2}, k={k2}, result={result2}")
# Test case 3
nums3 = [1, 2, 3, 4, 5]
k3 = 2
result3 = maxSubarraySumOptimized(nums3, k3)
print(f"Test 3: nums={nums3}, k={k3}, result={result3}")
test_maxSubarraySum()
Key Insights:
• Use sliding window technique for efficient traversal
• Hash map tracks distinct elements and their frequencies
• Handle edge cases: empty array, k > array length
• Optimize by avoiding unnecessary recomputation
Problem 2: Robot Path Planning with Obstacles
Difficulty: Medium | Time Limit: 45 minutes | Company: Google
Problem Statement:
A robot is located at the top-left corner of an m x n grid. The robot can only move either
down or right at any point in time. Some cells contain obstacles (marked as 1), and the
robot cannot pass through them. Find the number of possible unique paths from top-left to
bottom-right corner.
Example:
Plain Text
Input: grid = [
[0, 0, 0],
[0, 1, 0],
[0, 0, 0]
]
Output: 2
Explanation: There are 2 unique paths:
1. Right -> Right -> Down -> Down
2. Down -> Down -> Right -> Right
Constraints:
• 1 ≤ m, n ≤ 100
• grid[i][j] is 0 or 1
• grid[0][0] and grid[m-1][n-1] are always 0
Solution Approach:
This is a classic dynamic programming problem with obstacles. We need to count paths
while avoiding blocked cells.
Algorithm:
1. Create DP table where dp[i][j] represents number of ways to reach cell (i,j)
2. Initialize first row and column (considering obstacles)
3. For each cell, if it's not an obstacle: dp[i][j] = dp[i-1][j] + dp[i][j-1]
4. Return dp[m-1][n-1]
Time Complexity: O(mn)
Space Complexity: O(mn), can be optimized to O(n)
Python
def uniquePathsWithObstacles(obstacleGrid):
if not obstacleGrid or not obstacleGrid[0] or obstacleGrid[0][0] == 1:
return 0
m, n = len(obstacleGrid), len(obstacleGrid[0])
# Create DP table
dp = [[0] * n for _ in range(m)]
# Initialize starting point
dp[0][0] = 1
# Initialize first row
for j in range(1, n):
if obstacleGrid[0][j] == 0:
dp[0][j] = dp[0][j-1]
else:
dp[0][j] = 0
# Initialize first column
for i in range(1, m):
if obstacleGrid[i][0] == 0:
dp[i][0] = dp[i-1][0]
else:
dp[i][0] = 0
# Fill the DP table
for i in range(1, m):
for j in range(1, n):
if obstacleGrid[i][j] == 0:
dp[i][j] = dp[i-1][j] + dp[i][j-1]
else:
dp[i][j] = 0
return dp[m-1][n-1]
# Space-optimized version
def uniquePathsWithObstaclesOptimized(obstacleGrid):
if not obstacleGrid or not obstacleGrid[0] or obstacleGrid[0][0] == 1:
return 0
m, n = len(obstacleGrid), len(obstacleGrid[0])
# Use only one row for DP
dp = [0] * n
dp[0] = 1
for i in range(m):
for j in range(n):
if obstacleGrid[i][j] == 1:
dp[j] = 0
elif j > 0:
dp[j] += dp[j-1]
return dp[n-1]
# Test cases
def test_uniquePaths():
# Test case 1
grid1 = [
[0, 0, 0],
[0, 1, 0],
[0, 0, 0]
]
result1 = uniquePathsWithObstaclesOptimized(grid1)
print(f"Test 1: grid={grid1}, result={result1}")
# Test case 2
grid2 = [
[0, 1],
[0, 0]
]
result2 = uniquePathsWithObstaclesOptimized(grid2)
print(f"Test 2: grid={grid2}, result={result2}")
test_uniquePaths()
Key Insights:
• Dynamic programming builds solution incrementally
• Handle obstacles by setting path count to 0
• Space optimization possible using 1D array
• Edge cases: starting/ending point blocked, single cell grid
Problem 3: String Transformation Minimum Steps
Difficulty: Hard | Time Limit: 60 minutes | Company: Google
Problem Statement:
Given two strings source and target, find the minimum number of operations to transform
source into target. You can perform the following operations:
1. Insert a character
2. Delete a character
3. Replace a character
Example:
Plain Text
Input: source = "horse", target = "ros"
Output: 3
Explanation:
horse -> rorse (replace 'h' with 'r')
rorse -> rose (remove 'r')
rose -> ros (remove 'e')
Constraints:
• 0 ≤ source.length, target.length ≤ 500
• source and target consist of lowercase English letters
Solution Approach:
This is the classic Edit Distance problem solved using dynamic programming. We build a 2D
table where dp[i][j] represents the minimum operations to transform source[0:i] to
target[0:j].
Algorithm:
1. Create DP table of size (m+1) x (n+1)
2. Initialize base cases: transforming empty string
3. For each cell, consider three operations and take minimum
4. Return dp[m][n]
Time Complexity: O(mn)
Space Complexity: O(mn)
Python
def minDistance(source, target):
m, n = len(source), len(target)
# Create DP table
dp = [[0] * (n + 1) for _ in range(m + 1)]
# Initialize base cases
# Transform empty string to target[0:j] requires j insertions
for j in range(n + 1):
dp[0][j] = j
# Transform source[0:i] to empty string requires i deletions
for i in range(m + 1):
dp[i][0] = i
# Fill the DP table
for i in range(1, m + 1):
for j in range(1, n + 1):
if source[i-1] == target[j-1]:
# Characters match, no operation needed
dp[i][j] = dp[i-1][j-1]
else:
# Take minimum of three operations
dp[i][j] = 1 + min(
dp[i-1][j], # Delete from source
dp[i][j-1], # Insert into source
dp[i-1][j-1] # Replace in source
)
return dp[m][n]
# Space-optimized version using only two rows
def minDistanceOptimized(source, target):
m, n = len(source), len(target)
# Use only two rows
prev = list(range(n + 1))
curr = [0] * (n + 1)
for i in range(1, m + 1):
curr[0] = i
for j in range(1, n + 1):
if source[i-1] == target[j-1]:
curr[j] = prev[j-1]
else:
curr[j] = 1 + min(prev[j], curr[j-1], prev[j-1])
# Swap rows
prev, curr = curr, prev
return prev[n]
# Function to trace back the actual operations
def minDistanceWithOperations(source, target):
m, n = len(source), len(target)
dp = [[0] * (n + 1) for _ in range(m + 1)]
# Initialize base cases
for j in range(n + 1):
dp[0][j] = j
for i in range(m + 1):
dp[i][0] = i
# Fill DP table
for i in range(1, m + 1):
for j in range(1, n + 1):
if source[i-1] == target[j-1]:
dp[i][j] = dp[i-1][j-1]
else:
dp[i][j] = 1 + min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1])
# Trace back operations
operations = []
i, j = m, n
while i > 0 or j > 0:
if i > 0 and j > 0 and source[i-1] == target[j-1]:
i -= 1
j -= 1
elif i > 0 and j > 0 and dp[i][j] == dp[i-1][j-1] + 1:
operations.append(f"Replace '{source[i-1]}' with '{target[j-1]}'
at position {i-1}")
i -= 1
j -= 1
elif i > 0 and dp[i][j] == dp[i-1][j] + 1:
operations.append(f"Delete '{source[i-1]}' at position {i-1}")
i -= 1
elif j > 0 and dp[i][j] == dp[i][j-1] + 1:
operations.append(f"Insert '{target[j-1]}' at position {i}")
j -= 1
operations.reverse()
return dp[m][n], operations
# Test cases
def test_minDistance():
# Test case 1
source1, target1 = "horse", "ros"
result1, ops1 = minDistanceWithOperations(source1, target1)
print(f"Test 1: '{source1}' -> '{target1}', steps={result1}")
for op in ops1:
print(f" {op}")
# Test case 2
source2, target2 = "intention", "execution"
result2 = minDistanceOptimized(source2, target2)
print(f"Test 2: '{source2}' -> '{target2}', steps={result2}")
test_minDistance()
Key Insights:
• Classic DP problem with clear recurrence relation
• Three operations correspond to three DP transitions
• Space can be optimized to O(min(m,n))
• Can trace back actual operations for debugging
Problem 4: Binary Tree Level Order Traversal with Conditions
Difficulty: Medium | Time Limit: 45 minutes | Company: Google
Problem Statement:
Given a binary tree, return the level order traversal of its nodes' values with the following
conditions:
1. For even levels (0, 2, 4...), traverse from left to right
2. For odd levels (1, 3, 5...), traverse from right to left
3. Only include nodes whose values are greater than the average of their level
4. Return the result as a list of lists, where each inner list contains the values of nodes at
that level
Example:
Plain Text
Input:
3
/ \
9 20
/ \
15 7
Output: [[3], [20], [15]]
Explanation:
Level 0: [3] (even level, left to right, 3 > 3.0)
Level 1: [20, 9] -> [20] (odd level, right to left, only 20 > 14.5)
Level 2: [15, 7] -> [15] (even level, left to right, only 15 > 11.0)
Solution Approach:
This problem combines level-order traversal with conditional filtering and directional
processing.
Python
from collections import deque
from typing import List, Optional
class TreeNode:
def __init__(self, val=0, left=None, right=None):
self.val = val
self.left = left
self.right = right
def zigzagLevelOrderWithConditions(root: Optional[TreeNode]) ->
List[List[int]]:
if not root:
return []
result = []
queue = deque([root])
level = 0
while queue:
level_size = len(queue)
level_nodes = []
level_values = []
# Collect all nodes and values at current level
for _ in range(level_size):
node = queue.popleft()
level_nodes.append(node)
level_values.append(node.val)
if node.left:
queue.append(node.left)
if node.right:
queue.append(node.right)
# Calculate average for this level
level_average = sum(level_values) / len(level_values)
# Filter nodes greater than average
filtered_values = [val for val in level_values if val >
level_average]
if filtered_values:
# Apply direction based on level
if level % 2 == 0:
# Even level: left to right (already in correct order)
result.append(filtered_values)
else:
# Odd level: right to left
result.append(filtered_values[::-1])
level += 1
return result
# Enhanced version with more detailed tracking
def zigzagLevelOrderEnhanced(root: Optional[TreeNode]) -> dict:
if not root:
return {"result": [], "level_stats": []}
result = []
level_stats = []
queue = deque([root])
level = 0
while queue:
level_size = len(queue)
level_nodes = []
level_values = []
# Collect all nodes at current level
for _ in range(level_size):
node = queue.popleft()
level_nodes.append(node)
level_values.append(node.val)
if node.left:
queue.append(node.left)
if node.right:
queue.append(node.right)
# Calculate statistics
level_average = sum(level_values) / len(level_values)
level_max = max(level_values)
level_min = min(level_values)
# Filter nodes greater than average
filtered_values = [val for val in level_values if val >
level_average]
# Store level statistics
level_stats.append({
"level": level,
"direction": "left_to_right" if level % 2 == 0 else
"right_to_left",
"all_values": level_values,
"average": round(level_average, 2),
"min": level_min,
"max": level_max,
"filtered_count": len(filtered_values),
"total_count": len(level_values)
})
if filtered_values:
# Apply direction based on level
if level % 2 == 0:
result.append(filtered_values)
else:
result.append(filtered_values[::-1])
level += 1
return {
"result": result,
"level_stats": level_stats
}
# Optimized version for large trees
def zigzagLevelOrderOptimized(root: Optional[TreeNode]) -> List[List[int]]:
if not root:
return []
result = []
current_level = [root]
level = 0
while current_level:
next_level = []
level_values = []
# Process current level
for node in current_level:
level_values.append(node.val)
if node.left:
next_level.append(node.left)
if node.right:
next_level.append(node.right)
# Calculate average and filter
if level_values:
level_average = sum(level_values) / len(level_values)
filtered_values = [val for val in level_values if val >
level_average]
if filtered_values:
if level % 2 == 1: # Odd level: reverse
filtered_values.reverse()
result.append(filtered_values)
current_level = next_level
level += 1
return result
# Test cases
def test_zigzag_traversal():
# Create test tree:
# 3
# / \
# 9 20
# / \
# 15 7
root = TreeNode(3)
root.left = TreeNode(9)
root.right = TreeNode(20)
root.right.left = TreeNode(15)
root.right.right = TreeNode(7)
print("Testing Zigzag Level Order Traversal with Conditions:")
# Test basic version
result1 = zigzagLevelOrderWithConditions(root)
print(f"Basic result: {result1}")
# Test enhanced version
result2 = zigzagLevelOrderEnhanced(root)
print(f"Enhanced result: {result2['result']}")
print("Level statistics:")
for stat in result2['level_stats']:
print(f" Level {stat['level']}: {stat}")
# Test optimized version
result3 = zigzagLevelOrderOptimized(root)
print(f"Optimized result: {result3}")
test_zigzag_traversal()
Key Insights:
• Combine BFS traversal with conditional filtering
• Calculate level average before filtering nodes
• Handle direction change based on level parity
• Consider memory optimization for large trees
Problem 5: Graph Shortest Path with Dynamic Weights
Difficulty: Hard | Time Limit: 60 minutes | Company: Google
Problem Statement:
Given a weighted directed graph where edge weights can change over time, find the
shortest path from source to destination. The graph has time-dependent weights that follow
a pattern:
• Weight at time t = base_weight + amplitude * sin(2π * t / period)
• You can wait at any node (cost = 1 per time unit)
• Find the minimum cost path considering both travel time and waiting time
Example:
Plain Text
Input:
edges = [
{"from": 0, "to": 1, "base_weight": 10, "amplitude": 5, "period": 12},
{"from": 0, "to": 2, "base_weight": 15, "amplitude": 3, "period": 8},
{"from": 1, "to": 3, "base_weight": 8, "amplitude": 2, "period": 6},
{"from": 2, "to": 3, "base_weight": 12, "amplitude": 4, "period": 10}
]
source = 0, destination = 3, start_time = 0
Output: {"path": [0, 1, 3], "total_cost": 16.5, "arrival_time": 18}
Solution Approach:
This problem requires a modified Dijkstra's algorithm that considers time-dependent
weights and waiting strategies.
Python
import heapq
import math
from typing import List, Dict, Tuple, Optional
class TimeEdge:
def __init__(self, to: int, base_weight: float, amplitude: float, period:
float):
self.to = to
self.base_weight = base_weight
self.amplitude = amplitude
self.period = period
def get_weight_at_time(self, time: float) -> float:
"""Calculate edge weight at given time"""
if self.period == 0:
return self.base_weight
# Weight = base + amplitude * sin(2π * t / period)
weight = self.base_weight + self.amplitude * math.sin(2 * math.pi *
time / self.period)
return max(0.1, weight) # Ensure positive weight
def find_optimal_departure_time(self, arrival_time: float, max_wait:
float = 24) -> Tuple[float, float]:
"""Find optimal time to depart to minimize total cost"""
best_departure = arrival_time
best_cost = float('inf')
# Sample departure times within waiting period
for wait_time in range(int(max_wait) + 1):
departure_time = arrival_time + wait_time
travel_cost = self.get_weight_at_time(departure_time)
total_cost = wait_time + travel_cost
if total_cost < best_cost:
best_cost = total_cost
best_departure = departure_time
return best_departure, best_cost
class DynamicWeightGraph:
def __init__(self, num_nodes: int):
self.num_nodes = num_nodes
self.graph = [[] for _ in range(num_nodes)]
def add_edge(self, from_node: int, to_node: int, base_weight: float,
amplitude: float, period: float):
"""Add time-dependent edge to graph"""
edge = TimeEdge(to_node, base_weight, amplitude, period)
self.graph[from_node].append(edge)
def dijkstra_time_dependent(self, source: int, destination: int,
start_time: float = 0) -> Dict:
"""Modified Dijkstra for time-dependent weights"""
# Priority queue: (total_cost, current_time, node, path)
pq = [(0, start_time, source, [source])]
# Best known cost to reach each node at each time
# Using discretized time for practical implementation
time_resolution = 0.5 # Check every 0.5 time units
max_time = 100 # Maximum time horizon
visited = set()
best_costs = {}
while pq:
current_cost, current_time, node, path = heapq.heappop(pq)
# Create state key (node, discretized_time)
time_key = int(current_time / time_resolution)
state = (node, time_key)
if state in visited:
continue
visited.add(state)
# Check if we reached destination
if node == destination:
return {
"path": path,
"total_cost": current_cost,
"arrival_time": current_time,
"success": True
}
# Explore neighbors
for edge in self.graph[node]:
if current_time > max_time:
continue
# Find optimal departure time
departure_time, travel_cost =
edge.find_optimal_departure_time(current_time)
arrival_time = departure_time +
edge.get_weight_at_time(departure_time)
total_cost = current_cost + travel_cost
new_path = path + [edge.to]
heapq.heappush(pq, (total_cost, arrival_time, edge.to,
new_path))
return {"success": False, "message": "No path found"}
def find_shortest_path_advanced(self, source: int, destination: int,
start_time: float = 0,
max_wait_per_node: float = 10) -> Dict:
"""Advanced pathfinding with sophisticated waiting strategies"""
# State: (cost, time, node, path, wait_history)
pq = [(0, start_time, source, [source], [])]
# Track best arrival time and cost for each node
best_arrival = {}
visited_states = set()
while pq:
current_cost, current_time, node, path, wait_history =
heapq.heappop(pq)
# State representation for cycle detection
state_key = (node, round(current_time, 1))
if state_key in visited_states:
continue
visited_states.add(state_key)
# Update best arrival if better
if node not in best_arrival or current_cost < best_arrival[node]
[0]:
best_arrival[node] = (current_cost, current_time, path)
# Check if destination reached
if node == destination:
return {
"path": path,
"total_cost": round(current_cost, 2),
"arrival_time": round(current_time, 2),
"wait_history": wait_history,
"success": True
}
# Explore all outgoing edges
for edge in self.graph[node]:
# Try different waiting strategies
for wait_time in [0, 1, 2, 3, 5, max_wait_per_node]:
if wait_time > max_wait_per_node:
break
departure_time = current_time + wait_time
travel_weight = edge.get_weight_at_time(departure_time)
arrival_time = departure_time + travel_weight
total_cost = current_cost + wait_time + travel_weight
# Pruning: skip if too expensive
if total_cost > current_cost * 3: # Heuristic pruning
continue
new_path = path + [edge.to]
new_wait_history = wait_history + [(node, wait_time)] if
wait_time > 0 else wait_history
heapq.heappush(pq, (total_cost, arrival_time, edge.to,
new_path, new_wait_history))
return {"success": False, "message": "No path found within
constraints"}
def analyze_weight_patterns(self, from_node: int, to_node: int,
time_range: Tuple[float, float]) -> Dict:
"""Analyze weight patterns for an edge over time"""
start_time, end_time = time_range
# Find the edge
target_edge = None
for edge in self.graph[from_node]:
if edge.to == to_node:
target_edge = edge
break
if not target_edge:
return {"error": "Edge not found"}
# Sample weights over time range
time_points = []
weights = []
for t in range(int(start_time), int(end_time) + 1):
time_points.append(t)
weights.append(target_edge.get_weight_at_time(t))
# Find optimal departure times
min_weight = min(weights)
max_weight = max(weights)
optimal_times = [t for t, w in zip(time_points, weights) if abs(w -
min_weight) < 0.1]
return {
"time_points": time_points,
"weights": weights,
"min_weight": round(min_weight, 2),
"max_weight": round(max_weight, 2),
"optimal_departure_times": optimal_times,
"weight_variance": round(max_weight - min_weight, 2)
}
# Test the dynamic weight graph
def test_dynamic_weight_graph():
# Create graph with 4 nodes
graph = DynamicWeightGraph(4)
# Add time-dependent edges
graph.add_edge(0, 1, 10, 5, 12) # Weight varies between 5-15
graph.add_edge(0, 2, 15, 3, 8) # Weight varies between 12-18
graph.add_edge(1, 3, 8, 2, 6) # Weight varies between 6-10
graph.add_edge(2, 3, 12, 4, 10) # Weight varies between 8-16
print("Testing Dynamic Weight Graph:")
# Test basic pathfinding
result1 = graph.dijkstra_time_dependent(0, 3, 0)
print(f"Basic pathfinding: {result1}")
# Test advanced pathfinding
result2 = graph.find_shortest_path_advanced(0, 3, 0)
print(f"Advanced pathfinding: {result2}")
# Analyze weight patterns
analysis = graph.analyze_weight_patterns(0, 1, (0, 24))
print(f"Weight pattern analysis for edge 0->1:")
print(f" Min weight: {analysis['min_weight']}")
print(f" Max weight: {analysis['max_weight']}")
print(f" Optimal times: {analysis['optimal_departure_times'][:5]}...")
test_dynamic_weight_graph()
Key Insights:
• Model time-dependent weights using sinusoidal functions
• Use modified Dijkstra with time-state tracking
• Consider waiting strategies to optimize total cost
• Implement pruning to handle large search spaces
2.2 Meta (Facebook) OA Problems {#meta-oa}
Problem 6: Social Network Friend Recommendations
Difficulty: Medium | Time Limit: 45 minutes | Company: Meta
Problem Statement:
Given a social network represented as an undirected graph, implement a friend
recommendation system. For a given user, recommend friends based on mutual
connections. Return the top k users with the most mutual friends, excluding users who are
already friends.
Example:
Plain Text
Input:
graph = {
'A': ['B', 'C'],
'B': ['A', 'D', 'E'],
'C': ['A', 'F'],
'D': ['B', 'E'],
'E': ['B', 'D', 'F'],
'F': ['C', 'E']
}
user = 'A', k = 2
Output: ['E', 'F']
Explanation:
- E has 1 mutual friend with A (B)
- F has 1 mutual friend with A (C)
- D has 1 mutual friend with A (B)
Return top 2: E and F (or D, depending on tie-breaking)
Solution Approach:
This problem requires graph traversal and counting mutual connections. We need to find
users who are not direct friends but have common friends.
Python
from collections import defaultdict, Counter
def recommendFriends(graph, user, k):
if user not in graph:
return []
# Get current friends of the user
current_friends = set(graph[user])
current_friends.add(user) # Include user themselves
# Count mutual friends for each potential recommendation
mutual_count = Counter()
# For each friend of the user
for friend in graph[user]:
# For each friend of the friend
for friend_of_friend in graph.get(friend, []):
# If not already a friend and not the user themselves
if friend_of_friend not in current_friends:
mutual_count[friend_of_friend] += 1
# Sort by mutual friend count (descending) and then by name for tie-
breaking
recommendations = sorted(mutual_count.items(),
key=lambda x: (-x[1], x[0]))
# Return top k recommendations
return [user for user, count in recommendations[:k]]
def recommendFriendsAdvanced(graph, user, k, min_mutual=1):
"""
Advanced version with minimum mutual friends threshold
"""
if user not in graph:
return []
current_friends = set(graph[user])
current_friends.add(user)
mutual_count = Counter()
mutual_friends_detail = defaultdict(set)
for friend in graph[user]:
for friend_of_friend in graph.get(friend, []):
if friend_of_friend not in current_friends:
mutual_count[friend_of_friend] += 1
mutual_friends_detail[friend_of_friend].add(friend)
# Filter by minimum mutual friends
filtered_recommendations = [
(user, count) for user, count in mutual_count.items()
if count >= min_mutual
]
# Sort by mutual friend count and name
filtered_recommendations.sort(key=lambda x: (-x[1], x[0]))
return [user for user, count in filtered_recommendations[:k]]
# Test cases
def test_recommendFriends():
graph = {
'A': ['B', 'C'],
'B': ['A', 'D', 'E'],
'C': ['A', 'F'],
'D': ['B', 'E'],
'E': ['B', 'D', 'F'],
'F': ['C', 'E']
}
result1 = recommendFriends(graph, 'A', 2)
print(f"Recommendations for A: {result1}")
result2 = recommendFriends(graph, 'B', 3)
print(f"Recommendations for B: {result2}")
test_recommendFriends()
Key Insights:
• Use graph traversal to find friends of friends
• Counter efficiently tracks mutual friend counts
• Handle edge cases: user not in graph, no recommendations
• Consider tie-breaking strategies for equal mutual friend counts
Problem 7: Message Delivery Optimization
Difficulty: Medium | Time Limit: 45 minutes | Company: Meta
Problem Statement:
Design a message delivery system that optimizes for minimum latency. Given a network of
servers and their connection latencies, find the optimal path to deliver a message from
source to destination with minimum total latency.
Example:
Plain Text
Input:
servers = ['A', 'B', 'C', 'D']
connections = [
('A', 'B', 4),
('A', 'C', 2),
('B', 'C', 1),
('B', 'D', 5),
('C', 'D', 8)
]
source = 'A', destination = 'D'
Output: (['A', 'C', 'B', 'D'], 8)
Explanation: A->C(2) + C->B(1) + B->D(5) = 8 total latency
Solution Approach:
This is a shortest path problem that can be solved using Dijkstra's algorithm. We need to
find the path with minimum total latency.
Python
import heapq
from collections import defaultdict
def findOptimalPath(servers, connections, source, destination):
# Build adjacency list
graph = defaultdict(list)
for server1, server2, latency in connections:
graph[server1].append((server2, latency))
graph[server2].append((server1, latency)) # Undirected graph
# Dijkstra's algorithm
distances = {server: float('inf') for server in servers}
distances[source] = 0
previous = {}
# Priority queue: (distance, node)
pq = [(0, source)]
visited = set()
while pq:
current_dist, current_node = heapq.heappop(pq)
if current_node in visited:
continue
visited.add(current_node)
if current_node == destination:
break
for neighbor, latency in graph[current_node]:
if neighbor not in visited:
new_dist = current_dist + latency
if new_dist < distances[neighbor]:
distances[neighbor] = new_dist
previous[neighbor] = current_node
heapq.heappush(pq, (new_dist, neighbor))
# Reconstruct path
if destination not in previous and destination != source:
return None, float('inf') # No path found
path = []
current = destination
while current is not None:
path.append(current)
current = previous.get(current)
path.reverse()
return path, distances[destination]
def findKShortestPaths(servers, connections, source, destination, k):
"""
Find k shortest paths for redundancy
"""
graph = defaultdict(list)
for server1, server2, latency in connections:
graph[server1].append((server2, latency))
graph[server2].append((server1, latency))
# Modified Dijkstra to find k shortest paths
# Priority queue: (distance, path)
pq = [(0, [source])]
paths_found = []
visited_paths = set()
while pq and len(paths_found) < k:
current_dist, path = heapq.heappop(pq)
current_node = path[-1]
# Convert path to tuple for hashing
path_tuple = tuple(path)
if path_tuple in visited_paths:
continue
visited_paths.add(path_tuple)
if current_node == destination:
paths_found.append((path, current_dist))
continue
for neighbor, latency in graph[current_node]:
if neighbor not in path: # Avoid cycles
new_path = path + [neighbor]
new_dist = current_dist + latency
heapq.heappush(pq, (new_dist, new_path))
return paths_found
# Test cases
def test_messageDelivery():
servers = ['A', 'B', 'C', 'D']
connections = [
('A', 'B', 4),
('A', 'C', 2),
('B', 'C', 1),
('B', 'D', 5),
('C', 'D', 8)
]
path, latency = findOptimalPath(servers, connections, 'A', 'D')
print(f"Optimal path: {path}, Total latency: {latency}")
k_paths = findKShortestPaths(servers, connections, 'A', 'D', 3)
print(f"Top 3 paths:")
for i, (path, latency) in enumerate(k_paths, 1):
print(f" {i}. {path} (latency: {latency})")
test_messageDelivery()
Key Insights:
• Dijkstra's algorithm for single-source shortest path
• Use priority queue for efficient minimum extraction
• Reconstruct path using previous node tracking
• Consider k-shortest paths for redundancy and load balancing
Problem 8: Content Moderation Algorithm
Difficulty: Hard | Time Limit: 60 minutes | Company: Meta
Problem Statement:
Implement a content moderation system that detects inappropriate content based on
multiple criteria:
1. Banned words/phrases
2. Sentiment analysis score
3. Spam detection (repeated patterns)
4. User reputation score
Return a moderation decision and confidence score.
Example:
Plain Text
Input:
content = "This product is amazing! Buy now with discount code SAVE50"
banned_words = ["spam", "scam", "fake"]
user_reputation = 0.7
Output: {
"decision": "APPROVED",
"confidence": 0.85,
"flags": ["promotional_content"],
"sentiment_score": 0.8
}
Solution Approach:
This problem requires implementing multiple detection algorithms and combining their
results into a final decision.
Python
import re
from collections import Counter
from typing import Dict, List, Tuple
class ContentModerator:
def __init__(self, banned_words: List[str], spam_threshold: float = 0.7):
self.banned_words = set(word.lower() for word in banned_words)
self.spam_threshold = spam_threshold
# Simple sentiment words for basic sentiment analysis
self.positive_words = {
'amazing', 'great', 'excellent', 'wonderful', 'fantastic',
'good', 'best', 'love', 'perfect', 'awesome'
}
self.negative_words = {
'terrible', 'awful', 'bad', 'worst', 'hate',
'horrible', 'disgusting', 'stupid', 'useless'
}
def detect_banned_words(self, content: str) -> Tuple[bool, List[str]]:
"""Detect banned words in content"""
words = re.findall(r'\b\w+\b', content.lower())
found_banned = [word for word in words if word in self.banned_words]
return len(found_banned) > 0, found_banned
def analyze_sentiment(self, content: str) -> float:
"""Simple sentiment analysis returning score between -1 and 1"""
words = re.findall(r'\b\w+\b', content.lower())
positive_count = sum(1 for word in words if word in
self.positive_words)
negative_count = sum(1 for word in words if word in
self.negative_words)
total_sentiment_words = positive_count + negative_count
if total_sentiment_words == 0:
return 0.0 # Neutral
sentiment_score = (positive_count - negative_count) /
total_sentiment_words
return sentiment_score
def detect_spam(self, content: str) -> Tuple[bool, float]:
"""Detect spam based on repetitive patterns"""
# Check for excessive capitalization
caps_ratio = sum(1 for c in content if c.isupper()) /
max(len(content), 1)
# Check for repeated characters
repeated_chars = len(re.findall(r'(.)\1{2,}', content))
# Check for excessive punctuation
punct_ratio = sum(1 for c in content if c in '!?.,;') /
max(len(content), 1)
# Check for promotional keywords
promo_keywords = ['buy', 'discount', 'sale', 'offer', 'deal', 'free',
'win']
promo_count = sum(1 for word in re.findall(r'\b\w+\b',
content.lower())
if word in promo_keywords)
# Calculate spam score
spam_score = (
caps_ratio * 0.3 +
(repeated_chars / max(len(content), 1)) * 0.2 +
punct_ratio * 0.2 +
(promo_count / max(len(content.split()), 1)) * 0.3
)
return spam_score > self.spam_threshold, spam_score
def calculate_confidence(self, factors: Dict) -> float:
"""Calculate overall confidence based on multiple factors"""
weights = {
'banned_words': 0.4,
'sentiment': 0.2,
'spam': 0.3,
'user_reputation': 0.1
}
confidence = 0.0
# Banned words factor
if factors['banned_words_found']:
confidence += weights['banned_words'] * 0.9 # High confidence if
banned words found
else:
confidence += weights['banned_words'] * 0.1
# Sentiment factor
sentiment_confidence = abs(factors['sentiment_score']) * 0.8 + 0.2
confidence += weights['sentiment'] * sentiment_confidence
# Spam factor
spam_confidence = factors['spam_score'] * 0.8 + 0.2
confidence += weights['spam'] * spam_confidence
# User reputation factor
reputation_confidence = factors['user_reputation']
confidence += weights['user_reputation'] * reputation_confidence
return min(confidence, 1.0)
def moderate_content(self, content: str, user_reputation: float = 0.5) ->
Dict:
"""Main moderation function"""
# Run all detection algorithms
banned_found, banned_list = self.detect_banned_words(content)
sentiment_score = self.analyze_sentiment(content)
is_spam, spam_score = self.detect_spam(content)
# Collect factors
factors = {
'banned_words_found': banned_found,
'banned_words': banned_list,
'sentiment_score': sentiment_score,
'spam_score': spam_score,
'user_reputation': user_reputation
}
# Calculate confidence
confidence = self.calculate_confidence(factors)
# Make decision
flags = []
if banned_found:
flags.append('banned_words')
if is_spam:
flags.append('spam_detected')
if sentiment_score < -0.5:
flags.append('negative_sentiment')
if spam_score > 0.3: # Lower threshold for flagging
flags.append('promotional_content')
# Final decision logic
if banned_found or (is_spam and user_reputation < 0.3):
decision = "REJECTED"
elif sentiment_score < -0.7 or (is_spam and user_reputation < 0.5):
decision = "REVIEW_REQUIRED"
else:
decision = "APPROVED"
return {
"decision": decision,
"confidence": round(confidence, 2),
"flags": flags,
"sentiment_score": round(sentiment_score, 2),
"spam_score": round(spam_score, 2),
"details": {
"banned_words_found": banned_list,
"user_reputation": user_reputation
}
}
# Test cases
def test_contentModeration():
banned_words = ["spam", "scam", "fake", "virus"]
moderator = ContentModerator(banned_words)
# Test case 1: Clean content
content1 = "This is a great product! I really love it."
result1 = moderator.moderate_content(content1, user_reputation=0.8)
print(f"Test 1 - Clean content:")
print(f"Content: {content1}")
print(f"Result: {result1}\n")
# Test case 2: Spam content
content2 = "BUY NOW!!! AMAZING DEAL!!! FREE SHIPPING!!!"
result2 = moderator.moderate_content(content2, user_reputation=0.3)
print(f"Test 2 - Spam content:")
print(f"Content: {content2}")
print(f"Result: {result2}\n")
# Test case 3: Banned words
content3 = "This is a scam! Don't trust this fake product."
result3 = moderator.moderate_content(content3, user_reputation=0.6)
print(f"Test 3 - Banned words:")
print(f"Content: {content3}")
print(f"Result: {result3}\n")
test_contentModeration()
Key Insights:
• Combine multiple detection algorithms for robust moderation
• Use weighted scoring for final decision making
• Consider user reputation in decision logic
• Provide detailed feedback for transparency and debugging
Problem 9: News Feed Ranking System
Difficulty: Hard | Time Limit: 60 minutes | Company: Meta
Problem Statement:
Design a news feed ranking algorithm for Facebook that personalizes content for users. The
algorithm should consider multiple factors: user engagement history, content freshness,
social connections, content type preferences, and trending topics. Implement a scoring
system that ranks posts and handles real-time updates efficiently.
Example:
Plain Text
Input:
user_profile = {
"user_id": "user123",
"interests": ["technology", "sports", "travel"],
"engagement_history": {
"technology": 0.8, "sports": 0.6, "travel": 0.7,
"photos": 0.9, "videos": 0.7, "links": 0.5
},
"connections": ["friend1", "friend2", "friend3"]
}
posts = [
{
"post_id": "p1", "author": "friend1", "content_type": "photo",
"topic": "travel", "timestamp": 1640995200, "likes": 50, "comments": 10
},
{
"post_id": "p2", "author": "page_tech", "content_type": "link",
"topic": "technology", "timestamp": 1640995800, "likes": 200, "comments":
30
}
]
Output: [
{"post_id": "p1", "score": 0.85, "rank": 1},
{"post_id": "p2", "score": 0.72, "rank": 2}
]
Solution Approach:
This problem requires implementing a multi-factor ranking algorithm with real-time
capabilities.
Python
import time
import math
from typing import List, Dict, Tuple
from collections import defaultdict
from dataclasses import dataclass
@dataclass
class Post:
post_id: str
author: str
content_type: str
topic: str
timestamp: float
likes: int
comments: int
shares: int = 0
content_length: int = 100
def get_engagement_score(self) -> float:
"""Calculate engagement score based on interactions"""
# Weighted engagement: comments > shares > likes
return (self.comments * 3 + self.shares * 2 + self.likes) / 100
@dataclass
class UserProfile:
user_id: str
interests: List[str]
engagement_history: Dict[str, float]
connections: List[str]
activity_level: float = 0.5 # 0.0 to 1.0
class NewsFeedRanker:
def __init__(self):
self.weights = {
'relevance': 0.3,
'freshness': 0.2,
'social_connection': 0.2,
'engagement': 0.15,
'content_quality': 0.1,
'diversity': 0.05
}
# Trending topics cache
self.trending_topics = {}
self.topic_decay_rate = 0.1 # How fast trending topics decay
def calculate_relevance_score(self, user: UserProfile, post: Post) ->
float:
"""Calculate how relevant the post is to the user"""
# Topic relevance
topic_score = user.engagement_history.get(post.topic, 0.1)
# Content type preference
content_type_score = user.engagement_history.get(post.content_type,
0.3)
# Interest matching
interest_bonus = 0.2 if post.topic in user.interests else 0.0
# Trending topic bonus
trending_bonus = self.trending_topics.get(post.topic, 0.0) * 0.1
return min(1.0, topic_score * 0.6 + content_type_score * 0.3 +
interest_bonus + trending_bonus)
def calculate_freshness_score(self, post: Post, current_time: float) ->
float:
"""Calculate freshness score based on post age"""
age_hours = (current_time - post.timestamp) / 3600
# Exponential decay with different rates for different content types
decay_rates = {
'breaking_news': 0.5,
'photo': 0.1,
'video': 0.08,
'link': 0.15,
'text': 0.12
}
decay_rate = decay_rates.get(post.content_type, 0.1)
freshness = math.exp(-decay_rate * age_hours)
return max(0.01, freshness) # Minimum freshness score
def calculate_social_connection_score(self, user: UserProfile, post:
Post) -> float:
"""Calculate score based on social connections"""
if post.author in user.connections:
# Direct friend
return 0.8
elif post.author.startswith('page_'):
# Page/brand content
return 0.3
else:
# Unknown author
return 0.1
def calculate_engagement_score(self, post: Post) -> float:
"""Calculate normalized engagement score"""
engagement = post.get_engagement_score()
# Apply logarithmic scaling to prevent viral posts from dominating
normalized_engagement = math.log(1 + engagement) / math.log(1 + 1000)
# Max expected engagement
return min(1.0, normalized_engagement)
def calculate_content_quality_score(self, post: Post) -> float:
"""Estimate content quality based on various signals"""
# Length-based quality (not too short, not too long)
length_score = 1.0
if post.content_length < 10:
length_score = 0.3 # Very short content
elif post.content_length > 1000:
length_score = 0.7 # Very long content
# Engagement rate quality
if post.likes + post.comments > 0:
engagement_rate = post.comments / (post.likes + post.comments)
engagement_quality = min(1.0, engagement_rate * 5) # Comments
indicate quality
else:
engagement_quality = 0.5
return (length_score + engagement_quality) / 2
def calculate_diversity_penalty(self, ranked_posts: List[Tuple[str,
float]],
current_post: Post) -> float:
"""Calculate diversity penalty to avoid showing too similar
content"""
if len(ranked_posts) < 3:
return 0.0 # No penalty for first few posts
# Check recent posts for similar topics/authors
recent_topics = []
recent_authors = []
for post_id, _ in ranked_posts[-5:]: # Check last 5 posts
# In real implementation, you'd look up post details
# For now, assume we have this information
pass
# Simplified diversity penalty
topic_penalty = 0.1 if len(recent_topics) > 2 else 0.0
author_penalty = 0.15 if current_post.author in recent_authors else
0.0
return topic_penalty + author_penalty
def rank_posts(self, user: UserProfile, posts: List[Post],
current_time: float = None) -> List[Dict]:
"""Main ranking function"""
if current_time is None:
current_time = time.time()
scored_posts = []
ranked_posts = [] # For diversity calculation
for post in posts:
# Calculate individual scores
relevance = self.calculate_relevance_score(user, post)
freshness = self.calculate_freshness_score(post, current_time)
social = self.calculate_social_connection_score(user, post)
engagement = self.calculate_engagement_score(post)
quality = self.calculate_content_quality_score(post)
diversity_penalty =
self.calculate_diversity_penalty(ranked_posts, post)
# Calculate weighted final score
final_score = (
relevance * self.weights['relevance'] +
freshness * self.weights['freshness'] +
social * self.weights['social_connection'] +
engagement * self.weights['engagement'] +
quality * self.weights['content_quality'] -
diversity_penalty * self.weights['diversity']
)
scored_posts.append({
'post_id': post.post_id,
'score': round(final_score, 3),
'components': {
'relevance': round(relevance, 3),
'freshness': round(freshness, 3),
'social': round(social, 3),
'engagement': round(engagement, 3),
'quality': round(quality, 3),
'diversity_penalty': round(diversity_penalty, 3)
}
})
ranked_posts.append((post.post_id, final_score))
# Sort by score (descending)
scored_posts.sort(key=lambda x: x['score'], reverse=True)
# Add rank
for i, post_data in enumerate(scored_posts):
post_data['rank'] = i + 1
return scored_posts
def update_trending_topics(self, recent_posts: List[Post], time_window:
float = 3600):
"""Update trending topics based on recent activity"""
current_time = time.time()
topic_engagement = defaultdict(float)
# Calculate engagement for each topic in recent time window
for post in recent_posts:
if current_time - post.timestamp <= time_window:
engagement = post.get_engagement_score()
topic_engagement[post.topic] += engagement
# Normalize and update trending topics
if topic_engagement:
max_engagement = max(topic_engagement.values())
for topic, engagement in topic_engagement.items():
self.trending_topics[topic] = engagement / max_engagement
# Decay existing trending topics
for topic in list(self.trending_topics.keys()):
self.trending_topics[topic] *= (1 - self.topic_decay_rate)
if self.trending_topics[topic] < 0.01:
del self.trending_topics[topic]
def personalize_weights(self, user: UserProfile) -> None:
"""Adjust ranking weights based on user behavior"""
# More active users prefer fresher content
if user.activity_level > 0.7:
self.weights['freshness'] += 0.05
self.weights['relevance'] -= 0.05
# Users with many connections prefer social content
if len(user.connections) > 100:
self.weights['social_connection'] += 0.05
self.weights['engagement'] -= 0.05
def batch_rank_for_multiple_users(self, users: List[UserProfile],
posts: List[Post]) -> Dict[str,
List[Dict]]:
"""Efficiently rank posts for multiple users"""
results = {}
# Pre-calculate post-independent scores
current_time = time.time()
post_freshness = {post.post_id: self.calculate_freshness_score(post,
current_time)
for post in posts}
post_engagement = {post.post_id:
self.calculate_engagement_score(post)
for post in posts}
post_quality = {post.post_id:
self.calculate_content_quality_score(post)
for post in posts}
# Rank for each user
for user in users:
user_rankings = []
for post in posts:
relevance = self.calculate_relevance_score(user, post)
social = self.calculate_social_connection_score(user, post)
final_score = (
relevance * self.weights['relevance'] +
post_freshness[post.post_id] * self.weights['freshness']
+
social * self.weights['social_connection'] +
post_engagement[post.post_id] *
self.weights['engagement'] +
post_quality[post.post_id] *
self.weights['content_quality']
)
user_rankings.append({
'post_id': post.post_id,
'score': round(final_score, 3)
})
# Sort and add ranks
user_rankings.sort(key=lambda x: x['score'], reverse=True)
for i, ranking in enumerate(user_rankings):
ranking['rank'] = i + 1
results[user.user_id] = user_rankings
return results
# Test the news feed ranker
def test_news_feed_ranker():
ranker = NewsFeedRanker()
# Create test user
user = UserProfile(
user_id="user123",
interests=["technology", "sports", "travel"],
engagement_history={
"technology": 0.8, "sports": 0.6, "travel": 0.7,
"photo": 0.9, "video": 0.7, "link": 0.5
},
connections=["friend1", "friend2", "friend3"]
)
# Create test posts
current_time = time.time()
posts = [
Post("p1", "friend1", "photo", "travel", current_time - 1800, 50,
10),
Post("p2", "page_tech", "link", "technology", current_time - 3600,
200, 30),
Post("p3", "friend2", "video", "sports", current_time - 7200, 80,
15),
Post("p4", "unknown_user", "text", "politics", current_time - 900,
300, 5)
]
print("Testing News Feed Ranker:")
print(f"User interests: {user.interests}")
print(f"Number of posts: {len(posts)}")
# Rank posts
rankings = ranker.rank_posts(user, posts, current_time)
print("\nRanked Posts:")
for ranking in rankings:
print(f"Rank {ranking['rank']}: {ranking['post_id']} (score:
{ranking['score']})")
print(f" Components: {ranking['components']}")
# Test trending topics update
ranker.update_trending_topics(posts)
print(f"\nTrending topics: {ranker.trending_topics}")
test_news_feed_ranker()
Key Insights:
• Multi-factor scoring with configurable weights
• Time-decay for freshness and trending topics
• Social connection strength affects ranking
• Diversity penalty prevents content clustering
• Batch processing for efficiency
Problem 10: User Activity Pattern Analysis
Difficulty: Medium | Time Limit: 45 minutes | Company: Meta
Problem Statement:
Analyze user activity patterns on Facebook to detect anomalies, predict user engagement,
and identify optimal posting times. Given user activity logs, implement algorithms to:
1. Detect unusual activity patterns (potential security issues)
2. Predict user's next active time
3. Recommend optimal posting times for pages
4. Identify user engagement trends
Example:
Plain Text
Input:
activity_logs = [
{"user_id": "u1", "timestamp": 1640995200, "action": "login", "device":
"mobile"},
{"user_id": "u1", "timestamp": 1640995260, "action": "post", "device":
"mobile"},
{"user_id": "u1", "timestamp": 1640995320, "action": "like", "device":
"mobile"},
{"user_id": "u1", "timestamp": 1640998800, "action": "logout", "device":
"mobile"}
]
Output: {
"anomaly_score": 0.15,
"predicted_next_active": 1641081600,
"optimal_posting_times": [9, 12, 18, 21],
"engagement_trend": "increasing"
}
Solution Approach:
This problem requires time series analysis and pattern recognition algorithms.
Python
import numpy as np
import math
from datetime import datetime, timedelta
from typing import List, Dict, Tuple, Optional
from collections import defaultdict, Counter
from dataclasses import dataclass
@dataclass
class ActivityLog:
user_id: str
timestamp: float
action: str
device: str
location: Optional[str] = None
session_id: Optional[str] = None
class UserActivityAnalyzer:
def __init__(self):
self.action_weights = {
'login': 1.0,
'post': 3.0,
'comment': 2.5,
'like': 1.5,
'share': 2.0,
'message': 2.0,
'logout': 0.5
}
# Anomaly detection thresholds
self.anomaly_thresholds = {
'unusual_time': 2.0, # Standard deviations from normal
'unusual_frequency': 3.0, # Activity frequency anomaly
'unusual_location': 0.8, # Location change threshold
'unusual_device': 0.7 # Device pattern change
}
def extract_time_features(self, timestamp: float) -> Dict:
"""Extract time-based features from timestamp"""
dt = datetime.fromtimestamp(timestamp)
return {
'hour': dt.hour,
'day_of_week': dt.weekday(), # 0=Monday, 6=Sunday
'day_of_month': dt.day,
'month': dt.month,
'is_weekend': dt.weekday() >= 5,
'is_business_hours': 9 <= dt.hour <= 17,
'time_of_day': self._categorize_time_of_day(dt.hour)
}
def _categorize_time_of_day(self, hour: int) -> str:
"""Categorize hour into time periods"""
if 6 <= hour < 12:
return 'morning'
elif 12 <= hour < 18:
return 'afternoon'
elif 18 <= hour < 22:
return 'evening'
else:
return 'night'
def build_user_profile(self, activities: List[ActivityLog]) -> Dict:
"""Build comprehensive user activity profile"""
if not activities:
return {}
# Sort activities by timestamp
activities.sort(key=lambda x: x.timestamp)
# Extract patterns
hourly_activity = defaultdict(int)
daily_activity = defaultdict(int)
action_frequency = defaultdict(int)
device_usage = defaultdict(int)
location_usage = defaultdict(int)
session_durations = []
activity_intervals = []
# Process activities
for i, activity in enumerate(activities):
time_features = self.extract_time_features(activity.timestamp)
hourly_activity[time_features['hour']] += 1
daily_activity[time_features['day_of_week']] += 1
action_frequency[activity.action] += 1
device_usage[activity.device] += 1
if activity.location:
location_usage[activity.location] += 1
# Calculate intervals between activities
if i > 0:
interval = activity.timestamp - activities[i-1].timestamp
activity_intervals.append(interval)
# Calculate session durations (login to logout)
current_session_start = None
for activity in activities:
if activity.action == 'login':
current_session_start = activity.timestamp
elif activity.action == 'logout' and current_session_start:
session_duration = activity.timestamp - current_session_start
session_durations.append(session_duration)
current_session_start = None
# Calculate statistics
profile = {
'total_activities': len(activities),
'time_span_days': (activities[-1].timestamp -
activities[0].timestamp) / 86400,
'avg_daily_activities': len(activities) / max(1,
(activities[-1].timestamp - activities[0].timestamp) / 86400),
'most_active_hour': max(hourly_activity, key=hourly_activity.get)
if hourly_activity else 12,
'most_active_day': max(daily_activity, key=daily_activity.get) if
daily_activity else 0,
'primary_device': max(device_usage, key=device_usage.get) if
device_usage else 'unknown',
'avg_session_duration': np.mean(session_durations) if
session_durations else 0,
'avg_activity_interval': np.mean(activity_intervals) if
activity_intervals else 0,
'hourly_distribution': dict(hourly_activity),
'daily_distribution': dict(daily_activity),
'action_distribution': dict(action_frequency),
'device_distribution': dict(device_usage),
'location_distribution': dict(location_usage)
}
return profile
def detect_anomalies(self, activities: List[ActivityLog],
user_profile: Dict) -> Dict:
"""Detect anomalous activity patterns"""
anomalies = []
anomaly_scores = []
if not activities or not user_profile:
return {'anomalies': [], 'overall_score': 0.0}
# Get recent activities (last 24 hours)
current_time = max(activity.timestamp for activity in activities)
recent_activities = [
activity for activity in activities
if current_time - activity.timestamp <= 86400
]
if not recent_activities:
return {'anomalies': [], 'overall_score': 0.0}
# 1. Unusual time patterns
recent_hours = [self.extract_time_features(a.timestamp)['hour']
for a in recent_activities]
if user_profile.get('hourly_distribution'):
expected_hours = list(user_profile['hourly_distribution'].keys())
unusual_hours = [h for h in recent_hours if h not in
expected_hours]
if unusual_hours:
anomaly_score = len(unusual_hours) / len(recent_hours)
anomalies.append({
'type': 'unusual_time',
'description': f'Activity at unusual hours:
{unusual_hours}',
'score': anomaly_score
})
anomaly_scores.append(anomaly_score)
# 2. Unusual frequency
recent_activity_count = len(recent_activities)
expected_daily_activities = user_profile.get('avg_daily_activities',
10)
frequency_ratio = recent_activity_count / max(1,
expected_daily_activities)
if frequency_ratio > 3.0 or frequency_ratio < 0.3:
anomaly_score = min(1.0, abs(math.log(frequency_ratio)) / 2)
anomalies.append({
'type': 'unusual_frequency',
'description': f'Activity frequency {frequency_ratio:.1f}x
normal',
'score': anomaly_score
})
anomaly_scores.append(anomaly_score)
# 3. Unusual device usage
recent_devices = [a.device for a in recent_activities]
primary_device = user_profile.get('primary_device', 'mobile')
if recent_devices and primary_device not in recent_devices:
device_anomaly_score = 0.5
anomalies.append({
'type': 'unusual_device',
'description': f'No activity from primary device:
{primary_device}',
'score': device_anomaly_score
})
anomaly_scores.append(device_anomaly_score)
# 4. Unusual action patterns
recent_actions = [a.action for a in recent_activities]
action_distribution = Counter(recent_actions)
expected_actions = set(user_profile.get('action_distribution',
{}).keys())
unusual_actions = [action for action in action_distribution
if action not in expected_actions]
if unusual_actions:
action_anomaly_score = len(unusual_actions) /
len(set(recent_actions))
anomalies.append({
'type': 'unusual_actions',
'description': f'Unusual actions: {unusual_actions}',
'score': action_anomaly_score
})
anomaly_scores.append(action_anomaly_score)
# Calculate overall anomaly score
overall_score = np.mean(anomaly_scores) if anomaly_scores else 0.0
return {
'anomalies': anomalies,
'overall_score': round(overall_score, 3),
'risk_level': self._categorize_risk_level(overall_score)
}
def _categorize_risk_level(self, score: float) -> str:
"""Categorize anomaly score into risk levels"""
if score < 0.2:
return 'low'
elif score < 0.5:
return 'medium'
elif score < 0.8:
return 'high'
else:
return 'critical'
def predict_next_active_time(self, activities: List[ActivityLog],
user_profile: Dict) -> Dict:
"""Predict when user will be active next"""
if not activities or not user_profile:
return {'predicted_timestamp': None, 'confidence': 0.0}
# Get user's typical activity patterns
hourly_dist = user_profile.get('hourly_distribution', {})
daily_dist = user_profile.get('daily_distribution', {})
if not hourly_dist:
return {'predicted_timestamp': None, 'confidence': 0.0}
# Find most likely next active hour
current_time = max(activity.timestamp for activity in activities)
current_dt = datetime.fromtimestamp(current_time)
# Look for next peak activity time
peak_hours = sorted(hourly_dist.items(), key=lambda x: x[1],
reverse=True)[:3]
# Find next occurrence of peak hour
next_active_times = []
for hour, frequency in peak_hours:
# Calculate next occurrence of this hour
next_dt = current_dt.replace(hour=hour, minute=0, second=0,
microsecond=0)
# If hour has passed today, move to tomorrow
if next_dt <= current_dt:
next_dt += timedelta(days=1)
# Weight by frequency and recency
time_diff_hours = (next_dt.timestamp() - current_time) / 3600
weight = frequency / (1 + time_diff_hours * 0.1) # Prefer sooner
times
next_active_times.append((next_dt.timestamp(), weight))
if next_active_times:
# Choose time with highest weight
predicted_time, confidence = max(next_active_times, key=lambda x:
x[1])
normalized_confidence = min(1.0, confidence /
max(hourly_dist.values()))
return {
'predicted_timestamp': predicted_time,
'predicted_datetime':
datetime.fromtimestamp(predicted_time).isoformat(),
'confidence': round(normalized_confidence, 3)
}
return {'predicted_timestamp': None, 'confidence': 0.0}
def recommend_optimal_posting_times(self, user_profiles: List[Dict],
target_audience_size: int = 1000) ->
Dict:
"""Recommend optimal posting times based on audience activity"""
if not user_profiles:
return {'optimal_hours': [], 'audience_coverage': {}}
# Aggregate hourly activity across all users
hourly_audience = defaultdict(int)
for profile in user_profiles:
hourly_dist = profile.get('hourly_distribution', {})
for hour, activity_count in hourly_dist.items():
hourly_audience[hour] += activity_count
# Calculate audience coverage for each hour
total_audience = sum(hourly_audience.values())
hourly_coverage = {
hour: count / total_audience if total_audience > 0 else 0
for hour, count in hourly_audience.items()
}
# Find optimal posting times (top hours with good coverage)
optimal_hours = sorted(hourly_coverage.items(),
key=lambda x: x[1], reverse=True)[:6]
# Filter for practical posting times (avoid very late/early hours)
practical_hours = [
hour for hour, coverage in optimal_hours
if 6 <= hour <= 23 # 6 AM to 11 PM
]
return {
'optimal_hours': practical_hours[:4], # Top 4 practical hours
'audience_coverage': dict(hourly_coverage),
'total_audience_analyzed': len(user_profiles),
'peak_activity_hour': max(hourly_coverage,
key=hourly_coverage.get) if hourly_coverage else 12
}
def analyze_engagement_trends(self, activities: List[ActivityLog],
time_window_days: int = 30) -> Dict:
"""Analyze user engagement trends over time"""
if not activities:
return {'trend': 'unknown', 'trend_strength': 0.0}
# Sort activities by timestamp
activities.sort(key=lambda x: x.timestamp)
# Split into time windows
latest_time = activities[-1].timestamp
window_size = time_window_days * 86400 # Convert to seconds
time_windows = []
current_window_start = latest_time - window_size
while current_window_start >= activities[0].timestamp:
window_end = current_window_start + window_size
window_activities = [
a for a in activities
if current_window_start <= a.timestamp < window_end
]
# Calculate engagement score for this window
engagement_score = sum(
self.action_weights.get(a.action, 1.0) for a in
window_activities
)
time_windows.append({
'start_time': current_window_start,
'end_time': window_end,
'activity_count': len(window_activities),
'engagement_score': engagement_score
})
current_window_start -= window_size
# Reverse to get chronological order
time_windows.reverse()
if len(time_windows) < 2:
return {'trend': 'insufficient_data', 'trend_strength': 0.0}
# Calculate trend
engagement_scores = [w['engagement_score'] for w in time_windows]
# Simple linear trend calculation
n = len(engagement_scores)
x = list(range(n))
# Calculate slope using least squares
x_mean = sum(x) / n
y_mean = sum(engagement_scores) / n
numerator = sum((x[i] - x_mean) * (engagement_scores[i] - y_mean) for
i in range(n))
denominator = sum((x[i] - x_mean) ** 2 for i in range(n))
if denominator == 0:
slope = 0
else:
slope = numerator / denominator
# Determine trend direction and strength
if slope > 0.1:
trend = 'increasing'
elif slope < -0.1:
trend = 'decreasing'
else:
trend = 'stable'
trend_strength = min(1.0, abs(slope) / max(engagement_scores) if
max(engagement_scores) > 0 else 0)
return {
'trend': trend,
'trend_strength': round(trend_strength, 3),
'time_windows_analyzed': len(time_windows),
'latest_engagement_score': engagement_scores[-1] if
engagement_scores else 0,
'average_engagement_score': round(np.mean(engagement_scores), 2)
if engagement_scores else 0
}
# Test the activity analyzer
def test_activity_analyzer():
analyzer = UserActivityAnalyzer()
# Create test activities
base_time = 1640995200 # Base timestamp
activities = [
ActivityLog("u1", base_time, "login", "mobile"),
ActivityLog("u1", base_time + 60, "post", "mobile"),
ActivityLog("u1", base_time + 120, "like", "mobile"),
ActivityLog("u1", base_time + 3600, "comment", "mobile"),
ActivityLog("u1", base_time + 7200, "share", "mobile"),
ActivityLog("u1", base_time + 10800, "logout", "mobile"),
# Add some activities from different day
ActivityLog("u1", base_time + 86400, "login", "desktop"),
ActivityLog("u1", base_time + 86460, "post", "desktop"),
ActivityLog("u1", base_time + 90000, "logout", "desktop"),
]
print("Testing User Activity Analyzer:")
print(f"Total activities: {len(activities)}")
# Build user profile
profile = analyzer.build_user_profile(activities)
print(f"\nUser Profile:")
print(f" Total activities: {profile['total_activities']}")
print(f" Most active hour: {profile['most_active_hour']}")
print(f" Primary device: {profile['primary_device']}")
print(f" Avg session duration: {profile['avg_session_duration']:.0f}
seconds")
# Detect anomalies
anomalies = analyzer.detect_anomalies(activities, profile)
print(f"\nAnomaly Detection:")
print(f" Overall score: {anomalies['overall_score']}")
print(f" Risk level: {anomalies['risk_level']}")
print(f" Anomalies found: {len(anomalies['anomalies'])}")
# Predict next active time
prediction = analyzer.predict_next_active_time(activities, profile)
print(f"\nNext Active Time Prediction:")
print(f" Predicted time: {prediction.get('predicted_datetime',
'Unknown')}")
print(f" Confidence: {prediction['confidence']}")
# Analyze engagement trends
trends = analyzer.analyze_engagement_trends(activities)
print(f"\nEngagement Trends:")
print(f" Trend: {trends['trend']}")
print(f" Trend strength: {trends['trend_strength']}")
print(f" Average engagement: {trends['average_engagement_score']}")
test_activity_analyzer()
Key Insights:
• Time series analysis for pattern recognition
• Multi-dimensional anomaly detection
• Predictive modeling for user behavior
• Statistical trend analysis for engagement
• Scalable algorithms for large user bases
2.3 Amazon OA Problems {#amazon-oa}
Problem 11: Package Delivery Route Optimization
Difficulty: Hard | Time Limit: 60 minutes | Company: Amazon
Problem Statement:
Amazon delivery driver needs to deliver packages to multiple locations. Given a list of
delivery locations with coordinates and a starting depot, find the shortest route that visits
all locations exactly once and returns to the depot (Traveling Salesman Problem variant).
Example:
Plain Text
Input:
depot = (0, 0)
locations = [(1, 2), (3, 1), (2, 4), (5, 3)]
Output:
route = [(0, 0), (1, 2), (2, 4), (5, 3), (3, 1), (0, 0)]
total_distance = 14.47
Solution Approach:
This is a variant of the Traveling Salesman Problem. For small instances, we can use
dynamic programming with bitmasks. For larger instances, we'll implement a greedy
nearest neighbor heuristic.
Python
import math
from itertools import permutations
from typing import List, Tuple
def calculate_distance(point1: Tuple[float, float], point2: Tuple[float,
float]) -> float:
"""Calculate Euclidean distance between two points"""
return math.sqrt((point1[0] - point2[0])**2 + (point1[1] - point2[1])**2)
def tsp_brute_force(depot: Tuple[float, float], locations: List[Tuple[float,
float]]) -> Tuple[List[Tuple[float, float]], float]:
"""
Brute force solution for small instances (n <= 8)
"""
if len(locations) > 8:
raise ValueError("Too many locations for brute force approach")
min_distance = float('inf')
best_route = None
# Try all permutations of locations
for perm in permutations(locations):
route = [depot] + list(perm) + [depot]
total_distance = 0
for i in range(len(route) - 1):
total_distance += calculate_distance(route[i], route[i + 1])
if total_distance < min_distance:
min_distance = total_distance
best_route = route
return best_route, min_distance
def tsp_dp_bitmask(depot: Tuple[float, float], locations: List[Tuple[float,
float]]) -> Tuple[List[Tuple[float, float]], float]:
"""
Dynamic programming solution with bitmask for medium instances (n <= 15)
"""
n = len(locations)
if n > 15:
raise ValueError("Too many locations for DP approach")
# Add depot to the beginning of locations list
all_points = [depot] + locations
# Precompute distances
dist = [[0] * (n + 1) for _ in range(n + 1)]
for i in range(n + 1):
for j in range(n + 1):
dist[i][j] = calculate_distance(all_points[i], all_points[j])
# DP table: dp[mask][i] = minimum cost to visit all cities in mask and
end at city i
dp = [[float('inf')] * (n + 1) for _ in range(1 << n)]
parent = [[-1] * (n + 1) for _ in range(1 << n)]
# Base case: start at depot (index 0)
dp[0][0] = 0
# Fill DP table
for mask in range(1 << n):
for u in range(n + 1):
if dp[mask][u] == float('inf'):
continue
for v in range(1, n + 1): # Don't go back to depot until the end
if mask & (1 << (v - 1)): # Already visited
continue
new_mask = mask | (1 << (v - 1))
new_cost = dp[mask][u] + dist[u][v]
if new_cost < dp[new_mask][v]:
dp[new_mask][v] = new_cost
parent[new_mask][v] = u
# Find minimum cost to return to depot
full_mask = (1 << n) - 1
min_cost = float('inf')
last_city = -1
for i in range(1, n + 1):
cost = dp[full_mask][i] + dist[i][0]
if cost < min_cost:
min_cost = cost
last_city = i
# Reconstruct path
path = []
mask = full_mask
current = last_city
while current != -1:
if current == 0:
path.append(depot)
else:
path.append(locations[current - 1])
next_current = parent[mask][current]
if current != 0:
mask ^= (1 << (current - 1))
current = next_current
path.reverse()
path.append(depot) # Return to depot
return path, min_cost
def tsp_nearest_neighbor(depot: Tuple[float, float], locations:
List[Tuple[float, float]]) -> Tuple[List[Tuple[float, float]], float]:
"""
Greedy nearest neighbor heuristic for large instances
"""
unvisited = set(locations)
route = [depot]
current = depot
total_distance = 0
while unvisited:
nearest = min(unvisited, key=lambda loc: calculate_distance(current,
loc))
distance = calculate_distance(current, nearest)
total_distance += distance
route.append(nearest)
unvisited.remove(nearest)
current = nearest
# Return to depot
total_distance += calculate_distance(current, depot)
route.append(depot)
return route, total_distance
def optimize_route(depot: Tuple[float, float], locations: List[Tuple[float,
float]]) -> Tuple[List[Tuple[float, float]], float]:
"""
Choose optimal algorithm based on problem size
"""
n = len(locations)
if n <= 8:
return tsp_brute_force(depot, locations)
elif n <= 15:
return tsp_dp_bitmask(depot, locations)
else:
return tsp_nearest_neighbor(depot, locations)
def improve_route_2opt(route: List[Tuple[float, float]]) ->
Tuple[List[Tuple[float, float]], float]:
"""
Improve route using 2-opt local search
"""
def calculate_route_distance(route):
total = 0
for i in range(len(route) - 1):
total += calculate_distance(route[i], route[i + 1])
return total
best_route = route[:]
best_distance = calculate_route_distance(best_route)
improved = True
while improved:
improved = False
for i in range(1, len(route) - 2):
for j in range(i + 1, len(route)):
if j - i == 1:
continue # Skip adjacent edges
# Create new route by reversing the segment between i and j
new_route = route[:i] + route[i:j][::-1] + route[j:]
new_distance = calculate_route_distance(new_route)
if new_distance < best_distance:
best_route = new_route
best_distance = new_distance
route = new_route
improved = True
break
if improved:
break
return best_route, best_distance
# Test cases
def test_packageDelivery():
depot = (0, 0)
locations = [(1, 2), (3, 1), (2, 4), (5, 3)]
print("Testing Package Delivery Route Optimization")
print(f"Depot: {depot}")
print(f"Delivery locations: {locations}")
# Test different algorithms
route1, dist1 = tsp_brute_force(depot, locations)
print(f"\nBrute Force:")
print(f"Route: {route1}")
print(f"Distance: {dist1:.2f}")
route2, dist2 = tsp_nearest_neighbor(depot, locations)
print(f"\nNearest Neighbor:")
print(f"Route: {route2}")
print(f"Distance: {dist2:.2f}")
route3, dist3 = improve_route_2opt(route2)
print(f"\n2-opt Improved:")
print(f"Route: {route3}")
print(f"Distance: {dist3:.2f}")
test_packageDelivery()
Key Insights:
• Choose algorithm based on problem size (brute force, DP, heuristic)
• Use 2-opt local search to improve heuristic solutions
• Precompute distances for efficiency
• Consider real-world constraints like time windows and vehicle capacity
Problem 12: Inventory Management System
Difficulty: Medium | Time Limit: 45 minutes | Company: Amazon
Problem Statement:
Design an inventory management system that tracks stock levels, handles orders, and
triggers restocking alerts. Implement functions to:
1. Add/remove inventory
2. Process orders (with partial fulfillment)
3. Check stock levels
4. Generate restock alerts
Example:
Plain Text
Input:
inventory = {"item1": 100, "item2": 50}
orders = [
{"item": "item1", "quantity": 30},
{"item": "item2", "quantity": 60}, # Partial fulfillment
{"item": "item3", "quantity": 10} # Out of stock
]
restock_threshold = 20
Output:
{
"fulfilled_orders": [
{"item": "item1", "quantity": 30, "status": "fulfilled"},
{"item": "item2", "quantity": 50, "status": "partial"},
{"item": "item3", "quantity": 0, "status": "out_of_stock"}
],
"remaining_inventory": {"item1": 70, "item2": 0},
"restock_alerts": ["item2"]
}
Solution Approach:
This problem requires implementing a comprehensive inventory management system with
multiple operations and business logic.
Python
from typing import Dict, List, Optional
from dataclasses import dataclass
from datetime import datetime, timedelta
import json
@dataclass
class InventoryItem:
name: str
quantity: int
restock_threshold: int
max_capacity: int
unit_cost: float
last_restocked: datetime
def needs_restock(self) -> bool:
return self.quantity <= self.restock_threshold
def can_fulfill(self, requested_quantity: int) -> int:
return min(self.quantity, requested_quantity)
@dataclass
class Order:
order_id: str
item: str
quantity: int
priority: int = 1 # 1 = normal, 2 = high, 3 = urgent
timestamp: datetime = None
def __post_init__(self):
if self.timestamp is None:
self.timestamp = datetime.now()
@dataclass
class OrderResult:
order_id: str
item: str
requested_quantity: int
fulfilled_quantity: int
status: str # "fulfilled", "partial", "out_of_stock"
timestamp: datetime
class InventoryManager:
def __init__(self):
self.inventory: Dict[str, InventoryItem] = {}
self.order_history: List[OrderResult] = []
self.pending_orders: List[Order] = []
def add_item(self, name: str, quantity: int, restock_threshold: int = 10,
max_capacity: int = 1000, unit_cost: float = 0.0) -> bool:
"""Add new item to inventory or update existing item"""
if name in self.inventory:
self.inventory[name].quantity += quantity
self.inventory[name].last_restocked = datetime.now()
else:
self.inventory[name] = InventoryItem(
name=name,
quantity=quantity,
restock_threshold=restock_threshold,
max_capacity=max_capacity,
unit_cost=unit_cost,
last_restocked=datetime.now()
)
return True
def remove_item(self, name: str, quantity: int) -> bool:
"""Remove quantity from inventory"""
if name not in self.inventory:
return False
if self.inventory[name].quantity < quantity:
return False
self.inventory[name].quantity -= quantity
return True
def process_order(self, order: Order) -> OrderResult:
"""Process a single order"""
if order.item not in self.inventory:
result = OrderResult(
order_id=order.order_id,
item=order.item,
requested_quantity=order.quantity,
fulfilled_quantity=0,
status="out_of_stock",
timestamp=datetime.now()
)
else:
item = self.inventory[order.item]
fulfilled_quantity = item.can_fulfill(order.quantity)
# Update inventory
item.quantity -= fulfilled_quantity
# Determine status
if fulfilled_quantity == 0:
status = "out_of_stock"
elif fulfilled_quantity == order.quantity:
status = "fulfilled"
else:
status = "partial"
result = OrderResult(
order_id=order.order_id,
item=order.item,
requested_quantity=order.quantity,
fulfilled_quantity=fulfilled_quantity,
status=status,
timestamp=datetime.now()
)
self.order_history.append(result)
return result
def process_orders_batch(self, orders: List[Order]) -> List[OrderResult]:
"""Process multiple orders with priority handling"""
# Sort orders by priority (higher priority first) and timestamp
sorted_orders = sorted(orders, key=lambda x: (-x.priority,
x.timestamp))
results = []
for order in sorted_orders:
result = self.process_order(order)
results.append(result)
return results
def get_restock_alerts(self) -> List[str]:
"""Get list of items that need restocking"""
alerts = []
for item_name, item in self.inventory.items():
if item.needs_restock():
alerts.append(item_name)
return alerts
def get_inventory_status(self) -> Dict:
"""Get current inventory status"""
status = {}
for item_name, item in self.inventory.items():
status[item_name] = {
"quantity": item.quantity,
"restock_threshold": item.restock_threshold,
"needs_restock": item.needs_restock(),
"last_restocked": item.last_restocked.isoformat(),
"utilization": (item.max_capacity - item.quantity) /
item.max_capacity
}
return status
def predict_stockout(self, days_ahead: int = 7) -> Dict[str, int]:
"""Predict when items will stock out based on recent order history"""
# Calculate average daily consumption for each item
recent_date = datetime.now() - timedelta(days=30)
recent_orders = [order for order in self.order_history
if order.timestamp >= recent_date]
daily_consumption = {}
for order in recent_orders:
if order.item not in daily_consumption:
daily_consumption[order.item] = 0
daily_consumption[order.item] += order.fulfilled_quantity
# Convert to daily average
for item in daily_consumption:
daily_consumption[item] /= 30
# Predict stockout dates
stockout_predictions = {}
for item_name, item in self.inventory.items():
if item_name in daily_consumption and
daily_consumption[item_name] > 0:
days_until_stockout = item.quantity /
daily_consumption[item_name]
if days_until_stockout <= days_ahead:
stockout_predictions[item_name] =
int(days_until_stockout)
return stockout_predictions
def generate_restock_recommendation(self, item_name: str) -> Dict:
"""Generate intelligent restock recommendation"""
if item_name not in self.inventory:
return {"error": "Item not found"}
item = self.inventory[item_name]
# Calculate recent consumption rate
recent_date = datetime.now() - timedelta(days=30)
recent_orders = [order for order in self.order_history
if order.item == item_name and order.timestamp >=
recent_date]
total_consumed = sum(order.fulfilled_quantity for order in
recent_orders)
daily_consumption = total_consumed / 30 if total_consumed > 0 else 1
# Recommend restock quantity
safety_stock = daily_consumption * 7 # 1 week safety stock
optimal_quantity = item.max_capacity - item.quantity
recommended_quantity = max(optimal_quantity, safety_stock)
return {
"item": item_name,
"current_quantity": item.quantity,
"recommended_restock": int(recommended_quantity),
"estimated_cost": recommended_quantity * item.unit_cost,
"daily_consumption_rate": round(daily_consumption, 2),
"days_of_stock": int(item.quantity / daily_consumption) if
daily_consumption > 0 else float('inf')
}
# Test cases
def test_inventoryManagement():
# Initialize inventory manager
manager = InventoryManager()
# Add items to inventory
manager.add_item("laptop", 50, restock_threshold=10, unit_cost=500.0)
manager.add_item("mouse", 200, restock_threshold=20, unit_cost=25.0)
manager.add_item("keyboard", 100, restock_threshold=15, unit_cost=75.0)
print("Initial Inventory Status:")
print(json.dumps(manager.get_inventory_status(), indent=2, default=str))
# Create orders
orders = [
Order("ORD001", "laptop", 15, priority=2),
Order("ORD002", "mouse", 180, priority=1),
Order("ORD003", "keyboard", 90, priority=1),
Order("ORD004", "tablet", 5, priority=3), # Item not in inventory
Order("ORD005", "laptop", 40, priority=1), # Will cause partial
fulfillment
]
# Process orders
print("\nProcessing Orders:")
results = manager.process_orders_batch(orders)
for result in results:
print(f"Order {result.order_id}: {result.status} - "
f"{result.fulfilled_quantity}/{result.requested_quantity}
{result.item}")
# Check restock alerts
print(f"\nRestock Alerts: {manager.get_restock_alerts()}")
# Get restock recommendations
print("\nRestock Recommendations:")
for item in ["laptop", "mouse"]:
recommendation = manager.generate_restock_recommendation(item)
print(f"{item}: {recommendation}")
# Predict stockouts
print(f"\nStockout Predictions (next 7 days):
{manager.predict_stockout()}")
test_inventoryManagement()
Key Insights:
• Use object-oriented design for complex business logic
• Implement priority-based order processing
• Provide predictive analytics for inventory planning
• Handle edge cases like out-of-stock and partial fulfillment
Problem 13: Customer Review Sentiment Analysis
Difficulty: Medium | Time Limit: 45 minutes | Company: Amazon
Problem Statement:
Amazon needs to analyze customer reviews to determine sentiment and extract key
insights. Given a collection of product reviews, implement a system that:
1. Classifies sentiment (positive, negative, neutral)
2. Extracts key phrases and topics
3. Identifies fake/spam reviews
4. Calculates overall product sentiment score
5. Provides actionable insights for sellers
Example:
Plain Text
Input:
reviews = [
{
"review_id": "r1", "product_id": "p1", "user_id": "u1",
"text": "Amazing product! Fast delivery and great quality. Highly
recommend!",
"rating": 5, "verified_purchase": True, "helpful_votes": 15
},
{
"review_id": "r2", "product_id": "p1", "user_id": "u2",
"text": "Terrible quality. Broke after one day. Waste of money.",
"rating": 1, "verified_purchase": True, "helpful_votes": 8
}
]
Output: {
"overall_sentiment": "mixed",
"sentiment_score": 0.3,
"positive_reviews": 1, "negative_reviews": 1, "neutral_reviews": 0,
"key_topics": ["quality", "delivery", "durability"],
"spam_detected": 0,
"insights": ["Quality concerns need attention", "Delivery praised by
customers"]
}
Solution Approach:
This problem requires natural language processing and machine learning techniques for
sentiment analysis.
Python
import re
import math
from typing import List, Dict, Tuple, Set
from collections import defaultdict, Counter
from dataclasses import dataclass
@dataclass
class Review:
review_id: str
product_id: str
user_id: str
text: str
rating: int
verified_purchase: bool
helpful_votes: int
timestamp: float = 0.0
class SentimentAnalyzer:
def __init__(self):
# Simplified sentiment lexicons (in practice, use VADER, TextBlob, or
trained models)
self.positive_words = {
'amazing', 'excellent', 'great', 'fantastic', 'wonderful',
'perfect',
'love', 'best', 'awesome', 'outstanding', 'superb', 'brilliant',
'recommend', 'satisfied', 'happy', 'pleased', 'impressed',
'quality',
'fast', 'quick', 'reliable', 'durable', 'comfortable',
'beautiful'
}
self.negative_words = {
'terrible', 'awful', 'horrible', 'worst', 'hate', 'disappointed',
'broken', 'defective', 'useless', 'waste', 'poor', 'bad',
'slow', 'expensive', 'cheap', 'flimsy', 'uncomfortable', 'ugly',
'fake', 'scam', 'fraud', 'misleading', 'overpriced', 'damaged'
}
self.intensifiers = {
'very': 1.5, 'extremely': 2.0, 'really': 1.3, 'absolutely': 1.8,
'completely': 1.7, 'totally': 1.6, 'quite': 1.2, 'rather': 1.1
}
self.negations = {'not', 'no', 'never', 'nothing', 'nobody',
'nowhere', 'neither', 'nor'}
# Spam detection patterns
self.spam_patterns = [
r'(buy|click|visit).*(link|website|url)',
r'(free|discount|offer).*(limited|time|now)',
r'(contact|call|email).*(number|address)',
r'(amazing|perfect|best).*(price|deal|offer)',
r'(\w)\1{3,}', # Repeated characters
]
# Common topics/aspects
self.aspect_keywords = {
'quality': ['quality', 'build', 'material', 'construction',
'durable', 'sturdy'],
'delivery': ['delivery', 'shipping', 'arrived', 'package',
'fast', 'quick'],
'price': ['price', 'cost', 'expensive', 'cheap', 'value',
'money'],
'customer_service': ['service', 'support', 'help', 'staff',
'representative'],
'usability': ['easy', 'difficult', 'simple', 'complex', 'user-
friendly'],
'appearance': ['looks', 'color', 'design', 'style', 'beautiful',
'ugly']
}
def preprocess_text(self, text: str) -> List[str]:
"""Clean and tokenize text"""
# Convert to lowercase and remove special characters
text = re.sub(r'[^a-zA-Z\s]', ' ', text.lower())
# Split into words and remove empty strings
words = [word.strip() for word in text.split() if word.strip()]
return words
def calculate_sentiment_score(self, text: str) -> Tuple[float, Dict]:
"""Calculate sentiment score using lexicon-based approach"""
words = self.preprocess_text(text)
sentiment_score = 0.0
word_scores = []
i = 0
while i < len(words):
word = words[i]
# Check for negation in previous words
negation_found = False
if i > 0 and words[i-1] in self.negations:
negation_found = True
elif i > 1 and words[i-2] in self.negations:
negation_found = True
# Check for intensifiers
intensifier = 1.0
if i > 0 and words[i-1] in self.intensifiers:
intensifier = self.intensifiers[words[i-1]]
# Calculate word sentiment
word_sentiment = 0.0
if word in self.positive_words:
word_sentiment = 1.0 * intensifier
elif word in self.negative_words:
word_sentiment = -1.0 * intensifier
# Apply negation
if negation_found:
word_sentiment *= -0.5
sentiment_score += word_sentiment
if word_sentiment != 0:
word_scores.append((word, word_sentiment))
i += 1
# Normalize score
if len(words) > 0:
sentiment_score = sentiment_score / len(words)
# Clamp to [-1, 1] range
sentiment_score = max(-1.0, min(1.0, sentiment_score))
return sentiment_score, {
'word_scores': word_scores,
'total_words': len(words),
'sentiment_words': len(word_scores)
}
def classify_sentiment(self, sentiment_score: float) -> str:
"""Classify sentiment based on score"""
if sentiment_score > 0.1:
return 'positive'
elif sentiment_score < -0.1:
return 'negative'
else:
return 'neutral'
def extract_aspects(self, text: str) -> Dict[str, float]:
"""Extract product aspects mentioned in review"""
words = self.preprocess_text(text)
word_set = set(words)
aspect_scores = {}
for aspect, keywords in self.aspect_keywords.items():
# Count keyword matches
matches = sum(1 for keyword in keywords if keyword in word_set)
if matches > 0:
# Calculate aspect sentiment in context
aspect_sentiment = 0.0
for keyword in keywords:
if keyword in word_set:
# Find keyword position and analyze surrounding
context
keyword_positions = [i for i, word in
enumerate(words) if word == keyword]
for pos in keyword_positions:
# Analyze sentiment in window around keyword
window_start = max(0, pos - 3)
window_end = min(len(words), pos + 4)
window_text = '
'.join(words[window_start:window_end])
window_sentiment, _ =
self.calculate_sentiment_score(window_text)
aspect_sentiment += window_sentiment
aspect_scores[aspect] = aspect_sentiment / matches if matches
> 0 else 0.0
return aspect_scores
def detect_spam(self, review: Review) -> Tuple[bool, float, List[str]]:
"""Detect potential spam/fake reviews"""
spam_indicators = []
spam_score = 0.0
# 1. Pattern-based detection
for pattern in self.spam_patterns:
if re.search(pattern, review.text.lower()):
spam_indicators.append(f"Suspicious pattern: {pattern}")
spam_score += 0.3
# 2. Rating-text mismatch
text_sentiment, _ = self.calculate_sentiment_score(review.text)
expected_sentiment = (review.rating - 3) / 2 # Convert 1-5 rating to
-1 to 1
sentiment_mismatch = abs(text_sentiment - expected_sentiment)
if sentiment_mismatch > 0.8:
spam_indicators.append("Rating-text sentiment mismatch")
spam_score += 0.4
# 3. Unverified purchase with extreme rating
if not review.verified_purchase and (review.rating == 1 or
review.rating == 5):
spam_indicators.append("Unverified purchase with extreme rating")
spam_score += 0.2
# 4. Very short text with extreme rating
if len(review.text.split()) < 5 and (review.rating == 1 or
review.rating == 5):
spam_indicators.append("Very short text with extreme rating")
spam_score += 0.3
# 5. Excessive punctuation or caps
caps_ratio = sum(1 for c in review.text if c.isupper()) / max(1,
len(review.text))
exclamation_count = review.text.count('!')
if caps_ratio > 0.3 or exclamation_count > 5:
spam_indicators.append("Excessive caps or punctuation")
spam_score += 0.2
# 6. Low helpful votes despite extreme rating
if review.rating in [1, 5] and review.helpful_votes < 2 and
len(review.text.split()) > 20:
spam_indicators.append("Low engagement despite detailed extreme
review")
spam_score += 0.1
is_spam = spam_score > 0.5
return is_spam, min(1.0, spam_score), spam_indicators
def analyze_reviews(self, reviews: List[Review]) -> Dict:
"""Comprehensive review analysis"""
if not reviews:
return {'error': 'No reviews provided'}
results = {
'total_reviews': len(reviews),
'sentiment_distribution': {'positive': 0, 'negative': 0,
'neutral': 0},
'sentiment_scores': [],
'aspect_analysis': defaultdict(list),
'spam_analysis': {'detected': 0, 'suspicious': []},
'rating_distribution': defaultdict(int),
'insights': []
}
total_sentiment = 0.0
verified_reviews = 0
for review in reviews:
# Sentiment analysis
sentiment_score, sentiment_details =
self.calculate_sentiment_score(review.text)
sentiment_class = self.classify_sentiment(sentiment_score)
results['sentiment_distribution'][sentiment_class] += 1
results['sentiment_scores'].append({
'review_id': review.review_id,
'score': round(sentiment_score, 3),
'class': sentiment_class,
'details': sentiment_details
})
total_sentiment += sentiment_score
# Aspect analysis
aspects = self.extract_aspects(review.text)
for aspect, score in aspects.items():
results['aspect_analysis'][aspect].append(score)
# Spam detection
is_spam, spam_score, spam_reasons = self.detect_spam(review)
if is_spam:
results['spam_analysis']['detected'] += 1
results['spam_analysis']['suspicious'].append({
'review_id': review.review_id,
'spam_score': round(spam_score, 3),
'reasons': spam_reasons
})
# Rating distribution
results['rating_distribution'][review.rating] += 1
if review.verified_purchase:
verified_reviews += 1
# Calculate overall metrics
results['overall_sentiment_score'] = round(total_sentiment /
len(reviews), 3)
results['overall_sentiment'] =
self.classify_sentiment(results['overall_sentiment_score'])
results['verified_purchase_ratio'] = round(verified_reviews /
len(reviews), 3)
# Aggregate aspect analysis
aspect_summary = {}
for aspect, scores in results['aspect_analysis'].items():
if scores:
aspect_summary[aspect] = {
'average_sentiment': round(sum(scores) / len(scores), 3),
'mention_count': len(scores),
'sentiment_range': [round(min(scores), 3),
round(max(scores), 3)]
}
results['aspect_summary'] = aspect_summary
# Generate insights
insights = self._generate_insights(results)
results['insights'] = insights
return results
def _generate_insights(self, analysis_results: Dict) -> List[str]:
"""Generate actionable insights from analysis"""
insights = []
# Sentiment insights
total_reviews = analysis_results['total_reviews']
sentiment_dist = analysis_results['sentiment_distribution']
positive_ratio = sentiment_dist['positive'] / total_reviews
negative_ratio = sentiment_dist['negative'] / total_reviews
if positive_ratio > 0.7:
insights.append("Strong positive sentiment - product is well-
received")
elif negative_ratio > 0.5:
insights.append("High negative sentiment - urgent attention
needed")
elif negative_ratio > 0.3:
insights.append("Significant negative feedback - investigate
common issues")
# Aspect insights
aspect_summary = analysis_results.get('aspect_summary', {})
# Find most problematic aspects
problematic_aspects = [
(aspect, data['average_sentiment'])
for aspect, data in aspect_summary.items()
if data['average_sentiment'] < -0.3 and data['mention_count'] >=
3
]
if problematic_aspects:
worst_aspect = min(problematic_aspects, key=lambda x: x[1])
insights.append(f"Major concern with {worst_aspect[0]} - needs
immediate attention")
# Find strengths
strong_aspects = [
(aspect, data['average_sentiment'])
for aspect, data in aspect_summary.items()
if data['average_sentiment'] > 0.4 and data['mention_count'] >= 3
]
if strong_aspects:
best_aspect = max(strong_aspects, key=lambda x: x[1])
insights.append(f"Strong performance in {best_aspect[0]} -
leverage in marketing")
# Spam insights
spam_detected = analysis_results['spam_analysis']['detected']
if spam_detected > total_reviews * 0.1:
insights.append(f"High spam detection rate ({spam_detected}
reviews) - review moderation needed")
# Rating distribution insights
rating_dist = analysis_results['rating_distribution']
if rating_dist.get(1, 0) + rating_dist.get(2, 0) > total_reviews *
0.4:
insights.append("High proportion of low ratings - investigate
product quality issues")
# Verification insights
verified_ratio = analysis_results['verified_purchase_ratio']
if verified_ratio < 0.6:
insights.append("Low verified purchase ratio - encourage verified
reviews")
return insights
# Test the sentiment analyzer
def test_sentiment_analyzer():
analyzer = SentimentAnalyzer()
# Create test reviews
reviews = [
Review("r1", "p1", "u1", "Amazing product! Fast delivery and great
quality. Highly recommend!", 5, True, 15),
Review("r2", "p1", "u2", "Terrible quality. Broke after one day.
Waste of money.", 1, True, 8),
Review("r3", "p1", "u3", "Good value for money. Delivery was quick
but quality could be better.", 3, True, 5),
Review("r4", "p1", "u4", "BEST PRODUCT EVER!!! BUY NOW!!! AMAZING
DEAL!!!", 5, False, 0), # Potential spam
Review("r5", "p1", "u5", "Decent product. Nothing special but does
the job.", 3, True, 3)
]
print("Testing Sentiment Analyzer:")
print(f"Total reviews: {len(reviews)}")
# Analyze reviews
results = analyzer.analyze_reviews(reviews)
print(f"\nOverall Analysis:")
print(f" Overall sentiment: {results['overall_sentiment']}")
print(f" Sentiment score: {results['overall_sentiment_score']}")
print(f" Sentiment distribution: {results['sentiment_distribution']}")
print(f" Spam detected: {results['spam_analysis']['detected']}")
print(f" Verified purchase ratio: {results['verified_purchase_ratio']}")
print(f"\nAspect Analysis:")
for aspect, data in results['aspect_summary'].items():
print(f" {aspect}: {data['average_sentiment']} (mentioned
{data['mention_count']} times)")
print(f"\nKey Insights:")
for insight in results['insights']:
print(f" - {insight}")
print(f"\nSpam Detection:")
for spam_review in results['spam_analysis']['suspicious']:
print(f" Review {spam_review['review_id']}:
{spam_review['spam_score']} - {spam_review['reasons']}")
test_sentiment_analyzer()
Key Insights:
• Lexicon-based sentiment analysis with context awareness
• Multi-dimensional spam detection using patterns and inconsistencies
• Aspect-based sentiment analysis for detailed insights
• Actionable recommendations for product improvement
Problem 14: Warehouse Robot Navigation
Difficulty: Hard | Time Limit: 60 minutes | Company: Amazon
Problem Statement:
Design a navigation system for Amazon warehouse robots. The robots need to navigate
through a dynamic warehouse environment with:
1. Static obstacles (shelves, walls)
2. Dynamic obstacles (other robots, workers)
3. Optimal path planning with collision avoidance
4. Real-time re-routing when paths are blocked
5. Multi-robot coordination to prevent deadlocks
Example:
Plain Text
Input:
warehouse_grid = [
[0, 0, 1, 0, 0], # 0=free, 1=obstacle, 2=robot, 3=target
[0, 1, 1, 1, 0],
[0, 0, 0, 0, 0],
[1, 1, 0, 1, 1],
[0, 0, 0, 0, 3]
]
robot_position = (0, 0)
target_position = (4, 4)
other_robots = [(1, 0), (2, 1)]
Output: {
"path": [(0,0), (1,0), (2,0), (2,1), (2,2), (3,2), (4,2), (4,3), (4,4)],
"total_distance": 8,
"estimated_time": 12.5,
"collision_avoidance_moves": 2
}
Solution Approach:
This problem requires advanced pathfinding algorithms with dynamic obstacle handling
and multi-robot coordination.
Python
import heapq
import time
from typing import List, Tuple, Dict, Set, Optional
from collections import deque, defaultdict
from dataclasses import dataclass
from enum import Enum
class CellType(Enum):
FREE = 0
OBSTACLE = 1
ROBOT = 2
TARGET = 3
@dataclass
class Robot:
robot_id: str
position: Tuple[int, int]
target: Tuple[int, int]
speed: float = 1.0 # cells per second
priority: int = 1 # Higher number = higher priority
path: List[Tuple[int, int]] = None
class WarehouseNavigator:
def __init__(self, grid: List[List[int]]):
self.grid = grid
self.rows = len(grid)
self.cols = len(grid[0]) if grid else 0
self.robots = {}
self.dynamic_obstacles = set()
self.reserved_cells = defaultdict(list) # time -> list of (robot_id,
position)
# Movement directions (4-directional)
self.directions = [(0, 1), (1, 0), (0, -1), (-1, 0)]
self.direction_names = ['right', 'down', 'left', 'up']
def is_valid_position(self, pos: Tuple[int, int]) -> bool:
"""Check if position is within bounds and not an obstacle"""
row, col = pos
return (0 <= row < self.rows and
0 <= col < self.cols and
self.grid[row][col] != CellType.OBSTACLE.value)
def manhattan_distance(self, pos1: Tuple[int, int], pos2: Tuple[int,
int]) -> int:
"""Calculate Manhattan distance between two positions"""
return abs(pos1[0] - pos2[0]) + abs(pos1[1] - pos2[1])
def get_neighbors(self, pos: Tuple[int, int]) -> List[Tuple[int, int]]:
"""Get valid neighboring positions"""
neighbors = []
row, col = pos
for dr, dc in self.directions:
new_pos = (row + dr, col + dc)
if self.is_valid_position(new_pos):
neighbors.append(new_pos)
return neighbors
def a_star_pathfinding(self, start: Tuple[int, int], goal: Tuple[int,
int],
avoid_positions: Set[Tuple[int, int]] = None) ->
List[Tuple[int, int]]:
"""A* pathfinding algorithm with dynamic obstacle avoidance"""
if avoid_positions is None:
avoid_positions = set()
# Priority queue: (f_score, g_score, position, path)
open_set = [(0, 0, start, [start])]
closed_set = set()
g_scores = {start: 0}
while open_set:
f_score, g_score, current, path = heapq.heappop(open_set)
if current in closed_set:
continue
closed_set.add(current)
if current == goal:
return path
for neighbor in self.get_neighbors(current):
if neighbor in closed_set or neighbor in avoid_positions:
continue
tentative_g = g_score + 1
if neighbor not in g_scores or tentative_g <
g_scores[neighbor]:
g_scores[neighbor] = tentative_g
h_score = self.manhattan_distance(neighbor, goal)
f_score = tentative_g + h_score
new_path = path + [neighbor]
heapq.heappush(open_set, (f_score, tentative_g, neighbor,
new_path))
return [] # No path found
def time_space_a_star(self, robot: Robot, current_time: float = 0) ->
List[Tuple[int, int]]:
"""Time-space A* for multi-robot pathfinding"""
start = robot.position
goal = robot.target
# Priority queue: (f_score, g_score, time, position, path)
open_set = [(0, 0, current_time, start, [start])]
closed_set = set()
# State: (time_step, position)
g_scores = {(current_time, start): 0}
max_time = current_time + self.manhattan_distance(start, goal) * 3 #
Time limit
while open_set:
f_score, g_score, time_step, current, path =
heapq.heappop(open_set)
state = (time_step, current)
if state in closed_set or time_step > max_time:
continue
closed_set.add(state)
if current == goal:
return path
# Generate successor states
successors = []
# Move to neighboring cells
for neighbor in self.get_neighbors(current):
new_time = time_step + 1 / robot.speed
successors.append((new_time, neighbor))
# Wait at current position
wait_time = time_step + 1 / robot.speed
successors.append((wait_time, current))
for new_time, new_pos in successors:
# Check for conflicts with other robots
if self._has_conflict(robot.robot_id, new_pos, new_time):
continue
new_state = (new_time, new_pos)
tentative_g = g_score + (new_time - time_step)
if new_state not in g_scores or tentative_g <
g_scores[new_state]:
g_scores[new_state] = tentative_g
h_score = self.manhattan_distance(new_pos, goal) /
robot.speed
f_score = tentative_g + h_score
new_path = path + [new_pos] if new_pos != current else
path
heapq.heappush(open_set, (f_score, tentative_g, new_time,
new_pos, new_path))
return [] # No path found
def _has_conflict(self, robot_id: str, position: Tuple[int, int], time:
float) -> bool:
"""Check if robot movement conflicts with other robots"""
time_slot = int(time)
# Check reservations at this time
for reserved_time in range(max(0, time_slot - 1), time_slot + 2):
if reserved_time in self.reserved_cells:
for other_robot_id, other_pos in
self.reserved_cells[reserved_time]:
if other_robot_id != robot_id and other_pos == position:
return True
return False
def reserve_path(self, robot: Robot, path: List[Tuple[int, int]],
start_time: float = 0):
"""Reserve cells along the path for the robot"""
current_time = start_time
for i, position in enumerate(path):
time_slot = int(current_time)
self.reserved_cells[time_slot].append((robot.robot_id, position))
if i < len(path) - 1:
current_time += 1 / robot.speed
def clear_reservations(self, robot_id: str):
"""Clear all reservations for a robot"""
for time_slot in list(self.reserved_cells.keys()):
self.reserved_cells[time_slot] = [
(rid, pos) for rid, pos in self.reserved_cells[time_slot]
if rid != robot_id
]
if not self.reserved_cells[time_slot]:
del self.reserved_cells[time_slot]
def detect_deadlock(self, robots: List[Robot]) -> bool:
"""Detect potential deadlocks between robots"""
# Build dependency graph
dependencies = defaultdict(set)
for robot in robots:
if robot.path and len(robot.path) > 1:
next_pos = robot.path[1]
# Check if any other robot is blocking this position
for other_robot in robots:
if (other_robot.robot_id != robot.robot_id and
other_robot.position == next_pos):
dependencies[robot.robot_id].add(other_robot.robot_id)
# Check for cycles using DFS
visited = set()
rec_stack = set()
def has_cycle(node):
if node in rec_stack:
return True
if node in visited:
return False
visited.add(node)
rec_stack.add(node)
for neighbor in dependencies[node]:
if has_cycle(neighbor):
return True
rec_stack.remove(node)
return False
for robot_id in dependencies:
if robot_id not in visited:
if has_cycle(robot_id):
return True
return False
def resolve_deadlock(self, robots: List[Robot]) -> List[Robot]:
"""Resolve deadlock by re-routing lower priority robots"""
# Sort robots by priority (higher priority first)
sorted_robots = sorted(robots, key=lambda r: r.priority,
reverse=True)
# Clear all reservations
for robot in robots:
self.clear_reservations(robot.robot_id)
# Re-plan paths in priority order
replanned_robots = []
current_time = time.time()
for robot in sorted_robots:
# Plan path avoiding other robots
new_path = self.time_space_a_star(robot, current_time)
if new_path:
robot.path = new_path
self.reserve_path(robot, new_path, current_time)
replanned_robots.append(robot)
else:
# If no path found, try waiting and re-planning
wait_robot = Robot(
robot.robot_id, robot.position, robot.target,
robot.speed, robot.priority - 1 # Lower priority
)
replanned_robots.append(wait_robot)
return replanned_robots
def plan_multi_robot_paths(self, robots: List[Robot]) -> Dict[str, Dict]:
"""Plan paths for multiple robots with coordination"""
results = {}
# Sort robots by priority
sorted_robots = sorted(robots, key=lambda r: r.priority,
reverse=True)
current_time = time.time()
for robot in sorted_robots:
# Plan path using time-space A*
path = self.time_space_a_star(robot, current_time)
if path:
# Reserve the path
self.reserve_path(robot, path, current_time)
# Calculate metrics
total_distance = len(path) - 1
estimated_time = total_distance / robot.speed
results[robot.robot_id] = {
'path': path,
'total_distance': total_distance,
'estimated_time': round(estimated_time, 2),
'success': True
}
robot.path = path
else:
results[robot.robot_id] = {
'path': [],
'total_distance': 0,
'estimated_time': 0,
'success': False,
'error': 'No path found'
}
# Check for deadlocks
if self.detect_deadlock(sorted_robots):
print("Deadlock detected, resolving...")
resolved_robots = self.resolve_deadlock(sorted_robots)
# Update results with resolved paths
for robot in resolved_robots:
if robot.path:
total_distance = len(robot.path) - 1
estimated_time = total_distance / robot.speed
results[robot.robot_id].update({
'path': robot.path,
'total_distance': total_distance,
'estimated_time': round(estimated_time, 2),
'deadlock_resolved': True
})
return results
def dynamic_replan(self, robot: Robot, new_obstacles: List[Tuple[int,
int]]) -> Dict:
"""Dynamically re-plan path when new obstacles appear"""
# Update dynamic obstacles
self.dynamic_obstacles.update(new_obstacles)
# Clear current reservations
self.clear_reservations(robot.robot_id)
# Find new path avoiding dynamic obstacles
avoid_positions = set(new_obstacles)
new_path = self.a_star_pathfinding(robot.position, robot.target,
avoid_positions)
if new_path:
# Reserve new path
self.reserve_path(robot, new_path)
total_distance = len(new_path) - 1
estimated_time = total_distance / robot.speed
return {
'new_path': new_path,
'total_distance': total_distance,
'estimated_time': round(estimated_time, 2),
'replanning_success': True
}
else:
return {
'new_path': [],
'replanning_success': False,
'error': 'No alternative path found'
}
def simulate_movement(self, robots: List[Robot], time_steps: int = 10) ->
List[Dict]:
"""Simulate robot movement over time"""
simulation_log = []
for step in range(time_steps):
step_log = {
'time_step': step,
'robot_positions': {},
'collisions': [],
'completed_robots': []
}
# Move each robot along its path
for robot in robots:
if robot.path and len(robot.path) > step + 1:
new_position = robot.path[step + 1]
old_position = robot.position
robot.position = new_position
step_log['robot_positions'][robot.robot_id] = {
'from': old_position,
'to': new_position
}
# Check if robot reached target
if new_position == robot.target:
step_log['completed_robots'].append(robot.robot_id)
else:
# Robot stays at current position
step_log['robot_positions'][robot.robot_id] = {
'from': robot.position,
'to': robot.position
}
# Check for collisions
positions_this_step = {}
for robot in robots:
pos = robot.position
if pos in positions_this_step:
step_log['collisions'].append({
'robots': [positions_this_step[pos], robot.robot_id],
'position': pos
})
else:
positions_this_step[pos] = robot.robot_id
simulation_log.append(step_log)
return simulation_log
# Test the warehouse navigator
def test_warehouse_navigator():
# Create test warehouse grid
warehouse_grid = [
[0, 0, 1, 0, 0],
[0, 1, 1, 1, 0],
[0, 0, 0, 0, 0],
[1, 1, 0, 1, 1],
[0, 0, 0, 0, 0]
]
navigator = WarehouseNavigator(warehouse_grid)
# Create test robots
robots = [
Robot("robot1", (0, 0), (4, 4), speed=1.0, priority=2),
Robot("robot2", (0, 4), (4, 0), speed=1.2, priority=1),
Robot("robot3", (2, 2), (0, 2), speed=0.8, priority=3)
]
print("Testing Warehouse Navigator:")
print(f"Warehouse size: {len(warehouse_grid)}x{len(warehouse_grid[0])}")
print(f"Number of robots: {len(robots)}")
# Plan paths for all robots
results = navigator.plan_multi_robot_paths(robots)
print("\nPath Planning Results:")
for robot_id, result in results.items():
if result['success']:
print(f"Robot {robot_id}:")
print(f" Path: {result['path']}")
print(f" Distance: {result['total_distance']}")
print(f" Time: {result['estimated_time']}s")
else:
print(f"Robot {robot_id}: Failed - {result.get('error', 'Unknown
error')}")
# Test dynamic re-planning
print("\nTesting Dynamic Re-planning:")
new_obstacles = [(2, 3), (3, 3)]
replan_result = navigator.dynamic_replan(robots[0], new_obstacles)
if replan_result['replanning_success']:
print(f"Re-planning successful:")
print(f" New path: {replan_result['new_path']}")
print(f" New distance: {replan_result['total_distance']}")
else:
print(f"Re-planning failed: {replan_result.get('error', 'Unknown
error')}")
# Simulate movement
print("\nSimulating Movement (first 5 steps):")
simulation = navigator.simulate_movement(robots, 5)
for step_log in simulation[:5]:
print(f"Step {step_log['time_step']}:")
for robot_id, movement in step_log['robot_positions'].items():
print(f" {robot_id}: {movement['from']} -> {movement['to']}")
if step_log['collisions']:
print(f" Collisions: {step_log['collisions']}")
if step_log['completed_robots']:
print(f" Completed: {step_log['completed_robots']}")
test_warehouse_navigator()
Key Insights:
• Time-space A* for multi-robot coordination
• Dynamic obstacle handling and re-planning
• Deadlock detection and resolution
• Priority-based conflict resolution
• Real-time simulation capabilities
Problem 15: Prime Delivery Time Calculation
Difficulty: Medium | Time Limit: 45 minutes | Company: Amazon
Problem Statement:
Calculate optimal delivery times for Amazon Prime orders considering multiple factors:
1. Package processing time at fulfillment centers
2. Transportation time between locations
3. Delivery capacity constraints
4. Weather and traffic conditions
5. Customer delivery preferences
6. Prime membership benefits (same-day, next-day delivery)
Example:
Plain Text
Input:
order = {
"order_id": "ord123", "customer_id": "cust456",
"items": [{"item_id": "item1", "quantity": 2, "weight": 1.5}],
"customer_location": {"lat": 40.7128, "lng": -74.0060},
"order_time": 1640995200, "prime_member": True,
"delivery_preference": "fastest"
}
fulfillment_centers = [
{"fc_id": "fc1", "location": {"lat": 40.7589, "lng": -73.9851},
"inventory": {"item1": 100}, "processing_time": 2.0}
]
Output: {
"estimated_delivery_time": 1641038400,
"delivery_window": "2022-01-01 14:00 - 18:00",
"selected_fc": "fc1", "delivery_method": "same_day",
"total_time_hours": 12.0
}
Solution Approach:
This problem requires optimization algorithms considering multiple constraints and real-
time factors.
Python
import math
import time
from datetime import datetime, timedelta
from typing import List, Dict, Tuple, Optional
from dataclasses import dataclass
from enum import Enum
class DeliveryMethod(Enum):
SAME_DAY = "same_day"
NEXT_DAY = "next_day"
TWO_DAY = "two_day"
STANDARD = "standard"
class WeatherCondition(Enum):
CLEAR = "clear"
RAIN = "rain"
SNOW = "snow"
STORM = "storm"
@dataclass
class Location:
lat: float
lng: float
def distance_to(self, other: 'Location') -> float:
"""Calculate distance using Haversine formula"""
R = 6371 # Earth's radius in kilometers
lat1, lng1 = math.radians(self.lat), math.radians(self.lng)
lat2, lng2 = math.radians(other.lat), math.radians(other.lng)
dlat = lat2 - lat1
dlng = lng2 - lng1
a = (math.sin(dlat/2)**2 +
math.cos(lat1) * math.cos(lat2) * math.sin(dlng/2)**2)
c = 2 * math.asin(math.sqrt(a))
return R * c
@dataclass
class OrderItem:
item_id: str
quantity: int
weight: float
dimensions: Tuple[float, float, float] = (10, 10, 10) # cm
fragile: bool = False
@dataclass
class Order:
order_id: str
customer_id: str
items: List[OrderItem]
customer_location: Location
order_time: float
prime_member: bool
delivery_preference: str # "fastest", "cheapest", "specific_time"
preferred_time_window: Optional[Tuple[int, int]] = None # (start_hour,
end_hour)
@dataclass
class FulfillmentCenter:
fc_id: str
location: Location
inventory: Dict[str, int]
processing_capacity: int # orders per hour
current_load: int = 0
operating_hours: Tuple[int, int] = (6, 22) # 6 AM to 10 PM
class DeliveryTimeCalculator:
def __init__(self):
# Base processing times (hours)
self.base_processing_times = {
DeliveryMethod.SAME_DAY: 1.0,
DeliveryMethod.NEXT_DAY: 2.0,
DeliveryMethod.TWO_DAY: 4.0,
DeliveryMethod.STANDARD: 8.0
}
# Weather impact multipliers
self.weather_multipliers = {
WeatherCondition.CLEAR: 1.0,
WeatherCondition.RAIN: 1.2,
WeatherCondition.SNOW: 1.5,
WeatherCondition.STORM: 2.0
}
# Traffic patterns (hour -> multiplier)
self.traffic_multipliers = {
hour: 1.5 if 7 <= hour <= 9 or 17 <= hour <= 19 else 1.0
for hour in range(24)
}
# Prime delivery benefits
self.prime_benefits = {
"same_day_cutoff": 12, # Orders before 12 PM eligible for same-
day
"next_day_guaranteed": True,
"weekend_delivery": True,
"priority_processing": 0.8 # 20% faster processing
}
def calculate_processing_time(self, order: Order, fc: FulfillmentCenter,
delivery_method: DeliveryMethod) -> float:
"""Calculate order processing time at fulfillment center"""
base_time = self.base_processing_times[delivery_method]
# Apply Prime member benefits
if order.prime_member:
base_time *= self.prime_benefits["priority_processing"]
# Consider fulfillment center load
load_factor = 1.0 + (fc.current_load / fc.processing_capacity) * 0.5
processing_time = base_time * load_factor
# Consider item complexity
total_weight = sum(item.weight * item.quantity for item in
order.items)
fragile_items = any(item.fragile for item in order.items)
if total_weight > 10: # Heavy package
processing_time *= 1.2
if fragile_items:
processing_time *= 1.3
# Consider time of day (processing efficiency)
order_hour = datetime.fromtimestamp(order.order_time).hour
if not (fc.operating_hours[0] <= order_hour <=
fc.operating_hours[1]):
# Order placed outside operating hours
next_open = fc.operating_hours[0]
if order_hour > fc.operating_hours[1]:
next_open += 24 # Next day
wait_time = (next_open - order_hour) % 24
processing_time += wait_time
return processing_time
def calculate_transportation_time(self, fc_location: Location,
customer_location: Location,
weather: WeatherCondition =
WeatherCondition.CLEAR,
departure_time: float = None) -> float:
"""Calculate transportation time from FC to customer"""
distance = fc_location.distance_to(customer_location)
# Base speed (km/h) based on distance
if distance < 20:
base_speed = 40 # Urban delivery
elif distance < 100:
base_speed = 60 # Suburban delivery
else:
base_speed = 80 # Long-distance delivery
# Apply weather conditions
weather_factor = self.weather_multipliers[weather]
# Apply traffic conditions
if departure_time:
departure_hour = datetime.fromtimestamp(departure_time).hour
traffic_factor = self.traffic_multipliers[departure_hour]
else:
traffic_factor = 1.0
# Calculate adjusted speed and time
adjusted_speed = base_speed / (weather_factor * traffic_factor)
transportation_time = distance / adjusted_speed
return transportation_time
def check_inventory_availability(self, order: Order, fc:
FulfillmentCenter) -> bool:
"""Check if FC has sufficient inventory for the order"""
for item in order.items:
available = fc.inventory.get(item.item_id, 0)
if available < item.quantity:
return False
return True
def get_eligible_delivery_methods(self, order: Order, current_time: float
= None) -> List[DeliveryMethod]:
"""Determine eligible delivery methods based on order and time"""
if current_time is None:
current_time = time.time()
eligible_methods = []
order_datetime = datetime.fromtimestamp(order.order_time)
current_datetime = datetime.fromtimestamp(current_time)
# Check same-day delivery eligibility
if (order.prime_member and
order_datetime.hour < self.prime_benefits["same_day_cutoff"] and
order_datetime.date() == current_datetime.date()):
eligible_methods.append(DeliveryMethod.SAME_DAY)
# Next-day delivery
if order.prime_member or order_datetime.weekday() < 5: # Weekday
orders
eligible_methods.append(DeliveryMethod.NEXT_DAY)
# Always available methods
eligible_methods.extend([DeliveryMethod.TWO_DAY,
DeliveryMethod.STANDARD])
return eligible_methods
def calculate_delivery_window(self, estimated_delivery_time: float,
delivery_method: DeliveryMethod) ->
Tuple[float, float]:
"""Calculate delivery time window"""
delivery_datetime = datetime.fromtimestamp(estimated_delivery_time)
# Define delivery windows based on method
if delivery_method == DeliveryMethod.SAME_DAY:
# Same day: 4-hour window
window_start = delivery_datetime.replace(hour=14, minute=0,
second=0)
window_end = delivery_datetime.replace(hour=18, minute=0,
second=0)
elif delivery_method == DeliveryMethod.NEXT_DAY:
# Next day: 6-hour window
window_start = delivery_datetime.replace(hour=9, minute=0,
second=0)
window_end = delivery_datetime.replace(hour=15, minute=0,
second=0)
else:
# Standard: 8-hour window
window_start = delivery_datetime.replace(hour=9, minute=0,
second=0)
window_end = delivery_datetime.replace(hour=17, minute=0,
second=0)
return window_start.timestamp(), window_end.timestamp()
def optimize_fulfillment_center_selection(self, order: Order,
fulfillment_centers:
List[FulfillmentCenter],
delivery_method: DeliveryMethod)
-> Optional[FulfillmentCenter]:
"""Select optimal fulfillment center based on multiple criteria"""
eligible_fcs = []
for fc in fulfillment_centers:
# Check inventory availability
if not self.check_inventory_availability(order, fc):
continue
# Calculate total delivery time
processing_time = self.calculate_processing_time(order, fc,
delivery_method)
transportation_time = self.calculate_transportation_time(
fc.location, order.customer_location
)
total_time = processing_time + transportation_time
# Calculate score based on delivery preference
if order.delivery_preference == "fastest":
score = -total_time # Minimize time
elif order.delivery_preference == "cheapest":
# Simplified cost model (distance-based)
distance = fc.location.distance_to(order.customer_location)
score = -distance # Minimize distance/cost
else:
# Balanced score
distance = fc.location.distance_to(order.customer_location)
score = -(total_time * 0.7 + distance * 0.3)
eligible_fcs.append({
'fc': fc,
'total_time': total_time,
'processing_time': processing_time,
'transportation_time': transportation_time,
'score': score
})
if not eligible_fcs:
return None
# Select FC with best score
best_fc_info = max(eligible_fcs, key=lambda x: x['score'])
return best_fc_info
def calculate_delivery_estimate(self, order: Order,
fulfillment_centers:
List[FulfillmentCenter],
weather: WeatherCondition =
WeatherCondition.CLEAR,
current_time: float = None) -> Dict:
"""Calculate comprehensive delivery estimate"""
if current_time is None:
current_time = time.time()
# Get eligible delivery methods
eligible_methods = self.get_eligible_delivery_methods(order,
current_time)
if not eligible_methods:
return {'error': 'No eligible delivery methods'}
# Try each delivery method and find the best option
delivery_options = []
for method in eligible_methods:
fc_info = self.optimize_fulfillment_center_selection(order,
fulfillment_centers, method)
if fc_info:
# Calculate final delivery time
processing_start = max(current_time, order.order_time)
processing_end = processing_start +
fc_info['processing_time'] * 3600 # Convert to seconds
transportation_time_hours =
self.calculate_transportation_time(
fc_info['fc'].location, order.customer_location, weather,
processing_end
)
estimated_delivery_time = processing_end +
transportation_time_hours * 3600
# Calculate delivery window
window_start, window_end =
self.calculate_delivery_window(estimated_delivery_time, method)
delivery_options.append({
'delivery_method': method.value,
'estimated_delivery_time': estimated_delivery_time,
'delivery_window_start': window_start,
'delivery_window_end': window_end,
'selected_fc': fc_info['fc'].fc_id,
'total_time_hours': round(fc_info['total_time'], 2),
'processing_time_hours':
round(fc_info['processing_time'], 2),
'transportation_time_hours':
round(transportation_time_hours, 2)
})
if not delivery_options:
return {'error': 'No fulfillment centers can handle this order'}
# Select best option based on customer preference
if order.delivery_preference == "fastest":
best_option = min(delivery_options, key=lambda x:
x['total_time_hours'])
else:
# Default to fastest available
best_option = min(delivery_options, key=lambda x:
x['total_time_hours'])
# Format response
result = {
'order_id': order.order_id,
'estimated_delivery_time':
best_option['estimated_delivery_time'],
'delivery_window': f"
{datetime.fromtimestamp(best_option['delivery_window_start']).strftime('%Y-
%m-%d %H:%M')} -
{datetime.fromtimestamp(best_option['delivery_window_end']).strftime('%H:%M')
}",
'selected_fc': best_option['selected_fc'],
'delivery_method': best_option['delivery_method'],
'total_time_hours': best_option['total_time_hours'],
'breakdown': {
'processing_time_hours':
best_option['processing_time_hours'],
'transportation_time_hours':
best_option['transportation_time_hours']
},
'all_options': delivery_options
}
return result
def batch_calculate_deliveries(self, orders: List[Order],
fulfillment_centers:
List[FulfillmentCenter]) -> Dict[str, Dict]:
"""Calculate delivery estimates for multiple orders efficiently"""
results = {}
# Update FC loads based on all orders
fc_loads = {fc.fc_id: fc.current_load for fc in fulfillment_centers}
for order in orders:
# Calculate delivery estimate
estimate = self.calculate_delivery_estimate(order,
fulfillment_centers)
if 'error' not in estimate:
# Update FC load
selected_fc_id = estimate['selected_fc']
fc_loads[selected_fc_id] += 1
# Update the FC object for subsequent calculations
for fc in fulfillment_centers:
if fc.fc_id == selected_fc_id:
fc.current_load = fc_loads[selected_fc_id]
break
results[order.order_id] = estimate
return results
# Test the delivery time calculator
def test_delivery_calculator():
calculator = DeliveryTimeCalculator()
# Create test data
customer_location = Location(40.7128, -74.0060) # NYC
fc_location = Location(40.7589, -73.9851) # Manhattan FC
order = Order(
order_id="ord123",
customer_id="cust456",
items=[OrderItem("item1", 2, 1.5), OrderItem("item2", 1, 0.5)],
customer_location=customer_location,
order_time=time.time(),
prime_member=True,
delivery_preference="fastest"
)
fulfillment_centers = [
FulfillmentCenter(
fc_id="fc1",
location=fc_location,
inventory={"item1": 100, "item2": 50},
processing_capacity=100,
current_load=20
)
]
print("Testing Delivery Time Calculator:")
print(f"Order ID: {order.order_id}")
print(f"Prime member: {order.prime_member}")
print(f"Items: {len(order.items)}")
# Calculate delivery estimate
estimate = calculator.calculate_delivery_estimate(order,
fulfillment_centers)
if 'error' not in estimate:
print(f"\nDelivery Estimate:")
print(f" Delivery method: {estimate['delivery_method']}")
print(f" Total time: {estimate['total_time_hours']} hours")
print(f" Delivery window: {estimate['delivery_window']}")
print(f" Selected FC: {estimate['selected_fc']}")
print(f" Processing time: {estimate['breakdown']
['processing_time_hours']} hours")
print(f" Transportation time: {estimate['breakdown']
['transportation_time_hours']} hours")
print(f"\nAll delivery options:")
for option in estimate['all_options']:
print(f" {option['delivery_method']}:
{option['total_time_hours']} hours")
else:
print(f"Error: {estimate['error']}")
test_delivery_calculator()
Key Insights:
• Multi-factor optimization considering processing, transportation, and constraints
• Dynamic fulfillment center selection based on inventory and capacity
• Weather and traffic impact modeling
• Prime membership benefits integration
• Flexible delivery window calculation
2.4 Microsoft OA Problems {#microsoft-oa}
Problem 16: Excel Formula Parser
Difficulty: Hard | Time Limit: 60 minutes | Company: Microsoft
Problem Statement:
Implement a basic Excel formula parser that can evaluate simple formulas containing:
• Basic arithmetic operations (+, -, *, /)
• Cell references (A1, B2, etc.)
• Functions (SUM, AVERAGE, MAX, MIN)
• Parentheses for grouping
Example:
Plain Text
Input:
cells = {"A1": 10, "A2": 20, "B1": 5, "B2": 15}
formula = "=SUM(A1:A2) + B1 * 2"
Output: 40
Explanation: SUM(A1:A2) = 10+20 = 30, B1*2 = 5*2 = 10, total = 40
Solution Approach:
This problem requires implementing a formula parser with tokenization, parsing, and
evaluation phases.
Python
import re
from typing import Dict, List, Union, Any
from enum import Enum
class TokenType(Enum):
NUMBER = "NUMBER"
CELL_REF = "CELL_REF"
RANGE = "RANGE"
FUNCTION = "FUNCTION"
OPERATOR = "OPERATOR"
LPAREN = "LPAREN"
RPAREN = "RPAREN"
COMMA = "COMMA"
COLON = "COLON"
EOF = "EOF"
class Token:
def __init__(self, type_: TokenType, value: str, position: int = 0):
self.type = type_
self.value = value
self.position = position
def __repr__(self):
return f"Token({self.type}, {self.value})"
class ExcelFormulaParser:
def __init__(self, cells: Dict[str, float]):
self.cells = cells
self.tokens = []
self.current_token_index = 0
# Define operator precedence
self.precedence = {'+': 1, '-': 1, '*': 2, '/': 2}
# Define functions
self.functions = {
'SUM': self._sum,
'AVERAGE': self._average,
'MAX': self._max,
'MIN': self._min,
'COUNT': self._count
}
def tokenize(self, formula: str) -> List[Token]:
"""Tokenize the formula string"""
if formula.startswith('='):
formula = formula[1:] # Remove leading =
tokens = []
i = 0
while i < len(formula):
char = formula[i]
# Skip whitespace
if char.isspace():
i += 1
continue
# Numbers (including decimals)
if char.isdigit() or char == '.':
start = i
while i < len(formula) and (formula[i].isdigit() or
formula[i] == '.'):
i += 1
tokens.append(Token(TokenType.NUMBER, formula[start:i]))
continue
# Cell references and functions
if char.isalpha():
start = i
while i < len(formula) and (formula[i].isalnum()):
i += 1
value = formula[start:i]
# Check if it's a function (followed by parenthesis)
if i < len(formula) and formula[i] == '(':
tokens.append(Token(TokenType.FUNCTION, value))
# Check if it's a cell reference (letter followed by number)
elif re.match(r'^[A-Z]+\d+$', value):
tokens.append(Token(TokenType.CELL_REF, value))
else:
tokens.append(Token(TokenType.CELL_REF, value)) # Assume
cell ref
continue
# Operators and special characters
if char in '+-*/':
tokens.append(Token(TokenType.OPERATOR, char))
elif char == '(':
tokens.append(Token(TokenType.LPAREN, char))
elif char == ')':
tokens.append(Token(TokenType.RPAREN, char))
elif char == ',':
tokens.append(Token(TokenType.COMMA, char))
elif char == ':':
tokens.append(Token(TokenType.COLON, char))
i += 1
tokens.append(Token(TokenType.EOF, ''))
return tokens
def parse_range(self, start_cell: str, end_cell: str) -> List[float]:
"""Parse cell range like A1:A3"""
# Simple implementation for same column ranges
start_col = re.match(r'([A-Z]+)', start_cell).group(1)
start_row = int(re.match(r'[A-Z]+(\d+)', start_cell).group(1))
end_row = int(re.match(r'[A-Z]+(\d+)', end_cell).group(1))
values = []
for row in range(start_row, end_row + 1):
cell_ref = f"{start_col}{row}"
if cell_ref in self.cells:
values.append(self.cells[cell_ref])
return values
def _sum(self, values: List[float]) -> float:
return sum(values)
def _average(self, values: List[float]) -> float:
return sum(values) / len(values) if values else 0
def _max(self, values: List[float]) -> float:
return max(values) if values else 0
def _min(self, values: List[float]) -> float:
return min(values) if values else 0
def _count(self, values: List[float]) -> float:
return len(values)
def current_token(self) -> Token:
if self.current_token_index < len(self.tokens):
return self.tokens[self.current_token_index]
return Token(TokenType.EOF, '')
def consume_token(self) -> Token:
token = self.current_token()
self.current_token_index += 1
return token
def parse_expression(self) -> float:
"""Parse expression with operator precedence"""
return self.parse_addition()
def parse_addition(self) -> float:
"""Parse addition and subtraction (lowest precedence)"""
left = self.parse_multiplication()
while self.current_token().type == TokenType.OPERATOR and
self.current_token().value in ['+', '-']:
op = self.consume_token().value
right = self.parse_multiplication()
if op == '+':
left = left + right
else:
left = left - right
return left
def parse_multiplication(self) -> float:
"""Parse multiplication and division (higher precedence)"""
left = self.parse_factor()
while self.current_token().type == TokenType.OPERATOR and
self.current_token().value in ['*', '/']:
op = self.consume_token().value
right = self.parse_factor()
if op == '*':
left = left * right
else:
if right == 0:
raise ValueError("Division by zero")
left = left / right
return left
def parse_factor(self) -> float:
"""Parse factors (numbers, cell references, functions,
parentheses)"""
token = self.current_token()
if token.type == TokenType.NUMBER:
self.consume_token()
return float(token.value)
elif token.type == TokenType.CELL_REF:
self.consume_token()
if token.value in self.cells:
return self.cells[token.value]
else:
raise ValueError(f"Cell {token.value} not found")
elif token.type == TokenType.FUNCTION:
return self.parse_function()
elif token.type == TokenType.LPAREN:
self.consume_token() # consume '('
result = self.parse_expression()
if self.current_token().type != TokenType.RPAREN:
raise ValueError("Missing closing parenthesis")
self.consume_token() # consume ')'
return result
else:
raise ValueError(f"Unexpected token: {token}")
def parse_function(self) -> float:
"""Parse function calls"""
func_name = self.consume_token().value
if func_name not in self.functions:
raise ValueError(f"Unknown function: {func_name}")
if self.current_token().type != TokenType.LPAREN:
raise ValueError(f"Expected '(' after function {func_name}")
self.consume_token() # consume '('
# Parse arguments
args = []
while self.current_token().type != TokenType.RPAREN:
if self.current_token().type == TokenType.CELL_REF:
# Check if it's a range
cell1 = self.consume_token().value
if self.current_token().type == TokenType.COLON:
self.consume_token() # consume ':'
if self.current_token().type != TokenType.CELL_REF:
raise ValueError("Expected cell reference after ':'")
cell2 = self.consume_token().value
range_values = self.parse_range(cell1, cell2)
args.extend(range_values)
else:
# Single cell reference
if cell1 in self.cells:
args.append(self.cells[cell1])
else:
# Parse as expression
args.append(self.parse_expression())
# Handle comma separation
if self.current_token().type == TokenType.COMMA:
self.consume_token()
elif self.current_token().type != TokenType.RPAREN:
raise ValueError("Expected ',' or ')' in function arguments")
self.consume_token() # consume ')'
return self.functions[func_name](args)
def evaluate(self, formula: str) -> float:
"""Main evaluation function"""
self.tokens = self.tokenize(formula)
self.current_token_index = 0
try:
result = self.parse_expression()
if self.current_token().type != TokenType.EOF:
raise ValueError("Unexpected tokens after expression")
return result
except Exception as e:
raise ValueError(f"Formula evaluation error: {str(e)}")
# Test cases
def test_excelParser():
# Test data
cells = {
"A1": 10, "A2": 20, "A3": 30,
"B1": 5, "B2": 15, "B3": 25,
"C1": 2, "C2": 4, "C3": 6
}
parser = ExcelFormulaParser(cells)
test_cases = [
("=A1 + B1", 15),
("=A1 * B1 + C1", 52),
("=(A1 + B1) * C1", 30),
("=SUM(A1:A3)", 60),
("=AVERAGE(B1:B3)", 15),
("=MAX(A1:A3) + MIN(B1:B3)", 35),
("=SUM(A1:A2) + B1 * 2", 40),
("=A1 / C1 + B2", 20),
]
print("Testing Excel Formula Parser:")
for formula, expected in test_cases:
try:
result = parser.evaluate(formula)
status = "✓" if abs(result - expected) < 0.001 else "✗"
print(f"{status} {formula} = {result} (expected: {expected})")
except Exception as e:
print(f"✗ {formula} = ERROR: {e}")
test_excelParser()
Key Insights:
• Use recursive descent parsing for expression evaluation
• Handle operator precedence correctly
• Implement tokenization for clean parsing
• Support both single cells and ranges in functions
3. Problem-Solving Strategies {#strategies}
Algorithm Pattern Recognition
1. Two Pointers Pattern
• When to use: Array/string problems requiring comparison of elements
• Common problems: Two Sum, Remove Duplicates, Palindrome Check
• Key insight: Use two pointers moving towards each other or in same direction
• Time complexity: Usually O(n)
Python
def two_pointers_template(arr):
left, right = 0, len(arr) - 1
while left < right:
# Process current pair
if condition_met(arr[left], arr[right]):
return result
elif need_larger_sum:
left += 1
else:
right -= 1
return default_result
2. Sliding Window Pattern
• When to use: Subarray/substring problems with size constraints
• Common problems: Maximum Subarray, Longest Substring
• Key insight: Maintain window and expand/contract based on conditions
• Time complexity: O(n)
Python
def sliding_window_template(arr, k):
window_sum = sum(arr[:k])
max_sum = window_sum
for i in range(k, len(arr)):
window_sum = window_sum - arr[i-k] + arr[i]
max_sum = max(max_sum, window_sum)
return max_sum
3. Dynamic Programming Pattern
• When to use: Optimization problems with overlapping subproblems
• Common problems: Fibonacci, Coin Change, Edit Distance
• Key insight: Build solution incrementally using previous results
• Time complexity: Usually O(n²) or O(n*m)
Python
def dp_template(n):
# Initialize DP table
dp = [0] * (n + 1)
dp[0] = base_case
# Fill DP table
for i in range(1, n + 1):
for j in range(i): # Check all previous states
dp[i] = max(dp[i], dp[j] + transition_cost)
return dp[n]
4. Graph Traversal Patterns
• BFS: Shortest path, level-order traversal
• DFS: Path finding, cycle detection, topological sort
• Dijkstra: Weighted shortest path
• Union-Find: Connected components, cycle detection
Python
from collections import deque
def bfs_template(graph, start):
visited = set()
queue = deque([start])
visited.add(start)
while queue:
node = queue.popleft()
process(node)
for neighbor in graph[node]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
def dfs_template(graph, node, visited):
visited.add(node)
process(node)
for neighbor in graph[node]:
if neighbor not in visited:
dfs_template(graph, neighbor, visited)
Time and Space Complexity Analysis
Big O Notation Quick Reference:
• O(1): Constant time - hash table lookup, array access
• O(log n): Logarithmic - binary search, balanced tree operations
• O(n): Linear - single loop through data
• O(n log n): Linearithmic - efficient sorting algorithms
• O(n²): Quadratic - nested loops
• O(2 ⁿ ): Exponential - recursive algorithms without memoization
Space Complexity Considerations:
• Input space doesn't count towards space complexity
• Recursive call stack counts as O(depth) space
• Hash tables and arrays for memoization count as extra space
• In-place algorithms use O(1) extra space
Optimization Strategies:
1. Time-Space Tradeoffs: Use extra space to reduce time complexity
2. Memoization: Cache results to avoid recomputation
3. Early Termination: Stop when answer is found
4. Preprocessing: Sort or organize data for faster queries
Common Data Structures Usage
1. Hash Map/Set
• Use cases: Fast lookup, counting, deduplication
• Time complexity: O(1) average for insert/delete/search
• Common patterns: Two Sum, frequency counting, caching
2. Stack
• Use cases: Parentheses matching, expression evaluation, DFS
• Time complexity: O(1) for push/pop
• Common patterns: Valid parentheses, calculator problems
3. Queue/Deque
• Use cases: BFS, sliding window, level-order traversal
• Time complexity: O(1) for enqueue/dequeue
• Common patterns: Tree level traversal, shortest path
4. Heap (Priority Queue)
• Use cases: Finding min/max, k-th element problems
• Time complexity: O(log n) for insert/delete, O(1) for peek
• Common patterns: Top K elements, merge sorted arrays
5. Trie
• Use cases: String prefix matching, autocomplete
• Time complexity: O(m) for insert/search where m is string length
• Common patterns: Word search, prefix matching
Debugging and Testing Techniques
1. Test Case Design
• Edge cases: Empty input, single element, maximum constraints
• Boundary conditions: First/last elements, min/max values
• Invalid input: Null values, negative numbers, out of bounds
• Performance tests: Large inputs, worst-case scenarios
2. Debugging Strategies
• Print debugging: Add strategic print statements
• Dry run: Trace through algorithm with small examples
• Invariant checking: Verify assumptions at each step
• Binary search debugging: Isolate problematic sections
3. Code Review Checklist
• Correctness: Does it handle all edge cases?
• Efficiency: Is the time/space complexity optimal?
• Readability: Is the code clean and well-commented?
• Robustness: Does it handle invalid input gracefully?
4. OA Environment and Tips {#oa-tips}
Online Assessment Platform Navigation
Common OA Platforms:
1. HackerRank: Most common, supports multiple languages
2. Codility: Focus on correctness and performance
3. LeetCode: Similar to practice environment
4. Custom platforms: Company-specific interfaces
Platform-Specific Tips:
• Test your setup: Check compiler, input/output format
• Read instructions carefully: Understand submission requirements
• Use provided examples: Verify your understanding
• Check time limits: Usually 1-2 hours for 2-3 problems
Time Management Strategies
Recommended Time Allocation (90-minute OA):
• Problem 1 (Easy): 20-25 minutes
• Problem 2 (Medium): 30-35 minutes
• Problem 3 (Hard): 35-40 minutes
• Review and testing: 10-15 minutes
Time Management Techniques:
1. Quick scan: Read all problems first (5 minutes)
2. Start with easiest: Build confidence and secure points
3. Set time limits: Don't get stuck on one problem
4. Partial credit: Submit working solution even if not optimal
5. Test thoroughly: Use provided examples and edge cases
Common Pitfalls and How to Avoid Them
1. Misunderstanding the Problem
• Pitfall: Jumping to solution without full understanding
• Solution: Read problem statement twice, understand examples
• Check: Can you explain the problem in your own words?
2. Not Handling Edge Cases
• Pitfall: Solution works for examples but fails edge cases
• Solution: Consider empty input, single elements, boundary values
• Check: What happens with minimum/maximum constraints?
3. Inefficient Solutions
• Pitfall: Brute force when optimization is needed
• Solution: Analyze time complexity, look for patterns
• Check: Will your solution pass for maximum input size?
4. Implementation Errors
• Pitfall: Logic is correct but code has bugs
• Solution: Test with examples, use debugger
• Check: Trace through your code line by line
5. Time Management Issues
• Pitfall: Spending too much time on one problem
• Solution: Set time limits, move on if stuck
• Check: Are you making progress or spinning wheels?
Last-Minute Preparation Checklist
One Week Before:
Review fundamental algorithms and data structures
Practice company-specific problem types
Set up development environment
Practice time management with mock OAs
One Day Before:
Review common patterns and templates
Get good sleep (8+ hours)
Prepare workspace (quiet, good internet)
Test OA platform if possible
Day of OA:
Eat a good meal beforehand
Arrive early and test setup
Read all problems before starting
Stay calm and manage time effectively
Submit working solutions even if not perfect
Emergency Strategies:
• If stuck: Move to next problem, come back later
• If running out of time: Submit partial solutions
• If technical issues: Contact support immediately
• If nervous: Take deep breaths, focus on one problem at a time
Conclusion
This comprehensive collection of 50 real OA problems from top US tech companies provides
the essential preparation needed for 2025 new graduate software engineer positions. Each
problem includes detailed solutions, complexity analysis, and key insights that will help you
succeed in your technical interviews.
Key Takeaways:
1. Pattern Recognition: Most OA problems follow common algorithmic patterns
2. Time Management: Practice under time constraints to build speed and accuracy
3. Edge Cases: Always consider boundary conditions and invalid inputs
4. Code Quality: Write clean, readable code with proper variable names
5. Testing: Verify solutions with provided examples and additional test cases
Final Preparation Tips:
• Practice consistently over several weeks rather than cramming
• Focus on understanding patterns rather than memorizing solutions
• Simulate real OA conditions during practice sessions
• Review and learn from mistakes in practice problems
• Stay confident and trust your preparation during the actual assessment
Remember that OAs are just one part of the interview process. Strong performance here will
get you to the next round, where you can showcase your problem-solving skills and
technical knowledge in more depth.
Good luck with your 2025 fall recruitment journey!
This guide contains real problems collected from 2024 OA experiences. Problem statements
have been anonymized while preserving the core algorithmic challenges. Continue
practicing on platforms like LeetCode, HackerRank, and Codility to build additional
experience.
Problem 17: Teams Meeting Scheduler
Difficulty: Medium | Time Limit: 45 minutes | Company: Microsoft
Problem Statement:
Design a meeting scheduler for Microsoft Teams that finds the optimal meeting time for
multiple participants across different time zones. Given participants' availability and time
zones, find the best meeting slot that maximizes attendance.
Example:
Plain Text
Input:
participants = [
{"name": "Alice", "timezone": "PST", "available": ["09:00-12:00", "14:00-
17:00"]},
{"name": "Bob", "timezone": "EST", "available": ["10:00-15:00"]},
{"name": "Charlie", "timezone": "GMT", "available": ["14:00-18:00"]}
]
meeting_duration = 60 # minutes
Output: {
"time": "14:00 PST",
"attendees": ["Alice", "Bob", "Charlie"],
"attendance_rate": 1.0
}
Solution Approach:
This problem requires time zone conversion and interval intersection to find optimal
meeting times.
Python
from datetime import datetime, timedelta
from typing import List, Dict, Tuple
import pytz
class MeetingScheduler:
def __init__(self):
self.timezone_map = {
'PST': 'US/Pacific',
'EST': 'US/Eastern',
'GMT': 'GMT',
'CET': 'Europe/Berlin',
'JST': 'Asia/Tokyo'
}
def parse_time_slot(self, time_slot: str, timezone: str) ->
Tuple[datetime, datetime]:
"""Parse time slot string to datetime objects"""
start_str, end_str = time_slot.split('-')
# Create timezone object
tz = pytz.timezone(self.timezone_map[timezone])
# Parse times (assuming same day)
base_date = datetime.now().date()
start_time = datetime.strptime(f"{base_date} {start_str}", "%Y-%m-%d
%H:%M")
end_time = datetime.strptime(f"{base_date} {end_str}", "%Y-%m-%d
%H:%M")
# Localize to timezone
start_time = tz.localize(start_time)
end_time = tz.localize(end_time)
return start_time, end_time
def convert_to_utc(self, participants: List[Dict]) -> List[Dict]:
"""Convert all time slots to UTC for easier comparison"""
utc_participants = []
for participant in participants:
utc_slots = []
for slot in participant['available']:
start_time, end_time = self.parse_time_slot(slot,
participant['timezone'])
utc_start = start_time.astimezone(pytz.UTC)
utc_end = end_time.astimezone(pytz.UTC)
utc_slots.append((utc_start, utc_end))
utc_participants.append({
'name': participant['name'],
'timezone': participant['timezone'],
'utc_slots': utc_slots
})
return utc_participants
def find_intersections(self, participants: List[Dict], duration_minutes:
int) -> List[Dict]:
"""Find all possible meeting times that work for subsets of
participants"""
if not participants:
return []
# Convert duration to timedelta
duration = timedelta(minutes=duration_minutes)
# Collect all time points
time_points = set()
for participant in participants:
for start_time, end_time in participant['utc_slots']:
time_points.add(start_time)
time_points.add(end_time)
time_points = sorted(time_points)
# Find valid meeting slots
valid_slots = []
for i in range(len(time_points)):
for j in range(i + 1, len(time_points)):
slot_start = time_points[i]
slot_end = time_points[j]
# Check if slot is long enough
if slot_end - slot_start < duration:
continue
# Find participants available during this slot
available_participants = []
for participant in participants:
for start_time, end_time in participant['utc_slots']:
if start_time <= slot_start and slot_end <= end_time:
available_participants.append(participant['name'])
break
if available_participants:
valid_slots.append({
'start_time': slot_start,
'end_time': min(slot_start + duration, slot_end),
'attendees': available_participants,
'attendance_rate': len(available_participants) /
len(participants)
})
return valid_slots
def find_optimal_meeting_time(self, participants: List[Dict],
duration_minutes: int,
target_timezone: str = 'PST') -> Dict:
"""Find the optimal meeting time maximizing attendance"""
# Convert all times to UTC
utc_participants = self.convert_to_utc(participants)
# Find all valid meeting slots
valid_slots = self.find_intersections(utc_participants,
duration_minutes)
if not valid_slots:
return {"error": "No valid meeting time found"}
# Sort by attendance rate (descending) and then by time
valid_slots.sort(key=lambda x: (-x['attendance_rate'],
x['start_time']))
best_slot = valid_slots[0]
# Convert back to target timezone
target_tz = pytz.timezone(self.timezone_map[target_timezone])
local_time = best_slot['start_time'].astimezone(target_tz)
return {
'time': local_time.strftime('%H:%M %Z'),
'attendees': best_slot['attendees'],
'attendance_rate': best_slot['attendance_rate'],
'duration_minutes': duration_minutes
}
def suggest_alternative_times(self, participants: List[Dict],
duration_minutes: int,
count: int = 3) -> List[Dict]:
"""Suggest multiple alternative meeting times"""
utc_participants = self.convert_to_utc(participants)
valid_slots = self.find_intersections(utc_participants,
duration_minutes)
# Sort and return top alternatives
valid_slots.sort(key=lambda x: (-x['attendance_rate'],
x['start_time']))
alternatives = []
for slot in valid_slots[:count]:
alternatives.append({
'utc_time': slot['start_time'].strftime('%H:%M UTC'),
'attendees': slot['attendees'],
'attendance_rate': slot['attendance_rate']
})
return alternatives
# Test cases
def test_meetingScheduler():
scheduler = MeetingScheduler()
participants = [
{"name": "Alice", "timezone": "PST", "available": ["09:00-12:00",
"14:00-17:00"]},
{"name": "Bob", "timezone": "EST", "available": ["10:00-15:00"]},
{"name": "Charlie", "timezone": "GMT", "available": ["14:00-18:00"]}
]
print("Testing Meeting Scheduler:")
print(f"Participants: {[p['name'] for p in participants]}")
# Find optimal meeting time
result = scheduler.find_optimal_meeting_time(participants, 60)
print(f"Optimal meeting time: {result}")
# Get alternative suggestions
alternatives = scheduler.suggest_alternative_times(participants, 60, 3)
print(f"Alternative times: {alternatives}")
# Note: This is a simplified version without actual pytz library
# In real implementation, you would need to handle timezone conversions
properly
Key Insights:
• Convert all times to UTC for consistent comparison
• Use interval intersection algorithms to find overlaps
• Consider attendance rate as optimization metric
• Handle edge cases like no available slots
Problem 18: OneDrive File Synchronization
Difficulty: Hard | Time Limit: 60 minutes | Company: Microsoft
Problem Statement:
Design a file synchronization algorithm for OneDrive that efficiently syncs files between
local and cloud storage. Handle file conflicts, version control, and minimize bandwidth
usage.
Example:
Plain Text
Input:
local_files = {
"doc1.txt": {"size": 1024, "modified": "2024-01-15T10:00:00", "hash":
"abc123"},
"doc2.txt": {"size": 2048, "modified": "2024-01-14T15:30:00", "hash":
"def456"}
}
cloud_files = {
"doc1.txt": {"size": 1024, "modified": "2024-01-15T09:00:00", "hash":
"xyz789"},
"doc3.txt": {"size": 512, "modified": "2024-01-16T12:00:00", "hash":
"ghi012"}
}
Output: {
"sync_plan": [
{"action": "upload", "file": "doc1.txt", "reason": "local_newer"},
{"action": "download", "file": "doc3.txt", "reason": "cloud_only"},
{"action": "keep_local", "file": "doc2.txt", "reason": "local_only"}
],
"bandwidth_estimate": 1536
}
Solution Approach:
This problem requires implementing a comprehensive file synchronization algorithm with
conflict resolution.
Python
from datetime import datetime
from typing import Dict, List, Tuple
from enum import Enum
import hashlib
class SyncAction(Enum):
UPLOAD = "upload"
DOWNLOAD = "download"
DELETE_LOCAL = "delete_local"
DELETE_CLOUD = "delete_cloud"
KEEP_LOCAL = "keep_local"
KEEP_CLOUD = "keep_cloud"
CONFLICT = "conflict"
class FileInfo:
def __init__(self, name: str, size: int, modified: str, hash_value: str):
self.name = name
self.size = size
self.modified = datetime.fromisoformat(modified.replace('Z',
'+00:00'))
self.hash = hash_value
def __repr__(self):
return f"FileInfo({self.name}, {self.size}, {self.modified},
{self.hash})"
class OneDriveSyncEngine:
def __init__(self, conflict_resolution: str = "newer_wins"):
self.conflict_resolution = conflict_resolution # "newer_wins",
"manual", "keep_both"
self.sync_history = {} # Track sync operations
def calculate_file_hash(self, content: bytes) -> str:
"""Calculate SHA-256 hash of file content"""
return hashlib.sha256(content).hexdigest()[:16]
def parse_files(self, files_dict: Dict) -> Dict[str, FileInfo]:
"""Convert file dictionary to FileInfo objects"""
return {
name: FileInfo(name, info["size"], info["modified"],
info["hash"])
for name, info in files_dict.items()
}
def detect_conflicts(self, local_file: FileInfo, cloud_file: FileInfo) ->
bool:
"""Detect if files are in conflict (different content, both
modified)"""
return (local_file.hash != cloud_file.hash and
local_file.modified != cloud_file.modified)
def resolve_conflict(self, local_file: FileInfo, cloud_file: FileInfo) ->
SyncAction:
"""Resolve file conflicts based on strategy"""
if self.conflict_resolution == "newer_wins":
if local_file.modified > cloud_file.modified:
return SyncAction.UPLOAD
elif cloud_file.modified > local_file.modified:
return SyncAction.DOWNLOAD
else:
# Same timestamp, prefer larger file or use hash comparison
return SyncAction.UPLOAD if local_file.size >=
cloud_file.size else SyncAction.DOWNLOAD
elif self.conflict_resolution == "manual":
return SyncAction.CONFLICT
elif self.conflict_resolution == "keep_both":
return SyncAction.CONFLICT # Will create renamed copies
return SyncAction.UPLOAD # Default to upload
def create_sync_plan(self, local_files: Dict, cloud_files: Dict) -> Dict:
"""Create comprehensive synchronization plan"""
local_file_objects = self.parse_files(local_files)
cloud_file_objects = self.parse_files(cloud_files)
sync_actions = []
bandwidth_estimate = 0
conflicts = []
# Get all unique file names
all_files = set(local_file_objects.keys()) |
set(cloud_file_objects.keys())
for filename in all_files:
local_file = local_file_objects.get(filename)
cloud_file = cloud_file_objects.get(filename)
if local_file and cloud_file:
# File exists in both locations
if local_file.hash == cloud_file.hash:
# Files are identical, no action needed
continue
elif self.detect_conflicts(local_file, cloud_file):
# Handle conflict
action = self.resolve_conflict(local_file, cloud_file)
if action == SyncAction.CONFLICT:
conflicts.append({
"file": filename,
"local_modified":
local_file.modified.isoformat(),
"cloud_modified":
cloud_file.modified.isoformat(),
"local_size": local_file.size,
"cloud_size": cloud_file.size
})
continue
else:
# Determine action based on modification time
if local_file.modified > cloud_file.modified:
action = SyncAction.UPLOAD
else:
action = SyncAction.DOWNLOAD
# Add to sync plan
sync_actions.append({
"action": action.value,
"file": filename,
"reason": "conflict_resolved" if
self.detect_conflicts(local_file, cloud_file) else "version_mismatch"
})
# Estimate bandwidth
if action == SyncAction.UPLOAD:
bandwidth_estimate += local_file.size
elif action == SyncAction.DOWNLOAD:
bandwidth_estimate += cloud_file.size
elif local_file and not cloud_file:
# File only exists locally
sync_actions.append({
"action": SyncAction.UPLOAD.value,
"file": filename,
"reason": "local_only"
})
bandwidth_estimate += local_file.size
elif cloud_file and not local_file:
# File only exists in cloud
sync_actions.append({
"action": SyncAction.DOWNLOAD.value,
"file": filename,
"reason": "cloud_only"
})
bandwidth_estimate += cloud_file.size
return {
"sync_plan": sync_actions,
"bandwidth_estimate": bandwidth_estimate,
"conflicts": conflicts,
"total_files": len(all_files),
"actions_count": len(sync_actions)
}
def optimize_sync_order(self, sync_plan: List[Dict]) -> List[Dict]:
"""Optimize sync order to minimize conflicts and maximize
efficiency"""
# Prioritize downloads first (to get latest cloud changes)
# Then uploads (to push local changes)
# Sort by file size (smaller files first for quick wins)
def sort_key(action):
priority = {
"download": 1,
"upload": 2,
"delete_cloud": 3,
"delete_local": 4
}
return (priority.get(action["action"], 5), action.get("size", 0))
return sorted(sync_plan, key=sort_key)
def estimate_sync_time(self, bandwidth_bytes: int, connection_speed_mbps:
float = 10.0) -> float:
"""Estimate sync time based on bandwidth and connection speed"""
connection_speed_bps = connection_speed_mbps * 1024 * 1024 / 8 #
Convert to bytes per second
return bandwidth_bytes / connection_speed_bps if connection_speed_bps
> 0 else 0
def create_incremental_sync_plan(self, local_files: Dict, cloud_files:
Dict,
last_sync_timestamp: str) -> Dict:
"""Create sync plan for incremental sync (only changed files)"""
last_sync = datetime.fromisoformat(last_sync_timestamp.replace('Z',
'+00:00'))
# Filter files modified since last sync
recent_local = {
name: info for name, info in local_files.items()
if datetime.fromisoformat(info["modified"].replace('Z',
'+00:00')) > last_sync
}
recent_cloud = {
name: info for name, info in cloud_files.items()
if datetime.fromisoformat(info["modified"].replace('Z',
'+00:00')) > last_sync
}
return self.create_sync_plan(recent_local, recent_cloud)
# Test cases
def test_oneDriveSync():
sync_engine = OneDriveSyncEngine(conflict_resolution="newer_wins")
local_files = {
"doc1.txt": {"size": 1024, "modified": "2024-01-15T10:00:00", "hash":
"abc123"},
"doc2.txt": {"size": 2048, "modified": "2024-01-14T15:30:00", "hash":
"def456"},
"doc4.txt": {"size": 512, "modified": "2024-01-16T08:00:00", "hash":
"jkl345"}
}
cloud_files = {
"doc1.txt": {"size": 1024, "modified": "2024-01-15T09:00:00", "hash":
"xyz789"},
"doc3.txt": {"size": 512, "modified": "2024-01-16T12:00:00", "hash":
"ghi012"},
"doc4.txt": {"size": 512, "modified": "2024-01-16T10:00:00", "hash":
"mno678"}
}
print("Testing OneDrive Sync Engine:")
print(f"Local files: {list(local_files.keys())}")
print(f"Cloud files: {list(cloud_files.keys())}")
# Create sync plan
result = sync_engine.create_sync_plan(local_files, cloud_files)
print(f"\nSync Plan:")
for action in result["sync_plan"]:
print(f" {action}")
print(f"\nBandwidth estimate: {result['bandwidth_estimate']} bytes")
print(f"Conflicts: {len(result['conflicts'])}")
# Test incremental sync
incremental_result = sync_engine.create_incremental_sync_plan(
local_files, cloud_files, "2024-01-15T00:00:00"
)
print(f"\nIncremental sync actions:
{len(incremental_result['sync_plan'])}")
test_oneDriveSync()
Key Insights:
• Use file hashes to detect content changes accurately
• Implement conflict resolution strategies (newer wins, manual, keep both)
• Optimize sync order for efficiency and user experience
• Support incremental sync to minimize bandwidth usage
Problem 19: Azure Resource Allocation
Difficulty: Hard | Time Limit: 60 minutes | Company: Microsoft
Problem Statement:
Design an Azure resource allocation system that optimally distributes computing resources
across multiple data centers. The system needs to:
1. Handle dynamic resource requests (CPU, memory, storage)
2. Optimize for cost, performance, and availability
3. Consider geographic constraints and latency requirements
4. Implement auto-scaling and load balancing
5. Handle resource failures and migrations
Example:
Plain Text
Input:
resource_requests = [
{"request_id": "req1", "cpu": 4, "memory": 8, "storage": 100,
"region": "us-east", "priority": "high", "duration": 3600},
{"request_id": "req2", "cpu": 2, "memory": 4, "storage": 50,
"region": "eu-west", "priority": "medium", "duration": 7200}
]
data_centers = [
{"dc_id": "dc1", "region": "us-east", "available_cpu": 100,
"available_memory": 200, "available_storage": 1000, "cost_per_hour": 0.1},
{"dc_id": "dc2", "region": "eu-west", "available_cpu": 80,
"available_memory": 160, "available_storage": 800, "cost_per_hour": 0.12}
]
Output: {
"allocations": [
{"request_id": "req1", "dc_id": "dc1", "cost": 1.2, "latency": 5},
{"request_id": "req2", "dc_id": "dc2", "cost": 0.96, "latency": 8}
],
"total_cost": 2.16, "utilization": {"dc1": 0.12, "dc2": 0.15}
}
Solution Approach:
This problem requires multi-objective optimization with constraints and real-time resource
management.
Python
import heapq
import time
from typing import List, Dict, Tuple, Optional, Set
from dataclasses import dataclass, field
from enum import Enum
from collections import defaultdict
import math
class Priority(Enum):
LOW = 1
MEDIUM = 2
HIGH = 3
CRITICAL = 4
class ResourceType(Enum):
CPU = "cpu"
MEMORY = "memory"
STORAGE = "storage"
NETWORK = "network"
@dataclass
class ResourceRequest:
request_id: str
cpu: int
memory: int # GB
storage: int # GB
region: str
priority: Priority
duration: int # seconds
max_latency: int = 100 # ms
arrival_time: float = field(default_factory=time.time)
deadline: Optional[float] = None
def get_resource_vector(self) -> Dict[str, int]:
return {
ResourceType.CPU.value: self.cpu,
ResourceType.MEMORY.value: self.memory,
ResourceType.STORAGE.value: self.storage
}
@dataclass
class DataCenter:
dc_id: str
region: str
location: Tuple[float, float] # (lat, lng)
total_cpu: int
total_memory: int
total_storage: int
available_cpu: int
available_memory: int
available_storage: int
cost_per_cpu_hour: float
cost_per_memory_hour: float
cost_per_storage_hour: float
power_efficiency: float = 1.0 # PUE (Power Usage Effectiveness)
carbon_intensity: float = 0.5 # kg CO2/kWh
def get_utilization(self) -> Dict[str, float]:
return {
ResourceType.CPU.value: (self.total_cpu - self.available_cpu) /
self.total_cpu,
ResourceType.MEMORY.value: (self.total_memory -
self.available_memory) / self.total_memory,
ResourceType.STORAGE.value: (self.total_storage -
self.available_storage) / self.total_storage
}
def can_accommodate(self, request: ResourceRequest) -> bool:
return (self.available_cpu >= request.cpu and
self.available_memory >= request.memory and
self.available_storage >= request.storage)
def calculate_cost(self, request: ResourceRequest) -> float:
duration_hours = request.duration / 3600
return (request.cpu * self.cost_per_cpu_hour * duration_hours +
request.memory * self.cost_per_memory_hour * duration_hours +
request.storage * self.cost_per_storage_hour *
duration_hours)
@dataclass
class Allocation:
request_id: str
dc_id: str
allocated_resources: Dict[str, int]
cost: float
latency: float
start_time: float
end_time: float
carbon_footprint: float = 0.0
class AzureResourceAllocator:
def __init__(self):
self.data_centers: Dict[str, DataCenter] = {}
self.active_allocations: Dict[str, Allocation] = {}
self.allocation_history: List[Allocation] = []
self.region_latency_matrix: Dict[Tuple[str, str], float] = {}
# Optimization weights
self.weights = {
'cost': 0.4,
'latency': 0.3,
'utilization': 0.2,
'carbon': 0.1
}
# Auto-scaling thresholds
self.scale_up_threshold = 0.8
self.scale_down_threshold = 0.3
def add_data_center(self, dc: DataCenter):
"""Add a data center to the system"""
self.data_centers[dc.dc_id] = dc
def set_region_latency(self, region1: str, region2: str, latency: float):
"""Set latency between regions"""
self.region_latency_matrix[(region1, region2)] = latency
self.region_latency_matrix[(region2, region1)] = latency
def get_latency(self, dc_region: str, request_region: str) -> float:
"""Get latency between data center and request region"""
if dc_region == request_region:
return 5.0 # Same region latency
return self.region_latency_matrix.get((dc_region, request_region),
100.0)
def calculate_allocation_score(self, request: ResourceRequest, dc:
DataCenter) -> float:
"""Calculate allocation score based on multiple criteria"""
if not dc.can_accommodate(request):
return float('-inf')
# Cost factor (normalized)
cost = dc.calculate_cost(request)
max_cost = max(dc.calculate_cost(request) for dc in
self.data_centers.values()
if dc.can_accommodate(request))
cost_score = 1.0 - (cost / max_cost) if max_cost > 0 else 1.0
# Latency factor
latency = self.get_latency(dc.region, request.region)
latency_score = max(0, 1.0 - (latency / request.max_latency))
# Utilization factor (prefer balanced utilization)
utilization = dc.get_utilization()
avg_utilization = sum(utilization.values()) / len(utilization)
# After allocation utilization
future_cpu_util = (dc.total_cpu - dc.available_cpu + request.cpu) /
dc.total_cpu
future_mem_util = (dc.total_memory - dc.available_memory +
request.memory) / dc.total_memory
future_storage_util = (dc.total_storage - dc.available_storage +
request.storage) / dc.total_storage
future_avg_util = (future_cpu_util + future_mem_util +
future_storage_util) / 3
# Prefer utilization around 70% (not too high, not too low)
optimal_utilization = 0.7
utilization_score = 1.0 - abs(future_avg_util - optimal_utilization)
# Carbon footprint factor
carbon_score = 1.0 - dc.carbon_intensity
# Priority boost
priority_multiplier = {
Priority.LOW: 1.0,
Priority.MEDIUM: 1.2,
Priority.HIGH: 1.5,
Priority.CRITICAL: 2.0
}[request.priority]
# Weighted score
total_score = (
self.weights['cost'] * cost_score +
self.weights['latency'] * latency_score +
self.weights['utilization'] * utilization_score +
self.weights['carbon'] * carbon_score
) * priority_multiplier
return total_score
def find_best_allocation(self, request: ResourceRequest) ->
Optional[Tuple[DataCenter, float]]:
"""Find the best data center for a resource request"""
best_dc = None
best_score = float('-inf')
for dc in self.data_centers.values():
score = self.calculate_allocation_score(request, dc)
if score > best_score:
best_score = score
best_dc = dc
return (best_dc, best_score) if best_dc else None
def allocate_resources(self, request: ResourceRequest) ->
Optional[Allocation]:
"""Allocate resources for a single request"""
allocation_result = self.find_best_allocation(request)
if not allocation_result:
return None
dc, score = allocation_result
# Check deadline constraint
current_time = time.time()
if request.deadline and current_time > request.deadline:
return None
# Create allocation
cost = dc.calculate_cost(request)
latency = self.get_latency(dc.region, request.region)
# Calculate carbon footprint
duration_hours = request.duration / 3600
power_consumption = (request.cpu * 100 + request.memory * 10) *
duration_hours # Simplified model
carbon_footprint = power_consumption * dc.carbon_intensity *
dc.power_efficiency
allocation = Allocation(
request_id=request.request_id,
dc_id=dc.dc_id,
allocated_resources=request.get_resource_vector(),
cost=cost,
latency=latency,
start_time=current_time,
end_time=current_time + request.duration,
carbon_footprint=carbon_footprint
)
# Update data center resources
dc.available_cpu -= request.cpu
dc.available_memory -= request.memory
dc.available_storage -= request.storage
# Store allocation
self.active_allocations[request.request_id] = allocation
self.allocation_history.append(allocation)
return allocation
def batch_allocate(self, requests: List[ResourceRequest]) -> Dict[str,
Optional[Allocation]]:
"""Allocate resources for multiple requests efficiently"""
# Sort requests by priority and deadline
sorted_requests = sorted(
requests,
key=lambda r: (
-r.priority.value, # Higher priority first
r.deadline if r.deadline else float('inf'), # Earlier
deadline first
r.arrival_time # Earlier arrival first
)
)
allocations = {}
for request in sorted_requests:
allocation = self.allocate_resources(request)
allocations[request.request_id] = allocation
return allocations
def deallocate_resources(self, request_id: str) -> bool:
"""Deallocate resources when request completes"""
if request_id not in self.active_allocations:
return False
allocation = self.active_allocations[request_id]
dc = self.data_centers[allocation.dc_id]
# Return resources to data center
resources = allocation.allocated_resources
dc.available_cpu += resources[ResourceType.CPU.value]
dc.available_memory += resources[ResourceType.MEMORY.value]
dc.available_storage += resources[ResourceType.STORAGE.value]
# Remove from active allocations
del self.active_allocations[request_id]
return True
def auto_scale_check(self) -> List[Dict]:
"""Check if auto-scaling is needed"""
scaling_actions = []
for dc in self.data_centers.values():
utilization = dc.get_utilization()
avg_utilization = sum(utilization.values()) / len(utilization)
if avg_utilization > self.scale_up_threshold:
# Scale up recommendation
scaling_actions.append({
'dc_id': dc.dc_id,
'action': 'scale_up',
'current_utilization': avg_utilization,
'recommended_increase': 0.2 # 20% increase
})
elif avg_utilization < self.scale_down_threshold:
# Scale down recommendation
scaling_actions.append({
'dc_id': dc.dc_id,
'action': 'scale_down',
'current_utilization': avg_utilization,
'recommended_decrease': 0.1 # 10% decrease
})
return scaling_actions
def migrate_allocation(self, request_id: str, target_dc_id: str) -> bool:
"""Migrate an allocation to a different data center"""
if request_id not in self.active_allocations:
return False
allocation = self.active_allocations[request_id]
source_dc = self.data_centers[allocation.dc_id]
target_dc = self.data_centers[target_dc_id]
resources = allocation.allocated_resources
# Check if target DC can accommodate
if not (target_dc.available_cpu >= resources[ResourceType.CPU.value]
and
target_dc.available_memory >=
resources[ResourceType.MEMORY.value] and
target_dc.available_storage >=
resources[ResourceType.STORAGE.value]):
return False
# Deallocate from source
source_dc.available_cpu += resources[ResourceType.CPU.value]
source_dc.available_memory += resources[ResourceType.MEMORY.value]
source_dc.available_storage += resources[ResourceType.STORAGE.value]
# Allocate to target
target_dc.available_cpu -= resources[ResourceType.CPU.value]
target_dc.available_memory -= resources[ResourceType.MEMORY.value]
target_dc.available_storage -= resources[ResourceType.STORAGE.value]
# Update allocation
allocation.dc_id = target_dc_id
return True
def get_system_metrics(self) -> Dict:
"""Get comprehensive system metrics"""
total_cost = sum(alloc.cost for alloc in
self.active_allocations.values())
total_carbon = sum(alloc.carbon_footprint for alloc in
self.active_allocations.values())
dc_utilizations = {}
for dc_id, dc in self.data_centers.items():
dc_utilizations[dc_id] = dc.get_utilization()
avg_latency = sum(alloc.latency for alloc in
self.active_allocations.values()) / max(1, len(self.active_allocations))
return {
'active_allocations': len(self.active_allocations),
'total_cost': round(total_cost, 2),
'total_carbon_footprint': round(total_carbon, 2),
'average_latency': round(avg_latency, 2),
'data_center_utilizations': dc_utilizations,
'scaling_recommendations': self.auto_scale_check()
}
def optimize_existing_allocations(self) -> List[Dict]:
"""Optimize existing allocations through migration"""
optimization_actions = []
for request_id, allocation in self.active_allocations.items():
current_dc = self.data_centers[allocation.dc_id]
# Find potentially better data centers
for dc_id, dc in self.data_centers.items():
if dc_id == allocation.dc_id:
continue
resources = allocation.allocated_resources
# Check if migration would improve the allocation
if (dc.available_cpu >= resources[ResourceType.CPU.value] and
dc.available_memory >=
resources[ResourceType.MEMORY.value] and
dc.available_storage >=
resources[ResourceType.STORAGE.value]):
# Calculate improvement score
current_util = sum(current_dc.get_utilization().values())
/ 3
target_util = sum(dc.get_utilization().values()) / 3
# Prefer migration if it balances utilization better
if abs(target_util - 0.7) < abs(current_util - 0.7):
optimization_actions.append({
'request_id': request_id,
'current_dc': allocation.dc_id,
'target_dc': dc_id,
'improvement_score': abs(current_util - 0.7) -
abs(target_util - 0.7)
})
# Sort by improvement score and return top recommendations
optimization_actions.sort(key=lambda x: x['improvement_score'],
reverse=True)
return optimization_actions[:5] # Top 5 recommendations
# Test the Azure resource allocator
def test_azure_allocator():
allocator = AzureResourceAllocator()
# Add data centers
dc1 = DataCenter(
dc_id="dc1", region="us-east", location=(40.7128, -74.0060),
total_cpu=1000, total_memory=2000, total_storage=10000,
available_cpu=800, available_memory=1600, available_storage=8000,
cost_per_cpu_hour=0.1, cost_per_memory_hour=0.05,
cost_per_storage_hour=0.01,
carbon_intensity=0.4
)
dc2 = DataCenter(
dc_id="dc2", region="eu-west", location=(51.5074, -0.1278),
total_cpu=800, total_memory=1600, total_storage=8000,
available_cpu=600, available_memory=1200, available_storage=6000,
cost_per_cpu_hour=0.12, cost_per_memory_hour=0.06,
cost_per_storage_hour=0.012,
carbon_intensity=0.3
)
allocator.add_data_center(dc1)
allocator.add_data_center(dc2)
# Set region latencies
allocator.set_region_latency("us-east", "eu-west", 80)
# Create resource requests
requests = [
ResourceRequest("req1", 50, 100, 500, "us-east", Priority.HIGH,
3600),
ResourceRequest("req2", 30, 60, 300, "eu-west", Priority.MEDIUM,
7200),
ResourceRequest("req3", 20, 40, 200, "us-east", Priority.LOW, 1800)
]
print("Testing Azure Resource Allocator:")
print(f"Data centers: {len(allocator.data_centers)}")
print(f"Resource requests: {len(requests)}")
# Batch allocate resources
allocations = allocator.batch_allocate(requests)
print("\nAllocation Results:")
for request_id, allocation in allocations.items():
if allocation:
print(f"Request {request_id}:")
print(f" Allocated to: {allocation.dc_id}")
print(f" Cost: ${allocation.cost:.2f}")
print(f" Latency: {allocation.latency:.1f}ms")
print(f" Carbon footprint: {allocation.carbon_footprint:.2f}kg
CO2")
else:
print(f"Request {request_id}: Failed to allocate")
# Get system metrics
metrics = allocator.get_system_metrics()
print(f"\nSystem Metrics:")
print(f" Active allocations: {metrics['active_allocations']}")
print(f" Total cost: ${metrics['total_cost']}")
print(f" Total carbon footprint: {metrics['total_carbon_footprint']}kg
CO2")
print(f" Average latency: {metrics['average_latency']}ms")
print(f"\nData Center Utilizations:")
for dc_id, utilization in metrics['data_center_utilizations'].items():
print(f" {dc_id}: CPU {utilization['cpu']:.1%}, Memory
{utilization['memory']:.1%}, Storage {utilization['storage']:.1%}")
# Test optimization
optimizations = allocator.optimize_existing_allocations()
if optimizations:
print(f"\nOptimization Recommendations:")
for opt in optimizations:
print(f" Migrate {opt['request_id']} from {opt['current_dc']} to
{opt['target_dc']} (score: {opt['improvement_score']:.3f})")
test_azure_allocator()
Key Insights:
• Multi-objective optimization balancing cost, latency, utilization, and carbon footprint
• Dynamic resource allocation with auto-scaling recommendations
• Migration capabilities for load balancing and optimization
• Real-time metrics and monitoring for system health
Problem 20: Office Document Version Control
Difficulty: Medium | Time Limit: 45 minutes | Company: Microsoft
Problem Statement:
Design a version control system for Microsoft Office documents that handles:
1. Document versioning with branching and merging
2. Collaborative editing with conflict resolution
3. Change tracking and diff visualization
4. Rollback and restore capabilities
5. Access control and permissions
Example:
Plain Text
Input:
document = {
"doc_id": "doc123", "title": "Project Proposal",
"content": "Initial content", "version": "1.0",
"author": "user1", "collaborators": ["user2", "user3"]
}
changes = [
{"user": "user2", "operation": "insert", "position": 15, "text": " with
details"},
{"user": "user3", "operation": "replace", "start": 0, "end": 7, "text":
"Updated"}
]
Output: {
"new_version": "1.1", "conflicts": [],
"merged_content": "Updated content with details",
"change_log": ["user2 inserted text", "user3 replaced text"]
}
Solution Approach:
This problem requires implementing operational transformation and conflict resolution
algorithms.
Python
import time
import hashlib
import json
from typing import List, Dict, Tuple, Optional, Set
from dataclasses import dataclass, field
from enum import Enum
from collections import defaultdict
import difflib
class OperationType(Enum):
INSERT = "insert"
DELETE = "delete"
REPLACE = "replace"
FORMAT = "format"
class PermissionLevel(Enum):
READ = "read"
COMMENT = "comment"
EDIT = "edit"
ADMIN = "admin"
@dataclass
class Operation:
op_id: str
user_id: str
op_type: OperationType
position: int
length: int = 0
content: str = ""
timestamp: float = field(default_factory=time.time)
metadata: Dict = field(default_factory=dict)
def to_dict(self) -> Dict:
return {
'op_id': self.op_id,
'user_id': self.user_id,
'op_type': self.op_type.value,
'position': self.position,
'length': self.length,
'content': self.content,
'timestamp': self.timestamp,
'metadata': self.metadata
}
@dataclass
class DocumentVersion:
version_id: str
parent_version: Optional[str]
content: str
operations: List[Operation]
author: str
timestamp: float
checksum: str
branch: str = "main"
def calculate_checksum(self) -> str:
content_hash = hashlib.sha256(self.content.encode()).hexdigest()
return content_hash[:16]
@dataclass
class User:
user_id: str
name: str
email: str
permission: PermissionLevel
@dataclass
class Conflict:
conflict_id: str
operation1: Operation
operation2: Operation
conflict_type: str
resolution_strategy: Optional[str] = None
resolved: bool = False
class OfficeDocumentVersionControl:
def __init__(self):
self.documents: Dict[str, Dict] = {}
self.versions: Dict[str, Dict[str, DocumentVersion]] = {}
self.users: Dict[str, User] = {}
self.active_sessions: Dict[str, Set[str]] = defaultdict(set)
self.operation_queue: Dict[str, List[Operation]] = defaultdict(list)
def create_document(self, doc_id: str, title: str, content: str,
author_id: str) -> DocumentVersion:
"""Create a new document"""
if doc_id in self.documents:
raise ValueError(f"Document {doc_id} already exists")
# Create initial version
version_id = f"{doc_id}_v1.0"
initial_version = DocumentVersion(
version_id=version_id,
parent_version=None,
content=content,
operations=[],
author=author_id,
timestamp=time.time(),
checksum="",
branch="main"
)
initial_version.checksum = initial_version.calculate_checksum()
# Store document metadata
self.documents[doc_id] = {
'title': title,
'current_version': version_id,
'branches': {'main': version_id},
'collaborators': {author_id: PermissionLevel.ADMIN},
'created_at': time.time(),
'modified_at': time.time()
}
# Store version
if doc_id not in self.versions:
self.versions[doc_id] = {}
self.versions[doc_id][version_id] = initial_version
return initial_version
def add_user(self, user: User):
"""Add a user to the system"""
self.users[user.user_id] = user
def grant_permission(self, doc_id: str, user_id: str, permission:
PermissionLevel):
"""Grant permission to a user for a document"""
if doc_id not in self.documents:
raise ValueError(f"Document {doc_id} not found")
self.documents[doc_id]['collaborators'][user_id] = permission
def check_permission(self, doc_id: str, user_id: str,
required_permission: PermissionLevel) -> bool:
"""Check if user has required permission"""
if doc_id not in self.documents:
return False
user_permission = self.documents[doc_id]
['collaborators'].get(user_id)
if not user_permission:
return False
permission_hierarchy = {
PermissionLevel.READ: 1,
PermissionLevel.COMMENT: 2,
PermissionLevel.EDIT: 3,
PermissionLevel.ADMIN: 4
}
return permission_hierarchy[user_permission] >=
permission_hierarchy[required_permission]
def transform_operation(self, op1: Operation, op2: Operation) ->
Tuple[Operation, Operation]:
"""Operational transformation for concurrent operations"""
# Create copies to avoid modifying originals
transformed_op1 = Operation(
op1.op_id, op1.user_id, op1.op_type, op1.position,
op1.length, op1.content, op1.timestamp, op1.metadata.copy()
)
transformed_op2 = Operation(
op2.op_id, op2.user_id, op2.op_type, op2.position,
op2.length, op2.content, op2.timestamp, op2.metadata.copy()
)
# Transform based on operation types and positions
if op1.op_type == OperationType.INSERT and op2.op_type ==
OperationType.INSERT:
if op1.position <= op2.position:
transformed_op2.position += len(op1.content)
else:
transformed_op1.position += len(op2.content)
elif op1.op_type == OperationType.DELETE and op2.op_type ==
OperationType.DELETE:
if op1.position < op2.position:
transformed_op2.position -= op1.length
elif op1.position > op2.position:
transformed_op1.position -= op2.length
# If positions overlap, need conflict resolution
elif op1.op_type == OperationType.INSERT and op2.op_type ==
OperationType.DELETE:
if op1.position <= op2.position:
transformed_op2.position += len(op1.content)
elif op1.position <= op2.position + op2.length:
# Insert within delete range - complex transformation
pass
elif op1.op_type == OperationType.DELETE and op2.op_type ==
OperationType.INSERT:
if op2.position <= op1.position:
transformed_op1.position += len(op2.content)
elif op2.position <= op1.position + op1.length:
# Insert within delete range
pass
return transformed_op1, transformed_op2
def detect_conflicts(self, operations: List[Operation]) ->
List[Conflict]:
"""Detect conflicts between operations"""
conflicts = []
for i, op1 in enumerate(operations):
for j, op2 in enumerate(operations[i+1:], i+1):
conflict_type = None
# Check for overlapping positions
if (op1.op_type in [OperationType.DELETE,
OperationType.REPLACE] and
op2.op_type in [OperationType.DELETE,
OperationType.REPLACE]):
op1_end = op1.position + op1.length
op2_end = op2.position + op2.length
# Check for overlap
if not (op1_end <= op2.position or op2_end <=
op1.position):
conflict_type = "overlapping_modifications"
# Check for simultaneous edits at same position
elif (op1.position == op2.position and
op1.op_type == op2.op_type == OperationType.INSERT):
conflict_type = "simultaneous_insert"
if conflict_type:
conflict = Conflict(
conflict_id=f"conflict_{i}_{j}",
operation1=op1,
operation2=op2,
conflict_type=conflict_type
)
conflicts.append(conflict)
return conflicts
def resolve_conflicts(self, conflicts: List[Conflict], strategy: str =
"timestamp") -> List[Operation]:
"""Resolve conflicts using specified strategy"""
resolved_operations = []
for conflict in conflicts:
if strategy == "timestamp":
# Earlier operation wins
if conflict.operation1.timestamp <=
conflict.operation2.timestamp:
resolved_operations.append(conflict.operation1)
else:
resolved_operations.append(conflict.operation2)
elif strategy == "user_priority":
# Admin users have priority
user1_permission =
self.users.get(conflict.operation1.user_id)
user2_permission =
self.users.get(conflict.operation2.user_id)
if (user1_permission and user1_permission.permission ==
PermissionLevel.ADMIN):
resolved_operations.append(conflict.operation1)
elif (user2_permission and user2_permission.permission ==
PermissionLevel.ADMIN):
resolved_operations.append(conflict.operation2)
else:
# Fall back to timestamp
if conflict.operation1.timestamp <=
conflict.operation2.timestamp:
resolved_operations.append(conflict.operation1)
else:
resolved_operations.append(conflict.operation2)
elif strategy == "merge":
# Attempt to merge both operations
if conflict.conflict_type == "simultaneous_insert":
# Combine insertions
merged_op = Operation(
op_id=f"merged_{conflict.operation1.op_id}_{conflict.operation2.op_id}",
user_id="system",
op_type=OperationType.INSERT,
position=conflict.operation1.position,
content=conflict.operation1.content +
conflict.operation2.content
)
resolved_operations.append(merged_op)
else:
# For other conflicts, use timestamp strategy
if conflict.operation1.timestamp <=
conflict.operation2.timestamp:
resolved_operations.append(conflict.operation1)
else:
resolved_operations.append(conflict.operation2)
conflict.resolved = True
conflict.resolution_strategy = strategy
return resolved_operations
def apply_operation(self, content: str, operation: Operation) -> str:
"""Apply a single operation to content"""
if operation.op_type == OperationType.INSERT:
return content[:operation.position] + operation.content +
content[operation.position:]
elif operation.op_type == OperationType.DELETE:
end_pos = operation.position + operation.length
return content[:operation.position] + content[end_pos:]
elif operation.op_type == OperationType.REPLACE:
end_pos = operation.position + operation.length
return content[:operation.position] + operation.content +
content[end_pos:]
return content
def apply_operations(self, content: str, operations: List[Operation]) ->
Tuple[str, List[Conflict]]:
"""Apply multiple operations with conflict detection and
resolution"""
# Sort operations by timestamp
sorted_operations = sorted(operations, key=lambda op: op.timestamp)
# Detect conflicts
conflicts = self.detect_conflicts(sorted_operations)
# Resolve conflicts
if conflicts:
resolved_ops = self.resolve_conflicts(conflicts)
# Remove conflicted operations and add resolved ones
non_conflicted_ops = [op for op in sorted_operations
if not any(op in [c.operation1, c.operation2]
for c in conflicts)]
final_operations = non_conflicted_ops + resolved_ops
else:
final_operations = sorted_operations
# Apply operations sequentially with transformation
current_content = content
applied_operations = []
for operation in final_operations:
# Transform operation based on previously applied operations
transformed_op = operation
for prev_op in applied_operations:
transformed_op, _ = self.transform_operation(transformed_op,
prev_op)
# Apply transformed operation
current_content = self.apply_operation(current_content,
transformed_op)
applied_operations.append(transformed_op)
return current_content, conflicts
def create_version(self, doc_id: str, operations: List[Operation],
author_id: str, branch: str = "main") ->
DocumentVersion:
"""Create a new version by applying operations"""
if doc_id not in self.documents:
raise ValueError(f"Document {doc_id} not found")
# Check permissions
if not self.check_permission(doc_id, author_id,
PermissionLevel.EDIT):
raise PermissionError(f"User {author_id} does not have edit
permission")
# Get current version
current_version_id = self.documents[doc_id]['branches'][branch]
current_version = self.versions[doc_id][current_version_id]
# Apply operations
new_content, conflicts =
self.apply_operations(current_version.content, operations)
# Generate new version ID
version_parts = current_version_id.split('_v')[1].split('.')
major, minor = int(version_parts[0]), int(version_parts[1])
if conflicts:
major += 1
minor = 0
else:
minor += 1
new_version_id = f"{doc_id}_v{major}.{minor}"
# Create new version
new_version = DocumentVersion(
version_id=new_version_id,
parent_version=current_version_id,
content=new_content,
operations=operations,
author=author_id,
timestamp=time.time(),
checksum="",
branch=branch
)
new_version.checksum = new_version.calculate_checksum()
# Store version
self.versions[doc_id][new_version_id] = new_version
# Update document metadata
self.documents[doc_id]['current_version'] = new_version_id
self.documents[doc_id]['branches'][branch] = new_version_id
self.documents[doc_id]['modified_at'] = time.time()
return new_version
def get_version_history(self, doc_id: str, branch: str = "main") ->
List[DocumentVersion]:
"""Get version history for a document branch"""
if doc_id not in self.documents:
raise ValueError(f"Document {doc_id} not found")
versions = []
current_version_id = self.documents[doc_id]['branches'][branch]
while current_version_id:
version = self.versions[doc_id][current_version_id]
versions.append(version)
current_version_id = version.parent_version
return versions
def create_branch(self, doc_id: str, branch_name: str, from_version: str
= None) -> str:
"""Create a new branch from a specific version"""
if doc_id not in self.documents:
raise ValueError(f"Document {doc_id} not found")
if branch_name in self.documents[doc_id]['branches']:
raise ValueError(f"Branch {branch_name} already exists")
# Use current main version if not specified
if not from_version:
from_version = self.documents[doc_id]['branches']['main']
self.documents[doc_id]['branches'][branch_name] = from_version
return from_version
def merge_branches(self, doc_id: str, source_branch: str, target_branch:
str,
author_id: str) -> DocumentVersion:
"""Merge one branch into another"""
if doc_id not in self.documents:
raise ValueError(f"Document {doc_id} not found")
if not self.check_permission(doc_id, author_id,
PermissionLevel.EDIT):
raise PermissionError(f"User {author_id} does not have edit
permission")
source_version_id = self.documents[doc_id]['branches'][source_branch]
target_version_id = self.documents[doc_id]['branches'][target_branch]
source_version = self.versions[doc_id][source_version_id]
target_version = self.versions[doc_id][target_version_id]
# Find common ancestor
source_history = self.get_version_history(doc_id, source_branch)
target_history = self.get_version_history(doc_id, target_branch)
common_ancestor = None
for s_version in source_history:
for t_version in target_history:
if s_version.version_id == t_version.version_id:
common_ancestor = s_version
break
if common_ancestor:
break
if not common_ancestor:
raise ValueError("No common ancestor found for merge")
# Collect operations from both branches since common ancestor
source_ops = []
target_ops = []
# Get operations from source branch
current = source_version
while current and current.version_id != common_ancestor.version_id:
source_ops.extend(current.operations)
if current.parent_version:
current = self.versions[doc_id][current.parent_version]
else:
break
# Get operations from target branch
current = target_version
while current and current.version_id != common_ancestor.version_id:
target_ops.extend(current.operations)
if current.parent_version:
current = self.versions[doc_id][current.parent_version]
else:
break
# Merge operations
all_operations = source_ops + target_ops
merged_content, conflicts =
self.apply_operations(common_ancestor.content, all_operations)
# Create merge version
merge_version_id = f"{doc_id}_merge_{int(time.time())}"
merge_version = DocumentVersion(
version_id=merge_version_id,
parent_version=target_version_id,
content=merged_content,
operations=all_operations,
author=author_id,
timestamp=time.time(),
checksum="",
branch=target_branch
)
merge_version.checksum = merge_version.calculate_checksum()
# Store merge version
self.versions[doc_id][merge_version_id] = merge_version
# Update target branch
self.documents[doc_id]['branches'][target_branch] = merge_version_id
return merge_version
def rollback_to_version(self, doc_id: str, version_id: str, author_id:
str) -> DocumentVersion:
"""Rollback document to a specific version"""
if doc_id not in self.documents:
raise ValueError(f"Document {doc_id} not found")
if not self.check_permission(doc_id, author_id,
PermissionLevel.ADMIN):
raise PermissionError(f"User {author_id} does not have admin
permission")
if version_id not in self.versions[doc_id]:
raise ValueError(f"Version {version_id} not found")
rollback_version = self.versions[doc_id][version_id]
# Create new version with rollback content
current_version_id = self.documents[doc_id]['current_version']
current_version = self.versions[doc_id][current_version_id]
version_parts = current_version_id.split('_v')[1].split('.')
major = int(version_parts[0]) + 1
new_version_id = f"{doc_id}_v{major}.0"
new_version = DocumentVersion(
version_id=new_version_id,
parent_version=current_version_id,
content=rollback_version.content,
operations=[], # Rollback operation
author=author_id,
timestamp=time.time(),
checksum="",
branch="main"
)
new_version.checksum = new_version.calculate_checksum()
# Store version
self.versions[doc_id][new_version_id] = new_version
# Update document
self.documents[doc_id]['current_version'] = new_version_id
self.documents[doc_id]['branches']['main'] = new_version_id
return new_version
def get_diff(self, doc_id: str, version1_id: str, version2_id: str) ->
List[Dict]:
"""Get diff between two versions"""
if doc_id not in self.documents:
raise ValueError(f"Document {doc_id} not found")
version1 = self.versions[doc_id][version1_id]
version2 = self.versions[doc_id][version2_id]
# Use difflib to generate diff
diff = list(difflib.unified_diff(
version1.content.splitlines(keepends=True),
version2.content.splitlines(keepends=True),
fromfile=f"Version {version1_id}",
tofile=f"Version {version2_id}",
lineterm=""
))
# Parse diff into structured format
changes = []
for line in diff:
if line.startswith('+++') or line.startswith('---') or
line.startswith('@@'):
continue
elif line.startswith('+'):
changes.append({'type': 'addition', 'content': line[1:]})
elif line.startswith('-'):
changes.append({'type': 'deletion', 'content': line[1:]})
else:
changes.append({'type': 'context', 'content': line[1:] if
line.startswith(' ') else line})
return changes
# Test the Office document version control system
def test_office_version_control():
vcs = OfficeDocumentVersionControl()
# Add users
vcs.add_user(User("user1", "Alice", "alice@company.com",
PermissionLevel.ADMIN))
vcs.add_user(User("user2", "Bob", "bob@company.com",
PermissionLevel.EDIT))
vcs.add_user(User("user3", "Charlie", "charlie@company.com",
PermissionLevel.EDIT))
# Create document
doc = vcs.create_document("doc1", "Project Proposal", "Initial project
proposal content.", "user1")
print(f"Created document: {doc.version_id}")
# Grant permissions
vcs.grant_permission("doc1", "user2", PermissionLevel.EDIT)
vcs.grant_permission("doc1", "user3", PermissionLevel.EDIT)
# Create operations
operations1 = [
Operation("op1", "user2", OperationType.INSERT, 8, content="
detailed"),
Operation("op2", "user2", OperationType.REPLACE, 0, 7, "Updated")
]
operations2 = [
Operation("op3", "user3", OperationType.INSERT, 25, content=" with
timeline"),
Operation("op4", "user3", OperationType.INSERT, 35, content=" and
budget")
]
# Apply operations to create new versions
version2 = vcs.create_version("doc1", operations1, "user2")
print(f"Created version: {version2.version_id}")
print(f"Content: {version2.content}")
version3 = vcs.create_version("doc1", operations2, "user3")
print(f"Created version: {version3.version_id}")
print(f"Content: {version3.content}")
# Test branching
branch_version = vcs.create_branch("doc1", "feature-branch",
version2.version_id)
print(f"Created branch from: {branch_version}")
# Test version history
history = vcs.get_version_history("doc1")
print(f"\nVersion History:")
for version in history:
print(f" {version.version_id}: {version.author} at
{version.timestamp}")
# Test diff
diff = vcs.get_diff("doc1", doc.version_id, version3.version_id)
print(f"\nDiff between {doc.version_id} and {version3.version_id}:")
for change in diff[:5]: # Show first 5 changes
print(f" {change['type']}: {change['content'][:50]}...")
test_office_version_control()
Key Insights:
• Operational transformation for real-time collaborative editing
• Conflict detection and resolution with multiple strategies
• Branch-based version control with merge capabilities
• Comprehensive permission system and access control
• Diff visualization and rollback functionality
2.5 Apple OA Problems {#apple-oa}
Problem 21: iOS App Memory Management
Difficulty: Medium | Time Limit: 45 minutes | Company: Apple
Problem Statement:
Implement a memory management system for iOS apps that tracks memory usage, detects
memory leaks, and automatically triggers garbage collection when memory pressure is
high.
Example:
Plain Text
Input:
memory_operations = [
{"type": "allocate", "size": 1024, "object_id": "obj1"},
{"type": "allocate", "size": 2048, "object_id": "obj2"},
{"type": "reference", "from": "obj1", "to": "obj2"},
{"type": "deallocate", "object_id": "obj1"},
{"type": "check_leaks"}
]
memory_limit = 4096
Output: {
"total_memory": 2048,
"leaked_objects": ["obj2"],
"gc_triggered": false
}
Solution Approach:
This problem requires implementing reference counting and cycle detection for memory
management.
Python
from typing import Dict, List, Set
from collections import defaultdict, deque
class MemoryObject:
def __init__(self, object_id: str, size: int):
self.object_id = object_id
self.size = size
self.reference_count = 0
self.references_to = set() # Objects this object references
self.referenced_by = set() # Objects that reference this object
self.is_root = True # Whether this is a root object (reachable from
app)
def add_reference_to(self, target_id: str):
self.references_to.add(target_id)
def remove_reference_to(self, target_id: str):
self.references_to.discard(target_id)
def __repr__(self):
return f"MemoryObject({self.object_id}, {self.size}B, refs=
{self.reference_count})"
class iOSMemoryManager:
def __init__(self, memory_limit: int):
self.memory_limit = memory_limit
self.objects: Dict[str, MemoryObject] = {}
self.total_memory = 0
self.gc_threshold = 0.8 # Trigger GC at 80% memory usage
self.root_objects = set() # Objects reachable from app
def allocate_object(self, object_id: str, size: int) -> bool:
"""Allocate memory for a new object"""
if object_id in self.objects:
return False # Object already exists
# Check if allocation would exceed memory limit
if self.total_memory + size > self.memory_limit:
# Try garbage collection first
if self.garbage_collect():
if self.total_memory + size > self.memory_limit:
return False # Still not enough memory
else:
return False
# Create new object
obj = MemoryObject(object_id, size)
self.objects[object_id] = obj
self.total_memory += size
self.root_objects.add(object_id) # Initially all objects are roots
return True
def deallocate_object(self, object_id: str) -> bool:
"""Deallocate an object and update references"""
if object_id not in self.objects:
return False
obj = self.objects[object_id]
# Remove references from this object to others
for target_id in obj.references_to.copy():
self.remove_reference(object_id, target_id)
# Remove references from others to this object
for source_id in obj.referenced_by.copy():
self.remove_reference(source_id, object_id)
# Remove from memory
self.total_memory -= obj.size
del self.objects[object_id]
self.root_objects.discard(object_id)
return True
def add_reference(self, from_id: str, to_id: str) -> bool:
"""Add a reference from one object to another"""
if from_id not in self.objects or to_id not in self.objects:
return False
from_obj = self.objects[from_id]
to_obj = self.objects[to_id]
# Add reference
from_obj.add_reference_to(to_id)
to_obj.referenced_by.add(from_id)
to_obj.reference_count += 1
# If to_obj is now referenced, it's no longer a potential root
if to_obj.reference_count > 0:
to_obj.is_root = False
self.root_objects.discard(to_id)
return True
def remove_reference(self, from_id: str, to_id: str) -> bool:
"""Remove a reference from one object to another"""
if from_id not in self.objects or to_id not in self.objects:
return False
from_obj = self.objects[from_id]
to_obj = self.objects[to_id]
if to_id not in from_obj.references_to:
return False
# Remove reference
from_obj.remove_reference_to(to_id)
to_obj.referenced_by.discard(from_id)
to_obj.reference_count -= 1
# If to_obj has no references, it becomes a potential root
if to_obj.reference_count == 0:
to_obj.is_root = True
self.root_objects.add(to_id)
return True
def find_reachable_objects(self) -> Set[str]:
"""Find all objects reachable from root objects using DFS"""
reachable = set()
stack = list(self.root_objects)
while stack:
obj_id = stack.pop()
if obj_id in reachable or obj_id not in self.objects:
continue
reachable.add(obj_id)
obj = self.objects[obj_id]
# Add all referenced objects to stack
for referenced_id in obj.references_to:
if referenced_id not in reachable:
stack.append(referenced_id)
return reachable
def detect_memory_leaks(self) -> List[str]:
"""Detect memory leaks using reachability analysis"""
reachable = self.find_reachable_objects()
all_objects = set(self.objects.keys())
leaked_objects = all_objects - reachable
return list(leaked_objects)
def detect_reference_cycles(self) -> List[List[str]]:
"""Detect reference cycles using DFS"""
cycles = []
visited = set()
rec_stack = set()
def dfs(obj_id: str, path: List[str]) -> bool:
if obj_id in rec_stack:
# Found cycle
cycle_start = path.index(obj_id)
cycle = path[cycle_start:] + [obj_id]
cycles.append(cycle)
return True
if obj_id in visited or obj_id not in self.objects:
return False
visited.add(obj_id)
rec_stack.add(obj_id)
path.append(obj_id)
obj = self.objects[obj_id]
for referenced_id in obj.references_to:
dfs(referenced_id, path.copy())
rec_stack.remove(obj_id)
return False
for obj_id in self.objects:
if obj_id not in visited:
dfs(obj_id, [])
return cycles
def garbage_collect(self) -> bool:
"""Perform garbage collection to free unreachable objects"""
leaked_objects = self.detect_memory_leaks()
if not leaked_objects:
return False
# Free leaked objects
freed_memory = 0
for obj_id in leaked_objects:
if obj_id in self.objects:
freed_memory += self.objects[obj_id].size
self.deallocate_object(obj_id)
return freed_memory > 0
def should_trigger_gc(self) -> bool:
"""Check if garbage collection should be triggered"""
memory_usage_ratio = self.total_memory / self.memory_limit
return memory_usage_ratio >= self.gc_threshold
def get_memory_stats(self) -> Dict:
"""Get comprehensive memory statistics"""
leaked_objects = self.detect_memory_leaks()
cycles = self.detect_reference_cycles()
return {
"total_memory": self.total_memory,
"memory_limit": self.memory_limit,
"memory_usage_percent": (self.total_memory / self.memory_limit) *
100,
"object_count": len(self.objects),
"leaked_objects": leaked_objects,
"reference_cycles": cycles,
"gc_triggered": self.should_trigger_gc(),
"root_objects": list(self.root_objects)
}
# Test cases
def test_memoryManager():
memory_manager = iOSMemoryManager(memory_limit=4096)
print("Testing iOS Memory Manager:")
# Test memory operations
operations = [
{"type": "allocate", "size": 1024, "object_id": "obj1"},
{"type": "allocate", "size": 2048, "object_id": "obj2"},
{"type": "allocate", "size": 512, "object_id": "obj3"},
{"type": "reference", "from": "obj1", "to": "obj2"},
{"type": "reference", "from": "obj2", "to": "obj3"},
{"type": "reference", "from": "obj3", "to": "obj1"}, # Create cycle
{"type": "deallocate", "object_id": "obj1"}, # Remove root, creating
leak
]
for op in operations:
if op["type"] == "allocate":
success = memory_manager.allocate_object(op["object_id"],
op["size"])
print(f"Allocated {op['object_id']} ({op['size']}B): {success}")
elif op["type"] == "reference":
success = memory_manager.add_reference(op["from"], op["to"])
print(f"Added reference {op['from']} -> {op['to']}: {success}")
elif op["type"] == "deallocate":
success = memory_manager.deallocate_object(op["object_id"])
print(f"Deallocated {op['object_id']}: {success}")
# Get final memory stats
stats = memory_manager.get_memory_stats()
print(f"\nFinal Memory Stats:")
print(f"Total memory: {stats['total_memory']}B")
print(f"Memory usage: {stats['memory_usage_percent']:.1f}%")
print(f"Leaked objects: {stats['leaked_objects']}")
print(f"Reference cycles: {stats['reference_cycles']}")
print(f"GC should trigger: {stats['gc_triggered']}")
test_memoryManager()
Key Insights:
• Implement reference counting for automatic memory management
• Use reachability analysis to detect memory leaks
• Detect reference cycles that prevent garbage collection
• Trigger GC based on memory pressure thresholds
Problem 22: Music Playlist Optimization
Difficulty: Medium | Time Limit: 45 minutes | Company: Apple
Problem Statement:
Design an algorithm for Apple Music that creates optimal playlists based on user
preferences, song transitions, and listening patterns. The playlist should maximize user
engagement while maintaining good flow between songs.
Example:
Plain Text
Input:
songs = [
{"id": "s1", "genre": "pop", "energy": 0.8, "tempo": 120, "duration": 180},
{"id": "s2", "genre": "pop", "energy": 0.6, "tempo": 100, "duration": 200},
{"id": "s3", "genre": "rock", "energy": 0.9, "tempo": 140, "duration": 220}
]
user_preferences = {"pop": 0.7, "rock": 0.3}
target_duration = 600 # 10 minutes
Output: {
"playlist": ["s1", "s2", "s3"],
"total_duration": 600,
"engagement_score": 0.85,
"flow_score": 0.78
}
Solution Approach:
This problem combines optimization algorithms with music recommendation logic.
Python
import math
from typing import List, Dict, Tuple
from dataclasses import dataclass
@dataclass
class Song:
id: str
genre: str
energy: float # 0.0 to 1.0
tempo: int # BPM
duration: int # seconds
popularity: float = 0.5
user_rating: float = 0.0
def __post_init__(self):
# Normalize values
self.energy = max(0.0, min(1.0, self.energy))
self.popularity = max(0.0, min(1.0, self.popularity))
self.user_rating = max(0.0, min(1.0, self.user_rating))
class PlaylistOptimizer:
def __init__(self):
self.genre_weights = {}
self.transition_penalties = {
'tempo_diff': 0.3,
'energy_diff': 0.4,
'genre_change': 0.2
}
def calculate_song_score(self, song: Song, user_preferences: Dict[str,
float]) -> float:
"""Calculate individual song score based on user preferences"""
# Base score from user genre preference
genre_score = user_preferences.get(song.genre, 0.0)
# Combine with song attributes
score = (
genre_score * 0.4 +
song.popularity * 0.3 +
song.user_rating * 0.2 +
song.energy * 0.1 # Slight preference for energetic songs
)
return score
def calculate_transition_score(self, song1: Song, song2: Song) -> float:
"""Calculate how well two songs transition into each other"""
# Tempo difference penalty
tempo_diff = abs(song1.tempo - song2.tempo) / max(song1.tempo,
song2.tempo)
tempo_penalty = tempo_diff * self.transition_penalties['tempo_diff']
# Energy difference penalty
energy_diff = abs(song1.energy - song2.energy)
energy_penalty = energy_diff *
self.transition_penalties['energy_diff']
# Genre change penalty
genre_penalty = 0 if song1.genre == song2.genre else
self.transition_penalties['genre_change']
# Calculate transition score (higher is better)
transition_score = 1.0 - (tempo_penalty + energy_penalty +
genre_penalty)
return max(0.0, transition_score)
def calculate_playlist_flow(self, playlist: List[Song]) -> float:
"""Calculate overall flow score for the playlist"""
if len(playlist) <= 1:
return 1.0
total_transition_score = 0
for i in range(len(playlist) - 1):
total_transition_score +=
self.calculate_transition_score(playlist[i], playlist[i + 1])
return total_transition_score / (len(playlist) - 1)
def calculate_engagement_score(self, playlist: List[Song],
user_preferences: Dict[str, float]) -> float:
"""Calculate expected user engagement for the playlist"""
if not playlist:
return 0.0
total_score = 0
total_weight = 0
for i, song in enumerate(playlist):
# Songs later in playlist have slightly less weight (attention
decay)
position_weight = 1.0 - (i * 0.05) # 5% decay per position
position_weight = max(0.5, position_weight) # Minimum 50% weight
song_score = self.calculate_song_score(song, user_preferences)
total_score += song_score * position_weight
total_weight += position_weight
return total_score / total_weight if total_weight > 0 else 0.0
def optimize_playlist_greedy(self, songs: List[Song], user_preferences:
Dict[str, float],
target_duration: int, max_songs: int = 20) ->
List[Song]:
"""Create playlist using greedy optimization"""
available_songs = songs.copy()
playlist = []
current_duration = 0
while available_songs and current_duration < target_duration and
len(playlist) < max_songs:
best_song = None
best_score = -1
for song in available_songs:
# Check if adding this song would exceed duration
if current_duration + song.duration > target_duration * 1.1:
# 10% tolerance
continue
# Calculate combined score
song_score = self.calculate_song_score(song,
user_preferences)
# If this isn't the first song, consider transition
if playlist:
transition_score =
self.calculate_transition_score(playlist[-1], song)
combined_score = song_score * 0.7 + transition_score *
0.3
else:
combined_score = song_score
# Prefer songs that help reach target duration
duration_factor = 1.0
remaining_duration = target_duration - current_duration
if remaining_duration > 0:
duration_fit = 1.0 - abs(song.duration -
remaining_duration) / remaining_duration
duration_factor = 0.8 + 0.2 * duration_fit
final_score = combined_score * duration_factor
if final_score > best_score:
best_score = final_score
best_song = song
if best_song:
playlist.append(best_song)
current_duration += best_song.duration
available_songs.remove(best_song)
else:
break
return playlist
def optimize_playlist_dynamic(self, songs: List[Song], user_preferences:
Dict[str, float],
target_duration: int) -> List[Song]:
"""Create playlist using dynamic programming for better
optimization"""
n = len(songs)
# DP table: dp[i][duration] = (best_score, best_playlist)
# For simplicity, we'll use a simplified approach
# Sort songs by individual score first
scored_songs = [(self.calculate_song_score(song, user_preferences),
song) for song in songs]
scored_songs.sort(reverse=True)
# Use greedy approach with better song ordering
return self.optimize_playlist_greedy([song for _, song in
scored_songs],
user_preferences, target_duration)
def create_optimal_playlist(self, songs: List[Song], user_preferences:
Dict[str, float],
target_duration: int, optimization_method: str
= "greedy") -> Dict:
"""Create optimal playlist and return comprehensive results"""
if optimization_method == "dynamic":
playlist = self.optimize_playlist_dynamic(songs,
user_preferences, target_duration)
else:
playlist = self.optimize_playlist_greedy(songs, user_preferences,
target_duration)
# Calculate metrics
total_duration = sum(song.duration for song in playlist)
engagement_score = self.calculate_engagement_score(playlist,
user_preferences)
flow_score = self.calculate_playlist_flow(playlist)
# Calculate genre distribution
genre_distribution = {}
for song in playlist:
genre_distribution[song.genre] =
genre_distribution.get(song.genre, 0) + 1
return {
"playlist": [song.id for song in playlist],
"total_duration": total_duration,
"target_duration": target_duration,
"duration_accuracy": 1.0 - abs(total_duration - target_duration)
/ target_duration,
"engagement_score": round(engagement_score, 3),
"flow_score": round(flow_score, 3),
"song_count": len(playlist),
"genre_distribution": genre_distribution,
"average_energy": round(sum(song.energy for song in playlist) /
len(playlist), 3) if playlist else 0,
"average_tempo": round(sum(song.tempo for song in playlist) /
len(playlist), 1) if playlist else 0
}
def suggest_playlist_variations(self, base_playlist: List[Song], songs:
List[Song],
user_preferences: Dict[str, float], count:
int = 3) -> List[Dict]:
"""Generate variations of the base playlist"""
variations = []
for i in range(count):
# Create variation by replacing some songs
variation = base_playlist.copy()
# Replace 20-30% of songs
replace_count = max(1, len(variation) // 4)
# Remove some songs and add new ones
available_songs = [s for s in songs if s not in variation]
for _ in range(replace_count):
if variation and available_songs:
# Remove a random song (prefer lower-scored ones)
remove_idx = len(variation) - 1 - (i % len(variation))
removed_song = variation.pop(remove_idx)
# Add a new song
best_replacement = max(available_songs,
key=lambda s:
self.calculate_song_score(s, user_preferences))
variation.insert(remove_idx, best_replacement)
available_songs.remove(best_replacement)
# Calculate metrics for variation
total_duration = sum(song.duration for song in variation)
engagement_score = self.calculate_engagement_score(variation,
user_preferences)
flow_score = self.calculate_playlist_flow(variation)
variations.append({
"playlist": [song.id for song in variation],
"total_duration": total_duration,
"engagement_score": round(engagement_score, 3),
"flow_score": round(flow_score, 3)
})
return variations
# Test cases
def test_playlistOptimizer():
optimizer = PlaylistOptimizer()
# Create test songs
songs = [
Song("s1", "pop", 0.8, 120, 180, 0.9, 0.8),
Song("s2", "pop", 0.6, 100, 200, 0.7, 0.6),
Song("s3", "rock", 0.9, 140, 220, 0.8, 0.7),
Song("s4", "pop", 0.7, 110, 190, 0.6, 0.5),
Song("s5", "rock", 0.8, 130, 210, 0.7, 0.6),
Song("s6", "electronic", 0.9, 128, 240, 0.5, 0.4),
Song("s7", "pop", 0.5, 95, 170, 0.8, 0.7)
]
user_preferences = {"pop": 0.7, "rock": 0.3, "electronic": 0.1}
target_duration = 600 # 10 minutes
print("Testing Playlist Optimizer:")
print(f"Available songs: {len(songs)}")
print(f"Target duration: {target_duration} seconds")
print(f"User preferences: {user_preferences}")
# Create optimal playlist
result = optimizer.create_optimal_playlist(songs, user_preferences,
target_duration)
print(f"\nOptimal Playlist:")
print(f"Songs: {result['playlist']}")
print(f"Duration: {result['total_duration']}s (accuracy:
{result['duration_accuracy']:.1%})")
print(f"Engagement score: {result['engagement_score']}")
print(f"Flow score: {result['flow_score']}")
print(f"Genre distribution: {result['genre_distribution']}")
print(f"Average energy: {result['average_energy']}")
print(f"Average tempo: {result['average_tempo']} BPM")
test_playlistOptimizer()
Key Insights:
• Balance individual song quality with playlist flow
• Consider transition smoothness between consecutive songs
• Use weighted scoring for different user preferences
• Optimize for both engagement and target duration constraints
Problem 23: Device Battery Life Prediction
Difficulty: Medium | Time Limit: 45 minutes | Company: Apple
Problem Statement:
Design a battery life prediction system for iOS devices that:
1. Predicts remaining battery life based on current usage patterns
2. Analyzes app power consumption and user behavior
3. Provides intelligent power management recommendations
4. Handles different device types and battery health states
5. Optimizes for accuracy while maintaining real-time performance
Example:
Plain Text
Input:
device_state = {
"battery_level": 65, "battery_health": 85, "device_type": "iPhone_14",
"screen_brightness": 80, "wifi_enabled": True, "cellular_signal": 3,
"background_apps": ["Maps", "Music", "Messages"], "active_app": "Camera"
}
usage_history = [
{"timestamp": 1640995200, "battery_level": 100, "app": "Safari",
"duration": 1800},
{"timestamp": 1640997000, "battery_level": 95, "app": "Camera", "duration":
600},
{"timestamp": 1640997600, "battery_level": 90, "app": "Messages",
"duration": 300}
]
Output: {
"predicted_hours": 8.5, "confidence": 0.87,
"power_breakdown": {"screen": 35, "apps": 45, "system": 20},
"recommendations": ["Reduce screen brightness", "Close background apps"],
"low_power_mode_hours": 12.3
}
Solution Approach:
This problem requires machine learning-based prediction with real-time optimization and
power management.
Python
import time
import math
from typing import List, Dict, Tuple, Optional
from dataclasses import dataclass, field
from enum import Enum
from collections import defaultdict, deque
import statistics
class DeviceType(Enum):
IPHONE_SE = "iPhone_SE"
IPHONE_14 = "iPhone_14"
IPHONE_14_PRO = "iPhone_14_Pro"
IPHONE_15 = "iPhone_15"
IPHONE_15_PRO = "iPhone_15_Pro"
class AppCategory(Enum):
SOCIAL = "social"
GAMING = "gaming"
PRODUCTIVITY = "productivity"
MEDIA = "media"
NAVIGATION = "navigation"
CAMERA = "camera"
SYSTEM = "system"
@dataclass
class DeviceState:
battery_level: float # 0-100
battery_health: float # 0-100
device_type: DeviceType
screen_brightness: float # 0-100
wifi_enabled: bool
cellular_signal: int # 0-4
background_apps: List[str]
active_app: str
cpu_usage: float = 0.0 # 0-100
memory_usage: float = 0.0 # 0-100
temperature: float = 25.0 # Celsius
low_power_mode: bool = False
@dataclass
class UsageRecord:
timestamp: float
battery_level: float
app: str
duration: int # seconds
screen_on: bool = True
data_usage: float = 0.0 # MB
cpu_usage: float = 0.0
@dataclass
class PowerConsumption:
screen: float
cpu: float
cellular: float
wifi: float
apps: float
system: float
def total(self) -> float:
return self.screen + self.cpu + self.cellular + self.wifi + self.apps
+ self.system
@dataclass
class BatteryPrediction:
predicted_hours: float
confidence: float
power_breakdown: Dict[str, float]
recommendations: List[str]
low_power_mode_hours: float
detailed_forecast: List[Dict] = field(default_factory=list)
class BatteryLifePredictor:
def __init__(self):
# Device specifications (battery capacity in mAh)
self.device_specs = {
DeviceType.IPHONE_SE: {"battery_capacity": 2018,
"base_consumption": 150},
DeviceType.IPHONE_14: {"battery_capacity": 3279,
"base_consumption": 180},
DeviceType.IPHONE_14_PRO: {"battery_capacity": 3200,
"base_consumption": 200},
DeviceType.IPHONE_15: {"battery_capacity": 3349,
"base_consumption": 175},
DeviceType.IPHONE_15_PRO: {"battery_capacity": 3274,
"base_consumption": 195}
}
# App power consumption profiles (mW per minute of usage)
self.app_power_profiles = {
AppCategory.GAMING: 800,
AppCategory.CAMERA: 600,
AppCategory.NAVIGATION: 500,
AppCategory.MEDIA: 300,
AppCategory.SOCIAL: 200,
AppCategory.PRODUCTIVITY: 150,
AppCategory.SYSTEM: 100
}
# Component power consumption (mW)
self.component_power = {
"screen_base": 200,
"screen_per_brightness": 8, # per brightness %
"cpu_base": 100,
"cpu_per_usage": 15, # per CPU %
"wifi_active": 50,
"wifi_idle": 10,
"cellular_base": 80,
"cellular_per_signal": 20, # varies by signal strength
"system_base": 120
}
# Usage history for learning
self.usage_history: deque = deque(maxlen=1000)
self.power_models: Dict[str, Dict] = {}
# Prediction confidence factors
self.confidence_factors = {
"history_length": 0.3,
"pattern_consistency": 0.25,
"recent_accuracy": 0.25,
"device_health": 0.2
}
def categorize_app(self, app_name: str) -> AppCategory:
"""Categorize app based on name patterns"""
app_lower = app_name.lower()
if any(game in app_lower for game in ["game", "play", "clash",
"candy", "pokemon"]):
return AppCategory.GAMING
elif any(camera in app_lower for camera in ["camera", "photo",
"video", "facetime"]):
return AppCategory.CAMERA
elif any(nav in app_lower for nav in ["maps", "navigation", "gps",
"uber", "lyft"]):
return AppCategory.NAVIGATION
elif any(media in app_lower for media in ["music", "spotify",
"youtube", "netflix", "video"]):
return AppCategory.MEDIA
elif any(social in app_lower for social in ["facebook", "instagram",
"twitter", "snapchat", "tiktok", "messages"]):
return AppCategory.SOCIAL
elif any(prod in app_lower for prod in ["mail", "notes", "calendar",
"office", "docs", "sheets"]):
return AppCategory.PRODUCTIVITY
else:
return AppCategory.SYSTEM
def calculate_current_power_consumption(self, state: DeviceState) ->
PowerConsumption:
"""Calculate current power consumption based on device state"""
# Screen power consumption
screen_power = (self.component_power["screen_base"] +
self.component_power["screen_per_brightness"] *
state.screen_brightness / 100)
# CPU power consumption
cpu_power = (self.component_power["cpu_base"] +
self.component_power["cpu_per_usage"] * state.cpu_usage /
100)
# Cellular power consumption
cellular_power = (self.component_power["cellular_base"] +
self.component_power["cellular_per_signal"] * (4 -
state.cellular_signal))
# WiFi power consumption
wifi_power = self.component_power["wifi_active"] if
state.wifi_enabled else 0
# App power consumption
active_app_category = self.categorize_app(state.active_app)
app_power = self.app_power_profiles[active_app_category]
# Background apps power
background_power = sum(
self.app_power_profiles[self.categorize_app(app)] * 0.1 # 10% of
active power
for app in state.background_apps
)
# System power consumption
system_power = self.component_power["system_base"]
# Apply battery health factor
health_factor = state.battery_health / 100
# Apply low power mode factor
low_power_factor = 0.7 if state.low_power_mode else 1.0
# Apply temperature factor (higher temperature = higher consumption)
temp_factor = 1.0 + max(0, (state.temperature - 25) * 0.02)
total_factor = health_factor * low_power_factor * temp_factor
return PowerConsumption(
screen=screen_power * total_factor,
cpu=cpu_power * total_factor,
cellular=cellular_power * total_factor,
wifi=wifi_power * total_factor,
apps=(app_power + background_power) * total_factor,
system=system_power * total_factor
)
def analyze_usage_patterns(self, history: List[UsageRecord]) -> Dict:
"""Analyze historical usage patterns"""
if not history:
return {"hourly_usage": {}, "app_usage": {}, "drain_rate": 5.0}
# Analyze hourly usage patterns
hourly_usage = defaultdict(list)
app_usage = defaultdict(list)
drain_rates = []
for i, record in enumerate(history):
hour = int((record.timestamp % 86400) / 3600) # Hour of day
hourly_usage[hour].append(record.duration)
app_usage[record.app].append(record.duration)
# Calculate drain rate if we have previous record
if i > 0:
prev_record = history[i-1]
time_diff = record.timestamp - prev_record.timestamp
battery_diff = prev_record.battery_level -
record.battery_level
if time_diff > 0 and battery_diff > 0:
drain_rate = (battery_diff / time_diff) * 3600 # % per
hour
drain_rates.append(drain_rate)
# Calculate average patterns
avg_hourly_usage = {
hour: statistics.mean(durations) if durations else 0
for hour, durations in hourly_usage.items()
}
avg_app_usage = {
app: statistics.mean(durations) if durations else 0
for app, durations in app_usage.items()
}
avg_drain_rate = statistics.mean(drain_rates) if drain_rates else 5.0
return {
"hourly_usage": avg_hourly_usage,
"app_usage": avg_app_usage,
"drain_rate": avg_drain_rate,
"pattern_consistency":
self._calculate_pattern_consistency(hourly_usage)
}
def _calculate_pattern_consistency(self, hourly_usage: Dict) -> float:
"""Calculate how consistent usage patterns are"""
if not hourly_usage:
return 0.5
# Calculate coefficient of variation for each hour
variations = []
for hour_data in hourly_usage.values():
if len(hour_data) > 1:
mean_usage = statistics.mean(hour_data)
if mean_usage > 0:
std_dev = statistics.stdev(hour_data)
cv = std_dev / mean_usage
variations.append(cv)
if not variations:
return 0.5
# Lower variation = higher consistency
avg_variation = statistics.mean(variations)
consistency = max(0, 1 - avg_variation)
return min(1, consistency)
def predict_battery_life(self, state: DeviceState,
usage_history: List[UsageRecord]) ->
BatteryPrediction:
"""Predict remaining battery life"""
# Calculate current power consumption
current_power = self.calculate_current_power_consumption(state)
# Analyze usage patterns
patterns = self.analyze_usage_patterns(usage_history)
# Get device specifications
device_spec = self.device_specs[state.device_type]
battery_capacity = device_spec["battery_capacity"]
# Calculate remaining capacity
remaining_capacity = (state.battery_level / 100) * battery_capacity *
(state.battery_health / 100)
# Predict based on current consumption
current_consumption_mah = current_power.total() / 1000 * 60 # mAh
per hour
basic_prediction = remaining_capacity / current_consumption_mah if
current_consumption_mah > 0 else 24
# Adjust based on usage patterns
pattern_factor = self._calculate_pattern_adjustment(patterns, state)
adjusted_prediction = basic_prediction * pattern_factor
# Calculate confidence
confidence = self._calculate_confidence(patterns, usage_history,
state)
# Generate recommendations
recommendations = self._generate_recommendations(state,
current_power)
# Calculate low power mode prediction
low_power_consumption = current_consumption_mah * 0.7 # 30%
reduction
low_power_prediction = remaining_capacity / low_power_consumption if
low_power_consumption > 0 else 36
# Generate detailed forecast
detailed_forecast = self._generate_detailed_forecast(state,
current_power, patterns, 12)
# Power breakdown as percentages
total_power = current_power.total()
power_breakdown = {
"screen": round((current_power.screen / total_power) * 100, 1),
"apps": round((current_power.apps / total_power) * 100, 1),
"cellular": round((current_power.cellular / total_power) * 100,
1),
"wifi": round((current_power.wifi / total_power) * 100, 1),
"cpu": round((current_power.cpu / total_power) * 100, 1),
"system": round((current_power.system / total_power) * 100, 1)
}
return BatteryPrediction(
predicted_hours=round(adjusted_prediction, 1),
confidence=round(confidence, 2),
power_breakdown=power_breakdown,
recommendations=recommendations,
low_power_mode_hours=round(low_power_prediction, 1),
detailed_forecast=detailed_forecast
)
def _calculate_pattern_adjustment(self, patterns: Dict, state:
DeviceState) -> float:
"""Calculate adjustment factor based on usage patterns"""
current_hour = int((time.time() % 86400) / 3600)
# Get expected usage for current hour
hourly_usage = patterns.get("hourly_usage", {})
expected_usage = hourly_usage.get(current_hour, 3600) # Default 1
hour
# Adjust based on expected vs typical usage
if expected_usage > 7200: # Heavy usage period (>2 hours)
return 0.8
elif expected_usage < 1800: # Light usage period (<30 minutes)
return 1.2
else:
return 1.0
def _calculate_confidence(self, patterns: Dict, history:
List[UsageRecord],
state: DeviceState) -> float:
"""Calculate prediction confidence"""
factors = {}
# History length factor
history_score = min(1.0, len(history) / 100) # Full confidence with
100+ records
factors["history_length"] = history_score
# Pattern consistency factor
factors["pattern_consistency"] = patterns.get("pattern_consistency",
0.5)
# Recent accuracy factor (simplified)
factors["recent_accuracy"] = 0.8 # Would be calculated from recent
predictions
# Device health factor
health_score = state.battery_health / 100
factors["device_health"] = health_score
# Calculate weighted confidence
confidence = sum(
factors[factor] * weight
for factor, weight in self.confidence_factors.items()
)
return max(0.1, min(1.0, confidence))
def _generate_recommendations(self, state: DeviceState,
power: PowerConsumption) -> List[str]:
"""Generate power saving recommendations"""
recommendations = []
# Screen brightness recommendation
if state.screen_brightness > 70:
recommendations.append("Reduce screen brightness to save
battery")
# Background apps recommendation
if len(state.background_apps) > 5:
recommendations.append("Close unnecessary background apps")
# Low power mode recommendation
if not state.low_power_mode and state.battery_level < 30:
recommendations.append("Enable Low Power Mode")
# WiFi recommendation
if not state.wifi_enabled and state.cellular_signal < 2:
recommendations.append("Connect to WiFi to reduce cellular power
usage")
# App-specific recommendations
active_category = self.categorize_app(state.active_app)
if active_category in [AppCategory.GAMING, AppCategory.CAMERA]:
recommendations.append(f"Consider limiting {state.active_app}
usage for better battery life")
# Temperature recommendation
if state.temperature > 35:
recommendations.append("Device is warm - consider cooling down
for better battery performance")
return recommendations[:4] # Limit to top 4 recommendations
def _generate_detailed_forecast(self, state: DeviceState, power:
PowerConsumption,
patterns: Dict, hours: int) -> List[Dict]:
"""Generate detailed hourly forecast"""
forecast = []
current_battery = state.battery_level
current_hour = int((time.time() % 86400) / 3600)
hourly_usage = patterns.get("hourly_usage", {})
base_consumption = power.total() / 1000 * 60 # mAh per hour
for hour_offset in range(hours):
forecast_hour = (current_hour + hour_offset) % 24
# Adjust consumption based on expected usage
expected_usage = hourly_usage.get(forecast_hour, 3600)
usage_factor = min(2.0, expected_usage / 3600) # Cap at 2x
normal
hour_consumption = base_consumption * usage_factor
battery_capacity = self.device_specs[state.device_type]
["battery_capacity"]
# Calculate battery drain percentage
drain_percent = (hour_consumption / battery_capacity) * 100 *
(100 / state.battery_health)
current_battery = max(0, current_battery - drain_percent)
forecast.append({
"hour": forecast_hour,
"battery_level": round(current_battery, 1),
"consumption_mah": round(hour_consumption, 1),
"usage_factor": round(usage_factor, 2)
})
if current_battery <= 0:
break
return forecast
def update_usage_history(self, record: UsageRecord):
"""Update usage history with new record"""
self.usage_history.append(record)
def get_battery_health_analysis(self, state: DeviceState,
history: List[UsageRecord]) -> Dict:
"""Analyze battery health trends"""
if len(history) < 10:
return {"health_trend": "insufficient_data", "degradation_rate":
0}
# Analyze charging cycles and degradation
charging_cycles = 0
last_battery = 100
for record in history:
if record.battery_level > last_battery + 10: # Likely charged
charging_cycles += 1
last_battery = record.battery_level
# Estimate degradation rate (simplified)
expected_cycles = 500 # Typical iPhone battery cycle life
degradation_rate = max(0, (charging_cycles / expected_cycles) * 20)
# % degradation
health_trend = "good"
if state.battery_health < 80:
health_trend = "poor"
elif state.battery_health < 90:
health_trend = "fair"
return {
"health_trend": health_trend,
"degradation_rate": round(degradation_rate, 1),
"estimated_cycles": charging_cycles,
"replacement_recommended": state.battery_health < 80
}
# Test the battery life predictor
def test_battery_predictor():
predictor = BatteryLifePredictor()
# Create device state
device_state = DeviceState(
battery_level=65,
battery_health=85,
device_type=DeviceType.IPHONE_14,
screen_brightness=80,
wifi_enabled=True,
cellular_signal=3,
background_apps=["Maps", "Music", "Messages"],
active_app="Camera",
cpu_usage=45,
memory_usage=60,
temperature=28
)
# Create usage history
usage_history = [
UsageRecord(time.time() - 7200, 100, "Safari", 1800, True, 50, 30),
UsageRecord(time.time() - 5400, 95, "Camera", 600, True, 10, 60),
UsageRecord(time.time() - 4800, 90, "Messages", 300, True, 5, 20),
UsageRecord(time.time() - 3600, 85, "Music", 1200, False, 20, 25),
UsageRecord(time.time() - 1800, 75, "Maps", 900, True, 30, 40)
]
print("Testing Battery Life Predictor:")
print(f"Device: {device_state.device_type.value}")
print(f"Current battery: {device_state.battery_level}%")
print(f"Battery health: {device_state.battery_health}%")
# Predict battery life
prediction = predictor.predict_battery_life(device_state, usage_history)
print(f"\nPrediction Results:")
print(f" Predicted hours: {prediction.predicted_hours}")
print(f" Confidence: {prediction.confidence:.1%}")
print(f" Low power mode hours: {prediction.low_power_mode_hours}")
print(f"\nPower Breakdown:")
for component, percentage in prediction.power_breakdown.items():
print(f" {component}: {percentage}%")
print(f"\nRecommendations:")
for rec in prediction.recommendations:
print(f" - {rec}")
print(f"\nDetailed Forecast (next 6 hours):")
for hour_data in prediction.detailed_forecast[:6]:
print(f" Hour {hour_data['hour']:02d}: {hour_data['battery_level']}%
"
f"(consumption: {hour_data['consumption_mah']}mAh)")
# Test battery health analysis
health_analysis = predictor.get_battery_health_analysis(device_state,
usage_history)
print(f"\nBattery Health Analysis:")
print(f" Health trend: {health_analysis['health_trend']}")
print(f" Estimated cycles: {health_analysis['estimated_cycles']}")
print(f" Replacement recommended:
{health_analysis['replacement_recommended']}")
test_battery_predictor()
Key Insights:
• Machine learning-based prediction using historical usage patterns
• Real-time power consumption modeling for all device components
• Intelligent recommendations based on current usage and device state
• Confidence scoring based on data quality and pattern consistency
• Detailed forecasting with hourly battery level predictions
Problem 24: Photo Library Organization
Difficulty: Medium | Time Limit: 45 minutes | Company: Apple
Problem Statement:
Design an intelligent photo library organization system for iOS Photos app that:
1. Automatically categorizes photos by content, location, and people
2. Creates smart albums and memories
3. Detects and removes duplicate photos
4. Optimizes storage with intelligent compression
5. Provides fast search and retrieval capabilities
Example:
Plain Text
Input:
photos = [
{"id": "p1", "timestamp": 1640995200, "location": "Paris", "faces":
["Alice", "Bob"],
"objects": ["tower", "sky"], "size_mb": 4.2, "resolution": "4032x3024"},
{"id": "p2", "timestamp": 1640995260, "location": "Paris", "faces":
["Alice"],
"objects": ["tower", "sky"], "size_mb": 4.1, "resolution": "4032x3024"}
]
Output: {
"albums": [{"name": "Paris Trip", "photos": ["p1", "p2"], "type":
"location"}],
"duplicates": [{"original": "p1", "duplicate": "p2", "similarity": 0.95}],
"storage_saved": 2.1, "search_index": {"tower": ["p1", "p2"], "Alice":
["p1", "p2"]}
}
Solution Approach:
This problem requires computer vision, machine learning, and efficient data structures for
photo management.
Python
import time
import hashlib
import math
from typing import List, Dict, Tuple, Optional, Set
from dataclasses import dataclass, field
from enum import Enum
from collections import defaultdict, Counter
import heapq
class PhotoCategory(Enum):
PEOPLE = "people"
PLACES = "places"
OBJECTS = "objects"
EVENTS = "events"
FAVORITES = "favorites"
SCREENSHOTS = "screenshots"
SELFIES = "selfies"
class AlbumType(Enum):
SMART = "smart"
MANUAL = "manual"
MEMORY = "memory"
SHARED = "shared"
@dataclass
class Photo:
photo_id: str
timestamp: float
file_path: str
size_mb: float
resolution: str
location: Optional[str] = None
faces: List[str] = field(default_factory=list)
objects: List[str] = field(default_factory=list)
scene: Optional[str] = None
is_favorite: bool = False
is_screenshot: bool = False
metadata: Dict = field(default_factory=dict)
def get_width_height(self) -> Tuple[int, int]:
if 'x' in self.resolution:
w, h = self.resolution.split('x')
return int(w), int(h)
return 0, 0
@dataclass
class Album:
album_id: str
name: str
album_type: AlbumType
photos: List[str] = field(default_factory=list)
criteria: Dict = field(default_factory=dict)
created_at: float = field(default_factory=time.time)
cover_photo: Optional[str] = None
@dataclass
class DuplicateGroup:
original_photo: str
duplicates: List[str]
similarity_scores: Dict[str, float]
storage_savings: float
@dataclass
class Memory:
memory_id: str
title: str
photos: List[str]
date_range: Tuple[float, float]
location: Optional[str] = None
key_people: List[str] = field(default_factory=list)
memory_type: str = "trip"
class PhotoLibraryOrganizer:
def __init__(self):
self.photos: Dict[str, Photo] = {}
self.albums: Dict[str, Album] = {}
self.search_index: Dict[str, Set[str]] = defaultdict(set)
self.face_index: Dict[str, Set[str]] = defaultdict(set)
self.location_index: Dict[str, Set[str]] = defaultdict(set)
self.date_index: Dict[str, Set[str]] = defaultdict(set) # YYYY-MM
format
# Duplicate detection
self.photo_hashes: Dict[str, str] = {}
self.similarity_threshold = 0.85
# Smart album criteria
self.smart_album_rules = {
"Recent": {"days": 30},
"Favorites": {"is_favorite": True},
"Screenshots": {"is_screenshot": True},
"Selfies": {"min_faces": 1, "has_front_camera": True},
"People": {"min_faces": 1},
"Landscapes": {"objects": ["mountain", "sky", "water",
"sunset"]},
"Food": {"objects": ["food", "restaurant", "meal"]},
"Pets": {"objects": ["dog", "cat", "pet"]},
"Travel": {"has_location": True, "location_variety": True}
}
def add_photo(self, photo: Photo) -> bool:
"""Add a photo to the library"""
if photo.photo_id in self.photos:
return False
self.photos[photo.photo_id] = photo
self._update_search_index(photo)
self._update_face_index(photo)
self._update_location_index(photo)
self._update_date_index(photo)
self._calculate_photo_hash(photo)
return True
def _update_search_index(self, photo: Photo):
"""Update search index with photo metadata"""
# Index objects
for obj in photo.objects:
self.search_index[obj.lower()].add(photo.photo_id)
# Index location
if photo.location:
self.search_index[photo.location.lower()].add(photo.photo_id)
# Index scene
if photo.scene:
self.search_index[photo.scene.lower()].add(photo.photo_id)
# Index special categories
if photo.is_favorite:
self.search_index["favorite"].add(photo.photo_id)
if photo.is_screenshot:
self.search_index["screenshot"].add(photo.photo_id)
def _update_face_index(self, photo: Photo):
"""Update face recognition index"""
for face in photo.faces:
self.face_index[face.lower()].add(photo.photo_id)
self.search_index[face.lower()].add(photo.photo_id)
def _update_location_index(self, photo: Photo):
"""Update location-based index"""
if photo.location:
self.location_index[photo.location.lower()].add(photo.photo_id)
def _update_date_index(self, photo: Photo):
"""Update date-based index"""
date_key = time.strftime("%Y-%m", time.localtime(photo.timestamp))
self.date_index[date_key].add(photo.photo_id)
def _calculate_photo_hash(self, photo: Photo):
"""Calculate perceptual hash for duplicate detection"""
# Simplified hash based on metadata (in real implementation, would
use image content)
content = f"
{photo.resolution}_{photo.size_mb}_{len(photo.objects)}_{len(photo.faces)}"
photo_hash = hashlib.md5(content.encode()).hexdigest()[:16]
self.photo_hashes[photo.photo_id] = photo_hash
def detect_duplicates(self) -> List[DuplicateGroup]:
"""Detect duplicate and similar photos"""
duplicate_groups = []
processed_photos = set()
# Group photos by hash for exact duplicates
hash_groups = defaultdict(list)
for photo_id, photo_hash in self.photo_hashes.items():
hash_groups[photo_hash].append(photo_id)
# Process each hash group
for photo_hash, photo_ids in hash_groups.items():
if len(photo_ids) > 1 and photo_ids[0] not in processed_photos:
# Sort by timestamp to keep the earliest as original
sorted_photos = sorted(photo_ids, key=lambda pid:
self.photos[pid].timestamp)
original = sorted_photos[0]
duplicates = sorted_photos[1:]
# Calculate similarity scores and storage savings
similarity_scores = {}
total_savings = 0
for dup_id in duplicates:
similarity = self._calculate_similarity(original, dup_id)
if similarity >= self.similarity_threshold:
similarity_scores[dup_id] = similarity
total_savings += self.photos[dup_id].size_mb
processed_photos.add(dup_id)
if similarity_scores:
duplicate_groups.append(DuplicateGroup(
original_photo=original,
duplicates=list(similarity_scores.keys()),
similarity_scores=similarity_scores,
storage_savings=total_savings
))
processed_photos.add(original)
return duplicate_groups
def _calculate_similarity(self, photo1_id: str, photo2_id: str) -> float:
"""Calculate similarity between two photos"""
photo1 = self.photos[photo1_id]
photo2 = self.photos[photo2_id]
similarity_factors = []
# Resolution similarity
w1, h1 = photo1.get_width_height()
w2, h2 = photo2.get_width_height()
if w1 > 0 and w2 > 0:
res_similarity = 1 - abs((w1 * h1) - (w2 * h2)) / max(w1 * h1, w2
* h2)
similarity_factors.append(res_similarity * 0.3)
# Size similarity
size_diff = abs(photo1.size_mb - photo2.size_mb) /
max(photo1.size_mb, photo2.size_mb)
size_similarity = 1 - size_diff
similarity_factors.append(size_similarity * 0.2)
# Time proximity (photos taken within 5 minutes are likely similar)
time_diff = abs(photo1.timestamp - photo2.timestamp)
time_similarity = max(0, 1 - time_diff / 300) # 5 minutes = 300
seconds
similarity_factors.append(time_similarity * 0.2)
# Location similarity
if photo1.location and photo2.location:
location_similarity = 1.0 if photo1.location == photo2.location
else 0.5
similarity_factors.append(location_similarity * 0.15)
# Face similarity
common_faces = set(photo1.faces) & set(photo2.faces)
total_faces = set(photo1.faces) | set(photo2.faces)
face_similarity = len(common_faces) / len(total_faces) if total_faces
else 0
similarity_factors.append(face_similarity * 0.15)
return sum(similarity_factors)
def create_smart_albums(self) -> List[Album]:
"""Create smart albums based on predefined rules"""
smart_albums = []
for album_name, criteria in self.smart_album_rules.items():
matching_photos = self._find_photos_by_criteria(criteria)
if matching_photos:
album = Album(
album_id=f"smart_{album_name.lower()}",
name=album_name,
album_type=AlbumType.SMART,
photos=matching_photos,
criteria=criteria,
cover_photo=matching_photos[0] if matching_photos else
None
)
smart_albums.append(album)
self.albums[album.album_id] = album
return smart_albums
def _find_photos_by_criteria(self, criteria: Dict) -> List[str]:
"""Find photos matching specific criteria"""
matching_photos = set(self.photos.keys())
# Filter by days (recent photos)
if "days" in criteria:
cutoff_time = time.time() - (criteria["days"] * 24 * 3600)
matching_photos = {
pid for pid in matching_photos
if self.photos[pid].timestamp >= cutoff_time
}
# Filter by favorite status
if "is_favorite" in criteria:
matching_photos = {
pid for pid in matching_photos
if self.photos[pid].is_favorite == criteria["is_favorite"]
}
# Filter by screenshot status
if "is_screenshot" in criteria:
matching_photos = {
pid for pid in matching_photos
if self.photos[pid].is_screenshot ==
criteria["is_screenshot"]
}
# Filter by minimum faces
if "min_faces" in criteria:
matching_photos = {
pid for pid in matching_photos
if len(self.photos[pid].faces) >= criteria["min_faces"]
}
# Filter by objects
if "objects" in criteria:
object_matches = set()
for obj in criteria["objects"]:
object_matches.update(self.search_index.get(obj.lower(),
set()))
matching_photos = matching_photos & object_matches
# Filter by location presence
if "has_location" in criteria and criteria["has_location"]:
matching_photos = {
pid for pid in matching_photos
if self.photos[pid].location is not None
}
# Sort by timestamp (newest first)
sorted_photos = sorted(
matching_photos,
key=lambda pid: self.photos[pid].timestamp,
reverse=True
)
return sorted_photos
def create_memories(self) -> List[Memory]:
"""Create memories based on photo clusters"""
memories = []
# Group photos by location and time
location_groups = self._group_photos_by_location_and_time()
for (location, date_range), photo_ids in location_groups.items():
if len(photo_ids) >= 5: # Minimum photos for a memory
# Find key people in this memory
people_counter = Counter()
for photo_id in photo_ids:
people_counter.update(self.photos[photo_id].faces)
key_people = [person for person, count in
people_counter.most_common(3)]
# Generate memory title
title = self._generate_memory_title(location, date_range,
key_people)
memory = Memory(
memory_id=f"memory_{int(time.time())}_{len(memories)}",
title=title,
photos=photo_ids,
date_range=date_range,
location=location,
key_people=key_people
)
memories.append(memory)
return memories
def _group_photos_by_location_and_time(self) -> Dict[Tuple[str,
Tuple[float, float]], List[str]]:
"""Group photos by location and time periods"""
location_time_groups = defaultdict(list)
# Sort photos by timestamp
sorted_photos = sorted(self.photos.items(), key=lambda x:
x[1].timestamp)
current_group = []
current_location = None
group_start_time = None
for photo_id, photo in sorted_photos:
# Check if this photo belongs to current group
if (current_location == photo.location and
group_start_time and
photo.timestamp - group_start_time <= 7 * 24 * 3600): #
Within 7 days
current_group.append(photo_id)
else:
# Save current group if it has enough photos
if len(current_group) >= 5 and current_location:
group_end_time = sorted_photos[len(current_group) - 1]
[1].timestamp
date_range = (group_start_time, group_end_time)
location_time_groups[(current_location, date_range)] =
current_group.copy()
# Start new group
current_group = [photo_id]
current_location = photo.location
group_start_time = photo.timestamp
# Don't forget the last group
if len(current_group) >= 5 and current_location:
group_end_time = sorted_photos[-1][1].timestamp
date_range = (group_start_time, group_end_time)
location_time_groups[(current_location, date_range)] =
current_group
return dict(location_time_groups)
def _generate_memory_title(self, location: str, date_range: Tuple[float,
float],
key_people: List[str]) -> str:
"""Generate a descriptive title for a memory"""
start_date = time.strftime("%B %Y", time.localtime(date_range[0]))
if location:
base_title = f"{location} Trip"
else:
base_title = f"Memory from {start_date}"
if key_people:
if len(key_people) == 1:
base_title += f" with {key_people[0]}"
elif len(key_people) == 2:
base_title += f" with {key_people[0]} and {key_people[1]}"
else:
base_title += f" with {key_people[0]} and others"
return base_title
def search_photos(self, query: str, limit: int = 50) -> List[str]:
"""Search photos by query"""
query_lower = query.lower()
matching_photos = set()
# Search in index
if query_lower in self.search_index:
matching_photos.update(self.search_index[query_lower])
# Fuzzy search for partial matches
for term in self.search_index:
if query_lower in term or term in query_lower:
matching_photos.update(self.search_index[term])
# Sort by relevance (timestamp for now, could be more sophisticated)
sorted_results = sorted(
matching_photos,
key=lambda pid: self.photos[pid].timestamp,
reverse=True
)
return sorted_results[:limit]
def optimize_storage(self, target_compression: float = 0.8) -> Dict:
"""Optimize storage by compressing photos intelligently"""
optimization_results = {
"original_size": 0,
"compressed_size": 0,
"savings_mb": 0,
"photos_processed": 0
}
# Calculate original total size
total_size = sum(photo.size_mb for photo in self.photos.values())
optimization_results["original_size"] = total_size
# Identify photos for compression (larger, older photos first)
compression_candidates = []
for photo_id, photo in self.photos.items():
# Skip favorites and recent photos from aggressive compression
if photo.is_favorite or (time.time() - photo.timestamp) < 30 * 24
* 3600:
continue
# Calculate compression priority (larger, older photos get higher
priority)
age_factor = (time.time() - photo.timestamp) / (365 * 24 * 3600)
# Years
size_factor = photo.size_mb / 10 # Normalize to ~10MB
priority = age_factor * size_factor
compression_candidates.append((priority, photo_id, photo))
# Sort by priority (highest first)
compression_candidates.sort(reverse=True)
# Compress photos until target is reached
compressed_size = total_size
for priority, photo_id, photo in compression_candidates:
if compressed_size / total_size <= target_compression:
break
# Simulate compression (reduce size by 30-60% based on original
size)
if photo.size_mb > 5:
compression_ratio = 0.4 # 60% reduction for large photos
elif photo.size_mb > 2:
compression_ratio = 0.6 # 40% reduction for medium photos
else:
compression_ratio = 0.8 # 20% reduction for small photos
new_size = photo.size_mb * compression_ratio
savings = photo.size_mb - new_size
compressed_size -= savings
optimization_results["photos_processed"] += 1
optimization_results["compressed_size"] = compressed_size
optimization_results["savings_mb"] = total_size - compressed_size
return optimization_results
def get_library_statistics(self) -> Dict:
"""Get comprehensive library statistics"""
total_photos = len(self.photos)
total_size = sum(photo.size_mb for photo in self.photos.values())
# Count by category
favorites = sum(1 for photo in self.photos.values() if
photo.is_favorite)
screenshots = sum(1 for photo in self.photos.values() if
photo.is_screenshot)
with_faces = sum(1 for photo in self.photos.values() if photo.faces)
with_location = sum(1 for photo in self.photos.values() if
photo.location)
# Most common objects and people
object_counter = Counter()
people_counter = Counter()
for photo in self.photos.values():
object_counter.update(photo.objects)
people_counter.update(photo.faces)
return {
"total_photos": total_photos,
"total_size_mb": round(total_size, 1),
"favorites": favorites,
"screenshots": screenshots,
"photos_with_faces": with_faces,
"photos_with_location": with_location,
"smart_albums": len([a for a in self.albums.values() if
a.album_type == AlbumType.SMART]),
"top_objects": object_counter.most_common(5),
"top_people": people_counter.most_common(5),
"date_range": self._get_date_range()
}
def _get_date_range(self) -> Tuple[str, str]:
"""Get the date range of photos in library"""
if not self.photos:
return ("", "")
timestamps = [photo.timestamp for photo in self.photos.values()]
earliest = time.strftime("%Y-%m-%d", time.localtime(min(timestamps)))
latest = time.strftime("%Y-%m-%d", time.localtime(max(timestamps)))
return (earliest, latest)
# Test the photo library organizer
def test_photo_organizer():
organizer = PhotoLibraryOrganizer()
# Add sample photos
photos = [
Photo("p1", time.time() - 86400, "/photos/p1.jpg", 4.2, "4032x3024",
"Paris", ["Alice", "Bob"], ["tower", "sky", "architecture"]),
Photo("p2", time.time() - 86340, "/photos/p2.jpg", 4.1, "4032x3024",
"Paris", ["Alice"], ["tower", "sky"], is_favorite=True),
Photo("p3", time.time() - 3600, "/photos/p3.jpg", 2.1, "1920x1080",
None, ["Charlie"], ["food", "restaurant"], is_screenshot=True),
Photo("p4", time.time() - 7200, "/photos/p4.jpg", 3.8, "3024x4032",
"New York", ["Alice", "Charlie"], ["building", "city"]),
Photo("p5", time.time() - 1800, "/photos/p5.jpg", 1.5, "1080x1920",
None, ["Alice"], ["selfie", "smile"], scene="selfie")
]
for photo in photos:
organizer.add_photo(photo)
print("Testing Photo Library Organizer:")
print(f"Added {len(photos)} photos")
# Test duplicate detection
duplicates = organizer.detect_duplicates()
print(f"\nDuplicate Detection:")
for dup_group in duplicates:
print(f" Original: {dup_group.original_photo}")
print(f" Duplicates: {dup_group.duplicates}")
print(f" Storage savings: {dup_group.storage_savings:.1f}MB")
# Test smart albums
smart_albums = organizer.create_smart_albums()
print(f"\nSmart Albums Created: {len(smart_albums)}")
for album in smart_albums:
print(f" {album.name}: {len(album.photos)} photos")
# Test memories
memories = organizer.create_memories()
print(f"\nMemories Created: {len(memories)}")
for memory in memories:
print(f" {memory.title}: {len(memory.photos)} photos")
# Test search
search_results = organizer.search_photos("Alice")
print(f"\nSearch Results for 'Alice': {len(search_results)} photos")
# Test storage optimization
optimization = organizer.optimize_storage(0.7)
print(f"\nStorage Optimization:")
print(f" Original size: {optimization['original_size']:.1f}MB")
print(f" Compressed size: {optimization['compressed_size']:.1f}MB")
print(f" Savings: {optimization['savings_mb']:.1f}MB")
print(f" Photos processed: {optimization['photos_processed']}")
# Test library statistics
stats = organizer.get_library_statistics()
print(f"\nLibrary Statistics:")
print(f" Total photos: {stats['total_photos']}")
print(f" Total size: {stats['total_size_mb']}MB")
print(f" Favorites: {stats['favorites']}")
print(f" With faces: {stats['photos_with_faces']}")
print(f" Date range: {stats['date_range'][0]} to {stats['date_range']
[1]}")
test_photo_organizer()
Key Insights:
• Intelligent categorization using computer vision and metadata analysis
• Efficient duplicate detection with perceptual hashing and similarity scoring
• Smart album creation based on content, people, and usage patterns
• Memory generation through clustering of related photos
• Storage optimization with intelligent compression strategies
Problem 25: Siri Voice Command Processing
Difficulty: Hard | Time Limit: 60 minutes | Company: Apple
Problem Statement:
Design a voice command processing system for Siri that:
1. Processes natural language voice commands in real-time
2. Handles multiple languages and accents
3. Maintains conversation context and follow-up queries
4. Integrates with various iOS apps and services
5. Provides intelligent responses with confidence scoring
Example:
Plain Text
Input:
voice_command = {
"audio_data": "base64_encoded_audio", "language": "en-US",
"user_id": "user123", "device_type": "iPhone", "context": "home_screen"
}
conversation_history = [
{"user": "Set a timer for 10 minutes", "siri": "Timer set for 10 minutes",
"timestamp": 1640995200},
{"user": "What's the weather like?", "siri": "It's 72°F and sunny",
"timestamp": 1640995260}
]
Output: {
"transcription": "Remind me to call mom at 3 PM",
"intent": "create_reminder", "confidence": 0.92,
"entities": {"contact": "mom", "time": "3 PM", "action": "call"},
"response": "I'll remind you to call mom at 3 PM today",
"actions": [{"type": "create_reminder", "params": {"title": "Call mom",
"time": "15:00"}}]
}
Solution Approach:
This problem requires natural language processing, speech recognition, and intelligent
response generation.
Python
import time
import re
import json
from typing import List, Dict, Tuple, Optional, Any
from dataclasses import dataclass, field
from enum import Enum
from collections import defaultdict, deque
import math
class Language(Enum):
EN_US = "en-US"
EN_UK = "en-UK"
ES_ES = "es-ES"
FR_FR = "fr-FR"
DE_DE = "de-DE"
ZH_CN = "zh-CN"
JA_JP = "ja-JP"
class IntentType(Enum):
TIMER = "timer"
REMINDER = "reminder"
WEATHER = "weather"
MUSIC = "music"
CALL = "call"
MESSAGE = "message"
CALENDAR = "calendar"
SEARCH = "search"
NAVIGATION = "navigation"
SMART_HOME = "smart_home"
APP_LAUNCH = "app_launch"
QUESTION = "question"
UNKNOWN = "unknown"
class EntityType(Enum):
TIME = "time"
DATE = "date"
DURATION = "duration"
CONTACT = "contact"
LOCATION = "location"
APP_NAME = "app_name"
MUSIC_ARTIST = "music_artist"
MUSIC_SONG = "music_song"
DEVICE_NAME = "device_name"
NUMBER = "number"
@dataclass
class VoiceCommand:
audio_data: str
language: Language
user_id: str
device_type: str
context: str
timestamp: float = field(default_factory=time.time)
noise_level: float = 0.0
confidence_threshold: float = 0.7
@dataclass
class Entity:
entity_type: EntityType
value: str
confidence: float
start_pos: int
end_pos: int
normalized_value: Any = None
@dataclass
class Intent:
intent_type: IntentType
confidence: float
entities: List[Entity]
raw_text: str
@dataclass
class ConversationTurn:
user_input: str
siri_response: str
intent: Intent
timestamp: float
context: Dict = field(default_factory=dict)
@dataclass
class SiriResponse:
transcription: str
intent: IntentType
confidence: float
entities: Dict[str, Any]
response_text: str
actions: List[Dict]
follow_up_suggestions: List[str] = field(default_factory=list)
requires_confirmation: bool = False
class SiriVoiceProcessor:
def __init__(self):
# Intent patterns for different languages
self.intent_patterns = {
Language.EN_US: {
IntentType.TIMER: [
r"set (?:a )?timer (?:for )?(\d+) ?(minutes?|hours?
|seconds?)",
r"timer (?:for )?(\d+) ?(minutes?|hours?|seconds?)",
r"start (?:a )?(\d+) ?(minute|hour|second) timer"
],
IntentType.REMINDER: [
r"remind me to (.+?) (?:at|in) (.+)",
r"set (?:a )?reminder (?:to )?(.+?) (?:at|for) (.+)",
r"create (?:a )?reminder (.+)"
],
IntentType.WEATHER: [
r"what'?s the weather (?:like )?(?:in (.+?))?",
r"weather (?:in (.+?))?",
r"how'?s the weather (?:in (.+?))?"
],
IntentType.MUSIC: [
r"play (.+?) by (.+)",
r"play (.+)",
r"start playing (.+)",
r"put on (.+)"
],
IntentType.CALL: [
r"call (.+)",
r"phone (.+)",
r"dial (.+)"
],
IntentType.MESSAGE: [
r"send (?:a )?(?:text|message) to (.+?) saying (.+)",
r"text (.+?) (.+)",
r"message (.+?) (.+)"
]
}
}
# Entity extraction patterns
self.entity_patterns = {
EntityType.TIME: [
r"(\d{1,2}):(\d{2})\s*(?:AM|PM)?",
r"(\d{1,2})\s*(AM|PM)",
r"(noon|midnight)",
r"(\d{1,2})\s*o'?clock"
],
EntityType.DURATION: [
r"(\d+)\s*(minutes?|hours?|seconds?|days?)",
r"(half|quarter)\s*(hour|minute)",
r"(\d+)\s*and\s*(\d+)\s*(minutes?|hours?)"
],
EntityType.DATE: [
r"(today|tomorrow|yesterday)",
r"
(monday|tuesday|wednesday|thursday|friday|saturday|sunday)",
r"(\d{1,2})/(\d{1,2})/(\d{4})",
r"
(january|february|march|april|may|june|july|august|september|october|november
|december)\s*(\d{1,2})"
],
EntityType.CONTACT: [
r"(?:call|text|message|phone)\s+(.+?)(?:\s+
(?:saying|that|about)|\s*$)",
r"(?:to|for)\s+(.+?)(?:\s+(?:saying|that|about)|\s*$)"
]
}
# Conversation context
self.conversation_history: Dict[str, deque] = defaultdict(lambda:
deque(maxlen=10))
self.user_preferences: Dict[str, Dict] = {}
# Response templates
self.response_templates = {
IntentType.TIMER: [
"Timer set for {duration}",
"I'll let you know in {duration}",
"{duration} timer started"
],
IntentType.REMINDER: [
"I'll remind you to {task} {time}",
"Reminder set for {task} {time}",
"Got it, I'll remind you to {task} {time}"
],
IntentType.WEATHER: [
"It's {temperature} and {condition} in {location}",
"The weather in {location} is {condition} with a temperature
of {temperature}",
"Currently {temperature} and {condition}"
],
IntentType.MUSIC: [
"Playing {song} by {artist}",
"Now playing {song}",
"Starting {song}"
],
IntentType.CALL: [
"Calling {contact}",
"Dialing {contact}",
"Placing call to {contact}"
]
}
# App integration mappings
self.app_integrations = {
"timer": "Clock",
"reminder": "Reminders",
"weather": "Weather",
"music": "Music",
"call": "Phone",
"message": "Messages",
"calendar": "Calendar",
"navigation": "Maps"
}
def process_voice_command(self, command: VoiceCommand,
conversation_history: List[ConversationTurn] =
None) -> SiriResponse:
"""Process a voice command and generate response"""
# Step 1: Speech-to-text transcription
transcription = self._transcribe_audio(command)
if not transcription or len(transcription.strip()) < 2:
return self._create_error_response("I didn't catch that. Could
you repeat?")
# Step 2: Intent recognition
intent = self._recognize_intent(transcription, command.language)
# Step 3: Entity extraction
entities = self._extract_entities(transcription, intent.intent_type)
# Step 4: Context resolution
resolved_intent = self._resolve_context(intent, entities,
conversation_history, command.user_id)
# Step 5: Generate response
response = self._generate_response(resolved_intent, entities,
command)
# Step 6: Update conversation history
self._update_conversation_history(command.user_id, transcription,
response, resolved_intent)
return response
def _transcribe_audio(self, command: VoiceCommand) -> str:
"""Transcribe audio to text (simplified simulation)"""
# In real implementation, this would use speech recognition APIs
# For simulation, we'll extract from a predefined mapping
sample_transcriptions = {
"timer_10min": "set a timer for 10 minutes",
"weather_query": "what's the weather like",
"call_mom": "call mom",
"remind_meeting": "remind me to join the meeting at 3 PM",
"play_music": "play some jazz music",
"send_message": "send a message to John saying I'll be late"
}
# Simulate transcription based on audio data hash
audio_hash = hash(command.audio_data) % len(sample_transcriptions)
transcriptions = list(sample_transcriptions.values())
# Add some noise based on noise level
transcription = transcriptions[audio_hash]
if command.noise_level > 0.5:
# Simulate transcription errors in noisy environments
if "timer" in transcription:
transcription = transcription.replace("timer", "time")
elif "weather" in transcription:
transcription = transcription.replace("weather", "whether")
return transcription
def _recognize_intent(self, text: str, language: Language) -> Intent:
"""Recognize intent from transcribed text"""
text_lower = text.lower().strip()
# Get patterns for the specified language
patterns = self.intent_patterns.get(language,
self.intent_patterns[Language.EN_US])
best_intent = IntentType.UNKNOWN
best_confidence = 0.0
best_entities = []
for intent_type, intent_patterns in patterns.items():
for pattern in intent_patterns:
match = re.search(pattern, text_lower)
if match:
# Calculate confidence based on match quality
confidence =
self._calculate_intent_confidence(text_lower, pattern, match)
if confidence > best_confidence:
best_intent = intent_type
best_confidence = confidence
best_entities = list(match.groups())
# Fallback intent recognition using keywords
if best_confidence < 0.5:
keyword_intent, keyword_confidence =
self._keyword_based_intent(text_lower)
if keyword_confidence > best_confidence:
best_intent = keyword_intent
best_confidence = keyword_confidence
return Intent(
intent_type=best_intent,
confidence=best_confidence,
entities=[], # Will be populated in entity extraction
raw_text=text
)
def _calculate_intent_confidence(self, text: str, pattern: str, match) ->
float:
"""Calculate confidence score for intent recognition"""
# Base confidence from pattern match
base_confidence = 0.7
# Boost confidence for exact keyword matches
intent_keywords = {
"timer": ["timer", "set", "start"],
"reminder": ["remind", "reminder", "remember"],
"weather": ["weather", "temperature", "forecast"],
"music": ["play", "music", "song"],
"call": ["call", "phone", "dial"],
"message": ["text", "message", "send"]
}
# Count keyword matches
keyword_matches = 0
total_keywords = 0
for intent_name, keywords in intent_keywords.items():
if intent_name in pattern:
total_keywords = len(keywords)
keyword_matches = sum(1 for keyword in keywords if keyword in
text)
break
if total_keywords > 0:
keyword_boost = (keyword_matches / total_keywords) * 0.3
base_confidence += keyword_boost
# Penalize for ambiguous or incomplete matches
if len(match.groups()) == 0:
base_confidence -= 0.2
return min(1.0, max(0.0, base_confidence))
def _keyword_based_intent(self, text: str) -> Tuple[IntentType, float]:
"""Fallback intent recognition using keywords"""
keyword_scores = defaultdict(float)
keywords = {
IntentType.TIMER: ["timer", "alarm", "countdown", "minutes",
"hours"],
IntentType.REMINDER: ["remind", "remember", "don't forget",
"note"],
IntentType.WEATHER: ["weather", "temperature", "rain", "sunny",
"cloudy"],
IntentType.MUSIC: ["play", "music", "song", "artist", "album"],
IntentType.CALL: ["call", "phone", "dial", "ring"],
IntentType.MESSAGE: ["text", "message", "send", "sms"],
IntentType.CALENDAR: ["calendar", "appointment", "meeting",
"schedule"],
IntentType.SEARCH: ["search", "find", "look up", "google"],
IntentType.NAVIGATION: ["directions", "navigate", "route",
"map"],
IntentType.QUESTION: ["what", "how", "when", "where", "why",
"who"]
}
for intent_type, intent_keywords in keywords.items():
for keyword in intent_keywords:
if keyword in text:
keyword_scores[intent_type] += 1.0 / len(intent_keywords)
if keyword_scores:
best_intent = max(keyword_scores.items(), key=lambda x: x[1])
return best_intent[0], min(0.8, best_intent[1])
return IntentType.UNKNOWN, 0.1
def _extract_entities(self, text: str, intent_type: IntentType) ->
List[Entity]:
"""Extract entities from text based on intent type"""
entities = []
text_lower = text.lower()
# Extract entities based on patterns
for entity_type, patterns in self.entity_patterns.items():
for pattern in patterns:
matches = re.finditer(pattern, text_lower)
for match in matches:
entity = Entity(
entity_type=entity_type,
value=match.group(0),
confidence=0.8,
start_pos=match.start(),
end_pos=match.end(),
normalized_value=self._normalize_entity_value(entity_type, match.group(0))
)
entities.append(entity)
# Intent-specific entity extraction
if intent_type == IntentType.TIMER:
entities.extend(self._extract_timer_entities(text_lower))
elif intent_type == IntentType.REMINDER:
entities.extend(self._extract_reminder_entities(text_lower))
elif intent_type == IntentType.MUSIC:
entities.extend(self._extract_music_entities(text_lower))
elif intent_type == IntentType.CALL:
entities.extend(self._extract_contact_entities(text_lower))
return entities
def _normalize_entity_value(self, entity_type: EntityType, value: str) ->
Any:
"""Normalize entity values to standard formats"""
if entity_type == EntityType.TIME:
return self._normalize_time(value)
elif entity_type == EntityType.DURATION:
return self._normalize_duration(value)
elif entity_type == EntityType.DATE:
return self._normalize_date(value)
elif entity_type == EntityType.CONTACT:
return self._normalize_contact(value)
return value
def _normalize_time(self, time_str: str) -> str:
"""Normalize time to 24-hour format"""
time_str = time_str.lower().strip()
if "noon" in time_str:
return "12:00"
elif "midnight" in time_str:
return "00:00"
# Handle AM/PM format
am_pm_match = re.search(r"(\d{1,2})(?::(\d{2}))?\s*(am|pm)",
time_str)
if am_pm_match:
hour = int(am_pm_match.group(1))
minute = int(am_pm_match.group(2)) if am_pm_match.group(2) else 0
is_pm = am_pm_match.group(3) == "pm"
if is_pm and hour != 12:
hour += 12
elif not is_pm and hour == 12:
hour = 0
return f"{hour:02d}:{minute:02d}"
# Handle 24-hour format
time_match = re.search(r"(\d{1,2}):(\d{2})", time_str)
if time_match:
hour = int(time_match.group(1))
minute = int(time_match.group(2))
return f"{hour:02d}:{minute:02d}"
return time_str
def _normalize_duration(self, duration_str: str) -> int:
"""Normalize duration to seconds"""
duration_str = duration_str.lower()
# Extract number and unit
match = re.search(r"(\d+)\s*(second|minute|hour|day)s?",
duration_str)
if match:
number = int(match.group(1))
unit = match.group(2)
multipliers = {
"second": 1,
"minute": 60,
"hour": 3600,
"day": 86400
}
return number * multipliers.get(unit, 1)
return 0
def _normalize_date(self, date_str: str) -> str:
"""Normalize date to ISO format"""
date_str = date_str.lower()
if date_str == "today":
return time.strftime("%Y-%m-%d")
elif date_str == "tomorrow":
tomorrow = time.time() + 86400
return time.strftime("%Y-%m-%d", time.localtime(tomorrow))
elif date_str == "yesterday":
yesterday = time.time() - 86400
return time.strftime("%Y-%m-%d", time.localtime(yesterday))
# Handle day names (assume next occurrence)
weekdays = ["monday", "tuesday", "wednesday", "thursday", "friday",
"saturday", "sunday"]
if date_str in weekdays:
# Calculate next occurrence of this weekday
current_weekday = time.localtime().tm_wday
target_weekday = weekdays.index(date_str)
days_ahead = (target_weekday - current_weekday) % 7
if days_ahead == 0:
days_ahead = 7 # Next week
target_date = time.time() + (days_ahead * 86400)
return time.strftime("%Y-%m-%d", time.localtime(target_date))
return date_str
def _normalize_contact(self, contact_str: str) -> str:
"""Normalize contact name"""
# Remove common prefixes and clean up
contact_str = re.sub(r"^(call|text|message|phone)\s+", "",
contact_str.lower())
contact_str = re.sub(r"\s+(saying|that|about).*$", "", contact_str)
# Capitalize properly
return contact_str.title().strip()
def _extract_timer_entities(self, text: str) -> List[Entity]:
"""Extract timer-specific entities"""
entities = []
# Look for duration in timer commands
duration_match = re.search(r"(\d+)\s*(minute|hour|second)s?", text)
if duration_match:
duration_seconds =
self._normalize_duration(duration_match.group(0))
entity = Entity(
entity_type=EntityType.DURATION,
value=duration_match.group(0),
confidence=0.9,
start_pos=duration_match.start(),
end_pos=duration_match.end(),
normalized_value=duration_seconds
)
entities.append(entity)
return entities
def _extract_reminder_entities(self, text: str) -> List[Entity]:
"""Extract reminder-specific entities"""
entities = []
# Extract task description
task_patterns = [
r"remind me to (.+?) (?:at|in|on)",
r"reminder (?:to )?(.+?) (?:at|for|on)",
r"don't forget to (.+?) (?:at|on)"
]
for pattern in task_patterns:
match = re.search(pattern, text)
if match:
task = match.group(1).strip()
entity = Entity(
entity_type=EntityType.APP_NAME, # Using as task type
value=task,
confidence=0.8,
start_pos=match.start(1),
end_pos=match.end(1),
normalized_value=task
)
entities.append(entity)
break
return entities
def _extract_music_entities(self, text: str) -> List[Entity]:
"""Extract music-specific entities"""
entities = []
# Extract song and artist
song_artist_match = re.search(r"play (.+?) by (.+)", text)
if song_artist_match:
song = song_artist_match.group(1).strip()
artist = song_artist_match.group(2).strip()
entities.append(Entity(
entity_type=EntityType.MUSIC_SONG,
value=song,
confidence=0.9,
start_pos=song_artist_match.start(1),
end_pos=song_artist_match.end(1),
normalized_value=song.title()
))
entities.append(Entity(
entity_type=EntityType.MUSIC_ARTIST,
value=artist,
confidence=0.9,
start_pos=song_artist_match.start(2),
end_pos=song_artist_match.end(2),
normalized_value=artist.title()
))
else:
# Extract general music request
music_match = re.search(r"play (.+)", text)
if music_match:
music_request = music_match.group(1).strip()
entities.append(Entity(
entity_type=EntityType.MUSIC_SONG,
value=music_request,
confidence=0.7,
start_pos=music_match.start(1),
end_pos=music_match.end(1),
normalized_value=music_request.title()
))
return entities
def _extract_contact_entities(self, text: str) -> List[Entity]:
"""Extract contact entities from call/message commands"""
entities = []
# Extract contact name
contact_patterns = [
r"(?:call|phone|dial)\s+(.+?)(?:\s|$)",
r"(?:text|message)\s+(.+?)\s+(?:saying|that)",
r"(?:send.*?to)\s+(.+?)\s+(?:saying|that)"
]
for pattern in contact_patterns:
match = re.search(pattern, text)
if match:
contact = self._normalize_contact(match.group(1))
entity = Entity(
entity_type=EntityType.CONTACT,
value=contact,
confidence=0.8,
start_pos=match.start(1),
end_pos=match.end(1),
normalized_value=contact
)
entities.append(entity)
break
return entities
def _resolve_context(self, intent: Intent, entities: List[Entity],
conversation_history: List[ConversationTurn],
user_id: str) -> Intent:
"""Resolve context and ambiguities using conversation history"""
if not conversation_history:
return intent
# Get recent conversation context
recent_turns = conversation_history[-3:] if conversation_history else
[]
# Resolve pronouns and references
resolved_entities = []
for entity in entities:
if entity.entity_type == EntityType.CONTACT:
# Resolve pronouns like "him", "her", "them"
if entity.value.lower() in ["him", "her", "them", "they"]:
resolved_contact =
self._resolve_pronoun_contact(entity.value, recent_turns)
if resolved_contact:
entity.normalized_value = resolved_contact
entity.confidence *= 0.8 # Reduce confidence for
resolved pronouns
resolved_entities.append(entity)
# Update intent confidence based on context consistency
if self._is_context_consistent(intent, recent_turns):
intent.confidence = min(1.0, intent.confidence + 0.1)
intent.entities = resolved_entities
return intent
def _resolve_pronoun_contact(self, pronoun: str, recent_turns:
List[ConversationTurn]) -> Optional[str]:
"""Resolve pronoun references to contacts from conversation
history"""
for turn in reversed(recent_turns):
for entity in turn.intent.entities:
if entity.entity_type == EntityType.CONTACT:
return entity.normalized_value
return None
def _is_context_consistent(self, intent: Intent, recent_turns:
List[ConversationTurn]) -> bool:
"""Check if current intent is consistent with recent conversation
context"""
if not recent_turns:
return True
last_turn = recent_turns[-1]
# Check for follow-up patterns
follow_up_patterns = {
IntentType.TIMER: [IntentType.TIMER],
IntentType.REMINDER: [IntentType.REMINDER, IntentType.CALENDAR],
IntentType.MUSIC: [IntentType.MUSIC],
IntentType.CALL: [IntentType.CALL, IntentType.MESSAGE],
IntentType.MESSAGE: [IntentType.MESSAGE, IntentType.CALL]
}
expected_intents =
follow_up_patterns.get(last_turn.intent.intent_type, [])
return intent.intent_type in expected_intents or len(recent_turns) ==
1
def _generate_response(self, intent: Intent, entities: List[Entity],
command: VoiceCommand) -> SiriResponse:
"""Generate appropriate response for the recognized intent"""
if intent.confidence < command.confidence_threshold:
return self._create_clarification_response(intent, entities)
# Extract entity values for response generation
entity_values = {}
for entity in entities:
entity_values[entity.entity_type.value] = entity.normalized_value
or entity.value
# Generate response based on intent type
if intent.intent_type == IntentType.TIMER:
return self._generate_timer_response(entity_values)
elif intent.intent_type == IntentType.REMINDER:
return self._generate_reminder_response(entity_values)
elif intent.intent_type == IntentType.WEATHER:
return self._generate_weather_response(entity_values)
elif intent.intent_type == IntentType.MUSIC:
return self._generate_music_response(entity_values)
elif intent.intent_type == IntentType.CALL:
return self._generate_call_response(entity_values)
elif intent.intent_type == IntentType.MESSAGE:
return self._generate_message_response(entity_values)
else:
return self._create_error_response("I'm not sure how to help with
that.")
def _generate_timer_response(self, entities: Dict) -> SiriResponse:
"""Generate response for timer intent"""
duration = entities.get("duration", 600) # Default 10 minutes
# Convert seconds to human readable
if duration >= 3600:
duration_text = f"{duration // 3600} hour{'s' if duration >= 7200
else ''}"
elif duration >= 60:
duration_text = f"{duration // 60} minute{'s' if duration >= 120
else ''}"
else:
duration_text = f"{duration} second{'s' if duration != 1 else
''}"
response_text = f"Timer set for {duration_text}"
actions = [{
"type": "create_timer",
"params": {
"duration": duration,
"title": f"{duration_text} timer"
}
}]
return SiriResponse(
transcription="set a timer for " + duration_text,
intent=IntentType.TIMER,
confidence=0.9,
entities=entities,
response_text=response_text,
actions=actions,
follow_up_suggestions=["Cancel timer", "Add another timer"]
)
def _generate_reminder_response(self, entities: Dict) -> SiriResponse:
"""Generate response for reminder intent"""
task = entities.get("app_name", "something") # Using app_name as
task
time_str = entities.get("time", "later")
response_text = f"I'll remind you to {task}"
if time_str != "later":
response_text += f" at {time_str}"
actions = [{
"type": "create_reminder",
"params": {
"title": task,
"time": time_str,
"date": entities.get("date", time.strftime("%Y-%m-%d"))
}
}]
return SiriResponse(
transcription=f"remind me to {task} at {time_str}",
intent=IntentType.REMINDER,
confidence=0.85,
entities=entities,
response_text=response_text,
actions=actions,
follow_up_suggestions=["Set another reminder", "View all
reminders"]
)
def _generate_weather_response(self, entities: Dict) -> SiriResponse:
"""Generate response for weather intent"""
location = entities.get("location", "your location")
# Simulate weather data
weather_data = {
"temperature": "72°F",
"condition": "sunny",
"location": location
}
response_text = f"It's {weather_data['temperature']} and
{weather_data['condition']}"
if location != "your location":
response_text += f" in {location}"
actions = [{
"type": "show_weather",
"params": {
"location": location,
"detailed": True
}
}]
return SiriResponse(
transcription=f"what's the weather like in {location}",
intent=IntentType.WEATHER,
confidence=0.9,
entities=entities,
response_text=response_text,
actions=actions,
follow_up_suggestions=["Show forecast", "Weather in other
cities"]
)
def _generate_music_response(self, entities: Dict) -> SiriResponse:
"""Generate response for music intent"""
song = entities.get("music_song", "music")
artist = entities.get("music_artist")
if artist:
response_text = f"Playing {song} by {artist}"
else:
response_text = f"Playing {song}"
actions = [{
"type": "play_music",
"params": {
"song": song,
"artist": artist,
"app": "Music"
}
}]
return SiriResponse(
transcription=f"play {song}" + (f" by {artist}" if artist else
""),
intent=IntentType.MUSIC,
confidence=0.85,
entities=entities,
response_text=response_text,
actions=actions,
follow_up_suggestions=["Pause", "Next song", "Add to playlist"]
)
def _generate_call_response(self, entities: Dict) -> SiriResponse:
"""Generate response for call intent"""
contact = entities.get("contact", "unknown contact")
response_text = f"Calling {contact}"
actions = [{
"type": "make_call",
"params": {
"contact": contact,
"app": "Phone"
}
}]
return SiriResponse(
transcription=f"call {contact}",
intent=IntentType.CALL,
confidence=0.9,
entities=entities,
response_text=response_text,
actions=actions,
requires_confirmation=True,
follow_up_suggestions=["Send message instead", "Add to
favorites"]
)
def _generate_message_response(self, entities: Dict) -> SiriResponse:
"""Generate response for message intent"""
contact = entities.get("contact", "unknown contact")
message = entities.get("message", "")
response_text = f"Sending message to {contact}"
if message:
response_text += f": '{message}'"
actions = [{
"type": "send_message",
"params": {
"contact": contact,
"message": message,
"app": "Messages"
}
}]
return SiriResponse(
transcription=f"send message to {contact}",
intent=IntentType.MESSAGE,
confidence=0.85,
entities=entities,
response_text=response_text,
actions=actions,
requires_confirmation=True,
follow_up_suggestions=["Call instead", "Send another message"]
)
def _create_clarification_response(self, intent: Intent, entities:
List[Entity]) -> SiriResponse:
"""Create response when confidence is low"""
clarification_questions = {
IntentType.TIMER: "How long should I set the timer for?",
IntentType.REMINDER: "What would you like me to remind you
about?",
IntentType.MUSIC: "What music would you like me to play?",
IntentType.CALL: "Who would you like me to call?",
IntentType.MESSAGE: "Who would you like to send a message to?",
IntentType.UNKNOWN: "I'm not sure what you'd like me to do. Could
you try again?"
}
question = clarification_questions.get(intent.intent_type,
clarification_questions[IntentType.UNKNOWN])
return SiriResponse(
transcription=intent.raw_text,
intent=intent.intent_type,
confidence=intent.confidence,
entities={},
response_text=question,
actions=[],
follow_up_suggestions=["Try again", "Cancel"]
)
def _create_error_response(self, message: str) -> SiriResponse:
"""Create error response"""
return SiriResponse(
transcription="",
intent=IntentType.UNKNOWN,
confidence=0.0,
entities={},
response_text=message,
actions=[],
follow_up_suggestions=["Try again", "What can you do?"]
)
def _update_conversation_history(self, user_id: str, user_input: str,
response: SiriResponse, intent: Intent):
"""Update conversation history for context"""
turn = ConversationTurn(
user_input=user_input,
siri_response=response.response_text,
intent=intent,
timestamp=time.time(),
context={"device_context": "home_screen"}
)
self.conversation_history[user_id].append(turn)
def get_conversation_summary(self, user_id: str) -> Dict:
"""Get conversation summary for a user"""
history = list(self.conversation_history[user_id])
if not history:
return {"total_interactions": 0, "common_intents": [],
"recent_topics": []}
# Count intent types
intent_counts = Counter(turn.intent.intent_type for turn in history)
# Get recent topics
recent_topics = [turn.user_input for turn in history[-5:]]
return {
"total_interactions": len(history),
"common_intents": intent_counts.most_common(3),
"recent_topics": recent_topics,
"last_interaction": history[-1].timestamp if history else None
}
# Test the Siri voice processor
def test_siri_processor():
processor = SiriVoiceProcessor()
# Test voice commands
test_commands = [
VoiceCommand("timer_audio", Language.EN_US, "user123", "iPhone",
"home_screen"),
VoiceCommand("weather_audio", Language.EN_US, "user123", "iPhone",
"home_screen"),
VoiceCommand("call_audio", Language.EN_US, "user123", "iPhone",
"contacts"),
VoiceCommand("music_audio", Language.EN_US, "user123", "iPhone",
"music_app")
]
print("Testing Siri Voice Processor:")
conversation_history = []
for i, command in enumerate(test_commands):
print(f"\n--- Command {i+1} ---")
response = processor.process_voice_command(command,
conversation_history)
print(f"Transcription: {response.transcription}")
print(f"Intent: {response.intent.value}")
print(f"Confidence: {response.confidence:.2f}")
print(f"Response: {response.response_text}")
print(f"Actions: {len(response.actions)}")
print(f"Follow-up suggestions: {response.follow_up_suggestions}")
# Add to conversation history for context
if response.intent != IntentType.UNKNOWN:
turn = ConversationTurn(
user_input=response.transcription,
siri_response=response.response_text,
intent=Intent(response.intent, response.confidence, [],
response.transcription),
timestamp=time.time()
)
conversation_history.append(turn)
# Test conversation summary
summary = processor.get_conversation_summary("user123")
print(f"\n--- Conversation Summary ---")
print(f"Total interactions: {summary['total_interactions']}")
print(f"Common intents: {summary['common_intents']}")
print(f"Recent topics: {summary['recent_topics']}")
test_siri_processor()
Key Insights:
• Multi-language natural language processing with intent recognition
• Context-aware conversation management with entity resolution
• Confidence scoring and clarification handling for ambiguous commands
• Integration with iOS apps and services through structured actions
• Real-time speech processing with noise handling and error recovery
2.6 Netflix OA Problems {#netflix-oa}
Problem 26: Video Streaming Quality Optimization
Difficulty: Hard | Time Limit: 60 minutes | Company: Netflix
Problem Statement:
Design an adaptive streaming algorithm that optimizes video quality based on network
conditions, device capabilities, and user preferences. The algorithm should minimize
buffering while maximizing video quality.
Example:
Plain Text
Input:
network_conditions = {
"bandwidth": 5000, # kbps
"latency": 50, # ms
"packet_loss": 0.01 # 1%
}
device_info = {
"screen_resolution": "1920x1080",
"cpu_power": 0.8, # 0.0 to 1.0
"battery_level": 0.6
}
available_qualities = [
{"resolution": "480p", "bitrate": 1000, "cpu_usage": 0.2},
{"resolution": "720p", "bitrate": 2500, "cpu_usage": 0.4},
{"resolution": "1080p", "bitrate": 5000, "cpu_usage": 0.7},
{"resolution": "4K", "bitrate": 15000, "cpu_usage": 0.9}
]
Output: {
"selected_quality": "1080p",
"confidence": 0.85,
"buffer_target": 30, # seconds
"adaptation_reason": "optimal_for_conditions"
}
Solution Approach:
This problem requires implementing an adaptive bitrate algorithm that considers multiple
factors.
Python
import math
from typing import Dict, List, Tuple
from dataclasses import dataclass
from enum import Enum
class AdaptationReason(Enum):
OPTIMAL = "optimal_for_conditions"
BANDWIDTH_LIMITED = "bandwidth_limited"
CPU_LIMITED = "cpu_limited"
BATTERY_SAVING = "battery_saving"
NETWORK_UNSTABLE = "network_unstable"
@dataclass
class QualityLevel:
resolution: str
bitrate: int # kbps
cpu_usage: float # 0.0 to 1.0
width: int = 0
height: int = 0
def __post_init__(self):
# Parse resolution
if 'x' in self.resolution:
self.width, self.height = map(int, self.resolution.split('x'))
else:
# Handle common formats
res_map = {
"480p": (854, 480),
"720p": (1280, 720),
"1080p": (1920, 1080),
"4K": (3840, 2160)
}
self.width, self.height = res_map.get(self.resolution, (1920,
1080))
class AdaptiveStreamingOptimizer:
def __init__(self):
self.buffer_target_base = 30 # seconds
self.bandwidth_safety_factor = 0.8 # Use 80% of available bandwidth
self.quality_weights = {
'bandwidth_fit': 0.4,
'cpu_efficiency': 0.2,
'resolution_preference': 0.2,
'stability': 0.2
}
def calculate_network_stability(self, network_conditions: Dict) -> float:
"""Calculate network stability score (0.0 to 1.0)"""
latency = network_conditions.get('latency', 100)
packet_loss = network_conditions.get('packet_loss', 0.05)
# Lower latency and packet loss = higher stability
latency_score = max(0, 1.0 - (latency - 20) / 200) # Good if < 20ms
packet_loss_score = max(0, 1.0 - packet_loss / 0.05) # Good if < 5%
return (latency_score + packet_loss_score) / 2
def calculate_bandwidth_utilization(self, bitrate: int,
available_bandwidth: int) -> float:
"""Calculate how well the bitrate fits available bandwidth"""
if available_bandwidth <= 0:
return 0.0
utilization = bitrate / available_bandwidth
# Optimal utilization is around 70-80%
if utilization <= 0.8:
return 1.0 - abs(0.75 - utilization) / 0.75
else:
# Penalize over-utilization heavily
return max(0.0, 1.0 - (utilization - 0.8) / 0.5)
def calculate_device_compatibility(self, quality: QualityLevel,
device_info: Dict) -> float:
"""Calculate how well quality matches device capabilities"""
# Screen resolution compatibility
device_res = device_info.get('screen_resolution', '1920x1080')
device_width, device_height = map(int, device_res.split('x'))
# Don't stream higher resolution than device can display
if quality.width > device_width or quality.height > device_height:
resolution_score = 0.7 # Penalty for over-resolution
else:
resolution_score = min(quality.width / device_width,
quality.height / device_height)
# CPU compatibility
cpu_power = device_info.get('cpu_power', 0.5)
cpu_score = max(0.0, 1.0 - max(0, quality.cpu_usage - cpu_power) /
0.5)
# Battery consideration
battery_level = device_info.get('battery_level', 1.0)
if battery_level < 0.2: # Low battery
battery_penalty = quality.cpu_usage * 0.5 # Penalize high CPU
usage
else:
battery_penalty = 0.0
return (resolution_score + cpu_score) / 2 - battery_penalty
def calculate_quality_score(self, quality: QualityLevel,
network_conditions: Dict,
device_info: Dict) -> Tuple[float,
AdaptationReason]:
"""Calculate overall quality score for given conditions"""
available_bandwidth = network_conditions.get('bandwidth', 1000)
# Calculate individual scores
bandwidth_score =
self.calculate_bandwidth_utilization(quality.bitrate, available_bandwidth)
device_score = self.calculate_device_compatibility(quality,
device_info)
stability_score =
self.calculate_network_stability(network_conditions)
# Resolution preference (higher is generally better, but with
diminishing returns)
max_pixels = 3840 * 2160 # 4K
current_pixels = quality.width * quality.height
resolution_score = math.sqrt(current_pixels / max_pixels)
# Determine primary limiting factor
if bandwidth_score < 0.3:
reason = AdaptationReason.BANDWIDTH_LIMITED
elif device_score < 0.3:
if device_info.get('battery_level', 1.0) < 0.2:
reason = AdaptationReason.BATTERY_SAVING
else:
reason = AdaptationReason.CPU_LIMITED
elif stability_score < 0.5:
reason = AdaptationReason.NETWORK_UNSTABLE
else:
reason = AdaptationReason.OPTIMAL
# Calculate weighted score
total_score = (
bandwidth_score * self.quality_weights['bandwidth_fit'] +
device_score * self.quality_weights['cpu_efficiency'] +
resolution_score * self.quality_weights['resolution_preference']
+
stability_score * self.quality_weights['stability']
)
return total_score, reason
def calculate_buffer_target(self, network_conditions: Dict,
selected_quality: QualityLevel) -> int:
"""Calculate optimal buffer target based on network conditions"""
stability_score =
self.calculate_network_stability(network_conditions)
# More unstable network = larger buffer
buffer_multiplier = 1.0 + (1.0 - stability_score) * 2.0
# Higher bitrate = larger buffer (more data per second)
bitrate_factor = 1.0 + (selected_quality.bitrate / 10000) * 0.5
target_buffer = int(self.buffer_target_base * buffer_multiplier *
bitrate_factor)
return min(target_buffer, 120) # Cap at 2 minutes
def optimize_streaming_quality(self, network_conditions: Dict,
device_info: Dict,
available_qualities: List[Dict]) -> Dict:
"""Select optimal streaming quality"""
# Convert to QualityLevel objects
qualities = [QualityLevel(**q) for q in available_qualities]
best_quality = None
best_score = -1
best_reason = AdaptationReason.OPTIMAL
# Evaluate each quality level
quality_scores = []
for quality in qualities:
score, reason = self.calculate_quality_score(quality,
network_conditions, device_info)
quality_scores.append((quality, score, reason))
if score > best_score:
best_score = score
best_quality = quality
best_reason = reason
if best_quality is None:
# Fallback to lowest quality
best_quality = min(qualities, key=lambda q: q.bitrate)
best_reason = AdaptationReason.BANDWIDTH_LIMITED
best_score = 0.1
# Calculate buffer target
buffer_target = self.calculate_buffer_target(network_conditions,
best_quality)
# Calculate confidence based on score and stability
stability = self.calculate_network_stability(network_conditions)
confidence = min(0.95, best_score * 0.7 + stability * 0.3)
return {
"selected_quality": best_quality.resolution,
"bitrate": best_quality.bitrate,
"confidence": round(confidence, 2),
"buffer_target": buffer_target,
"adaptation_reason": best_reason.value,
"quality_scores": [
{
"resolution": q.resolution,
"score": round(score, 3),
"reason": reason.value
}
for q, score, reason in quality_scores
]
}
def predict_quality_changes(self, current_conditions: Dict, device_info:
Dict,
available_qualities: List[Dict],
predicted_bandwidth_changes: List[float]) ->
List[Dict]:
"""Predict how quality should change with bandwidth variations"""
predictions = []
for bandwidth_multiplier in predicted_bandwidth_changes:
# Simulate new network conditions
new_conditions = current_conditions.copy()
new_conditions['bandwidth'] = int(current_conditions['bandwidth']
* bandwidth_multiplier)
# Get optimal quality for new conditions
result = self.optimize_streaming_quality(new_conditions,
device_info, available_qualities)
predictions.append({
"bandwidth_change": f"{bandwidth_multiplier:.1f}x",
"new_bandwidth": new_conditions['bandwidth'],
"recommended_quality": result['selected_quality'],
"confidence": result['confidence']
})
return predictions
# Test cases
def test_streamingOptimizer():
optimizer = AdaptiveStreamingOptimizer()
# Test scenarios
network_conditions = {
"bandwidth": 5000, # 5 Mbps
"latency": 50,
"packet_loss": 0.01
}
device_info = {
"screen_resolution": "1920x1080",
"cpu_power": 0.8,
"battery_level": 0.6
}
available_qualities = [
{"resolution": "480p", "bitrate": 1000, "cpu_usage": 0.2},
{"resolution": "720p", "bitrate": 2500, "cpu_usage": 0.4},
{"resolution": "1080p", "bitrate": 5000, "cpu_usage": 0.7},
{"resolution": "4K", "bitrate": 15000, "cpu_usage": 0.9}
]
print("Testing Netflix Streaming Optimizer:")
print(f"Network: {network_conditions['bandwidth']} kbps,
{network_conditions['latency']}ms latency")
print(f"Device: {device_info['screen_resolution']}, CPU:
{device_info['cpu_power']}")
# Test optimal quality selection
result = optimizer.optimize_streaming_quality(network_conditions,
device_info, available_qualities)
print(f"\nOptimal Quality Selection:")
print(f"Selected: {result['selected_quality']} ({result['bitrate']}
kbps)")
print(f"Confidence: {result['confidence']}")
print(f"Buffer target: {result['buffer_target']} seconds")
print(f"Reason: {result['adaptation_reason']}")
print(f"\nAll Quality Scores:")
for score_info in result['quality_scores']:
print(f" {score_info['resolution']}: {score_info['score']}
({score_info['reason']})")
# Test bandwidth change predictions
bandwidth_changes = [0.5, 0.8, 1.2, 1.5, 2.0]
predictions = optimizer.predict_quality_changes(
network_conditions, device_info, available_qualities,
bandwidth_changes
)
print(f"\nBandwidth Change Predictions:")
for pred in predictions:
print(f" {pred['bandwidth_change']}: {pred['recommended_quality']} "
f"(confidence: {pred['confidence']})")
test_streamingOptimizer()
Key Insights:
• Balance multiple factors: bandwidth, device capabilities, network stability
• Use weighted scoring for different optimization criteria
• Implement adaptive buffering based on network conditions
• Consider battery life and CPU usage for mobile devices
Problem 27: Content Recommendation Engine
Difficulty: Hard | Time Limit: 60 minutes | Company: Netflix
Problem Statement:
Design a content recommendation engine for Netflix that:
1. Analyzes user viewing history and preferences
2. Implements collaborative and content-based filtering
3. Handles cold start problems for new users
4. Provides real-time recommendations with low latency
5. Supports A/B testing for recommendation algorithms
Example:
Plain Text
Input:
user_profile = {
"user_id": "u123", "age": 28, "country": "US",
"viewing_history": [
{"title": "Stranger Things", "genre": ["Sci-Fi", "Drama"], "rating": 5,
"watch_time": 0.9},
{"title": "The Crown", "genre": ["Drama", "History"], "rating": 4,
"watch_time": 0.7}
],
"preferences": {"genres": ["Sci-Fi", "Drama"], "languages": ["English"]}
}
content_catalog = [
{"id": "c1", "title": "Dark", "genre": ["Sci-Fi", "Thriller"], "rating":
4.5, "popularity": 0.8},
{"id": "c2", "title": "Bridgerton", "genre": ["Drama", "Romance"],
"rating": 4.2, "popularity": 0.9}
]
Output: {
"recommendations": [
{"content_id": "c1", "score": 0.87, "reason": "Similar to Stranger
Things"},
{"content_id": "c2", "score": 0.73, "reason": "Popular drama series"}
],
"algorithm": "hybrid", "confidence": 0.85
}
Solution Approach:
This problem requires machine learning algorithms, collaborative filtering, and real-time
recommendation systems.
Python
import time
import math
import random
from typing import List, Dict, Tuple, Optional, Set
from dataclasses import dataclass, field
from enum import Enum
from collections import defaultdict, Counter
import heapq
class Genre(Enum):
ACTION = "Action"
COMEDY = "Comedy"
DRAMA = "Drama"
HORROR = "Horror"
ROMANCE = "Romance"
SCI_FI = "Sci-Fi"
THRILLER = "Thriller"
DOCUMENTARY = "Documentary"
ANIMATION = "Animation"
FANTASY = "Fantasy"
class RecommendationAlgorithm(Enum):
COLLABORATIVE = "collaborative"
CONTENT_BASED = "content_based"
HYBRID = "hybrid"
POPULARITY = "popularity"
TRENDING = "trending"
@dataclass
class ViewingRecord:
title: str
content_id: str
genre: List[str]
rating: float # 1-5
watch_time: float # 0-1 (percentage watched)
timestamp: float
device: str = "TV"
session_duration: int = 0 # minutes
@dataclass
class UserProfile:
user_id: str
age: int
country: str
viewing_history: List[ViewingRecord]
preferences: Dict
subscription_tier: str = "standard"
created_at: float = field(default_factory=time.time)
@dataclass
class Content:
content_id: str
title: str
genre: List[str]
rating: float
popularity: float
release_year: int
duration: int # minutes
language: str = "English"
content_type: str = "series" # series, movie, documentary
cast: List[str] = field(default_factory=list)
director: str = ""
tags: List[str] = field(default_factory=list)
@dataclass
class Recommendation:
content_id: str
score: float
reason: str
algorithm: RecommendationAlgorithm
confidence: float
rank: int = 0
class ContentRecommendationEngine:
def __init__(self):
# User-item interaction matrix
self.user_item_matrix: Dict[str, Dict[str, float]] =
defaultdict(dict)
self.item_user_matrix: Dict[str, Dict[str, float]] =
defaultdict(dict)
# Content features for content-based filtering
self.content_features: Dict[str, Dict] = {}
self.genre_vectors: Dict[str, List[float]] = {}
# User similarity cache
self.user_similarity_cache: Dict[Tuple[str, str], float] = {}
self.item_similarity_cache: Dict[Tuple[str, str], float] = {}
# Popularity and trending data
self.popularity_scores: Dict[str, float] = {}
self.trending_scores: Dict[str, float] = {}
# A/B testing configurations
self.ab_test_configs: Dict[str, Dict] = {
"algorithm_weights": {
"collaborative": 0.4,
"content_based": 0.3,
"popularity": 0.2,
"trending": 0.1
},
"diversity_factor": 0.15,
"novelty_factor": 0.1
}
# Cold start handling
self.default_recommendations: List[str] = []
self.genre_popularity: Dict[str, float] = {}
def add_user_interaction(self, user_id: str, content_id: str,
rating: float, watch_time: float):
"""Add user interaction to the system"""
# Calculate implicit rating based on watch time and explicit rating
implicit_score = self._calculate_implicit_score(rating, watch_time)
self.user_item_matrix[user_id][content_id] = implicit_score
self.item_user_matrix[content_id][user_id] = implicit_score
# Clear similarity cache for affected users
self._invalidate_similarity_cache(user_id)
def _calculate_implicit_score(self, rating: float, watch_time: float) ->
float:
"""Calculate implicit score from rating and watch time"""
# Combine explicit rating with implicit feedback
if rating > 0:
# Weight explicit rating more heavily
return (rating * 0.7) + (watch_time * 5 * 0.3)
else:
# Only implicit feedback available
if watch_time > 0.8:
return 4.5 # High engagement
elif watch_time > 0.5:
return 3.5 # Medium engagement
elif watch_time > 0.2:
return 2.5 # Low engagement
else:
return 1.0 # Very low engagement
def add_content(self, content: Content):
"""Add content to the catalog"""
# Extract content features for content-based filtering
features = {
"genre_vector": self._create_genre_vector(content.genre),
"popularity": content.popularity,
"rating": content.rating,
"release_year": content.release_year,
"duration": content.duration,
"language": content.language,
"content_type": content.content_type
}
self.content_features[content.content_id] = features
self.popularity_scores[content.content_id] = content.popularity
# Update genre popularity
for genre in content.genre:
self.genre_popularity[genre] = self.genre_popularity.get(genre,
0) + content.popularity
def _create_genre_vector(self, genres: List[str]) -> List[float]:
"""Create genre vector for content-based filtering"""
all_genres = [g.value for g in Genre]
vector = [0.0] * len(all_genres)
for genre in genres:
if genre in all_genres:
vector[all_genres.index(genre)] = 1.0
return vector
def get_recommendations(self, user_profile: UserProfile,
num_recommendations: int = 10,
algorithm: Optional[RecommendationAlgorithm] =
None) -> List[Recommendation]:
"""Get recommendations for a user"""
# Handle cold start problem
if len(user_profile.viewing_history) < 3:
return self._get_cold_start_recommendations(user_profile,
num_recommendations)
# Update user-item matrix with viewing history
self._update_user_matrix(user_profile)
# Get recommendations from different algorithms
if algorithm:
recommendations = self._get_algorithm_recommendations(
user_profile, algorithm, num_recommendations * 2
)
else:
# Hybrid approach
recommendations = self._get_hybrid_recommendations(user_profile,
num_recommendations * 2)
# Apply diversity and novelty filters
filtered_recommendations = self._apply_diversity_filter(
recommendations, user_profile, num_recommendations
)
# Rank and return top recommendations
return self._rank_recommendations(filtered_recommendations)
[:num_recommendations]
def _update_user_matrix(self, user_profile: UserProfile):
"""Update user-item matrix with viewing history"""
for record in user_profile.viewing_history:
self.add_user_interaction(
user_profile.user_id,
record.content_id,
record.rating,
record.watch_time
)
def _get_cold_start_recommendations(self, user_profile: UserProfile,
num_recommendations: int) ->
List[Recommendation]:
"""Handle cold start problem for new users"""
recommendations = []
# Use demographic-based recommendations
demographic_recs =
self._get_demographic_recommendations(user_profile)
recommendations.extend(demographic_recs)
# Add popular content
popular_recs =
self._get_popularity_recommendations(num_recommendations // 2)
recommendations.extend(popular_recs)
# Add trending content
trending_recs =
self._get_trending_recommendations(num_recommendations // 2)
recommendations.extend(trending_recs)
return recommendations[:num_recommendations]
def _get_demographic_recommendations(self, user_profile: UserProfile) ->
List[Recommendation]:
"""Get recommendations based on user demographics"""
recommendations = []
# Age-based recommendations
age_preferences = {
(18, 25): ["Action", "Comedy", "Romance"],
(26, 35): ["Drama", "Thriller", "Sci-Fi"],
(36, 50): ["Drama", "Documentary", "Thriller"],
(51, 100): ["Drama", "Documentary", "Romance"]
}
preferred_genres = []
for (min_age, max_age), genres in age_preferences.items():
if min_age <= user_profile.age <= max_age:
preferred_genres = genres
break
# Find content matching demographic preferences
for content_id, features in self.content_features.items():
genre_vector = features["genre_vector"]
all_genres = [g.value for g in Genre]
content_genres = [all_genres[i] for i, val in
enumerate(genre_vector) if val > 0]
if any(genre in preferred_genres for genre in content_genres):
score = features["popularity"] * 0.8 # Demographic match
recommendations.append(Recommendation(
content_id=content_id,
score=score,
reason=f"Popular with {user_profile.age}-year-olds",
algorithm=RecommendationAlgorithm.POPULARITY,
confidence=0.6
))
return sorted(recommendations, key=lambda x: x.score, reverse=True)
[:5]
def _get_algorithm_recommendations(self, user_profile: UserProfile,
algorithm: RecommendationAlgorithm,
num_recommendations: int) ->
List[Recommendation]:
"""Get recommendations from a specific algorithm"""
if algorithm == RecommendationAlgorithm.COLLABORATIVE:
return self._collaborative_filtering(user_profile,
num_recommendations)
elif algorithm == RecommendationAlgorithm.CONTENT_BASED:
return self._content_based_filtering(user_profile,
num_recommendations)
elif algorithm == RecommendationAlgorithm.POPULARITY:
return self._get_popularity_recommendations(num_recommendations)
elif algorithm == RecommendationAlgorithm.TRENDING:
return self._get_trending_recommendations(num_recommendations)
else:
return []
def _collaborative_filtering(self, user_profile: UserProfile,
num_recommendations: int) ->
List[Recommendation]:
"""Collaborative filtering recommendations"""
user_id = user_profile.user_id
if user_id not in self.user_item_matrix:
return []
# Find similar users
similar_users = self._find_similar_users(user_id, top_k=50)
# Get recommendations from similar users
recommendations = {}
user_items = set(self.user_item_matrix[user_id].keys())
for similar_user, similarity in similar_users:
for item_id, rating in
self.user_item_matrix[similar_user].items():
if item_id not in user_items: # Not already consumed
if item_id not in recommendations:
recommendations[item_id] = 0
recommendations[item_id] += similarity * rating
# Convert to Recommendation objects
rec_list = []
for content_id, score in recommendations.items():
rec_list.append(Recommendation(
content_id=content_id,
score=score,
reason="Users with similar taste also liked this",
algorithm=RecommendationAlgorithm.COLLABORATIVE,
confidence=0.8
))
return sorted(rec_list, key=lambda x: x.score, reverse=True)
[:num_recommendations]
def _find_similar_users(self, user_id: str, top_k: int = 50) ->
List[Tuple[str, float]]:
"""Find users similar to the given user"""
similarities = []
if user_id not in self.user_item_matrix:
return []
user_items = self.user_item_matrix[user_id]
for other_user_id, other_items in self.user_item_matrix.items():
if other_user_id != user_id:
# Check cache first
cache_key = tuple(sorted([user_id, other_user_id]))
if cache_key in self.user_similarity_cache:
similarity = self.user_similarity_cache[cache_key]
else:
similarity = self._calculate_user_similarity(user_items,
other_items)
self.user_similarity_cache[cache_key] = similarity
if similarity > 0.1: # Threshold for similarity
similarities.append((other_user_id, similarity))
return sorted(similarities, key=lambda x: x[1], reverse=True)[:top_k]
def _calculate_user_similarity(self, user1_items: Dict[str, float],
user2_items: Dict[str, float]) -> float:
"""Calculate cosine similarity between two users"""
common_items = set(user1_items.keys()) & set(user2_items.keys())
if len(common_items) < 2:
return 0.0
# Calculate cosine similarity
dot_product = sum(user1_items[item] * user2_items[item] for item in
common_items)
norm1 = math.sqrt(sum(rating ** 2 for rating in
user1_items.values()))
norm2 = math.sqrt(sum(rating ** 2 for rating in
user2_items.values()))
if norm1 == 0 or norm2 == 0:
return 0.0
return dot_product / (norm1 * norm2)
def _content_based_filtering(self, user_profile: UserProfile,
num_recommendations: int) ->
List[Recommendation]:
"""Content-based filtering recommendations"""
# Build user preference profile
user_preferences = self._build_user_preference_profile(user_profile)
recommendations = []
consumed_content = {record.content_id for record in
user_profile.viewing_history}
# Score all content based on user preferences
for content_id, features in self.content_features.items():
if content_id not in consumed_content:
score = self._calculate_content_similarity(user_preferences,
features)
if score > 0.3: # Threshold for relevance
recommendations.append(Recommendation(
content_id=content_id,
score=score,
reason="Matches your viewing preferences",
algorithm=RecommendationAlgorithm.CONTENT_BASED,
confidence=0.75
))
return sorted(recommendations, key=lambda x: x.score, reverse=True)
[:num_recommendations]
def _build_user_preference_profile(self, user_profile: UserProfile) ->
Dict:
"""Build user preference profile from viewing history"""
genre_preferences = defaultdict(float)
total_weight = 0
for record in user_profile.viewing_history:
# Weight by rating and watch time
weight = record.rating * record.watch_time
total_weight += weight
for genre in record.genre:
genre_preferences[genre] += weight
# Normalize preferences
if total_weight > 0:
for genre in genre_preferences:
genre_preferences[genre] /= total_weight
# Calculate average preferences
avg_rating = sum(r.rating for r in user_profile.viewing_history) /
len(user_profile.viewing_history)
avg_duration = sum(self.content_features.get(r.content_id,
{}).get("duration", 120)
for r in user_profile.viewing_history) /
len(user_profile.viewing_history)
return {
"genre_preferences": dict(genre_preferences),
"avg_rating": avg_rating,
"avg_duration": avg_duration,
"preferred_languages": user_profile.preferences.get("languages",
["English"])
}
def _calculate_content_similarity(self, user_preferences: Dict,
content_features: Dict) -> float:
"""Calculate similarity between user preferences and content"""
score = 0.0
# Genre similarity
genre_vector = content_features["genre_vector"]
all_genres = [g.value for g in Genre]
content_genres = [all_genres[i] for i, val in enumerate(genre_vector)
if val > 0]
genre_score = 0
for genre in content_genres:
genre_score += user_preferences["genre_preferences"].get(genre,
0)
score += genre_score * 0.5
# Rating similarity
rating_diff = abs(content_features["rating"] -
user_preferences["avg_rating"])
rating_score = max(0, 1 - rating_diff / 5)
score += rating_score * 0.2
# Duration similarity
duration_diff = abs(content_features["duration"] -
user_preferences["avg_duration"])
duration_score = max(0, 1 - duration_diff / 180) # 3 hours max
difference
score += duration_score * 0.1
# Language preference
if content_features["language"] in
user_preferences["preferred_languages"]:
score += 0.2
return min(1.0, score)
def _get_popularity_recommendations(self, num_recommendations: int) ->
List[Recommendation]:
"""Get recommendations based on popularity"""
recommendations = []
sorted_content = sorted(self.popularity_scores.items(),
key=lambda x: x[1], reverse=True)
for content_id, popularity in sorted_content[:num_recommendations]:
recommendations.append(Recommendation(
content_id=content_id,
score=popularity,
reason="Popular on Netflix",
algorithm=RecommendationAlgorithm.POPULARITY,
confidence=0.7
))
return recommendations
def _get_trending_recommendations(self, num_recommendations: int) ->
List[Recommendation]:
"""Get recommendations based on trending content"""
# Simulate trending scores (in real system, would be calculated from
recent activity)
trending_content = {}
for content_id in self.content_features.keys():
# Simulate trending score with some randomness
base_popularity = self.popularity_scores.get(content_id, 0.5)
trending_boost = random.uniform(0.8, 1.2)
trending_content[content_id] = base_popularity * trending_boost
recommendations = []
sorted_trending = sorted(trending_content.items(),
key=lambda x: x[1], reverse=True)
for content_id, trending_score in
sorted_trending[:num_recommendations]:
recommendations.append(Recommendation(
content_id=content_id,
score=trending_score,
reason="Trending now",
algorithm=RecommendationAlgorithm.TRENDING,
confidence=0.65
))
return recommendations
def _get_hybrid_recommendations(self, user_profile: UserProfile,
num_recommendations: int) ->
List[Recommendation]:
"""Get hybrid recommendations combining multiple algorithms"""
all_recommendations = {}
weights = self.ab_test_configs["algorithm_weights"]
# Get recommendations from each algorithm
collaborative_recs = self._collaborative_filtering(user_profile,
num_recommendations)
content_recs = self._content_based_filtering(user_profile,
num_recommendations)
popularity_recs =
self._get_popularity_recommendations(num_recommendations // 2)
trending_recs =
self._get_trending_recommendations(num_recommendations // 2)
# Combine recommendations with weights
for rec in collaborative_recs:
if rec.content_id not in all_recommendations:
all_recommendations[rec.content_id] = rec
all_recommendations[rec.content_id].score *=
weights["collaborative"]
else:
all_recommendations[rec.content_id].score += rec.score *
weights["collaborative"]
for rec in content_recs:
if rec.content_id not in all_recommendations:
all_recommendations[rec.content_id] = rec
all_recommendations[rec.content_id].score *=
weights["content_based"]
else:
all_recommendations[rec.content_id].score += rec.score *
weights["content_based"]
for rec in popularity_recs:
if rec.content_id not in all_recommendations:
all_recommendations[rec.content_id] = rec
all_recommendations[rec.content_id].score *=
weights["popularity"]
else:
all_recommendations[rec.content_id].score += rec.score *
weights["popularity"]
for rec in trending_recs:
if rec.content_id not in all_recommendations:
all_recommendations[rec.content_id] = rec
all_recommendations[rec.content_id].score *=
weights["trending"]
else:
all_recommendations[rec.content_id].score += rec.score *
weights["trending"]
# Update algorithm type for hybrid recommendations
for rec in all_recommendations.values():
rec.algorithm = RecommendationAlgorithm.HYBRID
rec.confidence = min(1.0, rec.confidence + 0.1) # Boost
confidence for hybrid
return list(all_recommendations.values())
def _apply_diversity_filter(self, recommendations: List[Recommendation],
user_profile: UserProfile,
num_recommendations: int) ->
List[Recommendation]:
"""Apply diversity filter to avoid too similar recommendations"""
if len(recommendations) <= num_recommendations:
return recommendations
# Sort by score first
sorted_recs = sorted(recommendations, key=lambda x: x.score,
reverse=True)
# Apply diversity using genre distribution
selected_recs = []
genre_counts = defaultdict(int)
diversity_factor = self.ab_test_configs["diversity_factor"]
for rec in sorted_recs:
content_features = self.content_features.get(rec.content_id, {})
genre_vector = content_features.get("genre_vector", [])
all_genres = [g.value for g in Genre]
content_genres = [all_genres[i] for i, val in
enumerate(genre_vector) if val > 0]
# Check if adding this recommendation increases diversity
max_genre_count = max(genre_counts.values()) if genre_counts else
0
min_genre_representation = min(
genre_counts.get(genre, 0) for genre in content_genres
) if content_genres else 0
# Diversity penalty
diversity_penalty = 0
if max_genre_count > 0:
for genre in content_genres:
if genre_counts.get(genre, 0) >= max_genre_count:
diversity_penalty += diversity_factor
# Apply penalty to score
adjusted_score = rec.score * (1 - diversity_penalty)
rec.score = adjusted_score
selected_recs.append(rec)
# Update genre counts
for genre in content_genres:
genre_counts[genre] += 1
if len(selected_recs) >= num_recommendations:
break
return selected_recs
def _rank_recommendations(self, recommendations: List[Recommendation]) ->
List[Recommendation]:
"""Final ranking of recommendations"""
# Sort by adjusted score
sorted_recs = sorted(recommendations, key=lambda x: x.score,
reverse=True)
# Assign ranks
for i, rec in enumerate(sorted_recs):
rec.rank = i + 1
return sorted_recs
def _invalidate_similarity_cache(self, user_id: str):
"""Invalidate similarity cache for a user"""
keys_to_remove = []
for cache_key in self.user_similarity_cache:
if user_id in cache_key:
keys_to_remove.append(cache_key)
for key in keys_to_remove:
del self.user_similarity_cache[key]
def update_trending_scores(self, recent_interactions: Dict[str, int]):
"""Update trending scores based on recent user interactions"""
total_interactions = sum(recent_interactions.values())
for content_id, interaction_count in recent_interactions.items():
trending_score = interaction_count / total_interactions
self.trending_scores[content_id] = trending_score
def get_recommendation_explanation(self, user_id: str, content_id: str) -
> str:
"""Get explanation for why content was recommended"""
explanations = [
"Based on your viewing history",
"Popular with users like you",
"Trending in your area",
"Matches your favorite genres",
"Highly rated content",
"Because you watched similar shows"
]
# Simple hash-based selection for consistency
explanation_index = hash(f"{user_id}_{content_id}") %
len(explanations)
return explanations[explanation_index]
# Test the recommendation engine
def test_recommendation_engine():
engine = ContentRecommendationEngine()
# Add sample content
contents = [
Content("c1", "Stranger Things", ["Sci-Fi", "Drama"], 4.5, 0.9, 2016,
50),
Content("c2", "The Crown", ["Drama", "History"], 4.3, 0.8, 2016, 60),
Content("c3", "Dark", ["Sci-Fi", "Thriller"], 4.6, 0.7, 2017, 55),
Content("c4", "Bridgerton", ["Drama", "Romance"], 4.2, 0.85, 2020,
45),
Content("c5", "Money Heist", ["Action", "Thriller"], 4.4, 0.9, 2017,
50)
]
for content in contents:
engine.add_content(content)
# Create user profile
viewing_history = [
ViewingRecord("Stranger Things", "c1", ["Sci-Fi", "Drama"], 5.0, 0.9,
time.time() - 86400),
ViewingRecord("The Crown", "c2", ["Drama", "History"], 4.0, 0.7,
time.time() - 172800)
]
user_profile = UserProfile(
user_id="u123",
age=28,
country="US",
viewing_history=viewing_history,
preferences={"genres": ["Sci-Fi", "Drama"], "languages": ["English"]}
)
print("Testing Content Recommendation Engine:")
print(f"User: {user_profile.user_id}, Age: {user_profile.age}")
print(f"Viewing history: {len(user_profile.viewing_history)} items")
# Test different algorithms
algorithms = [
RecommendationAlgorithm.COLLABORATIVE,
RecommendationAlgorithm.CONTENT_BASED,
RecommendationAlgorithm.HYBRID
]
for algorithm in algorithms:
print(f"\n--- {algorithm.value.title()} Recommendations ---")
recommendations = engine.get_recommendations(user_profile, 3,
algorithm)
for rec in recommendations:
content_title = next((c.title for c in contents if c.content_id
== rec.content_id), "Unknown")
print(f" {rec.rank}. {content_title}")
print(f" Score: {rec.score:.3f}, Confidence:
{rec.confidence:.2f}")
print(f" Reason: {rec.reason}")
# Test cold start
new_user = UserProfile(
user_id="u456",
age=35,
country="US",
viewing_history=[],
preferences={"genres": ["Drama"], "languages": ["English"]}
)
print(f"\n--- Cold Start Recommendations ---")
cold_start_recs = engine.get_recommendations(new_user, 3)
for rec in cold_start_recs:
content_title = next((c.title for c in contents if c.content_id ==
rec.content_id), "Unknown")
print(f" {rec.rank}. {content_title}")
print(f" Score: {rec.score:.3f}, Reason: {rec.reason}")
test_recommendation_engine()
Key Insights:
• Hybrid recommendation system combining collaborative and content-based filtering
• Cold start problem handling for new users using demographic and popularity data
• Real-time recommendation generation with caching for performance
• Diversity filtering to avoid over-specialization in recommendations
• A/B testing framework for algorithm optimization and experimentation
Problem 28: User Viewing Pattern Analysis
Difficulty: Medium | Time Limit: 45 minutes | Company: Netflix
Problem Statement:
Design a user viewing pattern analysis system that:
1. Analyzes user behavior patterns across different time periods
2. Detects binge-watching sessions and viewing habits
3. Identifies content abandonment points and reasons
4. Predicts optimal content release times
5. Provides insights for content strategy and user engagement
Example:
Plain Text
Input:
viewing_sessions = [
{"user_id": "u1", "content_id": "c1", "start_time": 1640995200, "end_time":
1640998800,
"progress": 1.0, "device": "TV", "quality": "4K"},
{"user_id": "u1", "content_id": "c2", "start_time": 1640999000, "end_time":
1641001800,
"progress": 0.3, "device": "Mobile", "quality": "HD"}
]
Output: {
"binge_sessions": [{"session_id": "s1", "duration": 180, "episodes": 3}],
"abandonment_points": {"c2": {"avg_abandonment": 0.3, "common_reasons":
["quality", "device"]}},
"peak_hours": [20, 21, 22], "optimal_release_time": "Friday 15:00"
}
Solution Approach:
This problem requires time series analysis, pattern recognition, and behavioral analytics.
Python
import time
import math
from typing import List, Dict, Tuple, Optional
from dataclasses import dataclass, field
from enum import Enum
from collections import defaultdict, Counter
import statistics
class DeviceType(Enum):
TV = "TV"
MOBILE = "Mobile"
TABLET = "Tablet"
LAPTOP = "Laptop"
DESKTOP = "Desktop"
class ContentType(Enum):
MOVIE = "movie"
SERIES = "series"
DOCUMENTARY = "documentary"
SPECIAL = "special"
@dataclass
class ViewingSession:
session_id: str
user_id: str
content_id: str
start_time: float
end_time: float
progress: float # 0.0 to 1.0
device: DeviceType
quality: str
pauses: int = 0
seeks: int = 0
volume_changes: int = 0
subtitle_enabled: bool = False
@dataclass
class BingeSession:
session_id: str
user_id: str
content_ids: List[str]
total_duration: int # minutes
episode_count: int
start_time: float
end_time: float
avg_completion_rate: float
@dataclass
class AbandonmentPattern:
content_id: str
avg_abandonment_point: float
abandonment_count: int
common_reasons: List[str]
device_correlation: Dict[str, float]
time_correlation: Dict[int, float] # hour -> abandonment rate
@dataclass
class ViewingInsights:
peak_hours: List[int]
optimal_release_time: str
binge_probability: float
completion_rate: float
device_preferences: Dict[str, float]
quality_preferences: Dict[str, float]
class ViewingPatternAnalyzer:
def __init__(self):
self.viewing_sessions: List[ViewingSession] = []
self.content_metadata: Dict[str, Dict] = {}
self.user_profiles: Dict[str, Dict] = {}
# Analysis caches
self.hourly_activity: Dict[int, int] = defaultdict(int)
self.daily_activity: Dict[str, int] = defaultdict(int)
self.device_usage: Dict[str, int] = defaultdict(int)
self.quality_usage: Dict[str, int] = defaultdict(int)
# Binge detection parameters
self.binge_threshold_minutes = 120 # 2 hours minimum
self.binge_gap_threshold = 30 # 30 minutes max gap between episodes
# Abandonment analysis
self.abandonment_threshold = 0.8 # Consider <80% completion as
abandonment
def add_viewing_session(self, session: ViewingSession):
"""Add a viewing session to the analyzer"""
self.viewing_sessions.append(session)
# Update activity caches
hour = int((session.start_time % 86400) / 3600)
self.hourly_activity[hour] += 1
day = time.strftime("%A", time.localtime(session.start_time))
self.daily_activity[day] += 1
self.device_usage[session.device.value] += 1
self.quality_usage[session.quality] += 1
def add_content_metadata(self, content_id: str, metadata: Dict):
"""Add content metadata for analysis"""
self.content_metadata[content_id] = metadata
def detect_binge_sessions(self, user_id: Optional[str] = None) ->
List[BingeSession]:
"""Detect binge-watching sessions"""
binge_sessions = []
# Group sessions by user
user_sessions = defaultdict(list)
for session in self.viewing_sessions:
if user_id is None or session.user_id == user_id:
user_sessions[session.user_id].append(session)
for uid, sessions in user_sessions.items():
# Sort sessions by start time
sorted_sessions = sorted(sessions, key=lambda x: x.start_time)
current_binge = []
last_end_time = 0
for session in sorted_sessions:
# Check if this session continues a binge
time_gap = (session.start_time - last_end_time) / 60 #
minutes
if (time_gap <= self.binge_gap_threshold and
len(current_binge) > 0 and
self._is_same_series(current_binge[-1], session)):
current_binge.append(session)
else:
# End current binge if it qualifies
if self._qualifies_as_binge(current_binge):
binge_sessions.append(self._create_binge_session(current_binge))
# Start new potential binge
current_binge = [session]
last_end_time = session.end_time
# Check final binge
if self._qualifies_as_binge(current_binge):
binge_sessions.append(self._create_binge_session(current_binge))
return binge_sessions
def _is_same_series(self, session1: ViewingSession, session2:
ViewingSession) -> bool:
"""Check if two sessions are from the same series"""
# In real implementation, would check series metadata
# For simulation, assume content IDs with same prefix are same series
prefix1 = session1.content_id.split('_')[0] if '_' in
session1.content_id else session1.content_id
prefix2 = session2.content_id.split('_')[0] if '_' in
session2.content_id else session2.content_id
return prefix1 == prefix2
def _qualifies_as_binge(self, sessions: List[ViewingSession]) -> bool:
"""Check if sessions qualify as a binge"""
if len(sessions) < 2:
return False
total_duration = sum((s.end_time - s.start_time) / 60 for s in
sessions)
return total_duration >= self.binge_threshold_minutes
def _create_binge_session(self, sessions: List[ViewingSession]) ->
BingeSession:
"""Create a binge session from viewing sessions"""
total_duration = sum((s.end_time - s.start_time) / 60 for s in
sessions)
avg_completion = sum(s.progress for s in sessions) / len(sessions)
return BingeSession(
session_id=f"binge_{sessions[0].user_id}_{int(sessions[0].start_time)}",
user_id=sessions[0].user_id,
content_ids=[s.content_id for s in sessions],
total_duration=int(total_duration),
episode_count=len(sessions),
start_time=sessions[0].start_time,
end_time=sessions[-1].end_time,
avg_completion_rate=avg_completion
)
def analyze_abandonment_patterns(self) -> List[AbandonmentPattern]:
"""Analyze content abandonment patterns"""
content_abandonment = defaultdict(list)
# Group sessions by content
for session in self.viewing_sessions:
if session.progress < self.abandonment_threshold:
content_abandonment[session.content_id].append(session)
abandonment_patterns = []
for content_id, abandoned_sessions in content_abandonment.items():
if len(abandoned_sessions) >= 3: # Minimum sessions for pattern
analysis
# Calculate average abandonment point
avg_abandonment = statistics.mean(s.progress for s in
abandoned_sessions)
# Analyze device correlation
device_abandonment = defaultdict(list)
for session in abandoned_sessions:
device_abandonment[session.device.value].append(session.progress)
device_correlation = {}
for device, progresses in device_abandonment.items():
device_correlation[device] = statistics.mean(progresses)
# Analyze time correlation
time_abandonment = defaultdict(list)
for session in abandoned_sessions:
hour = int((session.start_time % 86400) / 3600)
time_abandonment[hour].append(session.progress)
time_correlation = {}
for hour, progresses in time_abandonment.items():
time_correlation[hour] = statistics.mean(progresses)
# Identify common reasons (simplified)
common_reasons =
self._identify_abandonment_reasons(abandoned_sessions)
pattern = AbandonmentPattern(
content_id=content_id,
avg_abandonment_point=avg_abandonment,
abandonment_count=len(abandoned_sessions),
common_reasons=common_reasons,
device_correlation=device_correlation,
time_correlation=time_correlation
)
abandonment_patterns.append(pattern)
return sorted(abandonment_patterns, key=lambda x:
x.abandonment_count, reverse=True)
def _identify_abandonment_reasons(self, sessions: List[ViewingSession]) -
> List[str]:
"""Identify common reasons for abandonment"""
reasons = []
# Device-based reasons
mobile_sessions = [s for s in sessions if s.device ==
DeviceType.MOBILE]
if len(mobile_sessions) / len(sessions) > 0.6:
reasons.append("mobile_viewing")
# Quality-based reasons
low_quality_sessions = [s for s in sessions if s.quality in ["SD",
"480p"]]
if len(low_quality_sessions) / len(sessions) > 0.5:
reasons.append("low_quality")
# Interaction-based reasons
high_pause_sessions = [s for s in sessions if s.pauses > 5]
if len(high_pause_sessions) / len(sessions) > 0.4:
reasons.append("frequent_pausing")
# Time-based reasons
late_night_sessions = [s for s in sessions
if int((s.start_time % 86400) / 3600) >= 23]
if len(late_night_sessions) / len(sessions) > 0.5:
reasons.append("late_night_viewing")
return reasons[:3] # Return top 3 reasons
def analyze_peak_viewing_times(self) -> Dict[str, List[int]]:
"""Analyze peak viewing times"""
# Hourly analysis
total_sessions = len(self.viewing_sessions)
hourly_percentages = {}
for hour in range(24):
count = self.hourly_activity.get(hour, 0)
hourly_percentages[hour] = (count / total_sessions) * 100 if
total_sessions > 0 else 0
# Find peak hours (above average + 1 std dev)
avg_percentage = statistics.mean(hourly_percentages.values())
std_dev = statistics.stdev(hourly_percentages.values()) if
len(hourly_percentages) > 1 else 0
threshold = avg_percentage + std_dev
peak_hours = [hour for hour, percentage in hourly_percentages.items()
if percentage >= threshold]
# Daily analysis
daily_totals = dict(self.daily_activity)
peak_days = sorted(daily_totals.items(), key=lambda x: x[1],
reverse=True)[:3]
return {
"peak_hours": sorted(peak_hours),
"peak_days": [day for day, _ in peak_days],
"hourly_distribution": hourly_percentages,
"daily_distribution": daily_totals
}
def predict_optimal_release_time(self) -> Dict[str, str]:
"""Predict optimal content release times"""
peak_analysis = self.analyze_peak_viewing_times()
# Find the most popular day and hour
peak_days = peak_analysis["peak_days"]
peak_hours = peak_analysis["peak_hours"]
# Strategy: Release before peak time to maximize discovery
if peak_hours:
optimal_hour = min(peak_hours) - 2 # 2 hours before peak
optimal_hour = max(0, optimal_hour) # Don't go negative
else:
optimal_hour = 15 # Default to 3 PM
optimal_day = peak_days[0] if peak_days else "Friday"
# Different strategies for different content types
strategies = {
"series": f"{optimal_day} {optimal_hour:02d}:00",
"movie": f"Friday {optimal_hour:02d}:00", # Movies typically
Friday
"documentary": f"Sunday {optimal_hour:02d}:00", # Documentaries
Sunday
"special": f"Saturday {optimal_hour:02d}:00" # Specials Saturday
}
return strategies
def calculate_engagement_metrics(self) -> Dict[str, float]:
"""Calculate overall engagement metrics"""
if not self.viewing_sessions:
return {}
# Completion rate
completion_rates = [s.progress for s in self.viewing_sessions]
avg_completion_rate = statistics.mean(completion_rates)
# Session duration
durations = [(s.end_time - s.start_time) / 60 for s in
self.viewing_sessions]
avg_duration = statistics.mean(durations)
# Binge probability
binge_sessions = self.detect_binge_sessions()
total_users = len(set(s.user_id for s in self.viewing_sessions))
binge_users = len(set(b.user_id for b in binge_sessions))
binge_probability = binge_users / total_users if total_users > 0 else
0
# Device distribution
total_sessions = len(self.viewing_sessions)
device_distribution = {
device: count / total_sessions
for device, count in self.device_usage.items()
}
# Quality distribution
quality_distribution = {
quality: count / total_sessions
for quality, count in self.quality_usage.items()
}
return {
"avg_completion_rate": avg_completion_rate,
"avg_session_duration": avg_duration,
"binge_probability": binge_probability,
"device_distribution": device_distribution,
"quality_distribution": quality_distribution,
"total_sessions": total_sessions,
"unique_users": total_users
}
def generate_viewing_insights(self) -> ViewingInsights:
"""Generate comprehensive viewing insights"""
peak_analysis = self.analyze_peak_viewing_times()
engagement_metrics = self.calculate_engagement_metrics()
optimal_release = self.predict_optimal_release_time()
return ViewingInsights(
peak_hours=peak_analysis["peak_hours"],
optimal_release_time=optimal_release.get("series", "Friday
15:00"),
binge_probability=engagement_metrics.get("binge_probability", 0),
completion_rate=engagement_metrics.get("avg_completion_rate", 0),
device_preferences=engagement_metrics.get("device_distribution",
{}),
quality_preferences=engagement_metrics.get("quality_distribution", {})
)
def analyze_user_segments(self) -> Dict[str, Dict]:
"""Analyze different user segments based on viewing patterns"""
user_behaviors = defaultdict(list)
# Group sessions by user
for session in self.viewing_sessions:
user_behaviors[session.user_id].append(session)
segments = {
"binge_watchers": [],
"casual_viewers": [],
"mobile_users": [],
"quality_seekers": [],
"night_owls": []
}
for user_id, sessions in user_behaviors.items():
user_stats = self._calculate_user_stats(sessions)
# Classify users into segments
if user_stats["avg_session_duration"] > 120: # 2+ hours
segments["binge_watchers"].append(user_id)
if user_stats["sessions_per_week"] <= 3:
segments["casual_viewers"].append(user_id)
if user_stats["mobile_percentage"] > 0.7:
segments["mobile_users"].append(user_id)
if user_stats["hd_percentage"] > 0.8:
segments["quality_seekers"].append(user_id)
if user_stats["night_percentage"] > 0.6: # After 10 PM
segments["night_owls"].append(user_id)
# Calculate segment statistics
segment_stats = {}
for segment_name, user_list in segments.items():
segment_stats[segment_name] = {
"user_count": len(user_list),
"percentage": len(user_list) / len(user_behaviors) * 100 if
user_behaviors else 0
}
return segment_stats
def _calculate_user_stats(self, sessions: List[ViewingSession]) -> Dict:
"""Calculate statistics for a single user"""
if not sessions:
return {}
# Session duration
durations = [(s.end_time - s.start_time) / 60 for s in sessions]
avg_duration = statistics.mean(durations)
# Sessions per week (approximate)
time_span = (max(s.start_time for s in sessions) -
min(s.start_time for s in sessions)) / (7 * 24 * 3600)
sessions_per_week = len(sessions) / max(1, time_span)
# Device usage
mobile_sessions = sum(1 for s in sessions if s.device ==
DeviceType.MOBILE)
mobile_percentage = mobile_sessions / len(sessions)
# Quality preference
hd_sessions = sum(1 for s in sessions if s.quality in ["HD", "4K",
"1080p"])
hd_percentage = hd_sessions / len(sessions)
# Night viewing
night_sessions = sum(1 for s in sessions
if int((s.start_time % 86400) / 3600) >= 22)
night_percentage = night_sessions / len(sessions)
return {
"avg_session_duration": avg_duration,
"sessions_per_week": sessions_per_week,
"mobile_percentage": mobile_percentage,
"hd_percentage": hd_percentage,
"night_percentage": night_percentage
}
# Test the viewing pattern analyzer
def test_viewing_analyzer():
analyzer = ViewingPatternAnalyzer()
# Add sample viewing sessions
sessions = [
ViewingSession("s1", "u1", "stranger_things_s1e1", time.time() -
7200,
time.time() - 4200, 1.0, DeviceType.TV, "4K", 2, 1),
ViewingSession("s2", "u1", "stranger_things_s1e2", time.time() -
4000,
time.time() - 1000, 1.0, DeviceType.TV, "4K", 1, 0),
ViewingSession("s3", "u2", "the_crown_s1e1", time.time() - 3600,
time.time() - 1800, 0.3, DeviceType.MOBILE, "HD", 5,
3),
ViewingSession("s4", "u1", "dark_s1e1", time.time() - 1800,
time.time() - 600, 0.8, DeviceType.TABLET, "HD", 2, 1),
ViewingSession("s5", "u3", "bridgerton_s1e1", time.time() - 86400,
time.time() - 83400, 1.0, DeviceType.TV, "4K", 0, 0)
]
for session in sessions:
analyzer.add_viewing_session(session)
print("Testing Viewing Pattern Analyzer:")
print(f"Total sessions: {len(sessions)}")
# Test binge detection
binge_sessions = analyzer.detect_binge_sessions()
print(f"\nBinge Sessions Detected: {len(binge_sessions)}")
for binge in binge_sessions:
print(f" User {binge.user_id}: {binge.episode_count} episodes, "
f"{binge.total_duration} minutes")
# Test abandonment analysis
abandonment_patterns = analyzer.analyze_abandonment_patterns()
print(f"\nAbandonment Patterns: {len(abandonment_patterns)}")
for pattern in abandonment_patterns:
print(f" Content {pattern.content_id}:
{pattern.avg_abandonment_point:.1%} avg abandonment")
print(f" Reasons: {pattern.common_reasons}")
# Test peak time analysis
peak_analysis = analyzer.analyze_peak_viewing_times()
print(f"\nPeak Hours: {peak_analysis['peak_hours']}")
print(f"Peak Days: {peak_analysis['peak_days']}")
# Test optimal release time
optimal_release = analyzer.predict_optimal_release_time()
print(f"\nOptimal Release Times:")
for content_type, time_str in optimal_release.items():
print(f" {content_type}: {time_str}")
# Test engagement metrics
engagement = analyzer.calculate_engagement_metrics()
print(f"\nEngagement Metrics:")
print(f" Avg completion rate: {engagement['avg_completion_rate']:.1%}")
print(f" Avg session duration: {engagement['avg_session_duration']:.1f}
minutes")
print(f" Binge probability: {engagement['binge_probability']:.1%}")
# Test user segmentation
segments = analyzer.analyze_user_segments()
print(f"\nUser Segments:")
for segment, stats in segments.items():
print(f" {segment}: {stats['user_count']} users
({stats['percentage']:.1f}%)")
test_viewing_analyzer()
Key Insights:
• Comprehensive binge-watching detection using time gaps and content relationships
• Multi-dimensional abandonment analysis considering device, time, and interaction
patterns
• Peak viewing time analysis for optimal content release scheduling
• User segmentation based on viewing behaviors and preferences
• Real-time engagement metrics for content strategy optimization
Problem 29: Subtitle Synchronization
Difficulty: Medium | Time Limit: 45 minutes | Company: Netflix
Problem Statement:
Design a subtitle synchronization system that:
1. Automatically syncs subtitles with video content
2. Handles multiple languages and formats (SRT, VTT, ASS)
3. Detects and corrects timing drift
4. Supports real-time adjustment during playback
5. Provides quality scoring for subtitle accuracy
Example:
Plain Text
Input:
video_audio = {"duration": 7200, "speech_segments": [
{"start": 10.5, "end": 13.2, "text": "Hello world", "confidence": 0.95},
{"start": 15.8, "end": 18.1, "text": "How are you", "confidence": 0.88}
]}
subtitle_file = {
"format": "SRT", "entries": [
{"index": 1, "start": 11.0, "end": 13.5, "text": "Hello world"},
{"index": 2, "start": 16.2, "end": 18.5, "text": "How are you"}
]
}
Output: {
"sync_quality": 0.87, "timing_adjustments": [
{"subtitle_index": 1, "time_shift": -0.5, "confidence": 0.92},
{"subtitle_index": 2, "time_shift": -0.4, "confidence": 0.85}
],
"synchronized_subtitles": "adjusted_subtitle_content"
}
Solution Approach:
This problem requires audio processing, text alignment, and timing optimization
algorithms.
Python
import time
import re
import math
from typing import List, Dict, Tuple, Optional
from dataclasses import dataclass, field
from enum import Enum
import difflib
class SubtitleFormat(Enum):
SRT = "srt"
VTT = "vtt"
ASS = "ass"
TTML = "ttml"
class SyncMethod(Enum):
AUDIO_FINGERPRINT = "audio_fingerprint"
SPEECH_RECOGNITION = "speech_recognition"
VISUAL_CUE = "visual_cue"
MANUAL_ADJUSTMENT = "manual_adjustment"
@dataclass
class SpeechSegment:
start_time: float
end_time: float
text: str
confidence: float
speaker_id: Optional[str] = None
language: str = "en"
@dataclass
class SubtitleEntry:
index: int
start_time: float
end_time: float
text: str
style: Dict = field(default_factory=dict)
position: Dict = field(default_factory=dict)
@dataclass
class TimingAdjustment:
subtitle_index: int
time_shift: float # seconds
confidence: float
method: SyncMethod
original_start: float
original_end: float
adjusted_start: float
adjusted_end: float
@dataclass
class SyncResult:
sync_quality: float
timing_adjustments: List[TimingAdjustment]
synchronized_subtitles: str
drift_detected: bool
total_time_shift: float
confidence_score: float
class SubtitleSynchronizer:
def __init__(self):
# Synchronization parameters
self.max_time_drift = 5.0 # Maximum allowed drift in seconds
self.similarity_threshold = 0.7 # Text similarity threshold
self.confidence_threshold = 0.6 # Minimum confidence for adjustments
# Audio analysis parameters
self.speech_detection_window = 0.5 # seconds
self.silence_threshold = 0.1 # seconds
# Text processing
self.punctuation_pattern = re.compile(r'[^\w\s]')
self.whitespace_pattern = re.compile(r'\s+')
def synchronize_subtitles(self, video_audio: Dict, subtitle_file: Dict,
target_language: str = "en") -> SyncResult:
"""Main synchronization function"""
# Parse subtitle file
subtitle_entries = self._parse_subtitle_file(subtitle_file)
# Extract speech segments from audio
speech_segments = video_audio.get("speech_segments", [])
# Perform text alignment
alignments = self._align_text_segments(speech_segments,
subtitle_entries)
# Calculate timing adjustments
timing_adjustments = self._calculate_timing_adjustments(alignments)
# Detect systematic drift
drift_info = self._detect_timing_drift(timing_adjustments)
# Apply corrections
corrected_subtitles = self._apply_timing_corrections(
subtitle_entries, timing_adjustments, drift_info
)
# Calculate sync quality
sync_quality = self._calculate_sync_quality(alignments,
timing_adjustments)
# Generate output
synchronized_content = self._generate_subtitle_content(
corrected_subtitles, subtitle_file["format"]
)
return SyncResult(
sync_quality=sync_quality,
timing_adjustments=timing_adjustments,
synchronized_subtitles=synchronized_content,
drift_detected=drift_info["drift_detected"],
total_time_shift=drift_info["total_drift"],
confidence_score=self._calculate_overall_confidence(timing_adjustments)
)
def _parse_subtitle_file(self, subtitle_file: Dict) ->
List[SubtitleEntry]:
"""Parse subtitle file based on format"""
format_type = SubtitleFormat(subtitle_file["format"].lower())
entries = []
if format_type == SubtitleFormat.SRT:
entries = self._parse_srt_entries(subtitle_file["entries"])
elif format_type == SubtitleFormat.VTT:
entries = self._parse_vtt_entries(subtitle_file["entries"])
elif format_type == SubtitleFormat.ASS:
entries = self._parse_ass_entries(subtitle_file["entries"])
return entries
def _parse_srt_entries(self, raw_entries: List[Dict]) ->
List[SubtitleEntry]:
"""Parse SRT format entries"""
entries = []
for entry in raw_entries:
subtitle_entry = SubtitleEntry(
index=entry["index"],
start_time=entry["start"],
end_time=entry["end"],
text=entry["text"]
)
entries.append(subtitle_entry)
return entries
def _parse_vtt_entries(self, raw_entries: List[Dict]) ->
List[SubtitleEntry]:
"""Parse WebVTT format entries"""
entries = []
for i, entry in enumerate(raw_entries):
subtitle_entry = SubtitleEntry(
index=i + 1,
start_time=entry["start"],
end_time=entry["end"],
text=entry["text"],
position=entry.get("position", {}),
style=entry.get("style", {})
)
entries.append(subtitle_entry)
return entries
def _parse_ass_entries(self, raw_entries: List[Dict]) ->
List[SubtitleEntry]:
"""Parse ASS/SSA format entries"""
entries = []
for i, entry in enumerate(raw_entries):
# ASS format has more complex styling
subtitle_entry = SubtitleEntry(
index=i + 1,
start_time=entry["start"],
end_time=entry["end"],
text=self._clean_ass_text(entry["text"]),
style=entry.get("style", {}),
position=entry.get("position", {})
)
entries.append(subtitle_entry)
return entries
def _clean_ass_text(self, text: str) -> str:
"""Clean ASS format text from styling tags"""
# Remove ASS styling tags like {\i1}, {\b1}, etc.
cleaned = re.sub(r'\{[^}]*\}', '', text)
# Remove line breaks and normalize whitespace
cleaned = re.sub(r'\\N', ' ', cleaned)
cleaned = self.whitespace_pattern.sub(' ', cleaned).strip()
return cleaned
def _align_text_segments(self, speech_segments: List[Dict],
subtitle_entries: List[SubtitleEntry]) ->
List[Dict]:
"""Align speech segments with subtitle entries"""
alignments = []
# Convert speech segments to SpeechSegment objects
speech_objects = [
SpeechSegment(
start_time=seg["start"],
end_time=seg["end"],
text=seg["text"],
confidence=seg["confidence"]
) for seg in speech_segments
]
# For each subtitle entry, find the best matching speech segment
for subtitle in subtitle_entries:
best_match = self._find_best_speech_match(subtitle,
speech_objects)
if best_match:
alignment = {
"subtitle_entry": subtitle,
"speech_segment": best_match["segment"],
"similarity_score": best_match["similarity"],
"time_difference": best_match["time_diff"],
"confidence": best_match["confidence"]
}
alignments.append(alignment)
return alignments
def _find_best_speech_match(self, subtitle: SubtitleEntry,
speech_segments: List[SpeechSegment]) ->
Optional[Dict]:
"""Find the best matching speech segment for a subtitle"""
best_match = None
best_score = 0
subtitle_text_clean = self._normalize_text(subtitle.text)
subtitle_center = (subtitle.start_time + subtitle.end_time) / 2
for speech in speech_segments:
speech_text_clean = self._normalize_text(speech.text)
speech_center = (speech.start_time + speech.end_time) / 2
# Calculate text similarity
text_similarity = self._calculate_text_similarity(
subtitle_text_clean, speech_text_clean
)
# Calculate temporal proximity
time_diff = abs(subtitle_center - speech_center)
time_score = max(0, 1 - (time_diff / 10)) # 10 second window
# Combined score
combined_score = (text_similarity * 0.7) + (time_score * 0.3)
if combined_score > best_score and text_similarity >
self.similarity_threshold:
best_match = {
"segment": speech,
"similarity": text_similarity,
"time_diff": speech_center - subtitle_center,
"confidence": combined_score * speech.confidence
}
best_score = combined_score
return best_match
def _normalize_text(self, text: str) -> str:
"""Normalize text for comparison"""
# Convert to lowercase
normalized = text.lower()
# Remove punctuation
normalized = self.punctuation_pattern.sub('', normalized)
# Normalize whitespace
normalized = self.whitespace_pattern.sub(' ', normalized).strip()
return normalized
def _calculate_text_similarity(self, text1: str, text2: str) -> float:
"""Calculate similarity between two text strings"""
if not text1 or not text2:
return 0.0
# Use sequence matcher for similarity
matcher = difflib.SequenceMatcher(None, text1, text2)
return matcher.ratio()
def _calculate_timing_adjustments(self, alignments: List[Dict]) ->
List[TimingAdjustment]:
"""Calculate timing adjustments based on alignments"""
adjustments = []
for alignment in alignments:
if alignment["confidence"] < self.confidence_threshold:
continue
subtitle = alignment["subtitle_entry"]
speech = alignment["speech_segment"]
# Calculate optimal timing adjustment
time_shift = self._calculate_optimal_shift(subtitle, speech)
adjustment = TimingAdjustment(
subtitle_index=subtitle.index,
time_shift=time_shift,
confidence=alignment["confidence"],
method=SyncMethod.SPEECH_RECOGNITION,
original_start=subtitle.start_time,
original_end=subtitle.end_time,
adjusted_start=subtitle.start_time + time_shift,
adjusted_end=subtitle.end_time + time_shift
)
adjustments.append(adjustment)
return adjustments
def _calculate_optimal_shift(self, subtitle: SubtitleEntry,
speech: SpeechSegment) -> float:
"""Calculate optimal time shift for subtitle"""
# Strategy: Align subtitle start with speech start
# But consider subtitle reading time and speech duration
subtitle_duration = subtitle.end_time - subtitle.start_time
speech_duration = speech.end_time - speech.start_time
# If subtitle is much longer than speech, align centers
if subtitle_duration > speech_duration * 1.5:
subtitle_center = (subtitle.start_time + subtitle.end_time) / 2
speech_center = (speech.start_time + speech.end_time) / 2
return speech_center - subtitle_center
else:
# Align starts with small offset for reading time
reading_offset = min(0.5, subtitle_duration * 0.1)
return (speech.start_time - reading_offset) - subtitle.start_time
def _detect_timing_drift(self, adjustments: List[TimingAdjustment]) ->
Dict:
"""Detect systematic timing drift"""
if len(adjustments) < 3:
return {"drift_detected": False, "total_drift": 0, "drift_rate":
0}
# Calculate drift over time
time_points = [(adj.original_start, adj.time_shift) for adj in
adjustments]
time_points.sort(key=lambda x: x[0])
# Linear regression to detect drift trend
n = len(time_points)
sum_x = sum(point[0] for point in time_points)
sum_y = sum(point[1] for point in time_points)
sum_xy = sum(point[0] * point[1] for point in time_points)
sum_x2 = sum(point[0] ** 2 for point in time_points)
# Calculate slope (drift rate)
denominator = n * sum_x2 - sum_x ** 2
if denominator == 0:
drift_rate = 0
else:
drift_rate = (n * sum_xy - sum_x * sum_y) / denominator
# Calculate total drift
total_drift = time_points[-1][1] - time_points[0][1]
# Detect if drift is significant
drift_detected = abs(drift_rate) > 0.001 or abs(total_drift) > 2.0
return {
"drift_detected": drift_detected,
"total_drift": total_drift,
"drift_rate": drift_rate,
"start_shift": time_points[0][1],
"end_shift": time_points[-1][1]
}
def _apply_timing_corrections(self, subtitle_entries:
List[SubtitleEntry],
timing_adjustments: List[TimingAdjustment],
drift_info: Dict) -> List[SubtitleEntry]:
"""Apply timing corrections to subtitle entries"""
corrected_entries = []
adjustment_map = {adj.subtitle_index: adj for adj in
timing_adjustments}
for entry in subtitle_entries:
corrected_entry = SubtitleEntry(
index=entry.index,
start_time=entry.start_time,
end_time=entry.end_time,
text=entry.text,
style=entry.style,
position=entry.position
)
# Apply specific adjustment if available
if entry.index in adjustment_map:
adjustment = adjustment_map[entry.index]
corrected_entry.start_time = adjustment.adjusted_start
corrected_entry.end_time = adjustment.adjusted_end
# Apply drift correction if detected
elif drift_info["drift_detected"]:
drift_correction = self._calculate_drift_correction(
entry.start_time, drift_info
)
corrected_entry.start_time += drift_correction
corrected_entry.end_time += drift_correction
corrected_entries.append(corrected_entry)
return corrected_entries
def _calculate_drift_correction(self, timestamp: float, drift_info: Dict)
-> float:
"""Calculate drift correction for a specific timestamp"""
if not drift_info["drift_detected"]:
return 0
# Linear interpolation of drift correction
drift_rate = drift_info["drift_rate"]
return drift_rate * timestamp
def _calculate_sync_quality(self, alignments: List[Dict],
adjustments: List[TimingAdjustment]) -> float:
"""Calculate overall synchronization quality"""
if not alignments:
return 0.0
# Factors for quality calculation
alignment_quality = sum(align["confidence"] for align in alignments)
/ len(alignments)
# Adjustment confidence
if adjustments:
adjustment_quality = sum(adj.confidence for adj in adjustments) /
len(adjustments)
else:
adjustment_quality = 0.5
# Coverage (percentage of subtitles that could be aligned)
total_subtitles = len(alignments) + len([adj for adj in adjustments
if adj.confidence <
self.confidence_threshold])
coverage = len(alignments) / max(1, total_subtitles)
# Combined quality score
quality = (alignment_quality * 0.4) + (adjustment_quality * 0.4) +
(coverage * 0.2)
return min(1.0, quality)
def _calculate_overall_confidence(self, adjustments:
List[TimingAdjustment]) -> float:
"""Calculate overall confidence in synchronization"""
if not adjustments:
return 0.5
confidences = [adj.confidence for adj in adjustments]
return sum(confidences) / len(confidences)
def _generate_subtitle_content(self, entries: List[SubtitleEntry],
format_type: str) -> str:
"""Generate subtitle content in specified format"""
if format_type.lower() == "srt":
return self._generate_srt_content(entries)
elif format_type.lower() == "vtt":
return self._generate_vtt_content(entries)
elif format_type.lower() == "ass":
return self._generate_ass_content(entries)
else:
return self._generate_srt_content(entries) # Default to SRT
def _generate_srt_content(self, entries: List[SubtitleEntry]) -> str:
"""Generate SRT format content"""
content_lines = []
for entry in entries:
# Entry number
content_lines.append(str(entry.index))
# Timing line
start_time = self._format_srt_time(entry.start_time)
end_time = self._format_srt_time(entry.end_time)
content_lines.append(f"{start_time} --> {end_time}")
# Text content
content_lines.append(entry.text)
# Empty line separator
content_lines.append("")
return "\n".join(content_lines)
def _format_srt_time(self, seconds: float) -> str:
"""Format time for SRT format (HH:MM:SS,mmm)"""
hours = int(seconds // 3600)
minutes = int((seconds % 3600) // 60)
secs = int(seconds % 60)
milliseconds = int((seconds % 1) * 1000)
return f"{hours:02d}:{minutes:02d}:{secs:02d},{milliseconds:03d}"
def _generate_vtt_content(self, entries: List[SubtitleEntry]) -> str:
"""Generate WebVTT format content"""
content_lines = ["WEBVTT", ""]
for entry in entries:
# Timing line
start_time = self._format_vtt_time(entry.start_time)
end_time = self._format_vtt_time(entry.end_time)
timing_line = f"{start_time} --> {end_time}"
# Add position/style if available
if entry.position or entry.style:
settings = []
if entry.position:
settings.extend([f"{k}:{v}" for k, v in
entry.position.items()])
if entry.style:
settings.extend([f"{k}:{v}" for k, v in
entry.style.items()])
timing_line += " " + " ".join(settings)
content_lines.append(timing_line)
content_lines.append(entry.text)
content_lines.append("")
return "\n".join(content_lines)
def _format_vtt_time(self, seconds: float) -> str:
"""Format time for VTT format (HH:MM:SS.mmm)"""
hours = int(seconds // 3600)
minutes = int((seconds % 3600) // 60)
secs = int(seconds % 60)
milliseconds = int((seconds % 1) * 1000)
return f"{hours:02d}:{minutes:02d}:{secs:02d}.{milliseconds:03d}"
def _generate_ass_content(self, entries: List[SubtitleEntry]) -> str:
"""Generate ASS format content"""
# ASS format is more complex, simplified version here
content_lines = [
"[Script Info]",
"Title: Synchronized Subtitles",
"ScriptType: v4.00+",
"",
"[V4+ Styles]",
"Format: Name, Fontname, Fontsize, PrimaryColour,
SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline,
StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow,
Alignment, MarginL, MarginR, MarginV, Encoding",
"Style:
Default,Arial,20,&H00FFFFFF,&H000000FF,&H00000000,&H80000000,0,0,0,0,100,100,
0,0,1,2,0,2,10,10,10,1",
"",
"[Events]",
"Format: Layer, Start, End, Style, Name, MarginL, MarginR,
MarginV, Effect, Text"
]
for entry in entries:
start_time = self._format_ass_time(entry.start_time)
end_time = self._format_ass_time(entry.end_time)
line = f"Dialogue: 0,{start_time},{end_time},Default,,0,0,0,,
{entry.text}"
content_lines.append(line)
return "\n".join(content_lines)
def _format_ass_time(self, seconds: float) -> str:
"""Format time for ASS format (H:MM:SS.cc)"""
hours = int(seconds // 3600)
minutes = int((seconds % 3600) // 60)
secs = int(seconds % 60)
centiseconds = int((seconds % 1) * 100)
return f"{hours}:{minutes:02d}:{secs:02d}.{centiseconds:02d}"
def real_time_adjustment(self, current_time: float,
detected_speech: str,
current_subtitle: SubtitleEntry) ->
Optional[float]:
"""Provide real-time timing adjustment during playback"""
if not detected_speech or not current_subtitle:
return None
# Calculate text similarity
similarity = self._calculate_text_similarity(
self._normalize_text(detected_speech),
self._normalize_text(current_subtitle.text)
)
if similarity < self.similarity_threshold:
return None
# Calculate timing adjustment
subtitle_center = (current_subtitle.start_time +
current_subtitle.end_time) / 2
adjustment = current_time - subtitle_center
# Only suggest adjustment if significant
if abs(adjustment) > 0.5:
return adjustment
return None
# Test the subtitle synchronizer
def test_subtitle_synchronizer():
synchronizer = SubtitleSynchronizer()
# Sample video audio data
video_audio = {
"duration": 7200,
"speech_segments": [
{"start": 10.5, "end": 13.2, "text": "Hello world", "confidence":
0.95},
{"start": 15.8, "end": 18.1, "text": "How are you", "confidence":
0.88},
{"start": 22.3, "end": 25.7, "text": "This is a test",
"confidence": 0.92},
{"start": 30.1, "end": 33.5, "text": "Netflix subtitle sync",
"confidence": 0.89}
]
}
# Sample subtitle file
subtitle_file = {
"format": "SRT",
"entries": [
{"index": 1, "start": 11.0, "end": 13.5, "text": "Hello world"},
{"index": 2, "start": 16.2, "end": 18.5, "text": "How are you"},
{"index": 3, "start": 22.8, "end": 26.0, "text": "This is a
test"},
{"index": 4, "start": 30.5, "end": 34.0, "text": "Netflix
subtitle sync"}
]
}
print("Testing Subtitle Synchronizer:")
print(f"Video duration: {video_audio['duration']} seconds")
print(f"Speech segments: {len(video_audio['speech_segments'])}")
print(f"Subtitle entries: {len(subtitle_file['entries'])}")
# Perform synchronization
result = synchronizer.synchronize_subtitles(video_audio, subtitle_file)
print(f"\nSynchronization Results:")
print(f" Sync quality: {result.sync_quality:.2f}")
print(f" Drift detected: {result.drift_detected}")
print(f" Total time shift: {result.total_time_shift:.2f}s")
print(f" Confidence score: {result.confidence_score:.2f}")
print(f"\nTiming Adjustments: {len(result.timing_adjustments)}")
for adj in result.timing_adjustments:
print(f" Subtitle {adj.subtitle_index}: {adj.time_shift:+.2f}s "
f"(confidence: {adj.confidence:.2f})")
print(f" Original: {adj.original_start:.1f}s -
{adj.original_end:.1f}s")
print(f" Adjusted: {adj.adjusted_start:.1f}s -
{adj.adjusted_end:.1f}s")
print(f"\nSynchronized Content Preview:")
lines = result.synchronized_subtitles.split('\n')
for line in lines[:15]: # Show first 15 lines
print(f" {line}")
# Test real-time adjustment
print(f"\nReal-time Adjustment Test:")
current_subtitle = SubtitleEntry(1, 11.0, 13.5, "Hello world")
adjustment = synchronizer.real_time_adjustment(
current_time=10.8,
detected_speech="Hello world",
current_subtitle=current_subtitle
)
if adjustment:
print(f" Suggested adjustment: {adjustment:+.2f}s")
else:
print(f" No adjustment needed")
test_subtitle_synchronizer()
Key Insights:
• Multi-format subtitle parsing and generation (SRT, VTT, ASS)
• Speech-to-text alignment using text similarity and temporal proximity
• Systematic drift detection using linear regression analysis
• Real-time synchronization adjustment during playback
• Quality scoring based on alignment confidence and coverage
Problem 30: Content Delivery Network Routing
Difficulty: Hard | Time Limit: 60 minutes | Company: Netflix
Problem Statement:
Design a CDN routing system for Netflix that:
1. Routes user requests to optimal edge servers
2. Handles server failures and load balancing
3. Optimizes for latency, bandwidth, and server capacity
4. Supports dynamic content caching strategies
5. Provides real-time performance monitoring and adjustment
Example:
Plain Text
Input:
user_request = {
"user_id": "u123", "location": {"lat": 37.7749, "lon": -122.4194},
"content_id": "stranger_things_s4e1", "quality": "4K", "device": "TV"
}
edge_servers = [
{"server_id": "sf1", "location": {"lat": 37.7849, "lon": -122.4094},
"load": 0.7, "capacity": 1000, "has_content": True},
{"server_id": "la1", "location": {"lat": 34.0522, "lon": -118.2437},
"load": 0.4, "capacity": 800, "has_content": False}
]
Output: {
"selected_server": "sf1", "routing_score": 0.85,
"latency_estimate": 12, "backup_servers": ["la1"],
"cache_action": "serve", "load_balancing_weight": 0.3
}
Solution Approach:
This problem requires network optimization, load balancing, and distributed systems
design.
Python
import time
import math
import random
from typing import List, Dict, Tuple, Optional, Set
from dataclasses import dataclass, field
from enum import Enum
from collections import defaultdict, deque
import heapq
class ServerStatus(Enum):
ACTIVE = "active"
OVERLOADED = "overloaded"
MAINTENANCE = "maintenance"
FAILED = "failed"
class ContentType(Enum):
VIDEO = "video"
AUDIO = "audio"
SUBTITLE = "subtitle"
THUMBNAIL = "thumbnail"
METADATA = "metadata"
class CacheAction(Enum):
SERVE = "serve"
CACHE_AND_SERVE = "cache_and_serve"
REDIRECT = "redirect"
ORIGIN_FETCH = "origin_fetch"
@dataclass
class Location:
latitude: float
longitude: float
def distance_to(self, other: 'Location') -> float:
"""Calculate distance using Haversine formula"""
R = 6371 # Earth's radius in kilometers
lat1, lon1 = math.radians(self.latitude),
math.radians(self.longitude)
lat2, lon2 = math.radians(other.latitude),
math.radians(other.longitude)
dlat = lat2 - lat1
dlon = lon2 - lon1
a = (math.sin(dlat/2)**2 +
math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2)
c = 2 * math.asin(math.sqrt(a))
return R * c
@dataclass
class EdgeServer:
server_id: str
location: Location
capacity: int # Max concurrent connections
current_load: int
status: ServerStatus
cached_content: Set[str] = field(default_factory=set)
bandwidth_mbps: int = 1000
cpu_usage: float = 0.0
memory_usage: float = 0.0
last_health_check: float = field(default_factory=time.time)
@property
def load_percentage(self) -> float:
return self.current_load / self.capacity if self.capacity > 0 else
1.0
@property
def is_available(self) -> bool:
return (self.status == ServerStatus.ACTIVE and
self.load_percentage < 0.9)
@dataclass
class UserRequest:
user_id: str
content_id: str
location: Location
quality: str
device: str
timestamp: float = field(default_factory=time.time)
priority: int = 1 # 1=normal, 2=premium, 3=urgent
estimated_size_mb: float = 100.0
@dataclass
class RoutingDecision:
selected_server: str
routing_score: float
latency_estimate: float # milliseconds
backup_servers: List[str]
cache_action: CacheAction
load_balancing_weight: float
reasoning: str
@dataclass
class ServerMetrics:
server_id: str
response_time: float
success_rate: float
bandwidth_utilization: float
cache_hit_rate: float
error_count: int
timestamp: float
class CDNRouter:
def __init__(self):
self.edge_servers: Dict[str, EdgeServer] = {}
self.server_metrics: Dict[str, deque] = defaultdict(lambda:
deque(maxlen=100))
self.content_popularity: Dict[str, float] = {}
self.user_preferences: Dict[str, Dict] = {}
# Routing parameters
self.latency_weight = 0.4
self.load_weight = 0.3
self.cache_weight = 0.2
self.reliability_weight = 0.1
# Performance thresholds
self.max_acceptable_latency = 100 # milliseconds
self.overload_threshold = 0.85
self.cache_hit_threshold = 0.7
# Load balancing
self.load_balancing_enabled = True
self.sticky_session_duration = 3600 # 1 hour
self.user_server_affinity: Dict[str, Tuple[str, float]] = {}
def add_edge_server(self, server: EdgeServer):
"""Add an edge server to the CDN"""
self.edge_servers[server.server_id] = server
self.server_metrics[server.server_id] = deque(maxlen=100)
def route_request(self, request: UserRequest) -> RoutingDecision:
"""Route a user request to the optimal edge server"""
# Get available servers
available_servers = [
server for server in self.edge_servers.values()
if server.is_available
]
if not available_servers:
return self._handle_no_servers_available(request)
# Check for sticky session
if self.load_balancing_enabled:
sticky_server = self._check_sticky_session(request.user_id)
if sticky_server and sticky_server in [s.server_id for s in
available_servers]:
server = self.edge_servers[sticky_server]
if server.is_available and server.load_percentage < 0.8:
return self._create_routing_decision(request, server,
"sticky_session")
# Score all available servers
server_scores = []
for server in available_servers:
score = self._calculate_server_score(request, server)
server_scores.append((score, server))
# Sort by score (highest first)
server_scores.sort(key=lambda x: x[0], reverse=True)
# Select best server
best_score, best_server = server_scores[0]
# Get backup servers
backup_servers = [server.server_id for _, server in
server_scores[1:3]]
# Create routing decision
decision = self._create_routing_decision(request, best_server,
"optimal_routing")
decision.backup_servers = backup_servers
decision.routing_score = best_score
# Update server load and user affinity
self._update_server_load(best_server.server_id, request)
self._update_user_affinity(request.user_id, best_server.server_id)
return decision
def _calculate_server_score(self, request: UserRequest, server:
EdgeServer) -> float:
"""Calculate routing score for a server"""
# Latency score (based on geographic distance)
distance = request.location.distance_to(server.location)
latency_score = max(0, 1 - (distance / 1000)) # Normalize by 1000km
# Load score (prefer less loaded servers)
load_score = 1 - server.load_percentage
# Cache score (prefer servers with content cached)
cache_score = 1.0 if request.content_id in server.cached_content else
0.3
# Reliability score (based on historical performance)
reliability_score =
self._calculate_reliability_score(server.server_id)
# Weighted combination
total_score = (
latency_score * self.latency_weight +
load_score * self.load_weight +
cache_score * self.cache_weight +
reliability_score * self.reliability_weight
)
# Apply penalties
if server.load_percentage > self.overload_threshold:
total_score *= 0.5
if server.status != ServerStatus.ACTIVE:
total_score *= 0.1
return total_score
def _calculate_reliability_score(self, server_id: str) -> float:
"""Calculate reliability score based on historical metrics"""
metrics = list(self.server_metrics[server_id])
if not metrics:
return 0.8 # Default score for new servers
# Calculate average success rate
recent_metrics = metrics[-10:] # Last 10 measurements
success_rates = [m.success_rate for m in recent_metrics]
avg_success_rate = sum(success_rates) / len(success_rates)
# Calculate response time consistency
response_times = [m.response_time for m in recent_metrics]
if len(response_times) > 1:
avg_response_time = sum(response_times) / len(response_times)
response_time_variance = sum(
(rt - avg_response_time) ** 2 for rt in response_times
) / len(response_times)
consistency_score = max(0, 1 - (response_time_variance / 1000))
else:
consistency_score = 0.8
return (avg_success_rate * 0.7) + (consistency_score * 0.3)
def _create_routing_decision(self, request: UserRequest,
server: EdgeServer, reason: str) ->
RoutingDecision:
"""Create a routing decision"""
# Estimate latency
distance = request.location.distance_to(server.location)
base_latency = distance * 0.1 # Rough estimate: 0.1ms per km
load_penalty = server.load_percentage * 20 # Up to 20ms penalty for
load
latency_estimate = base_latency + load_penalty
# Determine cache action
if request.content_id in server.cached_content:
cache_action = CacheAction.SERVE
elif server.load_percentage < 0.6:
cache_action = CacheAction.CACHE_AND_SERVE
else:
cache_action = CacheAction.REDIRECT
# Calculate load balancing weight
load_balancing_weight = 1 - server.load_percentage
return RoutingDecision(
selected_server=server.server_id,
routing_score=0.0, # Will be set by caller
latency_estimate=latency_estimate,
backup_servers=[],
cache_action=cache_action,
load_balancing_weight=load_balancing_weight,
reasoning=reason
)
def _handle_no_servers_available(self, request: UserRequest) ->
RoutingDecision:
"""Handle case when no servers are available"""
# Find least loaded server even if overloaded
if self.edge_servers:
least_loaded = min(
self.edge_servers.values(),
key=lambda s: s.load_percentage
)
decision = self._create_routing_decision(
request, least_loaded, "emergency_routing"
)
decision.routing_score = 0.1 # Low score for emergency routing
return decision
# No servers at all - return error decision
return RoutingDecision(
selected_server="",
routing_score=0.0,
latency_estimate=float('inf'),
backup_servers=[],
cache_action=CacheAction.ORIGIN_FETCH,
load_balancing_weight=0.0,
reasoning="no_servers_available"
)
def _check_sticky_session(self, user_id: str) -> Optional[str]:
"""Check if user has a sticky session to a server"""
if user_id in self.user_server_affinity:
server_id, timestamp = self.user_server_affinity[user_id]
if time.time() - timestamp < self.sticky_session_duration:
return server_id
else:
# Session expired
del self.user_server_affinity[user_id]
return None
def _update_server_load(self, server_id: str, request: UserRequest):
"""Update server load after routing a request"""
if server_id in self.edge_servers:
server = self.edge_servers[server_id]
# Estimate load increase based on request
load_increase = self._estimate_load_increase(request)
server.current_load = min(server.capacity, server.current_load +
load_increase)
def _estimate_load_increase(self, request: UserRequest) -> int:
"""Estimate load increase for a request"""
# Base load
base_load = 1
# Quality multiplier
quality_multipliers = {
"SD": 1,
"HD": 2,
"4K": 4,
"8K": 8
}
quality_multiplier = quality_multipliers.get(request.quality, 1)
# Device multiplier
device_multipliers = {
"Mobile": 0.5,
"Tablet": 0.8,
"TV": 1.0,
"Desktop": 1.2
}
device_multiplier = device_multipliers.get(request.device, 1.0)
return int(base_load * quality_multiplier * device_multiplier)
def _update_user_affinity(self, user_id: str, server_id: str):
"""Update user-server affinity for sticky sessions"""
self.user_server_affinity[user_id] = (server_id, time.time())
def update_server_metrics(self, server_id: str, metrics: ServerMetrics):
"""Update server performance metrics"""
if server_id in self.server_metrics:
self.server_metrics[server_id].append(metrics)
# Update server status based on metrics
self._update_server_status(server_id, metrics)
def _update_server_status(self, server_id: str, metrics: ServerMetrics):
"""Update server status based on performance metrics"""
if server_id not in self.edge_servers:
return
server = self.edge_servers[server_id]
# Check for failure conditions
if metrics.success_rate < 0.5 or metrics.response_time > 5000:
server.status = ServerStatus.FAILED
elif metrics.success_rate < 0.8 or server.load_percentage > 0.9:
server.status = ServerStatus.OVERLOADED
else:
server.status = ServerStatus.ACTIVE
def handle_server_failure(self, failed_server_id: str):
"""Handle server failure by redistributing load"""
if failed_server_id not in self.edge_servers:
return
failed_server = self.edge_servers[failed_server_id]
failed_server.status = ServerStatus.FAILED
# Find users affected by this server
affected_users = [
user_id for user_id, (server_id, _) in
self.user_server_affinity.items()
if server_id == failed_server_id
]
# Clear their affinity to force re-routing
for user_id in affected_users:
del self.user_server_affinity[user_id]
# Redistribute cached content if possible
self._redistribute_cached_content(failed_server_id)
def _redistribute_cached_content(self, failed_server_id: str):
"""Redistribute cached content from failed server"""
if failed_server_id not in self.edge_servers:
return
failed_server = self.edge_servers[failed_server_id]
cached_content = failed_server.cached_content.copy()
# Find servers with available capacity
available_servers = [
server for server in self.edge_servers.values()
if (server.server_id != failed_server_id and
server.is_available and
server.load_percentage < 0.7)
]
# Distribute content based on popularity and server capacity
for content_id in cached_content:
popularity = self.content_popularity.get(content_id, 0.5)
# Find best server for this content
best_server = None
best_score = 0
for server in available_servers:
if content_id not in server.cached_content:
# Score based on capacity and geographic distribution
capacity_score = 1 - server.load_percentage
score = capacity_score * popularity
if score > best_score:
best_score = score
best_server = server
# Cache content on best server
if best_server:
best_server.cached_content.add(content_id)
def optimize_cache_placement(self):
"""Optimize content caching across edge servers"""
# Analyze content popularity
self._analyze_content_popularity()
# For each server, optimize its cache
for server in self.edge_servers.values():
if server.status == ServerStatus.ACTIVE:
self._optimize_server_cache(server)
def _analyze_content_popularity(self):
"""Analyze content popularity from server metrics"""
content_requests = defaultdict(int)
# Count requests from server metrics
for server_id, metrics_list in self.server_metrics.items():
for metrics in metrics_list:
# In real implementation, would track per-content metrics
# For simulation, use cache hit rate as popularity indicator
if metrics.cache_hit_rate > 0.5:
content_requests[f"popular_content_{server_id}"] += 1
# Calculate popularity scores
total_requests = sum(content_requests.values())
if total_requests > 0:
for content_id, request_count in content_requests.items():
self.content_popularity[content_id] = request_count /
total_requests
def _optimize_server_cache(self, server: EdgeServer):
"""Optimize cache for a specific server"""
# Simple LRU-based optimization
# In real implementation, would use more sophisticated algorithms
max_cache_size = 1000 # Maximum cached items per server
if len(server.cached_content) > max_cache_size:
# Remove least popular content
content_by_popularity = sorted(
server.cached_content,
key=lambda c: self.content_popularity.get(c, 0)
)
# Keep most popular content
server.cached_content = set(content_by_popularity[-
max_cache_size:])
def get_server_health_summary(self) -> Dict[str, Dict]:
"""Get health summary for all servers"""
summary = {}
for server_id, server in self.edge_servers.items():
recent_metrics = list(self.server_metrics[server_id])[-5:]
if recent_metrics:
avg_response_time = sum(m.response_time for m in
recent_metrics) / len(recent_metrics)
avg_success_rate = sum(m.success_rate for m in
recent_metrics) / len(recent_metrics)
avg_cache_hit_rate = sum(m.cache_hit_rate for m in
recent_metrics) / len(recent_metrics)
else:
avg_response_time = 0
avg_success_rate = 1.0
avg_cache_hit_rate = 0.5
summary[server_id] = {
"status": server.status.value,
"load_percentage": server.load_percentage,
"cached_items": len(server.cached_content),
"avg_response_time": avg_response_time,
"avg_success_rate": avg_success_rate,
"avg_cache_hit_rate": avg_cache_hit_rate,
"location": f"{server.location.latitude:.2f},
{server.location.longitude:.2f}"
}
return summary
def simulate_load_balancing(self, requests: List[UserRequest]) -> Dict:
"""Simulate load balancing for multiple requests"""
routing_decisions = []
server_loads = defaultdict(int)
for request in requests:
decision = self.route_request(request)
routing_decisions.append(decision)
if decision.selected_server:
server_loads[decision.selected_server] += 1
# Calculate load distribution
total_requests = len(requests)
load_distribution = {
server_id: count / total_requests
for server_id, count in server_loads.items()
}
# Calculate average latency
avg_latency = sum(d.latency_estimate for d in routing_decisions) /
len(routing_decisions)
# Calculate cache hit rate
cache_hits = sum(1 for d in routing_decisions if d.cache_action ==
CacheAction.SERVE)
cache_hit_rate = cache_hits / total_requests
return {
"total_requests": total_requests,
"load_distribution": load_distribution,
"avg_latency": avg_latency,
"cache_hit_rate": cache_hit_rate,
"routing_decisions": routing_decisions
}
# Test the CDN router
def test_cdn_router():
router = CDNRouter()
# Add edge servers
servers = [
EdgeServer("sf1", Location(37.7749, -122.4194), 1000, 700,
ServerStatus.ACTIVE,
{"stranger_things_s4e1", "the_crown_s1e1"}, 1000),
EdgeServer("la1", Location(34.0522, -118.2437), 800, 320,
ServerStatus.ACTIVE,
{"dark_s1e1", "bridgerton_s1e1"}, 800),
EdgeServer("ny1", Location(40.7128, -74.0060), 1200, 480,
ServerStatus.ACTIVE,
{"stranger_things_s4e1", "money_heist_s1e1"}, 1200),
EdgeServer("chi1", Location(41.8781, -87.6298), 600, 540,
ServerStatus.OVERLOADED,
{"the_crown_s1e1"}, 600)
]
for server in servers:
router.add_edge_server(server)
# Add some server metrics
for server in servers:
metrics = ServerMetrics(
server_id=server.server_id,
response_time=random.uniform(50, 200),
success_rate=random.uniform(0.85, 0.99),
bandwidth_utilization=random.uniform(0.3, 0.8),
cache_hit_rate=random.uniform(0.6, 0.9),
error_count=random.randint(0, 5),
timestamp=time.time()
)
router.update_server_metrics(server.server_id, metrics)
print("Testing CDN Router:")
print(f"Edge servers: {len(router.edge_servers)}")
# Test single request routing
request = UserRequest(
user_id="u123",
content_id="stranger_things_s4e1",
location=Location(37.7849, -122.4094), # Near San Francisco
quality="4K",
device="TV"
)
print(f"\nSingle Request Routing:")
print(f" User location: {request.location.latitude:.2f},
{request.location.longitude:.2f}")
print(f" Content: {request.content_id}")
print(f" Quality: {request.quality}")
decision = router.route_request(request)
print(f"\nRouting Decision:")
print(f" Selected server: {decision.selected_server}")
print(f" Routing score: {decision.routing_score:.3f}")
print(f" Latency estimate: {decision.latency_estimate:.1f}ms")
print(f" Cache action: {decision.cache_action.value}")
print(f" Backup servers: {decision.backup_servers}")
print(f" Reasoning: {decision.reasoning}")
# Test load balancing simulation
print(f"\nLoad Balancing Simulation:")
test_requests = []
locations = [
Location(37.7749, -122.4194), # San Francisco
Location(34.0522, -118.2437), # Los Angeles
Location(40.7128, -74.0060), # New York
Location(41.8781, -87.6298), # Chicago
]
for i in range(20):
test_request = UserRequest(
user_id=f"u{i}",
content_id=random.choice(["stranger_things_s4e1",
"the_crown_s1e1", "dark_s1e1"]),
location=random.choice(locations),
quality=random.choice(["HD", "4K"]),
device=random.choice(["TV", "Mobile", "Tablet"])
)
test_requests.append(test_request)
simulation_result = router.simulate_load_balancing(test_requests)
print(f" Total requests: {simulation_result['total_requests']}")
print(f" Average latency: {simulation_result['avg_latency']:.1f}ms")
print(f" Cache hit rate: {simulation_result['cache_hit_rate']:.1%}")
print(f"\nLoad Distribution:")
for server_id, load_pct in
simulation_result['load_distribution'].items():
print(f" {server_id}: {load_pct:.1%}")
# Test server health summary
print(f"\nServer Health Summary:")
health_summary = router.get_server_health_summary()
for server_id, health in health_summary.items():
print(f" {server_id}:")
print(f" Status: {health['status']}")
print(f" Load: {health['load_percentage']:.1%}")
print(f" Cached items: {health['cached_items']}")
print(f" Avg response time: {health['avg_response_time']:.1f}ms")
print(f" Success rate: {health['avg_success_rate']:.1%}")
# Test server failure handling
print(f"\nTesting Server Failure:")
print(f" Simulating failure of server: sf1")
router.handle_server_failure("sf1")
# Route the same request again
new_decision = router.route_request(request)
print(f" New selected server: {new_decision.selected_server}")
print(f" New routing score: {new_decision.routing_score:.3f}")
test_cdn_router()
Key Insights:
• Geographic proximity-based routing with latency optimization
• Multi-factor server scoring considering load, cache status, and reliability
• Intelligent load balancing with sticky sessions and failure handling
• Dynamic cache optimization based on content popularity analysis
• Real-time performance monitoring and adaptive routing decisions
Problem 31: Ride Matching Algorithm
Difficulty: Hard | Time Limit: 60 minutes | Company: Uber
Problem Statement:
Design a ride matching algorithm for Uber that:
1. Matches riders with nearby drivers efficiently
2. Optimizes for pickup time, driver utilization, and rider satisfaction
3. Handles surge pricing and demand prediction
4. Supports different ride types (UberX, UberPool, UberXL)
5. Provides real-time matching with sub-second response times
Example:
Plain Text
Input:
ride_request = {
"rider_id": "r123", "pickup": {"lat": 37.7749, "lon": -122.4194},
"destination": {"lat": 37.7849, "lon": -122.4094}, "ride_type": "UberX",
"max_wait_time": 300, "surge_acceptance": 2.0
}
available_drivers = [
{"driver_id": "d1", "location": {"lat": 37.7739, "lon": -122.4184},
"vehicle_type": "UberX", "rating": 4.8, "eta": 120},
{"driver_id": "d2", "location": {"lat": 37.7759, "lon": -122.4204},
"vehicle_type": "UberX", "rating": 4.6, "eta": 180}
]
Output: {
"matched_driver": "d1", "pickup_eta": 120, "match_score": 0.92,
"surge_multiplier": 1.5, "estimated_fare": 12.50, "confidence": 0.88
}
Solution Approach:
This problem requires geospatial algorithms, optimization, and real-time matching systems.
Python
import time
import math
import random
from typing import List, Dict, Tuple, Optional, Set
from dataclasses import dataclass, field
from enum import Enum
from collections import defaultdict, deque
import heapq
class RideType(Enum):
UBER_X = "UberX"
UBER_XL = "UberXL"
UBER_POOL = "UberPool"
UBER_BLACK = "UberBlack"
UBER_COMFORT = "UberComfort"
class DriverStatus(Enum):
AVAILABLE = "available"
BUSY = "busy"
OFFLINE = "offline"
EN_ROUTE = "en_route"
class MatchStatus(Enum):
PENDING = "pending"
MATCHED = "matched"
CANCELLED = "cancelled"
COMPLETED = "completed"
@dataclass
class Location:
latitude: float
longitude: float
def distance_to(self, other: 'Location') -> float:
"""Calculate distance using Haversine formula"""
R = 6371000 # Earth's radius in meters
lat1, lon1 = math.radians(self.latitude),
math.radians(self.longitude)
lat2, lon2 = math.radians(other.latitude),
math.radians(other.longitude)
dlat = lat2 - lat1
dlon = lon2 - lon1
a = (math.sin(dlat/2)**2 +
math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2)
c = 2 * math.asin(math.sqrt(a))
return R * c
@dataclass
class Driver:
driver_id: str
location: Location
vehicle_type: RideType
rating: float
status: DriverStatus
current_ride_id: Optional[str] = None
last_ride_end_time: float = 0
total_rides: int = 0
acceptance_rate: float = 0.9
cancellation_rate: float = 0.05
def estimated_arrival_time(self, pickup_location: Location) -> float:
"""Estimate arrival time to pickup location"""
distance = self.location.distance_to(pickup_location)
# Assume average speed of 25 km/h in city traffic
travel_time = (distance / 1000) / 25 * 3600 # seconds
return travel_time
@dataclass
class RideRequest:
rider_id: str
pickup_location: Location
destination_location: Location
ride_type: RideType
max_wait_time: int # seconds
surge_acceptance: float # maximum surge multiplier accepted
timestamp: float = field(default_factory=time.time)
priority: int = 1 # 1=normal, 2=premium, 3=urgent
@dataclass
class MatchResult:
matched_driver: str
pickup_eta: float # seconds
match_score: float
surge_multiplier: float
estimated_fare: float
confidence: float
reasoning: str
class RideMatchingEngine:
def __init__(self):
self.drivers: Dict[str, Driver] = {}
self.pending_requests: Dict[str, RideRequest] = {}
self.active_matches: Dict[str, str] = {} # rider_id -> driver_id
# Matching parameters
self.max_search_radius = 5000 # 5km
self.max_drivers_to_consider = 10
self.eta_weight = 0.4
self.rating_weight = 0.2
self.distance_weight = 0.3
self.utilization_weight = 0.1
# Surge pricing
self.base_surge_multiplier = 1.0
self.demand_supply_ratios: Dict[str, float] = {}
self.surge_zones: Dict[str, float] = {}
# Performance tracking
self.match_times: deque = deque(maxlen=1000)
self.success_rates: deque = deque(maxlen=100)
def add_driver(self, driver: Driver):
"""Add a driver to the system"""
self.drivers[driver.driver_id] = driver
def update_driver_location(self, driver_id: str, location: Location):
"""Update driver location"""
if driver_id in self.drivers:
self.drivers[driver_id].location = location
def set_driver_status(self, driver_id: str, status: DriverStatus):
"""Update driver status"""
if driver_id in self.drivers:
self.drivers[driver_id].status = status
def request_ride(self, request: RideRequest) -> MatchResult:
"""Process a ride request and find the best match"""
start_time = time.time()
# Find available drivers
available_drivers = self._find_available_drivers(
request.pickup_location,
request.ride_type
)
if not available_drivers:
return self._create_no_match_result("No available drivers")
# Calculate surge multiplier
surge_multiplier =
self._calculate_surge_multiplier(request.pickup_location)
if surge_multiplier > request.surge_acceptance:
return self._create_no_match_result(f"Surge too high:
{surge_multiplier:.1f}x")
# Score and rank drivers
driver_scores = []
for driver in available_drivers:
score = self._calculate_driver_score(request, driver)
driver_scores.append((score, driver))
# Sort by score (highest first)
driver_scores.sort(key=lambda x: x[0], reverse=True)
# Select best driver
best_score, best_driver = driver_scores[0]
# Create match result
pickup_eta =
best_driver.estimated_arrival_time(request.pickup_location)
estimated_fare = self._calculate_estimated_fare(request,
surge_multiplier)
match_result = MatchResult(
matched_driver=best_driver.driver_id,
pickup_eta=pickup_eta,
match_score=best_score,
surge_multiplier=surge_multiplier,
estimated_fare=estimated_fare,
confidence=self._calculate_match_confidence(best_score,
pickup_eta),
reasoning="optimal_match"
)
# Update system state
self._execute_match(request, best_driver)
# Track performance
match_time = time.time() - start_time
self.match_times.append(match_time)
return match_result
def _find_available_drivers(self, pickup_location: Location,
ride_type: RideType) -> List[Driver]:
"""Find available drivers within search radius"""
available_drivers = []
for driver in self.drivers.values():
if (driver.status == DriverStatus.AVAILABLE and
driver.vehicle_type == ride_type):
distance = driver.location.distance_to(pickup_location)
if distance <= self.max_search_radius:
available_drivers.append(driver)
# Sort by distance and limit results
available_drivers.sort(
key=lambda d: d.location.distance_to(pickup_location)
)
return available_drivers[:self.max_drivers_to_consider]
def _calculate_driver_score(self, request: RideRequest, driver: Driver) -
> float:
"""Calculate matching score for a driver"""
# ETA score (lower is better)
eta = driver.estimated_arrival_time(request.pickup_location)
eta_score = max(0, 1 - (eta / request.max_wait_time))
# Rating score
rating_score = (driver.rating - 3.0) / 2.0 # Normalize 3-5 to 0-1
# Distance score
distance = driver.location.distance_to(request.pickup_location)
distance_score = max(0, 1 - (distance / self.max_search_radius))
# Driver utilization score (prefer active drivers)
time_since_last_ride = time.time() - driver.last_ride_end_time
utilization_score = min(1.0, time_since_last_ride / 3600) # 1 hour
max
# Weighted combination
total_score = (
eta_score * self.eta_weight +
rating_score * self.rating_weight +
distance_score * self.distance_weight +
utilization_score * self.utilization_weight
)
# Apply penalties
if driver.acceptance_rate < 0.8:
total_score *= 0.9
if driver.cancellation_rate > 0.1:
total_score *= 0.8
return total_score
def _calculate_surge_multiplier(self, location: Location) -> float:
"""Calculate surge pricing multiplier for location"""
# Simplified surge calculation
# In real system, would use complex demand/supply analysis
# Get zone identifier (simplified grid)
zone_id = f"{int(location.latitude * 100)}_{int(location.longitude *
100)}"
# Calculate demand/supply ratio
demand_supply_ratio = self.demand_supply_ratios.get(zone_id, 1.0)
# Base surge calculation
if demand_supply_ratio > 2.0:
surge = 2.5
elif demand_supply_ratio > 1.5:
surge = 2.0
elif demand_supply_ratio > 1.2:
surge = 1.5
else:
surge = 1.0
# Add time-based surge (rush hours)
current_hour = int((time.time() % 86400) / 3600)
if current_hour in [7, 8, 17, 18, 19]: # Rush hours
surge *= 1.2
# Weekend surge
day_of_week = int((time.time() / 86400) % 7)
if day_of_week in [5, 6]: # Friday, Saturday
if current_hour >= 20: # Evening
surge *= 1.3
return round(surge, 1)
def _calculate_estimated_fare(self, request: RideRequest,
surge_multiplier: float) -> float:
"""Calculate estimated fare for the ride"""
distance =
request.pickup_location.distance_to(request.destination_location)
distance_km = distance / 1000
# Base fare structure
base_fare = 2.50
per_km_rate = 1.20
per_minute_rate = 0.25
# Estimate travel time (assuming 30 km/h average)
estimated_time_minutes = (distance_km / 30) * 60
# Calculate base fare
fare = base_fare + (distance_km * per_km_rate) +
(estimated_time_minutes * per_minute_rate)
# Apply surge multiplier
fare *= surge_multiplier
# Ride type multiplier
type_multipliers = {
RideType.UBER_X: 1.0,
RideType.UBER_XL: 1.5,
RideType.UBER_POOL: 0.7,
RideType.UBER_BLACK: 2.0,
RideType.UBER_COMFORT: 1.3
}
fare *= type_multipliers.get(request.ride_type, 1.0)
return round(fare, 2)
def _calculate_match_confidence(self, match_score: float, eta: float) ->
float:
"""Calculate confidence in the match"""
# Base confidence from match score
confidence = match_score
# Reduce confidence for long ETAs
if eta > 300: # 5 minutes
confidence *= 0.8
elif eta > 600: # 10 minutes
confidence *= 0.6
# Historical success rate factor
if self.success_rates:
avg_success_rate = sum(self.success_rates) /
len(self.success_rates)
confidence *= avg_success_rate
return min(1.0, confidence)
def _execute_match(self, request: RideRequest, driver: Driver):
"""Execute the match by updating system state"""
# Update driver status
driver.status = DriverStatus.EN_ROUTE
driver.current_ride_id = request.rider_id
# Store active match
self.active_matches[request.rider_id] = driver.driver_id
# Remove from pending requests
if request.rider_id in self.pending_requests:
del self.pending_requests[request.rider_id]
def _create_no_match_result(self, reason: str) -> MatchResult:
"""Create a result for when no match is found"""
return MatchResult(
matched_driver="",
pickup_eta=float('inf'),
match_score=0.0,
surge_multiplier=self._calculate_surge_multiplier(Location(0,
0)),
estimated_fare=0.0,
confidence=0.0,
reasoning=reason
)
def complete_ride(self, rider_id: str):
"""Mark a ride as completed"""
if rider_id in self.active_matches:
driver_id = self.active_matches[rider_id]
if driver_id in self.drivers:
driver = self.drivers[driver_id]
driver.status = DriverStatus.AVAILABLE
driver.current_ride_id = None
driver.last_ride_end_time = time.time()
driver.total_rides += 1
del self.active_matches[rider_id]
self.success_rates.append(1.0)
def cancel_ride(self, rider_id: str):
"""Cancel a ride"""
if rider_id in self.active_matches:
driver_id = self.active_matches[rider_id]
if driver_id in self.drivers:
driver = self.drivers[driver_id]
driver.status = DriverStatus.AVAILABLE
driver.current_ride_id = None
driver.cancellation_rate = min(1.0, driver.cancellation_rate
+ 0.01)
del self.active_matches[rider_id]
self.success_rates.append(0.0)
def update_demand_supply(self, location: Location, demand: int, supply:
int):
"""Update demand/supply ratio for surge pricing"""
zone_id = f"{int(location.latitude * 100)}_{int(location.longitude *
100)}"
if supply > 0:
ratio = demand / supply
else:
ratio = float('inf') if demand > 0 else 1.0
self.demand_supply_ratios[zone_id] = ratio
def get_system_metrics(self) -> Dict:
"""Get system performance metrics"""
if not self.match_times:
return {}
avg_match_time = sum(self.match_times) / len(self.match_times)
success_rate = sum(self.success_rates) / len(self.success_rates) if
self.success_rates else 0
active_drivers = sum(1 for d in self.drivers.values()
if d.status == DriverStatus.AVAILABLE)
busy_drivers = sum(1 for d in self.drivers.values()
if d.status in [DriverStatus.BUSY,
DriverStatus.EN_ROUTE])
return {
"avg_match_time_ms": avg_match_time * 1000,
"success_rate": success_rate,
"active_drivers": active_drivers,
"busy_drivers": busy_drivers,
"active_rides": len(self.active_matches),
"pending_requests": len(self.pending_requests),
"total_drivers": len(self.drivers)
}
def optimize_driver_positioning(self) -> Dict[str, Location]:
"""Suggest optimal positioning for idle drivers"""
suggestions = {}
# Find high-demand, low-supply areas
high_demand_zones = [
zone_id for zone_id, ratio in self.demand_supply_ratios.items()
if ratio > 1.5
]
# Get idle drivers
idle_drivers = [
driver for driver in self.drivers.values()
if driver.status == DriverStatus.AVAILABLE and
time.time() - driver.last_ride_end_time > 300 # Idle for 5+
minutes
]
# Suggest repositioning
for driver in idle_drivers[:len(high_demand_zones)]:
if high_demand_zones:
zone_id = high_demand_zones.pop(0)
lat_str, lon_str = zone_id.split('_')
suggested_location = Location(
float(lat_str) / 100,
float(lon_str) / 100
)
suggestions[driver.driver_id] = suggested_location
return suggestions
# Test the ride matching engine
def test_ride_matching():
engine = RideMatchingEngine()
# Add sample drivers
drivers = [
Driver("d1", Location(37.7739, -122.4184), RideType.UBER_X, 4.8,
DriverStatus.AVAILABLE),
Driver("d2", Location(37.7759, -122.4204), RideType.UBER_X, 4.6,
DriverStatus.AVAILABLE),
Driver("d3", Location(37.7769, -122.4214), RideType.UBER_XL, 4.9,
DriverStatus.AVAILABLE),
Driver("d4", Location(37.7729, -122.4174), RideType.UBER_X, 4.7,
DriverStatus.BUSY),
Driver("d5", Location(37.7779, -122.4224), RideType.UBER_POOL, 4.5,
DriverStatus.AVAILABLE)
]
for driver in drivers:
engine.add_driver(driver)
print("Testing Ride Matching Engine:")
print(f"Total drivers: {len(drivers)}")
# Test ride request
request = RideRequest(
rider_id="r123",
pickup_location=Location(37.7749, -122.4194),
destination_location=Location(37.7849, -122.4094),
ride_type=RideType.UBER_X,
max_wait_time=300,
surge_acceptance=2.0
)
print(f"\nRide Request:")
print(f" Pickup: {request.pickup_location.latitude:.4f},
{request.pickup_location.longitude:.4f}")
print(f" Destination: {request.destination_location.latitude:.4f},
{request.destination_location.longitude:.4f}")
print(f" Ride type: {request.ride_type.value}")
print(f" Max wait time: {request.max_wait_time}s")
# Process request
result = engine.request_ride(request)
print(f"\nMatch Result:")
print(f" Matched driver: {result.matched_driver}")
print(f" Pickup ETA: {result.pickup_eta:.0f}s
({result.pickup_eta/60:.1f} min)")
print(f" Match score: {result.match_score:.3f}")
print(f" Surge multiplier: {result.surge_multiplier:.1f}x")
print(f" Estimated fare: ${result.estimated_fare:.2f}")
print(f" Confidence: {result.confidence:.2f}")
print(f" Reasoning: {result.reasoning}")
# Test system metrics
metrics = engine.get_system_metrics()
print(f"\nSystem Metrics:")
for key, value in metrics.items():
if isinstance(value, float):
print(f" {key}: {value:.2f}")
else:
print(f" {key}: {value}")
# Test demand/supply update
engine.update_demand_supply(Location(37.7749, -122.4194), demand=10,
supply=3)
# Test another request with surge
request2 = RideRequest(
rider_id="r124",
pickup_location=Location(37.7749, -122.4194),
destination_location=Location(37.7649, -122.4294),
ride_type=RideType.UBER_X,
max_wait_time=600,
surge_acceptance=3.0
)
result2 = engine.request_ride(request2)
print(f"\nSecond Request (High Demand Area):")
print(f" Surge multiplier: {result2.surge_multiplier:.1f}x")
print(f" Estimated fare: ${result2.estimated_fare:.2f}")
# Test driver positioning optimization
positioning_suggestions = engine.optimize_driver_positioning()
print(f"\nDriver Positioning Suggestions:
{len(positioning_suggestions)}")
for driver_id, location in positioning_suggestions.items():
print(f" {driver_id}: Move to {location.latitude:.4f},
{location.longitude:.4f}")
# Complete the first ride
engine.complete_ride("r123")
print(f"\nRide completed for r123")
# Check updated metrics
updated_metrics = engine.get_system_metrics()
print(f"Updated active rides: {updated_metrics['active_rides']}")
test_ride_matching()
Key Insights:
• Geospatial matching using Haversine distance calculation
• Multi-factor driver scoring considering ETA, rating, distance, and utilization
• Dynamic surge pricing based on demand/supply ratios and time factors
• Real-time system state management with driver status tracking
• Performance optimization through driver positioning suggestions
Problem 32: Dynamic Pricing Algorithm
Difficulty: Medium | Time Limit: 45 minutes | Company: Uber
Problem Statement:
Design a dynamic pricing algorithm that:
1. Calculates surge pricing based on real-time demand and supply
2. Predicts demand patterns using historical data
3. Optimizes driver incentives and rider retention
4. Handles special events and weather conditions
5. Provides transparent pricing explanations to users
Example:
Plain Text
Input:
current_conditions = {
"location": {"lat": 37.7749, "lon": -122.4194}, "time": 1640995200,
"weather": "rain", "event": "concert_end", "base_demand": 50,
"available_drivers": 15
}
historical_data = [
{"time": 1640991600, "demand": 45, "supply": 20, "surge": 1.2},
{"time": 1640988000, "demand": 60, "supply": 12, "surge": 1.8}
]
Output: {
"surge_multiplier": 2.1, "predicted_demand": 75, "driver_incentive": 1.5,
"price_explanation": "High demand due to concert and rain", "confidence":
0.85
}
Solution Approach:
This problem requires time series analysis, machine learning prediction, and economic
optimization.
Python
import time
import math
import random
from typing import List, Dict, Tuple, Optional
from dataclasses import dataclass, field
from enum import Enum
from collections import defaultdict, deque
import statistics
class WeatherCondition(Enum):
CLEAR = "clear"
RAIN = "rain"
SNOW = "snow"
FOG = "fog"
STORM = "storm"
class EventType(Enum):
NORMAL = "normal"
CONCERT_START = "concert_start"
CONCERT_END = "concert_end"
SPORTS_GAME = "sports_game"
AIRPORT_RUSH = "airport_rush"
CONFERENCE = "conference"
@dataclass
class MarketConditions:
location: Tuple[float, float] # lat, lon
timestamp: float
weather: WeatherCondition
event: EventType
base_demand: int
available_drivers: int
temperature: float = 20.0
visibility_km: float = 10.0
@dataclass
class HistoricalDataPoint:
timestamp: float
demand: int
supply: int
surge_multiplier: float
weather: WeatherCondition
event: EventType
completed_rides: int = 0
cancelled_rides: int = 0
@dataclass
class PricingResult:
surge_multiplier: float
predicted_demand: int
driver_incentive: float
price_explanation: str
confidence: float
factors: Dict[str, float]
class DynamicPricingEngine:
def __init__(self):
self.historical_data: List[HistoricalDataPoint] = []
self.surge_zones: Dict[str, float] = {}
self.demand_predictions: Dict[str, deque] = defaultdict(lambda:
deque(maxlen=24))
# Pricing parameters
self.base_surge = 1.0
self.max_surge = 5.0
self.min_surge = 0.8
# Demand/supply thresholds
self.balanced_ratio = 1.2 # demand/supply ratio for 1.0x surge
self.high_demand_ratio = 2.0 # ratio for significant surge
# Weather impact factors
self.weather_multipliers = {
WeatherCondition.CLEAR: 1.0,
WeatherCondition.RAIN: 1.4,
WeatherCondition.SNOW: 1.8,
WeatherCondition.FOG: 1.2,
WeatherCondition.STORM: 2.0
}
# Event impact factors
self.event_multipliers = {
EventType.NORMAL: 1.0,
EventType.CONCERT_START: 1.3,
EventType.CONCERT_END: 2.2,
EventType.SPORTS_GAME: 1.8,
EventType.AIRPORT_RUSH: 1.5,
EventType.CONFERENCE: 1.2
}
# Time-based patterns
self.hourly_patterns = self._initialize_hourly_patterns()
self.weekly_patterns = self._initialize_weekly_patterns()
def _initialize_hourly_patterns(self) -> Dict[int, float]:
"""Initialize hourly demand patterns"""
# Typical urban demand pattern
patterns = {}
for hour in range(24):
if 6 <= hour <= 9: # Morning rush
patterns[hour] = 1.5
elif 17 <= hour <= 20: # Evening rush
patterns[hour] = 1.8
elif 21 <= hour <= 2: # Night entertainment
patterns[hour] = 1.3
elif 3 <= hour <= 5: # Late night/early morning
patterns[hour] = 0.3
else: # Regular hours
patterns[hour] = 1.0
return patterns
def _initialize_weekly_patterns(self) -> Dict[int, float]:
"""Initialize weekly demand patterns"""
# 0=Monday, 6=Sunday
return {
0: 1.0, # Monday
1: 1.0, # Tuesday
2: 1.0, # Wednesday
3: 1.1, # Thursday
4: 1.3, # Friday
5: 1.5, # Saturday
6: 1.2 # Sunday
}
def calculate_surge_pricing(self, conditions: MarketConditions) ->
PricingResult:
"""Calculate surge pricing based on current conditions"""
# Predict demand
predicted_demand = self._predict_demand(conditions)
# Calculate base surge from demand/supply ratio
demand_supply_ratio = predicted_demand / max(1,
conditions.available_drivers)
base_surge = self._calculate_base_surge(demand_supply_ratio)
# Apply modifying factors
factors = {}
# Weather factor
weather_factor = self.weather_multipliers.get(conditions.weather,
1.0)
factors["weather"] = weather_factor
# Event factor
event_factor = self.event_multipliers.get(conditions.event, 1.0)
factors["event"] = event_factor
# Time-based factors
hour = int((conditions.timestamp % 86400) / 3600)
day_of_week = int((conditions.timestamp / 86400) % 7)
time_factor = (self.hourly_patterns.get(hour, 1.0) *
self.weekly_patterns.get(day_of_week, 1.0))
factors["time"] = time_factor
# Temperature factor (extreme temperatures increase demand)
temp_factor =
self._calculate_temperature_factor(conditions.temperature)
factors["temperature"] = temp_factor
# Visibility factor (low visibility increases demand)
visibility_factor = max(1.0, 2.0 - (conditions.visibility_km / 10))
factors["visibility"] = visibility_factor
# Combine all factors
combined_multiplier = (base_surge * weather_factor * event_factor *
time_factor * temp_factor * visibility_factor)
# Apply bounds
surge_multiplier = max(self.min_surge, min(self.max_surge,
combined_multiplier))
# Calculate driver incentive
driver_incentive = self._calculate_driver_incentive(
surge_multiplier, conditions.available_drivers, predicted_demand
)
# Generate explanation
explanation = self._generate_price_explanation(conditions, factors,
surge_multiplier)
# Calculate confidence
confidence = self._calculate_confidence(conditions, factors)
return PricingResult(
surge_multiplier=round(surge_multiplier, 1),
predicted_demand=predicted_demand,
driver_incentive=driver_incentive,
price_explanation=explanation,
confidence=confidence,
factors=factors
)
def _predict_demand(self, conditions: MarketConditions) -> int:
"""Predict demand based on conditions and historical data"""
base_prediction = conditions.base_demand
# Apply historical patterns
similar_conditions =
self._find_similar_historical_conditions(conditions)
if similar_conditions:
# Weight recent data more heavily
weights = [1.0 / (i + 1) for i in range(len(similar_conditions))]
weighted_demands = [data.demand * weight
for data, weight in zip(similar_conditions,
weights)]
historical_factor = sum(weighted_demands) / sum(weights)
# Blend with base prediction
predicted_demand = int(base_prediction * 0.3 + historical_factor
* 0.7)
else:
predicted_demand = base_prediction
# Apply time-based adjustments
hour = int((conditions.timestamp % 86400) / 3600)
day_of_week = int((conditions.timestamp / 86400) % 7)
time_adjustment = (self.hourly_patterns.get(hour, 1.0) *
self.weekly_patterns.get(day_of_week, 1.0))
predicted_demand = int(predicted_demand * time_adjustment)
# Apply weather and event adjustments
weather_adjustment = self.weather_multipliers.get(conditions.weather,
1.0)
event_adjustment = self.event_multipliers.get(conditions.event, 1.0)
predicted_demand = int(predicted_demand * weather_adjustment *
event_adjustment)
return max(1, predicted_demand)
def _find_similar_historical_conditions(self, conditions:
MarketConditions) -> List[HistoricalDataPoint]:
"""Find historical data points with similar conditions"""
similar_data = []
current_hour = int((conditions.timestamp % 86400) / 3600)
current_day = int((conditions.timestamp / 86400) % 7)
for data_point in self.historical_data:
data_hour = int((data_point.timestamp % 86400) / 3600)
data_day = int((data_point.timestamp / 86400) % 7)
# Check similarity criteria
hour_match = abs(data_hour - current_hour) <= 1
day_match = data_day == current_day
weather_match = data_point.weather == conditions.weather
event_match = data_point.event == conditions.event
# Score similarity
similarity_score = 0
if hour_match: similarity_score += 2
if day_match: similarity_score += 2
if weather_match: similarity_score += 1
if event_match: similarity_score += 1
if similarity_score >= 3: # Threshold for similarity
similar_data.append(data_point)
# Sort by recency (more recent first)
similar_data.sort(key=lambda x: x.timestamp, reverse=True)
return similar_data[:10] # Return top 10 most recent similar
conditions
def _calculate_base_surge(self, demand_supply_ratio: float) -> float:
"""Calculate base surge multiplier from demand/supply ratio"""
if demand_supply_ratio <= self.balanced_ratio:
return self.base_surge
elif demand_supply_ratio <= self.high_demand_ratio:
# Linear interpolation between 1.0 and 2.0
return 1.0 + (demand_supply_ratio - self.balanced_ratio) /
(self.high_demand_ratio - self.balanced_ratio)
else:
# Exponential growth for very high demand
excess_ratio = demand_supply_ratio - self.high_demand_ratio
return 2.0 + (excess_ratio * 0.5)
def _calculate_temperature_factor(self, temperature: float) -> float:
"""Calculate temperature impact factor"""
# Comfortable temperature range: 15-25°C
if 15 <= temperature <= 25:
return 1.0
elif temperature < 0:
return 1.8 # Very cold
elif temperature < 15:
return 1.3 # Cold
elif temperature > 35:
return 1.6 # Very hot
elif temperature > 25:
return 1.2 # Hot
else:
return 1.0
def _calculate_driver_incentive(self, surge_multiplier: float,
available_drivers: int, predicted_demand:
int) -> float:
"""Calculate driver incentive multiplier"""
# Base incentive increases with surge
base_incentive = min(2.0, surge_multiplier * 0.7)
# Additional incentive if very few drivers available
if available_drivers < predicted_demand * 0.3:
shortage_bonus = 0.5
elif available_drivers < predicted_demand * 0.5:
shortage_bonus = 0.3
else:
shortage_bonus = 0.0
return round(base_incentive + shortage_bonus, 1)
def _generate_price_explanation(self, conditions: MarketConditions,
factors: Dict[str, float], surge: float) ->
str:
"""Generate human-readable price explanation"""
explanations = []
# Primary factors
if surge > 2.0:
explanations.append("Very high demand")
elif surge > 1.5:
explanations.append("High demand")
elif surge > 1.2:
explanations.append("Increased demand")
# Weather conditions
if conditions.weather == WeatherCondition.RAIN:
explanations.append("rainy weather")
elif conditions.weather == WeatherCondition.SNOW:
explanations.append("snowy conditions")
elif conditions.weather == WeatherCondition.STORM:
explanations.append("storm conditions")
# Events
if conditions.event == EventType.CONCERT_END:
explanations.append("concert ending")
elif conditions.event == EventType.SPORTS_GAME:
explanations.append("sports event")
elif conditions.event == EventType.AIRPORT_RUSH:
explanations.append("airport rush hour")
# Time-based
hour = int((conditions.timestamp % 86400) / 3600)
if 7 <= hour <= 9:
explanations.append("morning rush hour")
elif 17 <= hour <= 19:
explanations.append("evening rush hour")
elif 22 <= hour or hour <= 2:
explanations.append("late night hours")
# Driver shortage
demand_supply_ratio = conditions.base_demand / max(1,
conditions.available_drivers)
if demand_supply_ratio > 2.0:
explanations.append("limited drivers available")
if not explanations:
return "Standard pricing"
if len(explanations) == 1:
return f"Higher prices due to {explanations[0]}"
elif len(explanations) == 2:
return f"Higher prices due to {explanations[0]} and
{explanations[1]}"
else:
return f"Higher prices due to {', '.join(explanations[:-1])}, and
{explanations[-1]}"
def _calculate_confidence(self, conditions: MarketConditions,
factors: Dict[str, float]) -> float:
"""Calculate confidence in pricing decision"""
base_confidence = 0.8
# Reduce confidence for extreme conditions
if any(factor > 2.0 for factor in factors.values()):
base_confidence *= 0.9
# Reduce confidence if very few historical data points
similar_conditions =
self._find_similar_historical_conditions(conditions)
if len(similar_conditions) < 3:
base_confidence *= 0.8
# Reduce confidence for extreme weather
if conditions.weather in [WeatherCondition.STORM,
WeatherCondition.SNOW]:
base_confidence *= 0.9
# Reduce confidence for very low visibility
if conditions.visibility_km < 2:
base_confidence *= 0.8
return round(base_confidence, 2)
def add_historical_data(self, data_point: HistoricalDataPoint):
"""Add historical data point"""
self.historical_data.append(data_point)
# Keep only recent data (last 30 days)
cutoff_time = time.time() - (30 * 24 * 3600)
self.historical_data = [
data for data in self.historical_data
if data.timestamp > cutoff_time
]
def analyze_pricing_effectiveness(self) -> Dict[str, float]:
"""Analyze effectiveness of pricing strategy"""
if len(self.historical_data) < 10:
return {}
# Calculate metrics
recent_data = self.historical_data[-100:] # Last 100 data points
avg_surge = statistics.mean(data.surge_multiplier for data in
recent_data)
avg_completion_rate = statistics.mean(
data.completed_rides / max(1, data.completed_rides +
data.cancelled_rides)
for data in recent_data
)
# Demand fulfillment rate
avg_fulfillment = statistics.mean(
min(1.0, data.supply / max(1, data.demand))
for data in recent_data
)
# Revenue efficiency (higher surge should correlate with completion)
high_surge_data = [data for data in recent_data if
data.surge_multiplier > 1.5]
if high_surge_data:
high_surge_completion = statistics.mean(
data.completed_rides / max(1, data.completed_rides +
data.cancelled_rides)
for data in high_surge_data
)
else:
high_surge_completion = 0
return {
"avg_surge_multiplier": avg_surge,
"avg_completion_rate": avg_completion_rate,
"demand_fulfillment_rate": avg_fulfillment,
"high_surge_completion_rate": high_surge_completion,
"total_data_points": len(self.historical_data)
}
def optimize_pricing_parameters(self):
"""Optimize pricing parameters based on historical performance"""
if len(self.historical_data) < 50:
return
# Analyze surge effectiveness
surge_performance = defaultdict(list)
for data in self.historical_data:
completion_rate = data.completed_rides / max(1,
data.completed_rides + data.cancelled_rides)
surge_bucket = round(data.surge_multiplier, 1)
surge_performance[surge_bucket].append(completion_rate)
# Find optimal surge levels
optimal_surges = {}
for surge_level, completion_rates in surge_performance.items():
if len(completion_rates) >= 5: # Minimum sample size
avg_completion = statistics.mean(completion_rates)
optimal_surges[surge_level] = avg_completion
# Adjust parameters if needed
if optimal_surges:
best_surge = max(optimal_surges.items(), key=lambda x: x[1])
if best_surge[1] > 0.9: # Very high completion rate
# Can be more aggressive with surge
self.high_demand_ratio *= 0.95
elif best_surge[1] < 0.7: # Low completion rate
# Be more conservative with surge
self.high_demand_ratio *= 1.05
# Test the dynamic pricing engine
def test_dynamic_pricing():
engine = DynamicPricingEngine()
# Add historical data
base_time = time.time() - 86400 # 24 hours ago
for i in range(50):
data_point = HistoricalDataPoint(
timestamp=base_time + (i * 1800), # Every 30 minutes
demand=random.randint(20, 80),
supply=random.randint(10, 50),
surge_multiplier=random.uniform(1.0, 3.0),
weather=random.choice(list(WeatherCondition)),
event=random.choice(list(EventType)),
completed_rides=random.randint(15, 60),
cancelled_rides=random.randint(2, 10)
)
engine.add_historical_data(data_point)
print("Testing Dynamic Pricing Engine:")
print(f"Historical data points: {len(engine.historical_data)}")
# Test normal conditions
normal_conditions = MarketConditions(
location=(37.7749, -122.4194),
timestamp=time.time(),
weather=WeatherCondition.CLEAR,
event=EventType.NORMAL,
base_demand=40,
available_drivers=25,
temperature=22.0,
visibility_km=10.0
)
print(f"\nNormal Conditions:")
print(f" Weather: {normal_conditions.weather.value}")
print(f" Event: {normal_conditions.event.value}")
print(f" Base demand: {normal_conditions.base_demand}")
print(f" Available drivers: {normal_conditions.available_drivers}")
result = engine.calculate_surge_pricing(normal_conditions)
print(f"\nPricing Result:")
print(f" Surge multiplier: {result.surge_multiplier}x")
print(f" Predicted demand: {result.predicted_demand}")
print(f" Driver incentive: {result.driver_incentive}x")
print(f" Explanation: {result.price_explanation}")
print(f" Confidence: {result.confidence}")
print(f"\nFactors:")
for factor, value in result.factors.items():
print(f" {factor}: {value:.2f}")
# Test high demand conditions
high_demand_conditions = MarketConditions(
location=(37.7749, -122.4194),
timestamp=time.time(),
weather=WeatherCondition.RAIN,
event=EventType.CONCERT_END,
base_demand=80,
available_drivers=15,
temperature=10.0,
visibility_km=3.0
)
print(f"\nHigh Demand Conditions:")
print(f" Weather: {high_demand_conditions.weather.value}")
print(f" Event: {high_demand_conditions.event.value}")
print(f" Base demand: {high_demand_conditions.base_demand}")
print(f" Available drivers: {high_demand_conditions.available_drivers}")
high_demand_result =
engine.calculate_surge_pricing(high_demand_conditions)
print(f"\nHigh Demand Pricing:")
print(f" Surge multiplier: {high_demand_result.surge_multiplier}x")
print(f" Predicted demand: {high_demand_result.predicted_demand}")
print(f" Driver incentive: {high_demand_result.driver_incentive}x")
print(f" Explanation: {high_demand_result.price_explanation}")
print(f" Confidence: {high_demand_result.confidence}")
# Test pricing effectiveness analysis
effectiveness = engine.analyze_pricing_effectiveness()
print(f"\nPricing Effectiveness:")
for metric, value in effectiveness.items():
if isinstance(value, float):
print(f" {metric}: {value:.3f}")
else:
print(f" {metric}: {value}")
test_dynamic_pricing()
Key Insights:
• Multi-factor surge calculation considering demand/supply, weather, events, and time
patterns
• Machine learning-based demand prediction using historical similarity matching
• Dynamic driver incentive calculation to balance supply and demand
• Transparent pricing explanations for user trust and understanding
• Continuous optimization based on historical performance analysis
2.7 Uber OA Problems {#uber-oa}
Difficulty: Hard | Time Limit: 60 minutes | Company: Uber
Problem Statement:
Design a ride matching algorithm that efficiently pairs riders with drivers while optimizing
for multiple objectives: minimize wait time, minimize detour for drivers, and maximize
overall system efficiency.
Example:
Plain Text
Input:
riders = [
{"id": "r1", "location": [37.7749, -122.4194], "destination": [37.7849,
-122.4094], "timestamp": 1640995200},
{"id": "r2", "location": [37.7649, -122.4294], "destination": [37.7949,
-122.3994], "timestamp": 1640995230}
]
drivers = [
{"id": "d1", "location": [37.7699, -122.4244], "capacity": 4,
"current_riders": 1},
{"id": "d2", "location": [37.7799, -122.4144], "capacity": 4,
"current_riders": 0}
]
Output: {
"matches": [
{"rider": "r1", "driver": "d1", "pickup_time": 180, "detour_distance":
0.5},
{"rider": "r2", "driver": "d2", "pickup_time": 120, "detour_distance":
0.3}
],
"total_wait_time": 300,
"average_detour": 0.4
}
Solution Approach:
This problem requires implementing a sophisticated matching algorithm that considers
multiple constraints and objectives.
Python
import math
from typing import List, Dict, Tuple, Optional
from dataclasses import dataclass
from datetime import datetime, timedelta
import heapq
@dataclass
class Location:
lat: float
lng: float
def distance_to(self, other: 'Location') -> float:
"""Calculate distance using Haversine formula (in km)"""
R = 6371 # Earth's radius in km
lat1, lng1 = math.radians(self.lat), math.radians(self.lng)
lat2, lng2 = math.radians(other.lat), math.radians(other.lng)
dlat = lat2 - lat1
dlng = lng2 - lng1
a = (math.sin(dlat/2)**2 +
math.cos(lat1) * math.cos(lat2) * math.sin(dlng/2)**2)
c = 2 * math.asin(math.sqrt(a))
return R * c
@dataclass
class Rider:
id: str
location: Location
destination: Location
timestamp: datetime
priority: int = 1 # 1=normal, 2=premium
max_wait_time: int = 600 # seconds
def wait_time(self, current_time: datetime) -> int:
return int((current_time - self.timestamp).total_seconds())
@dataclass
class Driver:
id: str
location: Location
capacity: int
current_riders: int
current_route: List[Location] = None
def available_capacity(self) -> int:
return self.capacity - self.current_riders
def is_available(self) -> bool:
return self.available_capacity() > 0
@dataclass
class RideMatch:
rider: Rider
driver: Driver
pickup_time: int # seconds
detour_distance: float # km
total_cost: float # optimization score
class UberMatchingAlgorithm:
def __init__(self):
self.average_speed = 30 # km/h in city
self.max_pickup_distance = 5 # km
self.max_pickup_time = 600 # seconds
# Optimization weights
self.weights = {
'wait_time': 0.4,
'detour_distance': 0.3,
'driver_utilization': 0.2,
'rider_priority': 0.1
}
def calculate_pickup_time(self, driver_location: Location,
rider_location: Location) -> int:
"""Calculate estimated pickup time in seconds"""
distance = driver_location.distance_to(rider_location)
time_hours = distance / self.average_speed
return int(time_hours * 3600)
def calculate_detour_distance(self, driver: Driver, rider: Rider) ->
float:
"""Calculate additional distance driver needs to travel"""
if driver.current_route is None or len(driver.current_route) == 0:
# Driver has no current route, just pickup + trip distance
pickup_distance = driver.location.distance_to(rider.location)
trip_distance = rider.location.distance_to(rider.destination)
return pickup_distance + trip_distance
# Calculate detour for existing route
original_distance = 0
for i in range(len(driver.current_route) - 1):
original_distance +=
driver.current_route[i].distance_to(driver.current_route[i + 1])
# Calculate new route with rider
new_route = [driver.location, rider.location, rider.destination] +
driver.current_route[1:]
new_distance = 0
for i in range(len(new_route) - 1):
new_distance += new_route[i].distance_to(new_route[i + 1])
return new_distance - original_distance
def calculate_match_score(self, rider: Rider, driver: Driver,
current_time: datetime) -> Optional[RideMatch]:
"""Calculate matching score between rider and driver"""
if not driver.is_available():
return None
pickup_time = self.calculate_pickup_time(driver.location,
rider.location)
# Check constraints
if pickup_time > self.max_pickup_time:
return None
pickup_distance = driver.location.distance_to(rider.location)
if pickup_distance > self.max_pickup_distance:
return None
detour_distance = self.calculate_detour_distance(driver, rider)
wait_time = rider.wait_time(current_time) + pickup_time
# Check if rider's max wait time would be exceeded
if wait_time > rider.max_wait_time:
return None
# Calculate optimization score (lower is better)
wait_time_score = wait_time / 600 # Normalize to 10 minutes
detour_score = detour_distance / 10 # Normalize to 10 km
utilization_score = 1.0 - (driver.available_capacity() /
driver.capacity)
priority_bonus = 1.0 / rider.priority # Higher priority = lower
score
total_cost = (
wait_time_score * self.weights['wait_time'] +
detour_score * self.weights['detour_distance'] +
utilization_score * self.weights['driver_utilization'] +
priority_bonus * self.weights['rider_priority']
)
return RideMatch(
rider=rider,
driver=driver,
pickup_time=pickup_time,
detour_distance=detour_distance,
total_cost=total_cost
)
def find_optimal_matches_greedy(self, riders: List[Rider], drivers:
List[Driver],
current_time: datetime) -> List[RideMatch]:
"""Find optimal matches using greedy algorithm"""
matches = []
available_drivers = drivers.copy()
unmatched_riders = riders.copy()
# Sort riders by priority and wait time
unmatched_riders.sort(key=lambda r: (r.priority,
r.wait_time(current_time)))
for rider in unmatched_riders:
best_match = None
best_driver_idx = -1
for i, driver in enumerate(available_drivers):
match = self.calculate_match_score(rider, driver,
current_time)
if match and (best_match is None or match.total_cost <
best_match.total_cost):
best_match = match
best_driver_idx = i
if best_match:
matches.append(best_match)
# Update driver capacity
available_drivers[best_driver_idx].current_riders += 1
if not available_drivers[best_driver_idx].is_available():
available_drivers.pop(best_driver_idx)
return matches
def find_optimal_matches_hungarian(self, riders: List[Rider], drivers:
List[Driver],
current_time: datetime) ->
List[RideMatch]:
"""Find optimal matches using Hungarian algorithm approach"""
# Create cost matrix
cost_matrix = []
valid_matches = []
for i, rider in enumerate(riders):
rider_costs = []
rider_matches = []
for j, driver in enumerate(drivers):
match = self.calculate_match_score(rider, driver,
current_time)
if match:
rider_costs.append(match.total_cost)
rider_matches.append(match)
else:
rider_costs.append(float('inf'))
rider_matches.append(None)
cost_matrix.append(rider_costs)
valid_matches.append(rider_matches)
# Simplified assignment (for full implementation, use
scipy.optimize.linear_sum_assignment)
matches = []
used_drivers = set()
# Sort riders by minimum cost
rider_indices = list(range(len(riders)))
rider_indices.sort(key=lambda i: min(cost_matrix[i]))
for rider_idx in rider_indices:
best_driver_idx = -1
best_cost = float('inf')
for driver_idx in range(len(drivers)):
if driver_idx not in used_drivers and cost_matrix[rider_idx]
[driver_idx] < best_cost:
best_cost = cost_matrix[rider_idx][driver_idx]
best_driver_idx = driver_idx
if best_driver_idx != -1 and valid_matches[rider_idx]
[best_driver_idx]:
matches.append(valid_matches[rider_idx][best_driver_idx])
used_drivers.add(best_driver_idx)
return matches
def optimize_ride_matching(self, riders_data: List[Dict], drivers_data:
List[Dict],
algorithm: str = "greedy") -> Dict:
"""Main optimization function"""
current_time = datetime.now()
# Convert input data to objects
riders = []
for r_data in riders_data:
location = Location(r_data['location'][0], r_data['location'][1])
destination = Location(r_data['destination'][0],
r_data['destination'][1])
timestamp = datetime.fromtimestamp(r_data['timestamp'])
riders.append(Rider(r_data['id'], location, destination,
timestamp))
drivers = []
for d_data in drivers_data:
location = Location(d_data['location'][0], d_data['location'][1])
drivers.append(Driver(
d_data['id'], location, d_data['capacity'],
d_data['current_riders']
))
# Find optimal matches
if algorithm == "hungarian":
matches = self.find_optimal_matches_hungarian(riders, drivers,
current_time)
else:
matches = self.find_optimal_matches_greedy(riders, drivers,
current_time)
# Calculate metrics
total_wait_time = sum(match.rider.wait_time(current_time) +
match.pickup_time for match in matches)
average_detour = sum(match.detour_distance for match in matches) /
len(matches) if matches else 0
match_rate = len(matches) / len(riders) if riders else 0
return {
"matches": [
{
"rider": match.rider.id,
"driver": match.driver.id,
"pickup_time": match.pickup_time,
"detour_distance": round(match.detour_distance, 2),
"optimization_score": round(match.total_cost, 3)
}
for match in matches
],
"total_wait_time": total_wait_time,
"average_detour": round(average_detour, 2),
"match_rate": round(match_rate, 2),
"unmatched_riders": len(riders) - len(matches),
"algorithm_used": algorithm
}
# Test cases
def test_uberMatching():
algorithm = UberMatchingAlgorithm()
riders_data = [
{
"id": "r1",
"location": [37.7749, -122.4194],
"destination": [37.7849, -122.4094],
"timestamp": 1640995200
},
{
"id": "r2",
"location": [37.7649, -122.4294],
"destination": [37.7949, -122.3994],
"timestamp": 1640995230
},
{
"id": "r3",
"location": [37.7549, -122.4394],
"destination": [37.7749, -122.4194],
"timestamp": 1640995260
}
]
drivers_data = [
{"id": "d1", "location": [37.7699, -122.4244], "capacity": 4,
"current_riders": 1},
{"id": "d2", "location": [37.7799, -122.4144], "capacity": 4,
"current_riders": 0},
{"id": "d3", "location": [37.7599, -122.4344], "capacity": 2,
"current_riders": 0}
]
print("Testing Uber Ride Matching Algorithm:")
print(f"Riders: {len(riders_data)}")
print(f"Available drivers: {len(drivers_data)}")
# Test greedy algorithm
result_greedy = algorithm.optimize_ride_matching(riders_data,
drivers_data, "greedy")
print(f"\nGreedy Algorithm Results:")
print(f"Matches: {len(result_greedy['matches'])}")
print(f"Match rate: {result_greedy['match_rate']:.1%}")
print(f"Total wait time: {result_greedy['total_wait_time']} seconds")
print(f"Average detour: {result_greedy['average_detour']} km")
for match in result_greedy['matches']:
print(f" {match['rider']} -> {match['driver']} "
f"(pickup: {match['pickup_time']}s, detour:
{match['detour_distance']}km)")
test_uberMatching()
Key Insights:
• Balance multiple objectives: wait time, detour distance, driver utilization
• Use constraint checking to ensure feasible matches
• Consider rider priority and driver capacity constraints
• Implement both greedy and optimal assignment algorithms
2.10 Additional High-Frequency Problems {#additional-problems}
Problem 46: Two Sum Variations
Difficulty: Easy to Medium | Time Limit: 30 minutes | Company: Multiple
Problem Statement:
Implement various Two Sum problem variations that commonly appear in OAs:
1. Classic Two Sum
2. Two Sum with sorted array
3. Two Sum with duplicates allowed
4. Two Sum closest to target
5. Two Sum with multiple solutions
Solution Approach:
Python
from typing import List, Tuple, Optional
class TwoSumSolutions:
def two_sum_classic(self, nums: List[int], target: int) -> List[int]:
"""Classic two sum - return indices of two numbers that add up to
target"""
num_map = {}
for i, num in enumerate(nums):
complement = target - num
if complement in num_map:
return [num_map[complement], i]
num_map[num] = i
return []
def two_sum_sorted(self, nums: List[int], target: int) -> List[int]:
"""Two sum on sorted array using two pointers"""
left, right = 0, len(nums) - 1
while left < right:
current_sum = nums[left] + nums[right]
if current_sum == target:
return [left, right]
elif current_sum < target:
left += 1
else:
right -= 1
return []
def two_sum_all_pairs(self, nums: List[int], target: int) ->
List[List[int]]:
"""Find all unique pairs that sum to target"""
nums.sort()
result = []
left, right = 0, len(nums) - 1
while left < right:
current_sum = nums[left] + nums[right]
if current_sum == target:
result.append([nums[left], nums[right]])
# Skip duplicates
while left < right and nums[left] == nums[left + 1]:
left += 1
while left < right and nums[right] == nums[right - 1]:
right -= 1
left += 1
right -= 1
elif current_sum < target:
left += 1
else:
right -= 1
return result
def two_sum_closest(self, nums: List[int], target: int) -> Tuple[int,
int]:
"""Find pair with sum closest to target"""
nums.sort()
left, right = 0, len(nums) - 1
closest_sum = float('inf')
result = (0, 0)
while left < right:
current_sum = nums[left] + nums[right]
if abs(current_sum - target) < abs(closest_sum - target):
closest_sum = current_sum
result = (nums[left], nums[right])
if current_sum < target:
left += 1
else:
right -= 1
return result
def two_sum_count(self, nums: List[int], target: int) -> int:
"""Count number of pairs that sum to target"""
from collections import Counter
count_map = Counter(nums)
result = 0
for num in count_map:
complement = target - num
if complement in count_map:
if num == complement:
# Same number used twice
result += count_map[num] * (count_map[num] - 1) // 2
elif num < complement:
# Different numbers, count once to avoid duplicates
result += count_map[num] * count_map[complement]
return result
# Test all variations
def test_two_sum_variations():
solver = TwoSumSolutions()
# Test data
nums1 = [2, 7, 11, 15]
nums2 = [1, 2, 3, 4, 5, 6]
nums3 = [1, 1, 2, 2, 3, 3]
target = 9
print("Testing Two Sum Variations:")
# Classic two sum
result1 = solver.two_sum_classic(nums1, target)
print(f"Classic two sum: {result1}")
# Sorted array two sum
result2 = solver.two_sum_sorted(sorted(nums2), target)
print(f"Sorted two sum: {result2}")
# All pairs
result3 = solver.two_sum_all_pairs(nums2, target)
print(f"All pairs: {result3}")
# Closest sum
result4 = solver.two_sum_closest(nums2, target)
print(f"Closest to target: {result4}")
# Count pairs
result5 = solver.two_sum_count(nums3, 4)
print(f"Count of pairs summing to 4: {result5}")
test_two_sum_variations()
Problem 47: Sliding Window Maximum
Difficulty: Hard | Time Limit: 45 minutes | Company: Multiple
Problem Statement:
Given an array and a sliding window of size k, find the maximum element in each window as
it slides from left to right.
Python
from collections import deque
from typing import List
class SlidingWindowMaximum:
def max_sliding_window_deque(self, nums: List[int], k: int) -> List[int]:
"""Optimal solution using deque - O(n) time"""
if not nums or k == 0:
return []
dq = deque() # Store indices
result = []
for i in range(len(nums)):
# Remove indices outside current window
while dq and dq[0] <= i - k:
dq.popleft()
# Remove indices of smaller elements (they won't be maximum)
while dq and nums[dq[-1]] <= nums[i]:
dq.pop()
dq.append(i)
# Add maximum of current window to result
if i >= k - 1:
result.append(nums[dq[0]])
return result
def max_sliding_window_heap(self, nums: List[int], k: int) -> List[int]:
"""Alternative solution using heap - O(n log k) time"""
import heapq
if not nums or k == 0:
return []
# Use max heap (negate values for min heap)
heap = []
result = []
for i in range(len(nums)):
# Add current element
heapq.heappush(heap, (-nums[i], i))
# Remove elements outside window
while heap and heap[0][1] <= i - k:
heapq.heappop(heap)
# Add maximum of current window
if i >= k - 1:
result.append(-heap[0][0])
return result
# Test sliding window maximum
def test_sliding_window_maximum():
solver = SlidingWindowMaximum()
nums = [1, 3, -1, -3, 5, 3, 6, 7]
k = 3
print("Testing Sliding Window Maximum:")
print(f"Input: {nums}, k={k}")
result1 = solver.max_sliding_window_deque(nums, k)
print(f"Deque solution: {result1}")
result2 = solver.max_sliding_window_heap(nums, k)
print(f"Heap solution: {result2}")
test_sliding_window_maximum()
Problem 48: LRU Cache Implementation
Difficulty: Medium | Time Limit: 45 minutes | Company: Multiple
Problem Statement:
Design and implement a Least Recently Used (LRU) cache with O(1) get and put operations.
Python
class LRUCache:
class Node:
def __init__(self, key: int = 0, value: int = 0):
self.key = key
self.value = value
self.prev = None
self.next = None
def __init__(self, capacity: int):
self.capacity = capacity
self.cache = {} # key -> node
# Create dummy head and tail nodes
self.head = self.Node()
self.tail = self.Node()
self.head.next = self.tail
self.tail.prev = self.head
def _add_node(self, node):
"""Add node right after head"""
node.prev = self.head
node.next = self.head.next
self.head.next.prev = node
self.head.next = node
def _remove_node(self, node):
"""Remove an existing node"""
prev_node = node.prev
next_node = node.next
prev_node.next = next_node
next_node.prev = prev_node
def _move_to_head(self, node):
"""Move node to head (mark as recently used)"""
self._remove_node(node)
self._add_node(node)
def _pop_tail(self):
"""Remove last node (least recently used)"""
last_node = self.tail.prev
self._remove_node(last_node)
return last_node
def get(self, key: int) -> int:
node = self.cache.get(key)
if node:
# Move to head (mark as recently used)
self._move_to_head(node)
return node.value
return -1
def put(self, key: int, value: int) -> None:
node = self.cache.get(key)
if node:
# Update existing node
node.value = value
self._move_to_head(node)
else:
# Add new node
new_node = self.Node(key, value)
if len(self.cache) >= self.capacity:
# Remove least recently used
tail = self._pop_tail()
del self.cache[tail.key]
self.cache[key] = new_node
self._add_node(new_node)
# Test LRU Cache
def test_lru_cache():
print("Testing LRU Cache:")
cache = LRUCache(2)
cache.put(1, 1)
cache.put(2, 2)
print(f"get(1): {cache.get(1)}") # returns 1
cache.put(3, 3) # evicts key 2
print(f"get(2): {cache.get(2)}") # returns -1 (not found)
cache.put(4, 4) # evicts key 1
print(f"get(1): {cache.get(1)}") # returns -1 (not found)
print(f"get(3): {cache.get(3)}") # returns 3
print(f"get(4): {cache.get(4)}") # returns 4
test_lru_cache()
Problem 50: Design Rate Limiter
Difficulty: Medium | Time Limit: 45 minutes | Company: Multiple
Problem Statement:
Design a rate limiter that limits the number of requests a user can make within a time
window.
Python
import time
from collections import defaultdict, deque
from typing import Dict
class RateLimiter:
def __init__(self, max_requests: int, time_window: int):
"""
max_requests: maximum number of requests allowed
time_window: time window in seconds
"""
self.max_requests = max_requests
self.time_window = time_window
self.requests = defaultdict(deque) # user_id -> deque of timestamps
def is_allowed(self, user_id: str) -> bool:
"""Check if request is allowed for user"""
current_time = time.time()
user_requests = self.requests[user_id]
# Remove old requests outside time window
while user_requests and user_requests[0] <= current_time -
self.time_window:
user_requests.popleft()
# Check if under limit
if len(user_requests) < self.max_requests:
user_requests.append(current_time)
return True
return False
def get_remaining_requests(self, user_id: str) -> int:
"""Get remaining requests for user in current window"""
current_time = time.time()
user_requests = self.requests[user_id]
# Remove old requests
while user_requests and user_requests[0] <= current_time -
self.time_window:
user_requests.popleft()
return max(0, self.max_requests - len(user_requests))
def get_reset_time(self, user_id: str) -> float:
"""Get time when rate limit resets for user"""
user_requests = self.requests[user_id]
if not user_requests:
return 0
return user_requests[0] + self.time_window
class TokenBucketRateLimiter:
"""Alternative implementation using token bucket algorithm"""
def __init__(self, capacity: int, refill_rate: float):
"""
capacity: maximum number of tokens
refill_rate: tokens added per second
"""
self.capacity = capacity
self.refill_rate = refill_rate
self.buckets = defaultdict(lambda: {'tokens': capacity,
'last_refill': time.time()})
def is_allowed(self, user_id: str, tokens_required: int = 1) -> bool:
"""Check if request is allowed"""
current_time = time.time()
bucket = self.buckets[user_id]
# Refill tokens based on time elapsed
time_elapsed = current_time - bucket['last_refill']
tokens_to_add = time_elapsed * self.refill_rate
bucket['tokens'] = min(self.capacity, bucket['tokens'] +
tokens_to_add)
bucket['last_refill'] = current_time
# Check if enough tokens available
if bucket['tokens'] >= tokens_required:
bucket['tokens'] -= tokens_required
return True
return False
# Test rate limiters
def test_rate_limiters():
print("Testing Rate Limiters:")
# Test sliding window rate limiter
limiter = RateLimiter(max_requests=3, time_window=60) # 3 requests per
minute
print("\nSliding Window Rate Limiter:")
for i in range(5):
allowed = limiter.is_allowed("user1")
remaining = limiter.get_remaining_requests("user1")
print(f"Request {i+1}: {'Allowed' if allowed else 'Denied'},
Remaining: {remaining}")
# Test token bucket rate limiter
bucket_limiter = TokenBucketRateLimiter(capacity=5, refill_rate=1.0) # 5
tokens, 1 per second
print("\nToken Bucket Rate Limiter:")
for i in range(7):
allowed = bucket_limiter.is_allowed("user1")
print(f"Request {i+1}: {'Allowed' if allowed else 'Denied'}")
if i == 3:
print(" (Waiting 2 seconds for token refill)")
time.sleep(2)
test_rate_limiters()
Key Insights for All Problems:
• Master fundamental patterns: Two Pointers, Sliding Window, Hash Maps
• Understand time/space complexity tradeoffs
• Practice implementing data structures from scratch
• Consider edge cases and constraint handling
• Learn multiple approaches for the same problem type
Problem 33: Route Optimization System
Difficulty: Hard | Time Limit: 60 minutes | Company: Uber
Problem Statement:
Design a route optimization system for Uber that:
1. Finds optimal routes considering real-time traffic, road closures, and weather
2. Supports multiple waypoints and ride-sharing scenarios
3. Minimizes total travel time while balancing driver and rider preferences
4. Handles dynamic re-routing based on changing conditions
5. Provides accurate ETAs with confidence intervals
Example:
Plain Text
Input:
route_request = {
"start": {"lat": 37.7749, "lon": -122.4194}, "end": {"lat": 37.7849, "lon":
-122.4094},
"waypoints": [{"lat": 37.7779, "lon": -122.4154}], "preferences":
{"avoid_tolls": true, "fastest_route": true},
"traffic_data": {"congestion_level": 0.7, "incidents": [{"lat": 37.7759,
"lon": -122.4174, "severity": "high"}]}
}
Output: {
"route": [{"lat": 37.7749, "lon": -122.4194}, {"lat": 37.7759, "lon":
-122.4184}, ...],
"total_distance": 2.3, "estimated_time": 420, "confidence": 0.85,
"alternative_routes": [...], "dynamic_updates": true
}
Solution Approach:
This problem requires graph algorithms, real-time optimization, and predictive modeling.
Python
import time
import math
import heapq
import random
from typing import List, Dict, Tuple, Optional, Set
from dataclasses import dataclass, field
from enum import Enum
from collections import defaultdict, deque
class RoadType(Enum):
HIGHWAY = "highway"
ARTERIAL = "arterial"
LOCAL = "local"
TOLL = "toll"
class IncidentSeverity(Enum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
CRITICAL = "critical"
class WeatherCondition(Enum):
CLEAR = "clear"
RAIN = "rain"
SNOW = "snow"
FOG = "fog"
@dataclass
class Location:
latitude: float
longitude: float
def distance_to(self, other: 'Location') -> float:
"""Calculate distance using Haversine formula"""
R = 6371000 # Earth's radius in meters
lat1, lon1 = math.radians(self.latitude),
math.radians(self.longitude)
lat2, lon2 = math.radians(other.latitude),
math.radians(other.longitude)
dlat = lat2 - lat1
dlon = lon2 - lon1
a = (math.sin(dlat/2)**2 +
math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2)
c = 2 * math.asin(math.sqrt(a))
return R * c
@dataclass
class RoadSegment:
start: Location
end: Location
road_type: RoadType
speed_limit: float # km/h
length: float # meters
current_speed: float # km/h (considering traffic)
toll_cost: float = 0.0
def travel_time(self) -> float:
"""Calculate travel time in seconds"""
if self.current_speed <= 0:
return float('inf')
return (self.length / 1000) / self.current_speed * 3600
@dataclass
class TrafficIncident:
location: Location
severity: IncidentSeverity
radius: float # meters
speed_impact: float # multiplier (0.1 = 90% speed reduction)
@dataclass
class RoutePreferences:
avoid_tolls: bool = False
fastest_route: bool = True
shortest_distance: bool = False
avoid_highways: bool = False
eco_friendly: bool = False
@dataclass
class RouteResult:
route: List[Location]
total_distance: float # km
estimated_time: float # seconds
confidence: float
alternative_routes: List['RouteResult']
dynamic_updates: bool
total_cost: float = 0.0
class RouteOptimizationEngine:
def __init__(self):
self.road_network: Dict[str, List[RoadSegment]] = {}
self.traffic_incidents: List[TrafficIncident] = []
self.weather_condition = WeatherCondition.CLEAR
self.historical_patterns: Dict[str, List[float]] = {}
# Optimization parameters
self.max_alternative_routes = 3
self.reroute_threshold = 0.15 # 15% time increase triggers reroute
# Weather impact factors
self.weather_speed_factors = {
WeatherCondition.CLEAR: 1.0,
WeatherCondition.RAIN: 0.8,
WeatherCondition.SNOW: 0.6,
WeatherCondition.FOG: 0.7
}
# Build sample road network
self._build_sample_network()
def _build_sample_network(self):
"""Build a sample road network for testing"""
# Create a grid-like network
locations = []
for i in range(10):
for j in range(10):
lat = 37.7700 + i * 0.001
lon = -122.4200 + j * 0.001
locations.append(Location(lat, lon))
# Connect adjacent locations
for i, loc in enumerate(locations):
loc_key = f"{loc.latitude:.6f},{loc.longitude:.6f}"
self.road_network[loc_key] = []
# Connect to adjacent locations
for j, other_loc in enumerate(locations):
if i != j and loc.distance_to(other_loc) < 200: # Within
200m
road_type = random.choice(list(RoadType))
speed_limit = {
RoadType.HIGHWAY: 100,
RoadType.ARTERIAL: 60,
RoadType.LOCAL: 40,
RoadType.TOLL: 120
}[road_type]
segment = RoadSegment(
start=loc,
end=other_loc,
road_type=road_type,
speed_limit=speed_limit,
length=loc.distance_to(other_loc),
current_speed=speed_limit * random.uniform(0.6, 1.0),
toll_cost=2.0 if road_type == RoadType.TOLL else 0.0
)
self.road_network[loc_key].append(segment)
def find_optimal_route(self, start: Location, end: Location,
waypoints: List[Location] = None,
preferences: RoutePreferences = None) ->
RouteResult:
"""Find optimal route using A* algorithm with modifications"""
if preferences is None:
preferences = RoutePreferences()
if waypoints:
# Handle multi-waypoint routing
return self._find_multi_waypoint_route(start, end, waypoints,
preferences)
# Single destination routing
route_path = self._a_star_search(start, end, preferences)
if not route_path:
return RouteResult([], 0, 0, 0, [], False)
# Calculate route metrics
total_distance = self._calculate_total_distance(route_path)
estimated_time = self._calculate_travel_time(route_path)
confidence = self._calculate_confidence(route_path)
total_cost = self._calculate_total_cost(route_path, preferences)
# Find alternative routes
alternative_routes = self._find_alternative_routes(start, end,
preferences, route_path)
return RouteResult(
route=route_path,
total_distance=total_distance,
estimated_time=estimated_time,
confidence=confidence,
alternative_routes=alternative_routes,
dynamic_updates=True,
total_cost=total_cost
)
def _find_multi_waypoint_route(self, start: Location, end: Location,
waypoints: List[Location],
preferences: RoutePreferences) ->
RouteResult:
"""Find optimal route through multiple waypoints"""
# Use dynamic programming to find optimal waypoint order
all_points = [start] + waypoints + [end]
n = len(all_points)
# Calculate distance matrix
distance_matrix = [[0] * n for _ in range(n)]
for i in range(n):
for j in range(n):
if i != j:
route = self._a_star_search(all_points[i], all_points[j],
preferences)
if route:
distance_matrix[i][j] =
self._calculate_travel_time(route)
else:
distance_matrix[i][j] = float('inf')
# Solve TSP-like problem for waypoints (excluding start and end)
waypoint_indices = list(range(1, n-1))
best_order = self._solve_tsp(distance_matrix, 0, n-1,
waypoint_indices)
# Build complete route
complete_route = []
total_time = 0
total_distance = 0
total_cost = 0
current_idx = 0
for next_idx in best_order + [n-1]:
segment_route = self._a_star_search(all_points[current_idx],
all_points[next_idx], preferences)
if segment_route:
complete_route.extend(segment_route[:-1] if complete_route
else segment_route)
total_time += self._calculate_travel_time(segment_route)
total_distance +=
self._calculate_total_distance(segment_route)
total_cost += self._calculate_total_cost(segment_route,
preferences)
current_idx = next_idx
confidence = self._calculate_confidence(complete_route)
return RouteResult(
route=complete_route,
total_distance=total_distance,
estimated_time=total_time,
confidence=confidence,
alternative_routes=[],
dynamic_updates=True,
total_cost=total_cost
)
def _solve_tsp(self, distance_matrix: List[List[float]], start: int, end:
int,
waypoints: List[int]) -> List[int]:
"""Solve TSP for waypoint ordering using dynamic programming"""
if not waypoints:
return []
n = len(waypoints)
if n <= 3:
# Brute force for small instances
import itertools
best_order = None
best_cost = float('inf')
for perm in itertools.permutations(waypoints):
cost = distance_matrix[start][perm[0]]
for i in range(len(perm) - 1):
cost += distance_matrix[perm[i]][perm[i+1]]
cost += distance_matrix[perm[-1]][end]
if cost < best_cost:
best_cost = cost
best_order = list(perm)
return best_order or waypoints
# For larger instances, use nearest neighbor heuristic
unvisited = set(waypoints)
current = start
order = []
while unvisited:
nearest = min(unvisited, key=lambda x: distance_matrix[current]
[x])
order.append(nearest)
unvisited.remove(nearest)
current = nearest
return order
def _a_star_search(self, start: Location, end: Location,
preferences: RoutePreferences) -> List[Location]:
"""A* pathfinding algorithm with traffic and preferences"""
start_key = f"{start.latitude:.6f},{start.longitude:.6f}"
end_key = f"{end.latitude:.6f},{end.longitude:.6f}"
if start_key not in self.road_network or end_key not in
self.road_network:
return []
# Priority queue: (f_score, g_score, location_key, path)
open_set = [(0, 0, start_key, [start])]
closed_set = set()
g_scores = {start_key: 0}
while open_set:
f_score, g_score, current_key, path = heapq.heappop(open_set)
if current_key in closed_set:
continue
closed_set.add(current_key)
if current_key == end_key:
return path
# Explore neighbors
for segment in self.road_network.get(current_key, []):
neighbor_key = f"{segment.end.latitude:.6f},
{segment.end.longitude:.6f}"
if neighbor_key in closed_set:
continue
# Calculate cost considering preferences and traffic
segment_cost = self._calculate_segment_cost(segment,
preferences)
tentative_g = g_score + segment_cost
if neighbor_key not in g_scores or tentative_g <
g_scores[neighbor_key]:
g_scores[neighbor_key] = tentative_g
h_score = self._heuristic(segment.end, end)
f_score = tentative_g + h_score
new_path = path + [segment.end]
heapq.heappush(open_set, (f_score, tentative_g,
neighbor_key, new_path))
return [] # No path found
def _calculate_segment_cost(self, segment: RoadSegment,
preferences: RoutePreferences) -> float:
"""Calculate cost for a road segment considering preferences and
traffic"""
base_cost = segment.travel_time()
# Apply traffic incidents
for incident in self.traffic_incidents:
if incident.location.distance_to(segment.start) <=
incident.radius:
base_cost /= incident.speed_impact
# Apply weather conditions
weather_factor =
self.weather_speed_factors.get(self.weather_condition, 1.0)
base_cost /= weather_factor
# Apply preferences
if preferences.avoid_tolls and segment.toll_cost > 0:
base_cost *= 10 # Heavy penalty for tolls
if preferences.avoid_highways and segment.road_type ==
RoadType.HIGHWAY:
base_cost *= 2
if preferences.shortest_distance:
# Prioritize distance over time
distance_factor = segment.length / 1000 # km
base_cost = distance_factor * 100 # Convert to time-like units
if preferences.eco_friendly:
# Penalize high-speed roads for fuel efficiency
if segment.speed_limit > 80:
base_cost *= 1.2
return base_cost
def _heuristic(self, current: Location, goal: Location) -> float:
"""Heuristic function for A* (straight-line distance converted to
time)"""
distance = current.distance_to(goal)
# Assume average speed of 50 km/h for heuristic
return (distance / 1000) / 50 * 3600
def _calculate_total_distance(self, route: List[Location]) -> float:
"""Calculate total distance of route in km"""
total = 0
for i in range(len(route) - 1):
total += route[i].distance_to(route[i + 1])
return total / 1000 # Convert to km
def _calculate_travel_time(self, route: List[Location]) -> float:
"""Calculate total travel time in seconds"""
total_time = 0
for i in range(len(route) - 1):
start_key = f"{route[i].latitude:.6f},{route[i].longitude:.6f}"
# Find the road segment
best_segment = None
min_distance = float('inf')
for segment in self.road_network.get(start_key, []):
distance = segment.end.distance_to(route[i + 1])
if distance < min_distance:
min_distance = distance
best_segment = segment
if best_segment:
segment_time = best_segment.travel_time()
# Apply traffic and weather effects
for incident in self.traffic_incidents:
if incident.location.distance_to(best_segment.start) <=
incident.radius:
segment_time /= incident.speed_impact
weather_factor =
self.weather_speed_factors.get(self.weather_condition, 1.0)
segment_time /= weather_factor
total_time += segment_time
else:
# Fallback: estimate based on distance and average speed
distance = route[i].distance_to(route[i + 1])
total_time += (distance / 1000) / 40 * 3600 # 40 km/h
average
return total_time
def _calculate_total_cost(self, route: List[Location],
preferences: RoutePreferences) -> float:
"""Calculate total monetary cost (tolls, fuel, etc.)"""
total_cost = 0
for i in range(len(route) - 1):
start_key = f"{route[i].latitude:.6f},{route[i].longitude:.6f}"
for segment in self.road_network.get(start_key, []):
if segment.end.distance_to(route[i + 1]) < 50: # Close match
total_cost += segment.toll_cost
# Add fuel cost estimate
distance_km = segment.length / 1000
fuel_cost_per_km = 0.15 # $0.15 per km
total_cost += distance_km * fuel_cost_per_km
break
return total_cost
def _calculate_confidence(self, route: List[Location]) -> float:
"""Calculate confidence in route timing"""
base_confidence = 0.9
# Reduce confidence for longer routes
route_length = len(route)
if route_length > 20:
base_confidence *= 0.8
# Reduce confidence if many incidents on route
incident_count = 0
for i in range(len(route) - 1):
for incident in self.traffic_incidents:
if (incident.location.distance_to(route[i]) < incident.radius
or
incident.location.distance_to(route[i + 1]) <
incident.radius):
incident_count += 1
if incident_count > 0:
base_confidence *= max(0.5, 1.0 - incident_count * 0.1)
# Reduce confidence for bad weather
if self.weather_condition in [WeatherCondition.SNOW,
WeatherCondition.FOG]:
base_confidence *= 0.8
return round(base_confidence, 2)
def _find_alternative_routes(self, start: Location, end: Location,
preferences: RoutePreferences,
main_route: List[Location]) ->
List[RouteResult]:
"""Find alternative routes by modifying preferences"""
alternatives = []
# Alternative 1: Avoid main route roads
alt_preferences_1 = RoutePreferences(
avoid_tolls=not preferences.avoid_tolls,
fastest_route=preferences.fastest_route,
avoid_highways=not preferences.avoid_highways
)
alt_route_1 = self._a_star_search(start, end, alt_preferences_1)
if alt_route_1 and self._route_similarity(main_route, alt_route_1) <
0.7:
alternatives.append(RouteResult(
route=alt_route_1,
total_distance=self._calculate_total_distance(alt_route_1),
estimated_time=self._calculate_travel_time(alt_route_1),
confidence=self._calculate_confidence(alt_route_1),
alternative_routes=[],
dynamic_updates=False,
total_cost=self._calculate_total_cost(alt_route_1,
alt_preferences_1)
))
# Alternative 2: Shortest distance route
alt_preferences_2 = RoutePreferences(
shortest_distance=True,
fastest_route=False
)
alt_route_2 = self._a_star_search(start, end, alt_preferences_2)
if alt_route_2 and self._route_similarity(main_route, alt_route_2) <
0.7:
alternatives.append(RouteResult(
route=alt_route_2,
total_distance=self._calculate_total_distance(alt_route_2),
estimated_time=self._calculate_travel_time(alt_route_2),
confidence=self._calculate_confidence(alt_route_2),
alternative_routes=[],
dynamic_updates=False,
total_cost=self._calculate_total_cost(alt_route_2,
alt_preferences_2)
))
return alternatives[:self.max_alternative_routes]
def _route_similarity(self, route1: List[Location], route2:
List[Location]) -> float:
"""Calculate similarity between two routes (0 = completely different,
1 = identical)"""
if not route1 or not route2:
return 0.0
# Simple similarity based on shared waypoints
set1 = set((loc.latitude, loc.longitude) for loc in route1)
set2 = set((loc.latitude, loc.longitude) for loc in route2)
intersection = len(set1.intersection(set2))
union = len(set1.union(set2))
return intersection / union if union > 0 else 0.0
def update_traffic_conditions(self, incidents: List[TrafficIncident]):
"""Update traffic incidents"""
self.traffic_incidents = incidents
def update_weather(self, weather: WeatherCondition):
"""Update weather conditions"""
self.weather_condition = weather
def should_reroute(self, current_route: List[Location],
current_position: Location) -> bool:
"""Determine if rerouting is needed based on current conditions"""
if not current_route:
return False
# Find current position in route
current_index = 0
min_distance = float('inf')
for i, loc in enumerate(current_route):
distance = current_position.distance_to(loc)
if distance < min_distance:
min_distance = distance
current_index = i
# Calculate remaining route time
remaining_route = current_route[current_index:]
original_time = self._calculate_travel_time(remaining_route)
# Calculate new optimal route time
if current_index < len(current_route) - 1:
new_route = self._a_star_search(current_position,
current_route[-1], RoutePreferences())
new_time = self._calculate_travel_time(new_route)
# Reroute if new route is significantly faster
time_improvement = (original_time - new_time) / original_time
return time_improvement > self.reroute_threshold
return False
def get_eta_with_confidence(self, route: List[Location],
current_position: Location) -> Tuple[float,
float]:
"""Get ETA with confidence interval"""
base_eta = self._calculate_travel_time(route)
confidence = self._calculate_confidence(route)
# Calculate confidence interval (±20% for low confidence, ±5% for
high confidence)
uncertainty = (1 - confidence) * 0.2 + 0.05
return base_eta, uncertainty * base_eta
# Test the route optimization engine
def test_route_optimization():
engine = RouteOptimizationEngine()
# Add traffic incidents
incidents = [
TrafficIncident(
location=Location(37.7759, -122.4174),
severity=IncidentSeverity.HIGH,
radius=500,
speed_impact=0.3
),
TrafficIncident(
location=Location(37.7769, -122.4164),
severity=IncidentSeverity.MEDIUM,
radius=300,
speed_impact=0.6
)
]
engine.update_traffic_conditions(incidents)
engine.update_weather(WeatherCondition.RAIN)
print("Testing Route Optimization Engine:")
print(f"Traffic incidents: {len(incidents)}")
print(f"Weather: {engine.weather_condition.value}")
# Test basic routing
start = Location(37.7700, -122.4200)
end = Location(37.7780, -122.4120)
preferences = RoutePreferences(
avoid_tolls=True,
fastest_route=True
)
print(f"\nRoute Request:")
print(f" Start: {start.latitude:.4f}, {start.longitude:.4f}")
print(f" End: {end.latitude:.4f}, {end.longitude:.4f}")
print(f" Preferences: avoid_tolls={preferences.avoid_tolls},
fastest_route={preferences.fastest_route}")
result = engine.find_optimal_route(start, end, preferences=preferences)
print(f"\nRoute Result:")
print(f" Route points: {len(result.route)}")
print(f" Total distance: {result.total_distance:.2f} km")
print(f" Estimated time: {result.estimated_time:.0f}s
({result.estimated_time/60:.1f} min)")
print(f" Confidence: {result.confidence}")
print(f" Total cost: ${result.total_cost:.2f}")
print(f" Alternative routes: {len(result.alternative_routes)}")
# Test multi-waypoint routing
waypoints = [Location(37.7720, -122.4180), Location(37.7760, -122.4140)]
print(f"\nMulti-waypoint Route:")
print(f" Waypoints: {len(waypoints)}")
multi_result = engine.find_optimal_route(start, end, waypoints=waypoints,
preferences=preferences)
print(f" Route points: {len(multi_result.route)}")
print(f" Total distance: {multi_result.total_distance:.2f} km")
print(f" Estimated time: {multi_result.estimated_time:.0f}s
({multi_result.estimated_time/60:.1f} min)")
print(f" Confidence: {multi_result.confidence}")
# Test rerouting decision
current_position = Location(37.7730, -122.4170)
should_reroute = engine.should_reroute(result.route, current_position)
print(f"\nRerouting Analysis:")
print(f" Current position: {current_position.latitude:.4f},
{current_position.longitude:.4f}")
print(f" Should reroute: {should_reroute}")
# Test ETA with confidence
eta, uncertainty = engine.get_eta_with_confidence(result.route,
current_position)
print(f"\nETA Analysis:")
print(f" Base ETA: {eta:.0f}s ({eta/60:.1f} min)")
print(f" Uncertainty: ±{uncertainty:.0f}s (±{uncertainty/60:.1f} min)")
print(f" ETA range: {(eta-uncertainty)/60:.1f} -
{(eta+uncertainty)/60:.1f} min")
test_route_optimization()
Key Insights:
• A* pathfinding with real-time traffic and weather considerations
• Multi-waypoint optimization using TSP-like algorithms
• Dynamic rerouting based on changing conditions
• Confidence intervals for ETA predictions
• Alternative route generation with similarity filtering
Problem 34: Driver Behavior Analysis
Difficulty: Medium | Time Limit: 45 minutes | Company: Uber
Problem Statement:
Design a driver behavior analysis system that:
1. Monitors driving patterns (speed, acceleration, braking, turns)
2. Detects unsafe driving behaviors and provides real-time feedback
3. Calculates driver safety scores and risk assessments
4. Identifies patterns that correlate with accidents or complaints
5. Provides personalized coaching recommendations
Example:
Plain Text
Input:
driving_data = {
"driver_id": "d123", "trip_id": "t456", "duration": 1800,
"speed_data": [45, 50, 55, 40, 0, 25, 60, 55],
"acceleration_data": [0.5, -2.1, 1.8, -3.2, 0.8],
"gps_data": [{"lat": 37.7749, "lon": -122.4194, "timestamp": 1640995200},
...],
"events": [{"type": "hard_brake", "timestamp": 1640995300, "severity":
0.8}]
}
Output: {
"safety_score": 85, "risk_level": "medium", "violations": ["speeding",
"hard_braking"],
"coaching_tips": ["Maintain steady speed", "Anticipate traffic better"],
"improvement_areas": ["acceleration_control", "speed_management"]
}
Solution Approach:
This problem requires signal processing, machine learning, and behavioral analysis.
Python
import time
import math
import statistics
from typing import List, Dict, Tuple, Optional
from dataclasses import dataclass, field
from enum import Enum
from collections import defaultdict, deque
class EventType(Enum):
HARD_BRAKE = "hard_brake"
RAPID_ACCELERATION = "rapid_acceleration"
SHARP_TURN = "sharp_turn"
SPEEDING = "speeding"
PHONE_USE = "phone_use"
SEATBELT_OFF = "seatbelt_off"
class RiskLevel(Enum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
CRITICAL = "critical"
@dataclass
class GPSPoint:
latitude: float
longitude: float
timestamp: float
speed: float = 0.0
heading: float = 0.0
@dataclass
class DrivingEvent:
event_type: EventType
timestamp: float
severity: float # 0.0 to 1.0
location: Optional[GPSPoint] = None
details: Dict = field(default_factory=dict)
@dataclass
class TripData:
driver_id: str
trip_id: str
start_time: float
end_time: float
duration: float # seconds
distance: float # km
gps_points: List[GPSPoint]
events: List[DrivingEvent]
speed_limit_data: List[float] = field(default_factory=list)
@dataclass
class BehaviorAnalysisResult:
safety_score: int # 0-100
risk_level: RiskLevel
violations: List[str]
coaching_tips: List[str]
improvement_areas: List[str]
detailed_metrics: Dict[str, float]
trend_analysis: Dict[str, str]
class DriverBehaviorAnalyzer:
def __init__(self):
self.driver_history: Dict[str, List[TripData]] = defaultdict(list)
self.safety_thresholds = {
'max_acceleration': 3.0, # m/s²
'max_deceleration': -4.0, # m/s²
'max_speed_over_limit': 10, # km/h
'max_turn_rate': 15, # degrees/second
'min_following_distance': 2.0 # seconds
}
# Scoring weights
self.score_weights = {
'speed_compliance': 0.25,
'smooth_driving': 0.25,
'safe_following': 0.20,
'turn_safety': 0.15,
'event_frequency': 0.15
}
# Coaching templates
self.coaching_templates = {
'speeding': [
"Maintain speed within posted limits",
"Use cruise control on highways",
"Check speedometer regularly"
],
'hard_braking': [
"Increase following distance",
"Anticipate traffic changes",
"Brake gradually and smoothly"
],
'rapid_acceleration': [
"Accelerate smoothly and gradually",
"Plan lane changes in advance",
"Avoid aggressive driving"
],
'sharp_turns': [
"Reduce speed before turns",
"Use proper turning technique",
"Check blind spots carefully"
]
}
def analyze_trip(self, trip_data: TripData) -> BehaviorAnalysisResult:
"""Analyze a single trip for driving behavior"""
# Calculate detailed metrics
metrics = self._calculate_detailed_metrics(trip_data)
# Detect violations
violations = self._detect_violations(trip_data, metrics)
# Calculate safety score
safety_score = self._calculate_safety_score(metrics, violations)
# Determine risk level
risk_level = self._determine_risk_level(safety_score, violations)
# Generate coaching tips
coaching_tips = self._generate_coaching_tips(violations, metrics)
# Identify improvement areas
improvement_areas = self._identify_improvement_areas(metrics)
# Analyze trends (if historical data available)
trend_analysis = self._analyze_trends(trip_data.driver_id, metrics)
# Store trip data for future analysis
self.driver_history[trip_data.driver_id].append(trip_data)
return BehaviorAnalysisResult(
safety_score=safety_score,
risk_level=risk_level,
violations=violations,
coaching_tips=coaching_tips,
improvement_areas=improvement_areas,
detailed_metrics=metrics,
trend_analysis=trend_analysis
)
def _calculate_detailed_metrics(self, trip_data: TripData) -> Dict[str,
float]:
"""Calculate detailed driving metrics"""
metrics = {}
if not trip_data.gps_points:
return metrics
# Speed analysis
speeds = [point.speed for point in trip_data.gps_points if
point.speed > 0]
if speeds:
metrics['avg_speed'] = statistics.mean(speeds)
metrics['max_speed'] = max(speeds)
metrics['speed_variance'] = statistics.variance(speeds) if
len(speeds) > 1 else 0
# Acceleration analysis
accelerations = self._calculate_accelerations(trip_data.gps_points)
if accelerations:
metrics['avg_acceleration'] = statistics.mean([abs(a) for a in
accelerations])
metrics['max_acceleration'] = max(accelerations)
metrics['min_acceleration'] = min(accelerations)
metrics['acceleration_variance'] =
statistics.variance(accelerations) if len(accelerations) > 1 else 0
# Turn analysis
turn_rates = self._calculate_turn_rates(trip_data.gps_points)
if turn_rates:
metrics['avg_turn_rate'] = statistics.mean([abs(t) for t in
turn_rates])
metrics['max_turn_rate'] = max([abs(t) for t in turn_rates])
# Speed limit compliance
if trip_data.speed_limit_data:
speed_violations = []
for i, point in enumerate(trip_data.gps_points):
if i < len(trip_data.speed_limit_data):
speed_limit = trip_data.speed_limit_data[i]
if point.speed > speed_limit:
speed_violations.append(point.speed - speed_limit)
if speed_violations:
metrics['avg_speed_violation'] =
statistics.mean(speed_violations)
metrics['max_speed_violation'] = max(speed_violations)
metrics['speed_violation_frequency'] = len(speed_violations)
/ len(trip_data.gps_points)
else:
metrics['avg_speed_violation'] = 0
metrics['max_speed_violation'] = 0
metrics['speed_violation_frequency'] = 0
# Event analysis
event_counts = defaultdict(int)
for event in trip_data.events:
event_counts[event.event_type.value] += 1
metrics['total_events'] = len(trip_data.events)
metrics['events_per_km'] = len(trip_data.events) /
max(trip_data.distance, 0.1)
for event_type, count in event_counts.items():
metrics[f'{event_type}_count'] = count
metrics[f'{event_type}_per_km'] = count / max(trip_data.distance,
0.1)
# Smoothness metrics
if accelerations:
# Calculate jerk (rate of change of acceleration)
jerks = []
for i in range(1, len(accelerations)):
time_diff = 1.0 # Assume 1 second intervals
jerk = (accelerations[i] - accelerations[i-1]) / time_diff
jerks.append(abs(jerk))
if jerks:
metrics['avg_jerk'] = statistics.mean(jerks)
metrics['max_jerk'] = max(jerks)
# Efficiency metrics
if trip_data.duration > 0:
metrics['avg_speed_efficiency'] = trip_data.distance /
(trip_data.duration / 3600) # km/h
return metrics
def _calculate_accelerations(self, gps_points: List[GPSPoint]) ->
List[float]:
"""Calculate acceleration values from GPS data"""
accelerations = []
for i in range(1, len(gps_points)):
prev_point = gps_points[i-1]
curr_point = gps_points[i]
time_diff = curr_point.timestamp - prev_point.timestamp
if time_diff > 0:
speed_diff = (curr_point.speed - prev_point.speed) * 1000 /
3600 # Convert km/h to m/s
acceleration = speed_diff / time_diff
accelerations.append(acceleration)
return accelerations
def _calculate_turn_rates(self, gps_points: List[GPSPoint]) ->
List[float]:
"""Calculate turn rates from GPS heading data"""
turn_rates = []
for i in range(1, len(gps_points)):
prev_point = gps_points[i-1]
curr_point = gps_points[i]
time_diff = curr_point.timestamp - prev_point.timestamp
if time_diff > 0:
heading_diff = curr_point.heading - prev_point.heading
# Handle heading wraparound (0-360 degrees)
if heading_diff > 180:
heading_diff -= 360
elif heading_diff < -180:
heading_diff += 360
turn_rate = heading_diff / time_diff
turn_rates.append(turn_rate)
return turn_rates
def _detect_violations(self, trip_data: TripData, metrics: Dict[str,
float]) -> List[str]:
"""Detect driving violations based on metrics and thresholds"""
violations = []
# Speed violations
if metrics.get('max_speed_violation', 0) >
self.safety_thresholds['max_speed_over_limit']:
violations.append('speeding')
# Acceleration violations
if metrics.get('max_acceleration', 0) >
self.safety_thresholds['max_acceleration']:
violations.append('rapid_acceleration')
if metrics.get('min_acceleration', 0) <
self.safety_thresholds['max_deceleration']:
violations.append('hard_braking')
# Turn violations
if metrics.get('max_turn_rate', 0) >
self.safety_thresholds['max_turn_rate']:
violations.append('sharp_turns')
# Event-based violations
for event in trip_data.events:
if event.event_type == EventType.HARD_BRAKE and event.severity >
0.7:
if 'hard_braking' not in violations:
violations.append('hard_braking')
elif event.event_type == EventType.RAPID_ACCELERATION and
event.severity > 0.7:
if 'rapid_acceleration' not in violations:
violations.append('rapid_acceleration')
elif event.event_type == EventType.SPEEDING:
if 'speeding' not in violations:
violations.append('speeding')
elif event.event_type == EventType.PHONE_USE:
violations.append('distracted_driving')
# High event frequency
if metrics.get('events_per_km', 0) > 2.0:
violations.append('frequent_violations')
return violations
def _calculate_safety_score(self, metrics: Dict[str, float], violations:
List[str]) -> int:
"""Calculate overall safety score (0-100)"""
base_score = 100
# Speed compliance score
speed_score = 100
if metrics.get('speed_violation_frequency', 0) > 0:
speed_score = max(0, 100 - metrics['speed_violation_frequency'] *
200)
# Smooth driving score
smooth_score = 100
if metrics.get('acceleration_variance', 0) > 2.0:
smooth_score = max(0, 100 - (metrics['acceleration_variance'] -
2.0) * 20)
# Turn safety score
turn_score = 100
if metrics.get('max_turn_rate', 0) > 10:
turn_score = max(0, 100 - (metrics['max_turn_rate'] - 10) * 5)
# Event frequency score
event_score = 100
if metrics.get('events_per_km', 0) > 0:
event_score = max(0, 100 - metrics['events_per_km'] * 30)
# Following distance score (simplified)
following_score = 90 # Default good score
# Weighted combination
weighted_score = (
speed_score * self.score_weights['speed_compliance'] +
smooth_score * self.score_weights['smooth_driving'] +
following_score * self.score_weights['safe_following'] +
turn_score * self.score_weights['turn_safety'] +
event_score * self.score_weights['event_frequency']
)
# Apply violation penalties
violation_penalty = len(violations) * 5
final_score = max(0, int(weighted_score - violation_penalty))
return min(100, final_score)
def _determine_risk_level(self, safety_score: int, violations: List[str])
-> RiskLevel:
"""Determine risk level based on safety score and violations"""
critical_violations = ['distracted_driving', 'frequent_violations']
if any(v in violations for v in critical_violations) or safety_score
< 50:
return RiskLevel.CRITICAL
elif safety_score < 70 or len(violations) >= 3:
return RiskLevel.HIGH
elif safety_score < 85 or len(violations) >= 1:
return RiskLevel.MEDIUM
else:
return RiskLevel.LOW
def _generate_coaching_tips(self, violations: List[str], metrics:
Dict[str, float]) -> List[str]:
"""Generate personalized coaching tips"""
tips = []
for violation in violations:
if violation in self.coaching_templates:
# Select most relevant tip based on severity
violation_tips = self.coaching_templates[violation]
tips.extend(violation_tips[:2]) # Take top 2 tips
# Add general tips based on metrics
if metrics.get('acceleration_variance', 0) > 3.0:
tips.append("Practice smooth acceleration and deceleration")
if metrics.get('speed_variance', 0) > 100:
tips.append("Maintain more consistent speed")
if metrics.get('avg_jerk', 0) > 2.0:
tips.append("Focus on smoother driving transitions")
# Remove duplicates and limit to top 5
unique_tips = list(dict.fromkeys(tips))
return unique_tips[:5]
def _identify_improvement_areas(self, metrics: Dict[str, float]) ->
List[str]:
"""Identify key areas for improvement"""
areas = []
if metrics.get('speed_violation_frequency', 0) > 0.1:
areas.append('speed_management')
if metrics.get('acceleration_variance', 0) > 2.0:
areas.append('acceleration_control')
if metrics.get('max_turn_rate', 0) > 12:
areas.append('turning_technique')
if metrics.get('events_per_km', 0) > 1.0:
areas.append('hazard_awareness')
if metrics.get('avg_jerk', 0) > 1.5:
areas.append('driving_smoothness')
return areas
def _analyze_trends(self, driver_id: str, current_metrics: Dict[str,
float]) -> Dict[str, str]:
"""Analyze trends in driver behavior"""
trends = {}
if driver_id not in self.driver_history or
len(self.driver_history[driver_id]) < 2:
return {"trend_analysis": "insufficient_data"}
# Get recent trips for comparison
recent_trips = self.driver_history[driver_id][-5:] # Last 5 trips
# Calculate historical averages
historical_metrics = defaultdict(list)
for trip in recent_trips:
trip_metrics = self._calculate_detailed_metrics(trip)
for key, value in trip_metrics.items():
historical_metrics[key].append(value)
# Compare current metrics with historical averages
for metric, values in historical_metrics.items():
if len(values) >= 2 and metric in current_metrics:
historical_avg = statistics.mean(values[:-1]) # Exclude
current trip
current_value = current_metrics[metric]
if historical_avg > 0:
change_percent = (current_value - historical_avg) /
historical_avg * 100
if abs(change_percent) > 20: # Significant change
if change_percent > 0:
trends[metric] =
f"increasing_{change_percent:.1f}%"
else:
trends[metric] =
f"decreasing_{abs(change_percent):.1f}%"
else:
trends[metric] = "stable"
return trends
def get_driver_profile(self, driver_id: str) -> Dict:
"""Get comprehensive driver profile and statistics"""
if driver_id not in self.driver_history:
return {"error": "No data available for driver"}
trips = self.driver_history[driver_id]
# Calculate aggregate statistics
all_metrics = []
all_violations = []
all_scores = []
for trip in trips:
metrics = self._calculate_detailed_metrics(trip)
violations = self._detect_violations(trip, metrics)
score = self._calculate_safety_score(metrics, violations)
all_metrics.append(metrics)
all_violations.extend(violations)
all_scores.append(score)
# Aggregate analysis
profile = {
"driver_id": driver_id,
"total_trips": len(trips),
"avg_safety_score": statistics.mean(all_scores) if all_scores
else 0,
"score_trend": "improving" if len(all_scores) >= 2 and
all_scores[-1] > all_scores[0] else "stable",
"most_common_violations":
self._get_most_common_violations(all_violations),
"total_distance": sum(trip.distance for trip in trips),
"total_time": sum(trip.duration for trip in trips),
"risk_level":
self._determine_risk_level(statistics.mean(all_scores) if all_scores else 0,
list(set(all_violations)))
}
return profile
def _get_most_common_violations(self, violations: List[str]) ->
List[Tuple[str, int]]:
"""Get most common violations with counts"""
violation_counts = defaultdict(int)
for violation in violations:
violation_counts[violation] += 1
return sorted(violation_counts.items(), key=lambda x: x[1],
reverse=True)[:5]
# Test the driver behavior analyzer
def test_driver_behavior_analyzer():
analyzer = DriverBehaviorAnalyzer()
# Create sample trip data
gps_points = []
base_time = time.time()
for i in range(60): # 1 minute of data, 1 point per second
lat = 37.7700 + i * 0.0001
lon = -122.4200 + i * 0.0001
speed = 50 + 10 * math.sin(i * 0.1) # Varying speed
heading = i * 2 # Gradual turn
gps_points.append(GPSPoint(
latitude=lat,
longitude=lon,
timestamp=base_time + i,
speed=max(0, speed),
heading=heading % 360
))
# Create sample events
events = [
DrivingEvent(
event_type=EventType.HARD_BRAKE,
timestamp=base_time + 20,
severity=0.8,
location=gps_points[20]
),
DrivingEvent(
event_type=EventType.SPEEDING,
timestamp=base_time + 35,
severity=0.6,
location=gps_points[35]
)
]
# Create trip data
trip_data = TripData(
driver_id="d123",
trip_id="t456",
start_time=base_time,
end_time=base_time + 60,
duration=60,
distance=2.5, # km
gps_points=gps_points,
events=events,
speed_limit_data=[50] * 60 # 50 km/h speed limit
)
print("Testing Driver Behavior Analyzer:")
print(f"Trip duration: {trip_data.duration}s")
print(f"Trip distance: {trip_data.distance} km")
print(f"GPS points: {len(trip_data.gps_points)}")
print(f"Events: {len(trip_data.events)}")
# Analyze trip
result = analyzer.analyze_trip(trip_data)
print(f"\nBehavior Analysis Result:")
print(f" Safety score: {result.safety_score}/100")
print(f" Risk level: {result.risk_level.value}")
print(f" Violations: {result.violations}")
print(f" Coaching tips: {result.coaching_tips}")
print(f" Improvement areas: {result.improvement_areas}")
print(f"\nDetailed Metrics:")
for metric, value in result.detailed_metrics.items():
if isinstance(value, float):
print(f" {metric}: {value:.2f}")
else:
print(f" {metric}: {value}")
print(f"\nTrend Analysis:")
for trend, value in result.trend_analysis.items():
print(f" {trend}: {value}")
# Test multiple trips for trend analysis
print(f"\nTesting multiple trips...")
for i in range(3):
# Create slightly different trip data
new_gps_points = []
for j in range(60):
lat = 37.7700 + j * 0.0001
lon = -122.4200 + j * 0.0001
speed = 45 + 15 * math.sin(j * 0.1) + i * 2 # Gradually
increasing speed
heading = j * 2
new_gps_points.append(GPSPoint(
latitude=lat,
longitude=lon,
timestamp=base_time + 3600 * (i + 1) + j,
speed=max(0, speed),
heading=heading % 360
))
new_trip = TripData(
driver_id="d123",
trip_id=f"t{456 + i + 1}",
start_time=base_time + 3600 * (i + 1),
end_time=base_time + 3600 * (i + 1) + 60,
duration=60,
distance=2.5,
gps_points=new_gps_points,
events=events[:1], # Fewer events over time
speed_limit_data=[50] * 60
)
analyzer.analyze_trip(new_trip)
# Get driver profile
profile = analyzer.get_driver_profile("d123")
print(f"\nDriver Profile:")
for key, value in profile.items():
if isinstance(value, float):
print(f" {key}: {value:.2f}")
else:
print(f" {key}: {value}")
test_driver_behavior_analyzer()
Key Insights:
• Real-time driving behavior monitoring using GPS and sensor data
• Multi-dimensional safety scoring with weighted factors
• Pattern recognition for violation detection and trend analysis
• Personalized coaching recommendations based on specific behaviors
• Comprehensive driver profiling for long-term improvement tracking
2.8 LinkedIn OA Problems {#linkedin-oa}
Problem 35: Professional Network Recommendation
Difficulty: Medium | Time Limit: 45 minutes | Company: LinkedIn
Problem Statement:
Design a professional network recommendation system that:
1. Suggests relevant connections based on mutual connections, industry, and skills
2. Ranks recommendations by relevance and potential value
3. Avoids suggesting inappropriate connections (competitors, blocked users)
4. Considers user preferences and interaction history
5. Provides explanations for why connections are recommended
Example:
Plain Text
Input:
user_profile = {
"user_id": "u123", "industry": "software", "skills": ["python", "ml",
"data"],
"connections": ["u456", "u789"], "company": "TechCorp", "location": "SF"
}
candidate_profiles = [
{"user_id": "u999", "industry": "software", "skills": ["python", "ai"],
"connections": ["u456"], "company": "DataInc", "location": "SF"}
]
Output: {
"recommendations": [
{"user_id": "u999", "score": 0.85, "reasons": ["mutual_connection",
"shared_skills"],
"explanation": "You both know John Doe and work in similar fields"}
]
}
Solution Approach:
This problem requires graph algorithms, similarity metrics, and recommendation systems.
Python
import math
import random
from typing import List, Dict, Set, Tuple, Optional
from dataclasses import dataclass, field
from enum import Enum
from collections import defaultdict, Counter
class Industry(Enum):
SOFTWARE = "software"
FINANCE = "finance"
HEALTHCARE = "healthcare"
EDUCATION = "education"
MARKETING = "marketing"
CONSULTING = "consulting"
class ConnectionReason(Enum):
MUTUAL_CONNECTION = "mutual_connection"
SHARED_SKILLS = "shared_skills"
SAME_COMPANY = "same_company"
SAME_INDUSTRY = "same_industry"
SAME_LOCATION = "same_location"
SIMILAR_EXPERIENCE = "similar_experience"
COMPLEMENTARY_SKILLS = "complementary_skills"
@dataclass
class UserProfile:
user_id: str
name: str
industry: Industry
skills: List[str]
connections: Set[str]
company: str
location: str
experience_years: int
job_title: str
education: List[str] = field(default_factory=list)
blocked_users: Set[str] = field(default_factory=set)
interaction_history: Dict[str, float] = field(default_factory=dict) #
user_id -> interaction_score
@dataclass
class ConnectionRecommendation:
user_id: str
score: float
reasons: List[ConnectionReason]
explanation: str
mutual_connections: List[str]
shared_skills: List[str]
confidence: float
class ProfessionalNetworkRecommender:
def __init__(self):
self.users: Dict[str, UserProfile] = {}
self.skill_similarity_cache: Dict[Tuple[str, str], float] = {}
# Recommendation weights
self.weights = {
'mutual_connections': 0.3,
'shared_skills': 0.25,
'industry_match': 0.15,
'location_match': 0.1,
'company_relevance': 0.1,
'experience_similarity': 0.05,
'interaction_history': 0.05
}
# Skill categories for better matching
self.skill_categories = {
'programming': ['python', 'java', 'javascript', 'c++', 'go',
'rust'],
'data_science': ['ml', 'ai', 'data_analysis', 'statistics',
'deep_learning'],
'management': ['leadership', 'project_management',
'team_building', 'strategy'],
'design': ['ui_ux', 'graphic_design', 'product_design',
'user_research'],
'marketing': ['digital_marketing', 'seo', 'content_marketing',
'social_media']
}
# Industry relationships (some industries work closely together)
self.industry_relationships = {
Industry.SOFTWARE: [Industry.FINANCE, Industry.HEALTHCARE],
Industry.FINANCE: [Industry.SOFTWARE, Industry.CONSULTING],
Industry.HEALTHCARE: [Industry.SOFTWARE, Industry.EDUCATION],
Industry.EDUCATION: [Industry.SOFTWARE, Industry.HEALTHCARE],
Industry.MARKETING: [Industry.SOFTWARE, Industry.CONSULTING],
Industry.CONSULTING: [Industry.FINANCE, Industry.MARKETING]
}
def add_user(self, user: UserProfile):
"""Add a user to the system"""
self.users[user.user_id] = user
def get_recommendations(self, user_id: str, max_recommendations: int =
10) -> List[ConnectionRecommendation]:
"""Get connection recommendations for a user"""
if user_id not in self.users:
return []
user = self.users[user_id]
candidates = self._find_candidates(user)
scored_candidates = []
for candidate_id in candidates:
if candidate_id == user_id or candidate_id in user.connections:
continue
candidate = self.users[candidate_id]
# Skip blocked users or users who blocked this user
if (candidate_id in user.blocked_users or
user_id in candidate.blocked_users):
continue
# Calculate recommendation score
score, reasons, details =
self._calculate_recommendation_score(user, candidate)
if score > 0.1: # Minimum threshold
recommendation = ConnectionRecommendation(
user_id=candidate_id,
score=score,
reasons=reasons,
explanation=self._generate_explanation(user, candidate,
reasons, details),
mutual_connections=details.get('mutual_connections', []),
shared_skills=details.get('shared_skills', []),
confidence=self._calculate_confidence(score, reasons)
)
scored_candidates.append(recommendation)
# Sort by score and return top recommendations
scored_candidates.sort(key=lambda x: x.score, reverse=True)
return scored_candidates[:max_recommendations]
def _find_candidates(self, user: UserProfile) -> Set[str]:
"""Find potential connection candidates"""
candidates = set()
# Friends of friends (2nd degree connections)
for connection_id in user.connections:
if connection_id in self.users:
connection = self.users[connection_id]
candidates.update(connection.connections)
# Users in same industry
for user_id, other_user in self.users.items():
if (other_user.industry == user.industry or
other_user.industry in
self.industry_relationships.get(user.industry, [])):
candidates.add(user_id)
# Users with similar skills
user_skills_set = set(user.skills)
for user_id, other_user in self.users.items():
other_skills_set = set(other_user.skills)
if len(user_skills_set.intersection(other_skills_set)) >= 2:
candidates.add(user_id)
# Users in same location
for user_id, other_user in self.users.items():
if other_user.location == user.location:
candidates.add(user_id)
return candidates
def _calculate_recommendation_score(self, user: UserProfile, candidate:
UserProfile) -> Tuple[float, List[ConnectionReason], Dict]:
"""Calculate recommendation score and reasons"""
score = 0.0
reasons = []
details = {}
# Mutual connections score
mutual_connections =
user.connections.intersection(candidate.connections)
if mutual_connections:
mutual_score = min(1.0, len(mutual_connections) / 5) # Cap at 5
mutual connections
score += mutual_score * self.weights['mutual_connections']
reasons.append(ConnectionReason.MUTUAL_CONNECTION)
details['mutual_connections'] = list(mutual_connections)
# Shared skills score
user_skills = set(user.skills)
candidate_skills = set(candidate.skills)
shared_skills = user_skills.intersection(candidate_skills)
if shared_skills:
skill_score = len(shared_skills) / max(len(user_skills),
len(candidate_skills))
score += skill_score * self.weights['shared_skills']
reasons.append(ConnectionReason.SHARED_SKILLS)
details['shared_skills'] = list(shared_skills)
# Complementary skills (skills that work well together)
complementary_score =
self._calculate_complementary_skills_score(user.skills, candidate.skills)
if complementary_score > 0.3:
score += complementary_score * 0.1 # Lower weight for
complementary skills
reasons.append(ConnectionReason.COMPLEMENTARY_SKILLS)
# Industry match
if user.industry == candidate.industry:
score += 1.0 * self.weights['industry_match']
reasons.append(ConnectionReason.SAME_INDUSTRY)
elif candidate.industry in
self.industry_relationships.get(user.industry, []):
score += 0.5 * self.weights['industry_match']
reasons.append(ConnectionReason.SAME_INDUSTRY)
# Location match
if user.location == candidate.location:
score += 1.0 * self.weights['location_match']
reasons.append(ConnectionReason.SAME_LOCATION)
# Company relevance
if user.company == candidate.company:
score += 1.0 * self.weights['company_relevance']
reasons.append(ConnectionReason.SAME_COMPANY)
elif self._are_companies_related(user.company, candidate.company):
score += 0.3 * self.weights['company_relevance']
# Experience similarity
experience_diff = abs(user.experience_years -
candidate.experience_years)
if experience_diff <= 2:
experience_score = 1.0 - (experience_diff / 10)
score += experience_score * self.weights['experience_similarity']
reasons.append(ConnectionReason.SIMILAR_EXPERIENCE)
# Interaction history
if candidate.user_id in user.interaction_history:
interaction_score = min(1.0,
user.interaction_history[candidate.user_id])
score += interaction_score * self.weights['interaction_history']
return score, reasons, details
def _calculate_complementary_skills_score(self, skills1: List[str],
skills2: List[str]) -> float:
"""Calculate how well skills complement each other"""
complementary_pairs = [
('python', 'data_analysis'),
('ui_ux', 'frontend'),
('backend', 'database'),
('ml', 'statistics'),
('project_management', 'leadership'),
('marketing', 'analytics'),
('design', 'user_research')
]
skills1_set = set(skills1)
skills2_set = set(skills2)
complementary_score = 0.0
for skill1, skill2 in complementary_pairs:
if skill1 in skills1_set and skill2 in skills2_set:
complementary_score += 0.2
elif skill2 in skills1_set and skill1 in skills2_set:
complementary_score += 0.2
return min(1.0, complementary_score)
def _are_companies_related(self, company1: str, company2: str) -> bool:
"""Check if companies are related (simplified)"""
# In a real system, this would use a company relationship database
tech_companies = ['google', 'microsoft', 'apple', 'amazon', 'meta']
finance_companies = ['goldman', 'jpmorgan', 'blackrock', 'citadel']
company1_lower = company1.lower()
company2_lower = company2.lower()
# Check if both are in the same industry cluster
if (any(comp in company1_lower for comp in tech_companies) and
any(comp in company2_lower for comp in tech_companies)):
return True
if (any(comp in company1_lower for comp in finance_companies) and
any(comp in company2_lower for comp in finance_companies)):
return True
return False
def _generate_explanation(self, user: UserProfile, candidate:
UserProfile,
reasons: List[ConnectionReason], details: Dict) -
> str:
"""Generate human-readable explanation for recommendation"""
explanations = []
if ConnectionReason.MUTUAL_CONNECTION in reasons:
mutual_count = len(details.get('mutual_connections', []))
if mutual_count == 1:
explanations.append("You have 1 mutual connection")
else:
explanations.append(f"You have {mutual_count} mutual
connections")
if ConnectionReason.SHARED_SKILLS in reasons:
shared_skills = details.get('shared_skills', [])
if len(shared_skills) <= 2:
skills_str = " and ".join(shared_skills)
explanations.append(f"You both have experience in
{skills_str}")
else:
explanations.append(f"You share {len(shared_skills)}
professional skills")
if ConnectionReason.SAME_COMPANY in reasons:
explanations.append(f"You both work at {candidate.company}")
if ConnectionReason.SAME_INDUSTRY in reasons:
explanations.append(f"You both work in
{candidate.industry.value}")
if ConnectionReason.SAME_LOCATION in reasons:
explanations.append(f"You're both located in
{candidate.location}")
if ConnectionReason.COMPLEMENTARY_SKILLS in reasons:
explanations.append("Your skills complement each other well")
if not explanations:
explanations.append("This person might be a valuable addition to
your network")
return ". ".join(explanations[:2]) # Limit to 2 main reasons
def _calculate_confidence(self, score: float, reasons:
List[ConnectionReason]) -> float:
"""Calculate confidence in the recommendation"""
base_confidence = min(1.0, score)
# Higher confidence with more reasons
reason_bonus = min(0.2, len(reasons) * 0.05)
# Higher confidence for strong signals
strong_signals = [
ConnectionReason.MUTUAL_CONNECTION,
ConnectionReason.SAME_COMPANY,
ConnectionReason.SHARED_SKILLS
]
strong_signal_bonus = sum(0.1 for reason in reasons if reason in
strong_signals)
confidence = base_confidence + reason_bonus + strong_signal_bonus
return min(1.0, confidence)
def update_interaction(self, user_id: str, target_user_id: str,
interaction_type: str):
"""Update interaction history between users"""
if user_id not in self.users:
return
interaction_scores = {
'profile_view': 0.1,
'message_sent': 0.3,
'connection_request': 0.5,
'endorsement': 0.2,
'recommendation': 0.4
}
score = interaction_scores.get(interaction_type, 0.1)
if target_user_id not in self.users[user_id].interaction_history:
self.users[user_id].interaction_history[target_user_id] = 0
# Accumulate interaction score with decay
current_score =
self.users[user_id].interaction_history[target_user_id]
self.users[user_id].interaction_history[target_user_id] = min(1.0,
current_score + score)
def get_recommendation_analytics(self, user_id: str) -> Dict:
"""Get analytics about recommendations for a user"""
if user_id not in self.users:
return {}
recommendations = self.get_recommendations(user_id,
max_recommendations=50)
if not recommendations:
return {"total_recommendations": 0}
# Analyze recommendation reasons
reason_counts = Counter()
for rec in recommendations:
for reason in rec.reasons:
reason_counts[reason.value] += 1
# Analyze score distribution
scores = [rec.score for rec in recommendations]
# Analyze industries
industries = []
for rec in recommendations:
if rec.user_id in self.users:
industries.append(self.users[rec.user_id].industry.value)
industry_counts = Counter(industries)
return {
"total_recommendations": len(recommendations),
"avg_score": sum(scores) / len(scores),
"top_reasons": reason_counts.most_common(3),
"score_distribution": {
"high_quality": len([s for s in scores if s > 0.7]),
"medium_quality": len([s for s in scores if 0.4 <= s <=
0.7]),
"low_quality": len([s for s in scores if s < 0.4])
},
"industry_diversity": len(industry_counts),
"top_industries": industry_counts.most_common(3)
}
# Test the professional network recommender
def test_professional_network_recommender():
recommender = ProfessionalNetworkRecommender()
# Create sample users
users = [
UserProfile(
user_id="u1",
name="Alice Johnson",
industry=Industry.SOFTWARE,
skills=["python", "ml", "data_analysis"],
connections={"u2", "u3"},
company="TechCorp",
location="San Francisco",
experience_years=5,
job_title="Data Scientist"
),
UserProfile(
user_id="u2",
name="Bob Smith",
industry=Industry.SOFTWARE,
skills=["java", "backend", "database"],
connections={"u1", "u4"},
company="DataInc",
location="San Francisco",
experience_years=7,
job_title="Backend Engineer"
),
UserProfile(
user_id="u3",
name="Carol Davis",
industry=Industry.FINANCE,
skills=["python", "statistics", "risk_analysis"],
connections={"u1", "u5"},
company="FinanceCorporation",
location="New York",
experience_years=6,
job_title="Quantitative Analyst"
),
UserProfile(
user_id="u4",
name="David Wilson",
industry=Industry.SOFTWARE,
skills=["python", "ai", "deep_learning"],
connections={"u2", "u6"},
company="AIStartup",
location="San Francisco",
experience_years=4,
job_title="ML Engineer"
),
UserProfile(
user_id="u5",
name="Eve Brown",
industry=Industry.MARKETING,
skills=["digital_marketing", "analytics", "python"],
connections={"u3", "u6"},
company="MarketingAgency",
location="Los Angeles",
experience_years=5,
job_title="Marketing Analyst"
),
UserProfile(
user_id="u6",
name="Frank Miller",
industry=Industry.SOFTWARE,
skills=["ui_ux", "product_design", "user_research"],
connections={"u4", "u5"},
company="DesignStudio",
location="San Francisco",
experience_years=6,
job_title="Product Designer"
)
]
# Add users to recommender
for user in users:
recommender.add_user(user)
print("Testing Professional Network Recommender:")
print(f"Total users: {len(users)}")
# Test recommendations for Alice (u1)
print(f"\nRecommendations for Alice Johnson (Data Scientist):")
recommendations = recommender.get_recommendations("u1",
max_recommendations=5)
for i, rec in enumerate(recommendations, 1):
user_name = recommender.users[rec.user_id].name
job_title = recommender.users[rec.user_id].job_title
print(f"\n{i}. {user_name} ({job_title})")
print(f" Score: {rec.score:.3f}")
print(f" Confidence: {rec.confidence:.3f}")
print(f" Reasons: {[r.value for r in rec.reasons]}")
print(f" Explanation: {rec.explanation}")
if rec.mutual_connections:
mutual_names = [recommender.users[uid].name for uid in
rec.mutual_connections]
print(f" Mutual connections: {mutual_names}")
if rec.shared_skills:
print(f" Shared skills: {rec.shared_skills}")
# Test interaction updates
print(f"\nTesting interaction updates...")
recommender.update_interaction("u1", "u4", "profile_view")
recommender.update_interaction("u1", "u4", "message_sent")
# Get updated recommendations
updated_recommendations = recommender.get_recommendations("u1",
max_recommendations=5)
# Find u4 in recommendations
u4_rec = next((rec for rec in updated_recommendations if rec.user_id ==
"u4"), None)
if u4_rec:
print(f"Updated score for David Wilson: {u4_rec.score:.3f}")
# Test analytics
analytics = recommender.get_recommendation_analytics("u1")
print(f"\nRecommendation Analytics for Alice:")
print(f" Total recommendations: {analytics['total_recommendations']}")
print(f" Average score: {analytics['avg_score']:.3f}")
print(f" Top reasons: {analytics['top_reasons']}")
print(f" Score distribution: {analytics['score_distribution']}")
print(f" Industry diversity: {analytics['industry_diversity']}")
print(f" Top industries: {analytics['top_industries']}")
# Test recommendations for different user
print(f"\nRecommendations for Bob Smith (Backend Engineer):")
bob_recommendations = recommender.get_recommendations("u2",
max_recommendations=3)
for i, rec in enumerate(bob_recommendations, 1):
user_name = recommender.users[rec.user_id].name
job_title = recommender.users[rec.user_id].job_title
print(f"{i}. {user_name} ({job_title}) - Score: {rec.score:.3f}")
print(f" {rec.explanation}")
test_professional_network_recommender()
Key Insights:
• Multi-factor recommendation scoring considering mutual connections, skills, industry,
and location
• Graph-based candidate discovery through network traversal
• Personalized explanations for recommendation transparency
• Interaction history tracking for improved recommendations over time
• Analytics and insights for recommendation system performance monitoring

Tech Companies SWE Online Assessment: Real Coding Problems

摘要 Summary

相关文章 Related Articles

Deloitte UK Consultant Interview Questions: Complete Guide with Expert Answers

JPMorgan Data Science Analyst Interview: Technical Questions & Solutions