Implement a Python function that efficiently calculates the PageRank scores for a given directed graph represented as an adjacency list, considering a damping factor and a specified number of iterations.
technical screen · 10-15 minutes
How to structure your answer
MECE Framework: 1. Input Validation: Check for valid graph structure (adjacency list), damping factor (0-1), and positive iterations. 2. Initialization: Assign equal PageRank to all nodes. 3. Iterative Calculation: For each iteration, update PageRank for each node by summing contributions from incoming links, weighted by their PageRank and the damping factor. 4. Dangling Nodes: Handle nodes with no outgoing links by distributing their PageRank equally among all nodes. 5. Normalization: Ensure PageRank scores sum to 1. 6. Output: Return the final PageRank scores as a dictionary or list. This ensures all edge cases are covered and the algorithm converges efficiently.
Sample answer
import numpy as np
def calculate_pagerank(graph: dict, damping_factor: float = 0.85, iterations: int = 100):
if not (0 <= damping_factor <= 1): raise ValueError("Damping factor must be between 0 and 1.")
if iterations <= 0: raise ValueError("Iterations must be positive.")
num_nodes = len(graph)
if num_nodes == 0: return {}
# Initialize PageRank scores equally
pr = np.ones(num_nodes) / num_nodes
# Create adjacency matrix and handle dangling nodes
adj_matrix = np.zeros((num_nodes, num_nodes))
out_degrees = np.zeros(num_nodes)
node_to_idx = {node: i for i, node in enumerate(graph.keys())}
for node, neighbors in graph.items():
u_idx = node_to_idx[node]
out_degrees[u_idx] = len(neighbors)
if len(neighbors) > 0:
for neighbor in neighbors:
v_idx = node_to_idx[neighbor]
adj_matrix[v_idx, u_idx] = 1 / len(neighbors)
for _ in range(iterations):
new_pr = np.zeros(num_nodes)
dangling_sum = np.sum(pr[out_degrees == 0])
for i in range(num_nodes):
# Contribution from non-dangling nodes
sum_incoming = np.sum(adj_matrix[i, :] * pr)
# Contribution from dangling nodes
dangling_contribution = damping_factor * dangling_sum / num_nodes
new_pr[i] = damping_factor * sum_incoming + (1 - damping_factor) / num_nodes + dangling_contribution
pr = new_pr / np.sum(new_pr) # Normalize
return {node: pr[node_to_idx[node]] for node in graph.keys()}
Key points to mention
- • Initialization of PageRank scores (uniform distribution).
- • Iterative update formula for PageRank, incorporating the damping factor.
- • Handling of dangling nodes (nodes with no outgoing links) to prevent PageRank 'sink' issues.
- • Normalization of PageRank scores (summing to 1).
- • The concept of convergence and why a fixed number of iterations is often used in practice, or alternatively, a convergence threshold.
Common mistakes to avoid
- ✗ Incorrectly applying the damping factor or the sum over incoming links.
- ✗ Failure to handle dangling nodes, leading to PageRank 'leakage' or incorrect distributions.
- ✗ Off-by-one errors in out-degree calculations or when iterating through graph structures.
- ✗ Inefficient graph traversal or data structure choices for large graphs.
- ✗ Not normalizing the initial or final PageRank scores.