technicalmedium

Implement a Python function that efficiently calculates the PageRank scores for a given directed graph represented as an adjacency list, considering a damping factor and a specified number of iterations.

technical screen · 10-15 minutes

How to structure your answer

MECE Framework: 1. Input Validation: Check for valid graph structure (adjacency list), damping factor (0-1), and positive iterations. 2. Initialization: Assign equal PageRank to all nodes. 3. Iterative Calculation: For each iteration, update PageRank for each node by summing contributions from incoming links, weighted by their PageRank and the damping factor. 4. Dangling Nodes: Handle nodes with no outgoing links by distributing their PageRank equally among all nodes. 5. Normalization: Ensure PageRank scores sum to 1. 6. Output: Return the final PageRank scores as a dictionary or list. This ensures all edge cases are covered and the algorithm converges efficiently.

Sample answer

import numpy as np

def calculate_pagerank(graph: dict, damping_factor: float = 0.85, iterations: int = 100):
    if not (0 <= damping_factor <= 1): raise ValueError("Damping factor must be between 0 and 1.")
    if iterations <= 0: raise ValueError("Iterations must be positive.")
    
    num_nodes = len(graph)
    if num_nodes == 0: return {}

    # Initialize PageRank scores equally
    pr = np.ones(num_nodes) / num_nodes
    
    # Create adjacency matrix and handle dangling nodes
    adj_matrix = np.zeros((num_nodes, num_nodes))
    out_degrees = np.zeros(num_nodes)
    node_to_idx = {node: i for i, node in enumerate(graph.keys())}

    for node, neighbors in graph.items():
        u_idx = node_to_idx[node]
        out_degrees[u_idx] = len(neighbors)
        if len(neighbors) > 0:
            for neighbor in neighbors:
                v_idx = node_to_idx[neighbor]
                adj_matrix[v_idx, u_idx] = 1 / len(neighbors)

    for _ in range(iterations):
        new_pr = np.zeros(num_nodes)
        dangling_sum = np.sum(pr[out_degrees == 0])
        
        for i in range(num_nodes):
            # Contribution from non-dangling nodes
            sum_incoming = np.sum(adj_matrix[i, :] * pr)
            
            # Contribution from dangling nodes
            dangling_contribution = damping_factor * dangling_sum / num_nodes
            
            new_pr[i] = damping_factor * sum_incoming + (1 - damping_factor) / num_nodes + dangling_contribution
        
        pr = new_pr / np.sum(new_pr) # Normalize

    return {node: pr[node_to_idx[node]] for node in graph.keys()}

Key points to mention

• Initialization of PageRank scores (uniform distribution).
• Iterative update formula for PageRank, incorporating the damping factor.
• Handling of dangling nodes (nodes with no outgoing links) to prevent PageRank 'sink' issues.
• Normalization of PageRank scores (summing to 1).
• The concept of convergence and why a fixed number of iterations is often used in practice, or alternatively, a convergence threshold.

Common mistakes to avoid

✗ Incorrectly applying the damping factor or the sum over incoming links.
✗ Failure to handle dangling nodes, leading to PageRank 'leakage' or incorrect distributions.
✗ Off-by-one errors in out-degree calculations or when iterating through graph structures.
✗ Inefficient graph traversal or data structure choices for large graphs.
✗ Not normalizing the initial or final PageRank scores.

Back to all questions Practice with AI mock