Soffio

Summary

Graph databases treat relationships as first-class citizens, making connected data queries natural and performant.

Core concepts:

  • Property graph model: nodes (entities) + relationships (connections)
  • Cypher query language for pattern matching
  • Built-in graph algorithms

Use cases: Social networks, fraud detection, recommendation engines, knowledge graphs

vs. Relational: Graphs excel at deep traversals and relationship queries; relational wins at aggregations and transactions.

When to use: Multi-hop queries, flexible schema, relationship-centric data model.

Graph Databases: When Relations Matter Most

Graph database visualization

Introduction: The World is a Graph

The world is inherently connected. People know people. Products relate to categories. Transactions flow between accounts. Yet for decades, we forced this interconnected reality into the rigid rows and columns of relational databases.

Graph databases flip the script: relationships are first-class citizens. Instead of foreign keys and JOIN operations, graphs represent connections directly, making relationship queries that would require complex multi-table JOINs trivial.

Consider finding friends-of-friends in a social network:

-- Relational (MySQL): Complex self-join
SELECT DISTINCT u.name
FROM users u
JOIN friendships f1 ON u.id = f1.user2_id
JOIN friendships f2 ON f1.user1_id = f2.user2_id
WHERE f2.user1_id = 123
  AND u.id != 123;
-- Performance degrades exponentially with depth
-- Graph (Neo4j): Natural traversal
MATCH (me:User {id: 123})-[:FRIEND]->()-[:FRIEND]->(friend)
RETURN DISTINCT friend.name;
-- Performance is predictable, scales linearly

This article explores when and how to use graph databases.

Graph Fundamentals

The Property Graph Model

Most graph databases use the property graph model:

interface PropertyGraph {
  nodes: Node[];
  relationships: Relationship[];
}

interface Node {
  id: string;
  labels: string[];  // e.g., ["Person", "Employee"]
  properties: Record<string, any>;
}

interface Relationship {
  id: string;
  type: string;  // e.g., "KNOWS", "WORKS_FOR"
  startNode: string;  // node ID
  endNode: string;    // node ID
  properties: Record<string, any>;
}

// Example: Social network
const alice: Node = {
  id: "1",
  labels: ["Person"],
  properties: { name: "Alice", age: 30, city: "NYC" }
};

const bob: Node = {
  id: "2",
  labels: ["Person"],
  properties: { name: "Bob", age: 25, city: "SF" }
};

const knows: Relationship = {
  id: "r1",
  type: "KNOWS",
  startNode: "1",  // Alice
  endNode: "2",    // Bob
  properties: { since: 2015, strength: 0.8 }
};

Property graph model

When to Choose Graph Databases

Use Case 1: Social Networks

Graph databases excel at social network queries:

-- Friend recommendations: friends of friends who are not my friends
MATCH (me:Person {name: "Alice"})-[:FRIEND]->(friend)-[:FRIEND]->(fof)
WHERE NOT (me)-[:FRIEND]->(fof) AND fof <> me
RETURN fof.name, COUNT(*) AS mutualFriends
ORDER BY mutualFriends DESC
LIMIT 10;

-- Shortest path between two people
MATCH path = shortestPath(
  (alice:Person {name: "Alice"})-[:FRIEND*]-(bob:Person {name: "Bob"})
)
RETURN [node IN nodes(path) | node.name] AS connectionPath;

-- Influential users (PageRank)
CALL algo.pageRank.stream("Person", "FRIEND")
YIELD nodeId, score
RETURN algo.getNodeById(nodeId).name AS person, score
ORDER BY score DESC
LIMIT 10;

Why graphs win: Multi-hop friend queries in relational databases require recursive CTEs or multiple self-joins, performance degrading exponentially.

Use Case 2: Fraud Detection

Detect fraud rings by finding suspicious connection patterns:

-- Find accounts sharing suspicious attributes
MATCH (a1:Account)-[:HAS_IP]->(ip:IP)<-[:HAS_IP]-(a2:Account)
MATCH (a1)-[:HAS_DEVICE]->(device:Device)<-[:HAS_DEVICE]-(a2)
WHERE a1 <> a2
WITH a1, a2, COUNT(DISTINCT ip) + COUNT(DISTINCT device) AS sharedAttributes
WHERE sharedAttributes >= 2
RETURN a1.id, a2.id, sharedAttributes
ORDER BY sharedAttributes DESC;

-- Detect circular money flow (money laundering)
MATCH path = (start:Account)-[:TRANSFERRED*3..6]->(start)
WHERE ALL(r IN relationships(path) WHERE r.amount > 10000)
RETURN path, [r IN relationships(path) | r.amount] AS amounts;

Why graphs win: Pattern matching across relationships is graph databases natural strength.

Use Case 3: Recommendation Engines

-- Collaborative filtering: users who liked X also liked Y
MATCH (user:User {id: 123})-[:LIKED]->(item:Product)<-[:LIKED]-(other:User)
MATCH (other)-[:LIKED]->(recommendation:Product)
WHERE NOT (user)-[:LIKED]->(recommendation)
RETURN recommendation.name, COUNT(*) AS score
ORDER BY score DESC
LIMIT 10;

-- Content-based: similar items
MATCH (item:Product {id: 456})-[:IN_CATEGORY]->(cat:Category)<-[:IN_CATEGORY]-(similar:Product)
WHERE item <> similar
RETURN similar.name, COUNT(cat) AS sharedCategories
ORDER BY sharedCategories DESC;

Graph Algorithms

Graph databases provide built-in algorithms:

1. Shortest Path

-- Dijkstra for weighted paths
MATCH (start:Station {name: "A"}), (end:Station {name: "Z"})
CALL algo.shortestPath.stream(start, end, "distance")
YIELD nodeId, cost
RETURN algo.getNodeById(nodeId).name AS station, cost;

2. Community Detection

-- Louvain algorithm for community detection
CALL algo.louvain.stream("Person", "FRIEND")
YIELD nodeId, community
RETURN community, COLLECT(algo.getNodeById(nodeId).name) AS members
ORDER BY SIZE(members) DESC;

3. Centrality Measures

-- Betweenness centrality: identify bridges
CALL algo.betweenness.stream("Person", "FRIEND")
YIELD nodeId, centrality
RETURN algo.getNodeById(nodeId).name, centrality
ORDER BY centrality DESC;

Graph algorithms

Implementation: Building a Graph Database

// Simplified in-memory graph database
class GraphDB {
  private nodes = new Map<string, Node>();
  private relationships = new Map<string, Relationship>();
  private adjacencyList = new Map<string, Set<string>>();  // node -> outgoing edges

  addNode(node: Node): void {
    this.nodes.set(node.id, node);
    if (!this.adjacencyList.has(node.id)) {
      this.adjacencyList.set(node.id, new Set());
    }
  }

  addRelationship(rel: Relationship): void {
    this.relationships.set(rel.id, rel);
    
    const outgoing = this.adjacencyList.get(rel.startNode);
    if (outgoing) {
      outgoing.add(rel.id);
    }
  }

  // Breadth-first search
  bfs(startId: string, predicate: (node: Node) => boolean): Node[] {
    const visited = new Set<string>();
    const queue: string[] = [startId];
    const results: Node[] = [];

    while (queue.length > 0) {
      const nodeId = queue.shift()!;
      if (visited.has(nodeId)) continue;
      visited.add(nodeId);

      const node = this.nodes.get(nodeId)!;
      if (predicate(node)) {
        results.push(node);
      }

      const outgoing = this.adjacencyList.get(nodeId) || new Set();
      for (const relId of outgoing) {
        const rel = this.relationships.get(relId)!;
        queue.push(rel.endNode);
      }
    }

    return results;
  }

  // Shortest path (unweighted)
  shortestPath(startId: string, endId: string): string[] | null {
    const visited = new Set<string>();
    const queue: Array<{ nodeId: string; path: string[] }> = [
      { nodeId: startId, path: [startId] }
    ];

    while (queue.length > 0) {
      const { nodeId, path } = queue.shift()!;
      if (nodeId === endId) return path;
      if (visited.has(nodeId)) continue;
      visited.add(nodeId);

      const outgoing = this.adjacencyList.get(nodeId) || new Set();
      for (const relId of outgoing) {
        const rel = this.relationships.get(relId)!;
        queue.push({
          nodeId: rel.endNode,
          path: [...path, rel.endNode]
        });
      }
    }

    return null;
  }
}

Indexing and Performance

// Index structures for graph databases
class GraphIndexes {
  // 1. Node index by label and property
  private nodeLabelIndex = new Map<string, Set<string>>();  // label -> node IDs
  private nodePropertyIndex = new Map<string, Map<any, Set<string>>>();  // property -> value -> node IDs

  // 2. Relationship type index
  private relTypeIndex = new Map<string, Set<string>>();  // type -> relationship IDs

  indexNode(node: Node): void {
    for (const label of node.labels) {
      if (!this.nodeLabelIndex.has(label)) {
        this.nodeLabelIndex.set(label, new Set());
      }
      this.nodeLabelIndex.get(label)!.add(node.id);
    }

    for (const [key, value] of Object.entries(node.properties)) {
      if (!this.nodePropertyIndex.has(key)) {
        this.nodePropertyIndex.set(key, new Map());
      }
      const propIndex = this.nodePropertyIndex.get(key)!;
      if (!propIndex.has(value)) {
        propIndex.set(value, new Set());
      }
      propIndex.get(value)!.add(node.id);
    }
  }

  findNodesByLabel(label: string): Set<string> {
    return this.nodeLabelIndex.get(label) || new Set();
  }

  findNodesByProperty(key: string, value: any): Set<string> {
    const propIndex = this.nodePropertyIndex.get(key);
    return propIndex?.get(value) || new Set();
  }
}

Distributed Graph Databases

Scaling graphs across machines is challenging:

// Graph partitioning strategies
enum PartitionStrategy {
  // 1. Edge-cut: minimize edges crossing partitions
  EdgeCut,
  
  // 2. Vertex-cut: replicate vertices, keep edges local
  VertexCut
}

class DistributedGraph {
  // Hash-based partitioning
  partition(nodeId: string, numPartitions: number): number {
    const hash = this.hashString(nodeId);
    return hash % numPartitions;
  }

  // For traversal queries, need distributed BFS
  async distributedBFS(
    startId: string,
    partitions: GraphPartition[]
  ): Promise<Node[]> {
    const visited = new Set<string>();
    const frontier = new Set<string>([startId]);
    const results: Node[] = [];

    while (frontier.size > 0) {
      // Fetch nodes from all partitions in parallel
      const nodePromises = Array.from(frontier).map(async nodeId => {
        const partition = this.partition(nodeId, partitions.length);
        return partitions[partition].getNode(nodeId);
      });

      const nodes = await Promise.all(nodePromises);
      
      for (const node of nodes) {
        if (!visited.has(node.id)) {
          visited.add(node.id);
          results.push(node);
          
          // Get neighbors
          const partition = this.partition(node.id, partitions.length);
          const neighbors = await partitions[partition].getNeighbors(node.id);
          for (const neighbor of neighbors) {
            frontier.add(neighbor);
          }
        }
        frontier.delete(node.id);
      }
    }

    return results;
  }
}

Distributed graphs

Graph vs. Relational: When to Choose

// Decision matrix
interface DatabaseChoice {
  useGraph: boolean;
  reason: string;
}

function chooseDatabase(requirements: Requirements): DatabaseChoice {
  // Graph database wins if:
  if (requirements.queryDepth > 3) {
    return {
      useGraph: true,
      reason: "Deep traversals (>3 hops) are expensive in relational"
    };
  }

  if (requirements.relationshipComplexity === "high") {
    return {
      useGraph: true,
      reason: "Many-to-many relationships with properties on edges"
    };
  }

  if (requirements.schemaFlexibility === "required") {
    return {
      useGraph: true,
      reason: "Graph schemas are flexible, easy to evolve"
    };
  }

  // Relational wins if:
  if (requirements.transactionComplexity === "high") {
    return {
      useGraph: false,
      reason: "Relational databases have mature ACID support"
    };
  }

  if (requirements.aggregations === "heavy") {
    return {
      useGraph: false,
      reason: "Relational databases excel at GROUP BY, SUM, etc."
    };
  }

  return {
    useGraph: false,
    reason: "Default to relational for general-purpose workloads"
  };
}

Conclusion

Graph databases shine when relationships are as important as the data itself. From social networks to fraud detection to knowledge graphs, they provide a natural way to model and query connected data.

Key takeaways:

  • Graphs make relationship queries simple and fast
  • Built-in graph algorithms (shortest path, PageRank, community detection)
  • Schema flexibility enables rapid iteration
  • But: Less mature than relational, harder to scale

Choose graphs when:

  • Query depth > 3 hops
  • Relationships have properties
  • Schema evolves frequently
  • Pattern matching is core to the application

The world is a graph—sometimes your database should be too.