Designed and implemented scalable GNN-based community detection pipelines using GCN and GraphSAGE architectures with full-batch training, neighbor sampling, and graph partitioning strategies.
Conducted extensive experiments on SBM (1K, 10K), CORA, and Reddit datasets, demonstrating that neighbor sampling and graph partitioning enable training on large graphs where full-batch methods fail due to memory constraints.
Achieved up to 90% accuracy on SBM (10K nodes) with graph partitioning while reducing memory footprint, and enabled scalable training on Reddit where full-batch methods resulted in out-of-memory failures.
Analyzed trade-offs between accuracy, training time, and memory usage, providing practical guidelines for scalable GNN deployment in real-world large-scale social networks.