Hi PyTorch Community!
## The Problem
I’ve noticed a significant gap in the PyTorch ecosystem: **there’s no comprehensive Graph Neural Network (GNN) library for LibTorch/C++**.
While Python has excellent libraries like PyTorch Geometric and DGL, C++ developers are left to implement GNN operations from scratch. This creates barriers for:
- Production deployments requiring low-latency inference
- Integration with existing C++ ML pipelines
- Mobile and embedded applications
- High-performance training on large graphs
## My Proposal
I’m proposing to develop **LibTorch-Geometric** - a comprehensive C++ GNN library that would provide:
### Core Features
- **Graph data structures** with efficient batching for variable-sized graphs
- **Message passing framework** similar to PyG’s MessagePassing class
- **Standard GNN layers**: GCN, GraphSAGE, GAT, GIN
- **Graph operations**: Optimized sparse operations, pooling, sampling
- **CUDA acceleration** for performance-critical operations
### Example API (Draft)
```cpp
#include <libtorch_geometric/libtorch_geometric.h>
// Simple GCN model
class GCN : public torch::nn::Module {
public:
GCN(int64_t num_features, int64_t hidden_dim, int64_t num_classes) {
conv1 = register_module(“conv1”,
ltg::GCNConv(ltg::GCNConvOptions(num_features, hidden_dim)));
conv2 = register_module(“conv2”,
ltg::GCNConv(ltg::GCNConvOptions(hidden_dim, num_classes)));
}
torch::Tensor forward(torch::Tensor x, torch::Tensor edge_index) {
x = conv1->forward(x, edge_index);
x = torch::relu(x);
x = conv2->forward(x, edge_index);
return torch::log_softmax(x, 1);
}
private:
ltg::GCNConv conv1{nullptr}, conv2{nullptr};
};
```
## Why This Matters
- **Performance**: Native C++ speed without Python overhead
- **Production Ready**: Deploy GNNs without Python dependencies
- **Ecosystem Growth**: Brings graph deep learning to more use cases
- **Research Impact**: Enables high-performance GNN research
## My Background
I’m an MTech student & I have experience with C++, CUDA, and deep learning, and I’m committed to seeing this through to completion and long-term maintenance.
## Questions for the Community
1. **Interest Level**: Would this be valuable to the PyTorch ecosystem?
2. **API Design**: Does the proposed C++ API feel natural? Any suggestions for improvement?
3. **Priority Features**: Which GNN layers and operations should I prioritize first?
- Basic layers: GCN, GraphSAGE, GAT?
- Graph pooling operations?
- Large graph sampling algorithms?
4. **Integration**: How should this integrate with existing PyTorch tooling?
- Should it follow the same conventions as other LibTorch extensions?
- Any specific build system preferences?
5. **Performance Requirements**: What are the key bottlenecks you’ve experienced with Python GNN libraries?
6. **Contribution Path**: Would this be better as:
- Independent library in the PyTorch ecosystem (like PyG for Python)?
- Eventually proposed for inclusion in PyTorch core?
- Hybrid approach - start independent, propose inclusion if successful?
## Next Steps
Based on community feedback, I plan to:
1. Start with a prototype implementing basic GCN and message passing
2. Create benchmarking framework vs Python implementations
3. Iterate based on real-world usage and community input
4. Open source everything and build contributor community
## Timeline
- **Months 1-2**: Core infrastructure and basic layers
- **Months 3-4**: Standard GNN implementations
- **Months 5-6**: Performance optimization and CUDA kernels
- **Months 7-8**: Documentation, examples, and community feedback
-–
**TL;DR**: I want to build a comprehensive GNN library for LibTorch to fill the C++ ecosystem gap. Looking for community input on design, priorities, and contribution approach.
**Your thoughts?** Would love to hear from both potential users and PyTorch maintainers!
-–
*Cross-posting this to PyTorch Forums as well to reach broader audience*