- [Description of random situation](#description-of-random-situation)
- [Others](#others)
<!--TOC -->
# Graph Attention Networks Description
Graph Attention Networks(GAT) was proposed in 2017 by Petar Veličković et al. By leveraging masked self-attentional layers to address shortcomings of prior graph based method, GAT achieved or matched state of the art performance on both transductive datasets like Cora and inductive dataset like PPI. This is an example of training GAT with Cora dataset in MindSpore.
[Paper](https://arxiv.org/abs/1710.10903): Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y. (2017). Graph attention networks. arXiv preprint arXiv:1710.10903.
# Model architecture
An illustration of multi- head attention (with K = 3 heads) by node 1 on its neighborhood can be found below:
To ultilize the strong computation power of Ascend chip, and accelerate the training process, the mixed training method is used. MindSpore is able to cope with FP32 inputs and FP16 operators. In GAT example, the model is set to FP16 mode except for the loss calculation part.
| Training Cost(200 epochs) | 27.62298311s | 36.711862s |
| End to End Training Cost(200 epochs) | 39.074s | 50.894s |
# Description of random situation
GAT model contains lots of dropout operations, if you want to disable dropout, set the attn_dropout and feature_dropout to 0 in src/config.py. Note that this operation will cause the accuracy drop to approximately 80%.
# Others
GAT model is verified on Ascend environment, not on CPU or GPU.