Accurately locating the fovea is a prerequisite for developing computer aided diagnosis (CAD) of retinal diseases. In colour fundus images of the retina, the fovea is a fuzzy region lacking prominent visual features and this makes it difficult to directly locate the fovea. While traditional methods rely on explicitly extracting image features from the surrounding structures such as the optic disc and various vessels to infer the position of the fovea, deep learning based regression technique can implicitly model the relation between the fovea and other nearby anatomical structures to determine the location of the fovea in an end-to-end fashion. Although promising, using deep learning for fovea localisation also has many unsolved challenges. In this paper, we present a new end-to-end fovea localisation method based on a hierarchical coarse-to-fine deep regression neural network. The innovative features of the new method include a multi-scale feature fusion technique and a self-attention technique to exploit location, semantic, and contextual information in an integrated framework, a multi-field-of-view (multi-FOV) feature fusion technique for context-aware feature learning and a Gaussian-shift-cropping method for augmenting effective training data. We present extensive experimental results on two public databases and show that our new method achieved state-of-the-art performances. We also present a comprehensive ablation study and analysis to demonstrate the technical soundness and effectiveness of the overall framework and its various constituent components.