Image localization is an important supplement to GPS-based methods, especially in indoor scenes. Traditional methods depending on image retrieval or structure from motion (SfM) techniques either suffer from low accuracy or even fail to work due to the texture-less or repetitive indoor surfaces. With the development of range sensors, 3D colourless maps are easily constructed in indoor scenes. How to utilize such a 3D colourless map to improve single image localization performance is a timely but unsolved research problem. In this paper, we present a new approach to addressing this problem by inferring the 3D geometry from a single image with an initial 6DOF pose estimated by a neural network based method. In contrast to previous methods that rely multiple overlapping images or videos to generate sparse point clouds, our new approach can produce dense point cloud from only a single image. We achieve this through estimating the depth map of the input image and performing geometry matching in the 3D space. We have developed a novel depth estimation method by utilizing both the 3D map and RGB images where we use the RGB image to estimate a dense depth map and use the 3D map to guide the depth estimation. We will show that our new method significantly outperforms current RGB image based depth estimation methods for both indoor and outdoor datasets. We also show that utilizing the depth map predicted by the new method for single indoor image localization can improve both position and orientation localization accuracy over state-of-the-art methods.