Vertex normals are absoloutely essential for both Gouraud and Phong shading.
In Gouraud shading the lighting is calculated per vertex and then interpolated across the triangle.
In Phong shading the normal is interpolated across the triangle and then the calculation is done per-pixel/fragment.
Bump-mapping refers to a range of different technologies. When doing normal mapping (probably the most common variety these days) the normals, bi-tangent (often erroneously called bi-normal) and tangent are calculated per-vertex to build a basis matrix. This basis matrix is then interpolated across the triangle. The normal retrieved from the normal map is then transformed by this basis matrix and then the lighting is performed per pixel.
There are extensions to the normal mapping technique above that allow bumps to hide other bumps behind them. This is, usually, performed by storing a height map along with the normal map and then ray marching through the height map to find parts that are being obscured. This technique is called Relief Mapping.
There are other older forms such as DUDV bump mapping (Which was implemented in DirectX 6 as Environment Mapped, bump mapping or EMBM).
You also have emboss bump mapping which was a really early way of doing bump mapping
Edit: In answer to your comment, emboss bump mapping CAN be performed on gouraud shaded triangles. Other forms of bump-mapping are, necessarily, per-pixel (due to the fact they work by modifying the surface normals on a per-pixel (or, at least, per-texel) basis). I wouldn't be surprised if there were other methods that can be performed with per-vertex lighting but I can't think of any off the top of my head. The results will look pretty rubbish compared to doing it on a per-pixel basis, though.
Re: Tangents and Bi-Tangents are actually quite simple once you get your head round them (took me years though, tbh ;)). Any 3D coordinate frame can be defined by a set of vectors that form an orthogonal basis matrix. By setting up the normal, tangent and bi-tangent per vertex you are merely setting up the coordinate frame at each vertex. From this you have the ability to transform a world or object space vector into the triangle's own coordinate frame. From here you can simply translate a light vector (or position) into the coordinate frame of a given pixel on the surface of the triangle. This then means that the normals in the normal map don't need to be stored in the object's space and hence as those triangles move around (when being animated, for example) the normals are already being handled in their own local space.