My explanation may not be accurate, but I am comfortable in reading YUV on Wikipidea only after I know the following.
First, let's consider how a monitor displays an image. Every color can be represented by three colors (the so-called RGB model), that is, red, green and blue. Every "red", "green" and "blue" can be recorded in a computer in 1 byte (8 bit). So a pixel in a monitor has 3 bytes (each byte for "red", "green" and "blue") information.
Human beings' eye is very sensible on brightness, but not so much on color. Can we make use of that? So it comes the YUV representation of color. "Y" means luminance (brightness), "U" and "V" means chrominance (color). YUV is like RGB which is used to represent a color. But why do we need it while we can represent color in RGB model?
Becuase the file size matters. In RGB model we have to use 3 bytes (24 bits) to record a color, but in YUV model, we can half the size if we use yuv420p format. You do not have to know what yuv420p is, the point here is that use YUV we can dramatically decrease the file size (very easily) than using RGB model.
When you play an MP4 file encoded in x264, you can see the file is decoded in yuv420 planar (check that in your VLC player by opening "codec information"). VLC needs metadata to be able to display the video properly, which is why things like MP4 exist. YUV is raw data, which does not contain the height and width information needed.
Check here for how to play it :Playing YUV on VLC Player
Check here for an explaination about why you need the metadata: How to find out resolution and count of frames in YUV 4:2:0 file?