You need more focus with this question to make it valid for StackOverflow.
You need to take into consideration two things:
- There is no such thing as 100 ms of audio sharp in MP4 file. Typical MP4 AAC frame is 21⅓ milliseconds long and you can take a few to make 100 ms of audio data.
- Vorbis out of AAC means you need to fully transcode the data: decode from AAC into PCM and then encode
As you tagged c# this means you need to use one of the media APIs for Windows, or a library on top of them, to read an MP4 file, select audio track of interest, start reading its data, decode it through AAC decoder and obtain raw decoded audio content.
Then separately use Vorbis audio encoder and encode audio per stated requirements.
This is the way to do it programmatically. You cna probably use one of the readily available well-known tools to have it done for you "the easiest way".