Is there any necessary mechanism in implementation of operating systems that impacts flow of instructions my program sends to CPU?
Not strictly necessary mechanisms (depending on how you define "OS"); but typically there's IRQs, exceptions and task switches.
IRQs are used by devices to ask the OS (their device driver) for attention; and interrupting the flow of instructions your program sends to CPU. The alternative is polling, which wastes a huge amount of CPU time checking if the device needs attention when it probably doesn't. Because applications need to use devices (file IO, keyboard, video, etc) and wasting CPU time is bad; IRQs significantly improve the performance of applications.
Exceptions (like IRQs) also interrupt the normal flow of instructions. They occur when the normal flow of instructions can't continue, either because your program crashed, or because your program needs something. The most common cause of exceptions is virtual memory (e.g. using swap space to let the application have more memory than actually exists so that the application can actually work properly; where the exception tells the OS that your program tried to access memory that has to be fetched from disk first). In general; this also improves performance for multiple reasons (because "can't execute because there's not enough RAM" can be considered "zero performance"; and because various tricks reduce RAM consumption and increase the amount of RAM that can be used for things like caching files which improve file IO speed).
Task switches is the basis of multi-tasking (e.g. being able to run more than one application at a time). If there are more tasks that want CPU time than there are CPUs, then the OS (scheduler) may (depending on task priorities and scheduler design) switch between them so that all the tasks get some CPU time. However; most applications spend most of their time waiting for something to do (e.g. waiting for user to press a key) and don't need CPU time while waiting; and if the OS is only running one task then the scheduler does nothing (no task switches because there's no other task to switch to). In other words, if the OS supports multi-tasking but you're only running one task, then it makes no difference.
Note that in some cases, IRQs and/or tasks are also used to "opportunistically" do work in the background (when hardware has nothing better to do) to improve performance (e.g. pre-fetch, pre-process and/or pre-calculate data before it's needed so that the resulting data is available instantly when it is needed).
For example if my program was set for maximum priority in OS, would it perform exactly the same when run without OS?
It's best to think of it as many layers - hardware & devices (CPU, etc), with kernel and device drivers on top, with applications on top of that. If you remove any of the layers nothing works (e.g. how can an application read and write files when there's no file system and no disk device drivers?).
If you shift all of the functionality that an OS provides into the application (e.g. a statically linked library that can make an application boot on bare metal); then if the functionality is the same the performance will be the same.
You can only improve performance by reducing functionality. For example, if you get rid of security you'll improve performance (temporarily, until your application becomes part of an attacker's botnet and performance becomes significantly worse due to all the bitcoin mining it's doing). In a similar way, you can get rid of flexibility (reboot the computer when you plug in a different USB flash stick), or fault tolerance (trash all of your data without any warning when the storage devices start failing because software assumed hardware is permanently perfect).