In the linux kernel, I wrote code that resembles copy_page_range (mm/memory.c) so copy memory from one process to another with COW optimization. The destination and source addresses can be offset by PAGE_SIZE and COW still works. I noticed, however, that in a user program when I copy from the same source address to different destination addresses, the TLB does not seem to be properly flushed. At a high level, my user level code does the following (I copy exactly one page, 0x1000 bytes on my machine, at a time):
SRC=0x20000000
- Write to SRC (call the associated page
page1). - Syscall to copy SRC into 0x30000000 in destination process. Now, src process address 0x20000000 and destination process address 0x30000000 point to the same page (
page1). - Write something different to SRC (this should trigger a page fault to handle the COW). Assume source address now points to
page2. - Syscall to copy SRC into 0x30001000 in destination process.
At this point, two separate pages should exist:
SRC 0x20000000 page2
DST 0x30000000 page1
DST 0x30001000 page2
I find that at step 3, when I write something different into src 0x20000000, no page fault is generated. Upon inspection, the actual page mappings are:
SRC 0x20000000 page1
DST 0x30000000 page1
DST 0x30001000 page1
In my code, if I call flush_tlb_page and pass the source address, the user code works as expected with the proper page mappings. So I am convinced I am not maintaining the TLB correctly. In copy_page_range, the kernel calls mmu_notifier_invalidate_range_start/end before and after it alters page tables. I am doing the exact same thing and have double checked I am indeed passing the correct struct_mm and addresses to mmu_notifier_invalidate_range_start/end. Does this function not handle flushing the tlb?
Ok, so literally as I finished typing this, I checked dup_mmap and realized that the primary caller of copy_page_range, dup_mmap (kernel/fork.c), calls flush_tlb_mm. I am guessing I should call flush_cache_range and flush_tlb_range before and after my kernel code. Is this correct? What exactly does mmu_notifier_invalidate_range_start/end do?