1
votes

I got some question about the .xdf file:

  1. What is this exacly?
  2. How does this type of file work?
  3. How Microsoft R works with this typ of file?
  4. What are the advantages agains data.frames?

I'm really looking forward to your answers.

Greetings R123456789

1
I read this already, but the manuals doesn't answer my question. And the questions are very important for me to understand programming with Microsoft R. - user43348044
That link answers all questions, 1. it is a compressed data file. 2,3. using package export/import functions. 4. Unlike dataframes it doesn't sit on the memory. - zx8754
Thanks for your answer. That means, xdf-files are stored like a normal file in a folder on the SQL Server or local and data.frames "stored" in the memory? That means, that with xdf-files I got not memory limitation? - user43348044

1 Answers

6
votes
  1. An XDF file is a compressed binary file format with user selectable levels of compression, some quick facts can be found here: https://support.microsoft.com/en-us/help/3104260/qa-what-is-the-.xdf-file-format XDF files come in two forms, Standalone and Composite. For Standalone XDF files, you will see a single file stored on disk with the .xdf extension. For Composite, the XDF file is represented by a directory, which contains metadata and data subdirectories. Also, for Composite, Metadata and Data files in there directories are split and individually compress as XDF part files.

  1. It is a proprietary implementation inside of Microsoft R Server, I can expand on this answer, but i would need to refine the question, "How does this type of file work?"

  1. An XDF file is stored on the disk and does not sit in memory. Microsoft R Server, with a call to RxXdfData() or rxImport(), will read the XDF file and decompress it, then insert it into memory as a Data Frame. Many Microsoft R "rx" functions can take a path to an XDF directly as a data source or sink, and will manage reading segments into memory as required.

  1. The advantages of using XDF as a Data Source/Sink is that you do not need to buffer the entire file into memory for Microsoft R Server to work with it. It allows for partial reads and writes, as well as other optimizations around disk space via compression. It will operate faster than reading/writing from flat files as Metadata is used to index the XDF. The disadvantages are primarily performance, Data in-memory (data.frames) will be faster to operate on than data on disk in all cases.

Note: As with all files, the underlying operation system controls when a file is written from memory to disk. For the purpose of your question, the assumption can be made that the XDF file resides on disk as a standard file.