Although there is no way to do what I asked directly, as @hadley said, there is a different way to look at the problem. What took me some time to understand is that dplyr treats each connection as a difference source (or src
for short). There is on src
that is always there which is the collection of objects in memory.
So in my question I was actually dealing with three sources:
- Objects in memory
- Database_1
- Database_2
Any join between either of these three will require the permissions for dplyr to make temporary tables and insert into them. Any join within any of these three will not be an issue. So there are (as least) three possible ways to deal with this when you cannot get additional permissions to the databases.
1. Load everything into memory
If you source all the objects you need from both databases into memory, then you will bring them into a shared src
and will be able to do any joins on them. Dplyr offers the collect()
function to do so:
db_table_of_interest <- tbl(Database_1, "table_of_interest")
df_table_of_interest <- collect(db_table_of_interest).
You will in fact turn the table into a local dataframe. You will need to do this for all tables of interest, in all databases of interest. It will work fine as long as you have enough RAM to harbour all the data you are reading.
2. Copy identifiers between databases
If (as was our case) you need to join the record identifiers (like site or sample ID's) between two databasas that have the same sites under different ID's, then the easiest way is to double the ID's in a table in both database. So instead of using an external sheet where you list all the ID's of each site in each database, and set up the matches, add one column in the site_table in each database with a reference to the other.
As long as you do not need to set up joined tables with the actual data, then you will be fine. For example if you need to read in a sites-by-species and a sites-by-environment table where the species and environment data are stored in a separate database, and have different ID's for each site.
If you do need to join tables, then at least you will be able to pre-select in both databases before doing a collect()
and reading everything in memory. That will save you memory requirements when there is a lot of data to read.
3. Copy everything into one analysis database
If for some reason more memory is not possible, then in theory you could copy all the data into a single database first. I cannot imagine that this is a viable option in any situation other than that there are only a few tables of interest so that it would be feasible to merge them into a new joint database. Perhaps in cases where the data set is very large and does not change.
If there is an option I have missed in this list, I would be happy to know what other solutions you may have found.