I don't use RODBC
, so I cannot test/reproduce your problem, but I'll try to reproduce your situation and show that in my environment it does not fail.
Sample data:
library(tibble)
dat <- tibble(id = 1:2, chr = c(strrep("A", 4000), strrep("B", 400000)))
nchar(dat$chr)
# [1] 4000 400000
library(DBI)
# library(odbc) # no need to load, but need it installed/available
con <- DBI::dbConnect(odbc::odbc(), driver = "ODBC Driver 17 for SQL Server",
database = "mydb", server = "111.222.333.444,1433",
uid = "myuser", pwd = "mypassword")
(I'm not going to go into all of the options required for this.)
Manually-defined table
DBI::dbExecute(con, "drop table if exists r2test")
# [1] 0
DBI::dbExecute(con, "create table r2test (id int, chr nvarchar(max))")
# [1] 0
system.time(
DBI::dbWriteTable(con, "r2test", dat, append = TRUE)
)
# user system elapsed
# 0.00 0.02 1.28
dat2 <- DBI::dbGetQuery(con, "select id, chr from r2test")
nchar(dat2$chr)
# [1] 4000 400000
str(dat2)
# 'data.frame': 2 obs. of 2 variables:
# $ id : int 1 2
# $ chr: chr "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"| __truncated__ "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"| __truncated__
Justifying pre-CreateTable
I find two possible error scenarios with SQL Server and DBI
.
DBI::dbExecute(con, "drop table if exists r2test")
### also with DBI::dbCreateTable(con2, "r2test", dat)
DBI::dbWriteTable(con, "r2test", dat, create = TRUE)
# Error: nanodbc/nanodbc.cpp:1617: 42000: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near '4e+05'. [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Statement(s) could not be prepared.
# <SQL> 'CREATE TABLE "r2test" (
# "id" INT,
# "chr" varchar(4e+05)
# )
# '
This is because SQL Server appears to not like scientific notation for field sizes. We can avoid this by changing scipen
:
options(scipen=99)
DBI::dbWriteTable(con, "r2test", dat, create = TRUE)
# Error: nanodbc/nanodbc.cpp:1617: 42000: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]The size (400000) given to the column 'chr' exceeds the maximum allowed for any data type (8000). [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Statement(s) could not be prepared.
# <SQL> 'CREATE TABLE "r2test" (
# "id" INT,
# "chr" varchar(400000)
# )
# '
And now we see that SQL Server doesn't like explicit sizes that large, so we need to encourage it to use varchar(max)
.
Pre-created table
DBI::dbExecute(con, "drop table if exists r2test")
DBI::dbCreateTable(con2, "r2test", fields = c(id="INT", chr="nvarchar(max)"))
system.time(
DBI::dbWriteTable(con, "r2test", dat, append = TRUE)
)
# user system elapsed
# 0.00 0.01 1.34
dat3 <- DBI::dbGetQuery(con, "select id, chr from r2test")
nchar(dat3$chr)
# [1] 4000 400000
str(dat3)
# 'data.frame': 2 obs. of 2 variables:
# $ id : int 1 2
# $ chr: chr "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"| __truncated__ "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"| __truncated__
Note about "large fields"
When using the Microsoft ODBC driver for SQL Server, one must always select "large" fields last in the query. For instance,
DBI::dbGetQuery(con, "select chr, id from r2test")
# Error in result_fetch(res@ptr, n) :
# nanodbc/nanodbc.cpp:2966: 07009: [Microsoft][ODBC Driver 17 for SQL Server]Invalid Descriptor Index
This is a known thing with MS's odbc driver for SQL Server (driver "ODBC Driver 17 for SQL Server"). The problem has been around for years. No other driver (including FreeTDS, which connects to SQL Server) is prone to this "feature". There is no indication that this will ever change (in fact, it's "formal" in the documentation, though "large" is not quantified).
I have no idea if RODBC
has this issue as well; since it does not use nanodbc
, it might work with the SQLGetData
function a little more intelligently, side-stepping the problem.
Ways to work around this problem:
- always put "large data" at the end of your list of selected columns;
- use FreeTDS instead of Microsoft's ODBC drivers ... allegedly it is slightly slower (10%? idk), but I've successfully installed in windows/linux and selected fields in crazy orders without problem;
- use RStudio's professional drivers, if you have the right OS and one of RStudio's professional products;
- do all queries with "large data" using their bulk tools (
bcp
or sqlcmd
), I believe they both deal better with it, though this is much less interactive than on the R console;
- use
RODBC
(allegedly ... again, I don't know);
- don't use "large data" fields (
...(max)
or anything larger than ...(255)
... a not-well-defined number) ... perhaps not an option;
-- RECENTLY, a PR (odbc!415) has finalized the ability to workaround this large-field problem, so a github-install of the package (until released on CRAN) will work; or
- use a different DBMS than SQL Server ... perhaps not an option.
References:
In the odbc
package, issue odbc/#10 appears to be the first appearance, and is referenced in all others in this package; related closed issues (#82,
#86,
#112,
#171,
#256,
#331); and some recent discussion perhaps to work-around this problem (#309,
#358,
#373
)
Ultimately, odbc
uses the nanodbc
C++ library, and while they recognize the problem, they feel it is not theirs to fix (nanodbc/#149).
The source document at Microsoft: https://docs.microsoft.com/en-us/sql/odbc/reference/develop-app/getting-long-data
DBI
andodbc
with success (including fairly largenvarchar(max)
fields), perhaps that is an avenue for you to investigate. Otherwise, as joran mentioned in one of the comments by Alex's linked question, perhaps the r-sig-db mailing, I have no experience on it, though. – r2evans