RODBC::sqlsave() truncates col in DF when saving to SQL Server to varchar(255)

Question

RODBC:sqlsave is default writing a data.frame with only char columns as varchar(255) and truncating the rest. The var in the DF are actually almost 4000 char in length.

I have tried :

sqlQuery(db,'CREATE TABLE SCUBA_tweetsC8 (user_id``` varchar(max), 
status_id varchar(max), 
screen_name varchar(max), 
text varchar(max),
source varchar(max),
reply_to_status_id varchar(max), 
reply_to_user_id varchar(max), 
reply_to_screen_name varchar(max), 
ext_media_type varchar(max), 
lang varchar(max),
quoted_status_id varchar(max), 
quoted_text varchar(max), 
quoted_source varchar(max), 
quoted_user_id varchar(max), 
quoted_screen_name varchar(max), 
quoted_name varchar(max), 
quoted_location varchar(max), 
quoted_description varchar(max), 
retweet_status_id varchar(max),
retweet_text varchar(max), 
retweet_source varchar(max), 
retweet_user_id varchar(max),
retweet_screen_name varchar(max), 
retweet_name varchar(max), 
retweet_location varchar(max), 
retweet_description varchar(max), 
place_url varchar(max), 
place_name varchar(max),
place_full_name varchar(max), 
place_type varchar(max), 
country varchar(max), 
country_code varchar(max), 
status_url varchar(max), 
name varchar(max),
location varchar(max), 
description varchar(max), 
url varchar(max), 
profile_url varchar(max), 
profile_expanded_url varchar(max), 
profile_banner_url varchar(max), 
profile_background_url varchar(max), 
profile_image_url varchar(max),);')

sqlSave(db,SCUBA_tweetsC,"SCUBA_tweetsC8",append = T)

but I get the following error:

Error in odbcUpdate(channel, query, mydata, coldata[m, ], test = test, : 'Calloc' could not allocate memory (18446744071562067968 of 1 bytes)

I have tried the best I could the other suggestions such as trying to use the VarType function with no success.

Your question is a duplicate of this one: stackoverflow.com/questions/39129895/… . Top Google result when searching for "Error in odbcUpdate(channel, query, mydata, coldata[m, ], test = test, : 'Calloc' could not allocate memory (18446744071562067968 of 1 bytes)" — Alex
Alex, it can really only be a duplicate with an answer, accepted or otherwise (though "accepted" is much preferred), and the best the comments do is suggest using another package. — r2evans
@r2evans - Googling this error returns GitLab issue (2nd link), that is unresolved ...... I doubt we can help here. — Alex
Axex, I agree (though github is my first link). Allan ... I don't know that you will get much better advice here, though, since I don't know how much has changed since that last question. I'm using DBI and odbc with success (including fairly large nvarchar(max) fields), perhaps that is an avenue for you to investigate. Otherwise, as joran mentioned in one of the comments by Alex's linked question, perhaps the r-sig-db mailing, I have no experience on it, though. — r2evans
Thanks for the info. I thought I would ask an hope someone had a working alternative. @r2evans, Would you be able to post an example?. I had been using DBI and ODBC and also received a truncation error and nothing would be written to the SQL table. — Allan

r2evans r2evans · Accepted Answer · 2020-06-24T20:52:36

I don't use RODBC, so I cannot test/reproduce your problem, but I'll try to reproduce your situation and show that in my environment it does not fail.

Sample data:

library(tibble)
dat <- tibble(id = 1:2, chr = c(strrep("A", 4000), strrep("B", 400000)))
nchar(dat$chr)
# [1]   4000 400000

library(DBI)
# library(odbc) # no need to load, but need it installed/available
con <- DBI::dbConnect(odbc::odbc(), driver = "ODBC Driver 17 for SQL Server",
                      database = "mydb", server = "111.222.333.444,1433",
                      uid = "myuser", pwd = "mypassword")

(I'm not going to go into all of the options required for this.)

Manually-defined table

DBI::dbExecute(con, "drop table if exists r2test")
# [1] 0
DBI::dbExecute(con, "create table r2test (id int, chr nvarchar(max))")
# [1] 0
system.time(
  DBI::dbWriteTable(con, "r2test", dat, append = TRUE)
)
#    user  system elapsed 
#    0.00    0.02    1.28 
dat2 <- DBI::dbGetQuery(con, "select id, chr from r2test")
nchar(dat2$chr)
# [1]   4000 400000
str(dat2)
# 'data.frame': 2 obs. of  2 variables:
#  $ id : int  1 2
#  $ chr: chr  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"| __truncated__ "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"| __truncated__

Justifying pre-`CreateTable`

I find two possible error scenarios with SQL Server and DBI.

DBI::dbExecute(con, "drop table if exists r2test")
### also with DBI::dbCreateTable(con2, "r2test", dat)
DBI::dbWriteTable(con, "r2test", dat, create = TRUE)
# Error: nanodbc/nanodbc.cpp:1617: 42000: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near '4e+05'.  [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Statement(s) could not be prepared. 
# <SQL> 'CREATE TABLE "r2test" (
#   "id" INT,
#   "chr" varchar(4e+05)
# )
# '

This is because SQL Server appears to not like scientific notation for field sizes. We can avoid this by changing scipen:

options(scipen=99)
DBI::dbWriteTable(con, "r2test", dat, create = TRUE)
# Error: nanodbc/nanodbc.cpp:1617: 42000: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]The size (400000) given to the column 'chr' exceeds the maximum allowed for any data type (8000).  [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Statement(s) could not be prepared. 
# <SQL> 'CREATE TABLE "r2test" (
#   "id" INT,
#   "chr" varchar(400000)
# )
# '

And now we see that SQL Server doesn't like explicit sizes that large, so we need to encourage it to use varchar(max).

Pre-created table

DBI::dbExecute(con, "drop table if exists r2test")
DBI::dbCreateTable(con2, "r2test", fields = c(id="INT", chr="nvarchar(max)"))
system.time(
  DBI::dbWriteTable(con, "r2test", dat, append = TRUE)
)
#    user  system elapsed 
#    0.00    0.01    1.34 
dat3 <- DBI::dbGetQuery(con, "select id, chr from r2test")
nchar(dat3$chr)
# [1]   4000 400000
str(dat3)
# 'data.frame': 2 obs. of  2 variables:
#  $ id : int  1 2
#  $ chr: chr  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"| __truncated__ "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"| __truncated__

Note about "large fields"

When using the Microsoft ODBC driver for SQL Server, one must always select "large" fields last in the query. For instance,

DBI::dbGetQuery(con, "select chr, id from r2test")
# Error in result_fetch(res@ptr, n) : 
#   nanodbc/nanodbc.cpp:2966: 07009: [Microsoft][ODBC Driver 17 for SQL Server]Invalid Descriptor Index

This is a known thing with MS's odbc driver for SQL Server (driver "ODBC Driver 17 for SQL Server"). The problem has been around for years. No other driver (including FreeTDS, which connects to SQL Server) is prone to this "feature". There is no indication that this will ever change (in fact, it's "formal" in the documentation, though "large" is not quantified).

I have no idea if RODBC has this issue as well; since it does not use nanodbc, it might work with the SQLGetData function a little more intelligently, side-stepping the problem.

Ways to work around this problem:

always put "large data" at the end of your list of selected columns;
use FreeTDS instead of Microsoft's ODBC drivers ... allegedly it is slightly slower (10%? idk), but I've successfully installed in windows/linux and selected fields in crazy orders without problem;
use RStudio's professional drivers, if you have the right OS and one of RStudio's professional products;
do all queries with "large data" using their bulk tools (bcp or sqlcmd), I believe they both deal better with it, though this is much less interactive than on the R console;
use RODBC (allegedly ... again, I don't know);
don't use "large data" fields (...(max) or anything larger than ...(255) ... a not-well-defined number) ... perhaps not an option; -- RECENTLY, a PR (odbc!415) has finalized the ability to workaround this large-field problem, so a github-install of the package (until released on CRAN) will work; or
use a different DBMS than SQL Server ... perhaps not an option.

References:

In the odbc package, issue odbc/#10 appears to be the first appearance, and is referenced in all others in this package; related closed issues (#82, #86, #112, #171, #256, #331); and some recent discussion perhaps to work-around this problem (#309, #358, #373 )
Ultimately, odbc uses the nanodbc C++ library, and while they recognize the problem, they feel it is not theirs to fix (nanodbc/#149).
The source document at Microsoft: https://docs.microsoft.com/en-us/sql/odbc/reference/develop-app/getting-long-data