[Update: looks like a aws glue get-table --database-name xyz --name tablename
will give me the raw materials for the table definition, so that's progress--just wondering if something exists that automatically assembles the pieces]
[Update 2: You can have Cloudtrail show an Athena table definition, without using glue or a glue crawler as discussed in this CloudTrail specific Athena documentation. It's still got some quirks (not defining some of the more complex data elements as structs), but it's better than what glue generates if pointed at cloudtrail.]
I've got a Glue cataloged S3 bucket with cloud trail logs. The JSON nests pretty deep. Down deep, there are some properties like encryptionContext:struct<aws\:cloudtrail\:arn:string,aws\:s3\:arn:string>
. Those colons in property names are throwing off Athena and causing queries to fail.
I've found out it would be easy to supply a mapping via Serde that could overcome this, but for that, I have to generate an external table definition for Athena, so that I can write a WITH SERDEPROPERTIES section in the table definition.
I'd like to auto-generate the Athena table definition (fairly complex) from the Glue catalog. Does anyone have pointers to such code or utility?
The approach to mapping the offending property names is described in this AWS Big Data Blog post, in the section "Walkthrough: Handling forbidden characters with mappings".
Here's an example of the error I'm currently getting when I do a simple select from the Athena table whose metadata is provided by Glue:
HIVE_METASTORE_ERROR: com.facebook.presto.spi.PrestoException: Error: : expected at the position 997 of 'struct,stackName:string,keyId:string,aggregateField:string,filter:struct,startTimes:array>>,host:array,bucketName:string,location:array,roleArn:string,roleSessionName:string,templateURL:string,encryption:array,parameters:array>,includeShared:boolean,topicArn:string,policyName:string,attributeName:string,attributeValue:string,protocol:string,endpoint:string,returnSubscriptionArn:boolean,name:string,attributes:struct,eventCategory:string,maxResults:int,notificationARNs:array,capabilities:array,tags:array,disableRollback:boolean,lookupAttributes:array>,policy:string,description:string,keyUsage:string,customerMasterKeySpec:string,origin:string,bypassPolicyLockoutSafetyCheck:boolean,aliasName:string,targetKeyId:string,trailName:string,encryptionContext:struct<aws:cloudtrail:arn:string,aws:s3:arn:string>,keySpec:string,trailNameList:array,includeShadowTrails:boolean,s3BucketName:string,s3KeyPrefix:string,snsTopicName:string,includeGlobalServiceEvents:boolean,isMultiRegionTrail:boolean,enableLogFileValidation:boolean,kmsKeyId:string,bucketPolicy:struct,Sid:string,Condition:struct>>>>,eventSelectors:array,excludeManagementEventSources:array>>,ServerSideEncryptionConfiguration:struct>>,tagging:array,Tagging:struct>>>,x-amz-acl:array,resourceIdList:array,logging:array,website:array,lifecycle:array,notification:array,versioning:array,publicAccessBlock:array,acl:array,cors:array,object-lock:array,requestPayment:array,replication:array,resourceArn:string,DescribeFlowLogsRequest:string>' but '\' is found. (Service: null; Status Code: 0; Error Code: null; Request ID: null)