Cosmos HttpFS file format

Question

Actually, I'm sending data to Cosmos via Cygnus. The Cosmos directory where Cygnus put the data is, for example, /user/myUser/mysetdata. I've created my hive table with this columns: recvTimeTs, recvTime, entityId, entityType, attrName, attrType, attrValue.

Now, I want to put data into Cosmos directly via HttpFS to the same directory that is putting Cygnus.

How could be the ".txt" file format? It have to be comma delimited? For example:

recvTimeTs;recvTimem;entityId;entityType;attrName;attrType;attrValue value;value;value;...

frb frb · Accepted Answer · 2014-06-17T09:12:25

Hive tables contains the structured data within files located in the HDFS folder which was given in the Hive table creation command.

With Cygnus 0.1, such structured data is achieved by using CSV-like files, thus adding a new file to the HDFS folder or appending new data to an already existent file within that folder is as easy as composing new CSV-like lines of data. The separator character must be the same you specified when creating the table, e.g.:

create external table <table_name> (recvTimeTs bigint, recvTime string, entityId string, entityType string, attrName string, attrType string, attrValue string) row format delimited fields terminated by '|' location '/user/<myusername>/<mydataset>/';

Thus, being the example separator |, the new data lines must be like:

<ts>|<ts_ms>|<entity_name>|<entity_type>|<attribute_name>|<attribute_type>|<value>

From Cugnus 0.2 (inclusive), the structured data is achieved by using Json-like files. In this case you do not have to deal with separators, nor table creation (see this question), since Json does not use separators and the table creation is automatic. In this case, you have to compose a new file or new data to be appended to an already existing file by following any of this formats (depending if you are storing the data in row or column mode, respectively):

{"recvTimeTs":"13453464536", "recvTime":"2014-02-27T14:46:21", "entityId":"Room1", "entityType":"Room", "attrName":"temperature", "attrType":"centigrade", "attrValue":"26.5", "attrMd":[{name:ID, type:string, value:ground}]}

{"recvTime":"2014-02-27T14:46:21", "temperature":"26.5", "temperature_md":[{"name":"ID", "type":"string", "value":"ground"}]}

It is worth mentioning there exists scripts in charge of moving 0.1-like format into 0.2-like (or higher) format.

Cosmos HttpFS file format

1 Answers