1
votes

I have logs that resemble the following:

value1 value2 "value 3 with spaces" value4

using:

  "formats": {
    "csv": {
      "type": "text",
      "delimiter": " "
    }
  }

for the storage plugin delimiting by " " gives me the following columns:

columns[0] | columns[1] | columns[2] | columns[3] | columns[5] | columns[6] | columns[7]
value1     | value2     | value      | 3          | with       | spaces     | value4

what I'd like is:

columns[0] | columns[1] | columns[2]              | columns[3] 
value1     | value2     | value 3 with spaces     | value4
1
There's a feature targeted for release soon (my educated guess is December) that should work for you: issues.apache.org/jira/browse/DRILL-3423 - catpaws
@catpaws is this resolved in 1.3? - Dev
Sorry, it's not in 1.3. The target for DRILL-3423 is 1.4. - catpaws

1 Answers

0
votes

To my knowledge, there is no way to skip delimiters in Drill. However, if variable 3 is the only one that can have those " " in between, a workaround I can think of is:

  • structure your first query so that columns[3] is always the last, Ex

select columns[0], columns[1], columns[2], columns[4], columns[3] from dfs.default./path/to/your/file;

  • use the CONCATENATE() command to build your variable in a separate column.

Another way around it would require changing the default delimiter in the file prior having Drill reading it. Depending on where you are ingesting your data from this may be feasible or not.

Good luck and if you are looking for more things on Drill, be sure to check out MapR's Community page on Drill, which has code examples that might be helpful: https://community.mapr.com/community/products/apache-drill