0
votes

I am running Drill 1.15 in distributed mode on top of datanodes only (3 nodes with 32GB memory each). I am trying to read parquet file generated from Spark job in HDFs.

Generated file is being read in spark, just fine but when reading in Drill it doesn't seem to work for columns except a few.

org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: Exception occurred while reading from disk. File: [file_name].parquet Column: Line Row Group Start: 111831 File: [file_name].parquet Column: Line Row Group Start: 111831 Fragment 0:0 [Error Id: [Error_id] on [host]:31010]

In drill config for dfs, i have default config for parquet format.

I am trying to run a simple query :

select * from dfs.`/hdfs/path/to/parquet/file.parquet`

File size if also in 10s of MBs not alot.

I am using Spark 2.3 version to generate the parquet file with 1.15 version of Drill.

Is there any config i am missing or some other point?

1
That's interesting question, but not that valuable unless you can provide a minimal reproducible example.10465355
@user10465355 i have added query sample and nodes information. Is there any other specific detail that you are looking for? I can definitely provide you thatAvik Aggarwal

1 Answers

1
votes

Looks like a bug.
Please create Jira ticket and provide file.parquet and log files.
Thanks