Exception in cleaner thread when reading files from S3 in local Spark

Question

Following is in my class-path:

aws-java-sdk-1.7.4.jar
hadoop-aws-2.7.3.jar
spark-sql-2.2.0.jar

Following code works fine:

object MySparkJob {
  def main(args:Array[String]):Unit = {
    val conf = new SparkConf().setAppName("MySparkTest").setMaster("local[4]")
    val spark = new SparkContext(conf)
    ......all credentials config stuff here.......
    val input = spark.textFile("s3a://mybucket/access_2017-10-30.log_10.0.0.176.gz")
    val pageStats = input filter(_.contains("pagestat"))
    val parsedRecords = pageStats flatMap accessLogRecordToCaseClass
    val evIDs = parsedRecords map (_.evid)
    println("Size is " + evIDs.collect().toSet.size)
    spark.stop
  }
}

I run the job with sbt clean compile run.

But in the console I see following warning with the exception wrapped:

17/11/10 15:22:28 WARN FileSystem: exception in the cleaner thread but it will continue to run
java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
        at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3063)
        at java.lang.Thread.run(Thread.java:748)

Though this is a warning I still would like to understand why it happens. Probably some of you encountered similar thing in the past and can advise?

stevel stevel · Accepted Answer · 2017-11-14T11:45:51

Don't worry; it's just a cleanup tread being told to stop in application shutdown. HADOOP-12829 explicitly toned down the warning message when interrupted to ""Cleaner thread interrupted, will stop"

Exception in cleaner thread when reading files from S3 in local Spark

1 Answers