I am using Hive bundled with Spark. My Spark Streaming job writes 250 Parquet files to HDFS per batch job, in the form of /hdfs/nodes/part-r-$partition_num-$job_hash.gz.parquet. This means that after 1 job, I have 250 files in HDFS, and after 2, I have 500. My external Hive table, created using Parquet, points at /hdfs/nodes for it's location, but it doesn't update to include the data in the new files after I rerun the program.
Do Hive external tables include new files in the table, or only updates to existing files that were there when the table was made?
Also see my related question about automatically updating tables using Hive.