1
votes

I am new to SnappyData and I am trying to import a huge amount of data into it. So the data is created from different sources and stored as csv files into zip files from each location. Lets say that the structure of the zips are zip1, zip2... zipn and each zip contains exactly the same (header.csv, detail1.csv, detail2.csv, ... detail15.csv) each .csv has the same structure, it means detail5.csv from zip1 has the same columns than detail5.csv from zipn. So my question is how to automate the importation?? Is there an import command for such a bunch of data? The first time is easy because I use create external table but how do I import the rest of the idata? Or, better, how do I import all the data into a column (because we have a lot of data) or row (because we can partition the data based in the location it comes from) table?

1
I'll get this answered for you ASAP - plamb

1 Answers

0
votes

The fastest way to import CSV is to use the built-in spark support for CSV in DataframeReader. Afaik, there is no support for the level of customization you need. But, you could easily run a snappy-job to select files within these archive files with the same schema and read in parallel using org.apache.spark.sql.DataFrameReader.csv