0
votes

I started playing with streaming on my Community Edition Databricks but after some minutes of producing test events I encountered some problem. I believe it's somehow connected with the fact of some temporary small files produced during streaming process. I would like to find them and remove, but can't find where are they stored. My exception is

com.databricks.api.base.DatabricksServiceException: QUOTA_EXCEEDED: You have exceeded the maximum number of allowed files on Databricks Community Edition. To ensure free access, you are limited to 10000 files and 10 GB of storage in DBFS. Please use dbutils.fs to list and clean up files to restore service. You may have to wait a few minutes after cleaning up the files for the quota to be refreshed. (Files found: 11492);

And I have tried to run some shell script to find out the number of files per each folder but unfortunately I cannot find suspicious, mostly lib, usr and other folder containing system or python files are there, cannot find anything that could be produced by my streaming. This script I use

find / -maxdepth 2 -mindepth 1 -type d | while read dir; do
  printf "%-25.25s : " "$dir"
  find "$dir" -type f | wc -l
done

Where can I find the reason for too many files problem? Maybe it's not connected to Streaming at all?

To make it clear, I have not uploaded many custom files to /FileStore

1

1 Answers

0
votes

It looks like you have only checked for files on the local filesystem and not DBFS itself. You can take a look at DBFS by running the following cell in a Databricks notebook:

%sh
fs ls /

or:

%python
dbutils.fs.ls("/")

You could check for files there and remove them with dbutils.fs.rm or fs rm. Also take a look at the /tmp folder on DBFS and delete any files there.