11
votes

Memory error occurs in amazon sagemaker when preprocessing 2 gb of data which is stored in s3. No problem in loading the data. Dimension of data is 7 million rows and 64 columns. One hot encoding is also not possible. Doing so results in memory error. Notebook instance is ml.t2.medium. How to solve this issue?

2
I ran into a similar problem. I opened a terminal (via Jupyter) on the same SageMaker machine. There is plenty of memory, both ram and disk (using free and df to check). It looks like a bug. Everything is working fine in the terminal, and I can allocate memory from there (eg by creating large objects in a Python REPL). - Tyler
Was a solution ever found for this? I'm running into it over a year later. - Valevalorin

2 Answers

3
votes

I assume you're processing on the data on the notebook instance, right? t2.medium has only 4GB of RAM, so it's quite possible you're simply running out of memory.

Have you tried a larger instance? The specs are here: https://aws.amazon.com/sagemaker/pricing/instance-types/

0
votes

Can you cut a AWS forum post under, https://forums.aws.amazon.com/forum.jspa?forumID=285? with your question. That way, SageMaker team would be able to help you out.