15
votes

I try to create an exe file using PyInstaller 3.2.1, for test purpose I tried to make an exe for following code:

import pandas as pd
print('hello world')

After considerable amount of time (15mins +) I finished with dist folder as big as 620 MB and build - 150 MB. I work on Windows using Python 3.5.2 |Anaconda custom (64-bit). Might be worth noting that in dist folder mkl files are responsible for almost 300 MB. I run pyinstaller using 'pyinstaller.exe foo.py'. I tried using --exclude-module to exclude some dependencies, still ended up with huge files. Whether I use onefile or onedir doesn't make any difference.

I am aware that exe must contain some important files but is it normal to be as big as almost 1 GB? I can provide warning log if necessary or anything that could be helpful to solve the matter.

P.S. In parallel my coworker created an exe from same sample script and ended up with less than 100 MB, difference is he is not using anaconda. Could that be the matter?

Any help will be appreciated.

6

6 Answers

15
votes

PyInstaller create the big executable from the conda packages and the small executable from the pip packages. From this simple python code:

from pandas import DataFrame as df
print('h')

I obtain the 203MB executable by the conda packages and the 30MB executable by the pip packages. But conda is the nice replacement of the pure virtualenv. I can develop with conda and Jupyter, create some mycode.py (I can download jupyter notebook as py-file in myfolder). But my final solution is next: If you do not have it, install Miniconda and from the Windows Start Menu open Anaconda Prompt;

    cd myfolder
    conda create -n exe python=3
    activate exe
    pip install pandas pyinstaller pypiwin32
    echo hiddenimports = ['pandas._libs.tslibs.timedeltas'] > %CONDA_PREFIX%\Lib\site-packages\PyInstaller\hooks\hook-pandas.py
    pyinstaller -F mycode.py

Where I create new environment 'exe', pypiwin32 need for pyinstaller but is not installed automaticaly, hook-pandas.py need for compile with pandas. Also, import submodules do not help me optimize the size of the executable file. So I do not need this thing:

from pandas import DataFrame as df

but I can just use the usual code:

import pandas as pd

Also, some errors are possible along using the national letters in paths, so it is nice the english user account for development tools.

8
votes

This is probably because the Anaconda version of numpy is built using mkl.

If you want to reduce the size of the distributable, you could work with a seperate building virtual environment with the packages installed through pip instead of conda

6
votes

Here's a way to still be using conda and avoid mkl. Install numpy before installing pandas with this alternate command:
conda install -c conda-forge numpy

Avoids mkl, uses an OpenBLAS package in its place. Full explanation in this issue at conda/conda-forge/numpy-feedstock github repo.

4
votes

A simple solution while working with Anaconda:

-Make a new environment inside Anaconda Navigator. (The new environment is free from the large amounts of packages that are causing the problem.)

-Open a terminal and use pipinstall to include the packages you need. ( Make sure it is in the new environment)

-Run pyinstaller.

I reduced my .exe from 300 MB to 30 MB.

2
votes

I have the Anaconda 3.5.5 build for Python on Windows 10 and was also getting excessively large executables using the Anaconda distribution.

I was able to correct this by doing the following:

  1. First create a virtual environment (forums suggest virtualenv, but this gave me problems so instead I used venv)

    python -m venv C:/Python/NewEnv
    

This creates a virtual environment inside C:/Python/NewEnv with base python, pip and setuptools

  1. Next switch to the newly created environment

    C:/Python/NewEnv/Scripts/activate
    

You'll know that the environment is different as your command prompt will be prefaced with your new environment name (NewEnv)

  1. Install numpy first, then scipy, then pandas

    pip install numpy==1.13.3
    pip install scipy==1.1.0
    pip install pandas==0.18.1
    pip install pypiwin32==223
    pip install pyinstaller==3.2
    

I had to use these versions as I've tried different ones, but any later version of pandas were giving me further issues.

  1. Once these have been installed you can compile your program

    C:/Python/NewEnv/Scripts/pyinstaller --onefile program.py
    
  2. This will create a .spec file, which you'll need to modify with this version of pandas and pyinstaller to add hidden imports otherwise loading pandas from the executable will fail (Not sure if there's a pyinstaller command to just create the spec file, but if there is then rather do that - see ammendment#1)

There will be a hidden imports line inside the newly created .spec file:

    hiddenimports=[],

Change this to add pandas._libs.tslibs.timedeltas

    hiddenimports=['pandas._libs.tslibs.timedeltas'],
  1. Then you can compile your program again against the .spec file

    C:/Python/NewEnv/Scripts/pyinstaller --onefile program.spec
    

Note that this will install the program in whichever directory you are in so change directories before executing pyinstaller.

Ammendmend#1: I see that it's possible to add the hook-pandas.py to the Pyinstaller hooks. So after you install pyinstaller in the new environment, run

    echo hiddenimports = ['pandas._libs.tslibs.timedeltas'] > C:\Python\NewEnv\Lib\site-packages\PyInstaller\hooks\hook-pandas.py
-1
votes

You need pure python environment, No Anaconda.

Because, it has too many useless packages. Install new python environment on another PC with as few package as possible!

Then try to use pyinstaller again. With this method, pyinstaller reduced the file from 200M to 8M.

PS: If you lack of some packages, you can pip install ...