Found https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-libraries.html#develop-local-python. The glue-1.0 is only compatibale to Linux OS.
2 Answers
One thing you can try is install docker desktop in windows, then run the docker container in windows.
If you want help on how to setup glue on docker follow this article:
https://towardsdatascience.com/develop-glue-jobs-locally-using-docker-containers-bffc9d95bd1
UPDATE
This AWS Blog covers the below mentioned options on Windows
- Setting up the container to use Jupyter or Zeppelin notebooks
- Setting up the Docker image with PyCharm Professional
- Running against the CLI interpreter
My answer also involves the use of Docker, but uses openjdk:8 as the base image and a different approach to the one in the other answer.
Note: Some Docker commands may need to be changed to work on Windows. I don't have a Windows environment to test them.
Dockerfile
FROM openjdk:8
# Config for Glue-1.0
ENV GLUE_REPO=https://github.com/awslabs/aws-glue-libs.git
ENV SPARK_URL=https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-1.0/spark-2.4.3-bin-hadoop2.8.tgz
ENV MAVEN_URL=https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-common/apache-maven-3.6.0-bin.tar.gz
ENV PYTHON_BIN=python3
RUN mkdir glue
WORKDIR /glue
RUN git clone -b glue-1.0 $GLUE_REPO
RUN apt-get update && apt-get install awscli zip git tar ${PYTHON_BIN} ${PYTHON_BIN}-pip -y
ADD ${MAVEN_URL} /tmp/maven.tar.gz
ADD ${SPARK_URL} /tmp/spark.tar.gz
RUN tar zxvf /tmp/maven.tar.gz -C ~/ && tar zxvf /tmp/spark.tar.gz -C ~/ && rm -rf /tmp/*
RUN echo 'export SPARK_HOME="$(ls -d /root/*spark*)"; export MAVEN_HOME="$(ls -d /root/*maven*)"; export PATH="$PATH:$MAVEN_HOME/bin:$SPARK_HOME/bin:/glue/bin"' >> ~/.bashrc
ENV PYSPARK_PYTHON "${PYTHON_BIN}"
RUN pip3 install pytest boto3 moto
RUN bash -l -c 'bash ~/.profile && bash /glue/aws-glue-libs/bin/glue-setup.sh'
Build image
docker build -t awsglue/dev-1.0 .
Create a container
docker run -it --name glue-1.0 awsglue/dev-1.0
I prefer to mount my source code directory to the container and keep the container running in a separate terminal to submit jobs or just use the shell. You may choose the approach that suits you. Optionally, if you're going to use AWS SDK in your code, you may also want to mount the credentials location.
docker run -it --mount src=C:\Users\username\.aws,target=/root/.aws,type=bind --mount src=C:\path\to\src,target=/glue/src,type=bind --name glue-1.0 awsglue/dev-1.0
Use the commands below to start, stop or exec into the container
docker start glue-1.0
docker stop glue-1.0
docker exec -it glue-1.0 /bin/bash
And once inside, use the below to start Glue shell
./aws-glue-libs/bin/gluepyspark
Or submit a job
./aws-glue-libs/bin/gluesparksubmit src/job_name.py