2
votes

Found https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-libraries.html#develop-local-python. The glue-1.0 is only compatibale to Linux OS.

2
have you tried setting it up in windows? Did you encountered any issues? - Prabhakar Reddy

2 Answers

0
votes

One thing you can try is install docker desktop in windows, then run the docker container in windows.

If you want help on how to setup glue on docker follow this article:

https://towardsdatascience.com/develop-glue-jobs-locally-using-docker-containers-bffc9d95bd1

0
votes

UPDATE

This AWS Blog covers the below mentioned options on Windows

  • Setting up the container to use Jupyter or Zeppelin notebooks
  • Setting up the Docker image with PyCharm Professional
  • Running against the CLI interpreter

My answer also involves the use of Docker, but uses openjdk:8 as the base image and a different approach to the one in the other answer.

Note: Some Docker commands may need to be changed to work on Windows. I don't have a Windows environment to test them.

Dockerfile

FROM openjdk:8

# Config for Glue-1.0
ENV GLUE_REPO=https://github.com/awslabs/aws-glue-libs.git
ENV SPARK_URL=https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-1.0/spark-2.4.3-bin-hadoop2.8.tgz
ENV MAVEN_URL=https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-common/apache-maven-3.6.0-bin.tar.gz
ENV PYTHON_BIN=python3

RUN mkdir glue
WORKDIR /glue
RUN git clone -b glue-1.0 $GLUE_REPO
RUN apt-get update && apt-get install awscli zip git tar ${PYTHON_BIN} ${PYTHON_BIN}-pip -y

ADD ${MAVEN_URL} /tmp/maven.tar.gz
ADD ${SPARK_URL} /tmp/spark.tar.gz

RUN tar zxvf /tmp/maven.tar.gz -C ~/ && tar zxvf /tmp/spark.tar.gz -C ~/ && rm -rf /tmp/*
RUN echo 'export SPARK_HOME="$(ls -d /root/*spark*)"; export MAVEN_HOME="$(ls -d /root/*maven*)"; export PATH="$PATH:$MAVEN_HOME/bin:$SPARK_HOME/bin:/glue/bin"' >> ~/.bashrc
ENV PYSPARK_PYTHON "${PYTHON_BIN}"

RUN pip3 install pytest boto3 moto
RUN bash -l -c 'bash ~/.profile && bash /glue/aws-glue-libs/bin/glue-setup.sh'

Build image

docker build -t awsglue/dev-1.0 .

Create a container

docker run -it --name glue-1.0 awsglue/dev-1.0

I prefer to mount my source code directory to the container and keep the container running in a separate terminal to submit jobs or just use the shell. You may choose the approach that suits you. Optionally, if you're going to use AWS SDK in your code, you may also want to mount the credentials location.

docker run -it --mount src=C:\Users\username\.aws,target=/root/.aws,type=bind --mount src=C:\path\to\src,target=/glue/src,type=bind --name glue-1.0 awsglue/dev-1.0

Use the commands below to start, stop or exec into the container

docker start glue-1.0
docker stop glue-1.0
docker exec -it  glue-1.0 /bin/bash

And once inside, use the below to start Glue shell

./aws-glue-libs/bin/gluepyspark

Or submit a job

./aws-glue-libs/bin/gluesparksubmit src/job_name.py