which are the best/typical use cases for each of them? Some document
says python shell job is suitable for simple jobs whereas spark for
more complicated jobs, is that correct?
AWS Glue is quick development facility/service for ETL jobs, given by AWS.
IMHO it is very quick development if you know what needs to be done in your etl pipeline.
Glue has components like Discover, Develop, Deploy.
In Discover... automatic crawling (run or schedule a crawler multiple times) is the important feature which differentiates with other tools I observed.
Glue has seems like integration feature to connect to AWS eco system services (where as spark you need to do it)
Typical use case of AWS Glue could be...
1) Load data from Dataware houses.
2) Build a data lake on amazon s3 .
See this presentation of AWS for more insight.
Custom Spark Job also can do the same thing, but it needs to be developed from the scratch. and it doesnt have in built automatic crawling kind of feature.
But if you develop a spark job for etl you have fine grained control to implement complicated jobs.
Both glue, spark has same goal for ETL. AFAIK, Glue is for simple jobs such as loading from source to destination. Where as Spark job can do wide variety of transformations in a controlled way.
Conclusion : For simple use cases of ETL (which can be done with out much development experience ) go with Glue. For customized ETL
which has many dependencies/transformations go with spark job.