Distributed scheduler
more details refer sourcedcode Distributed scheduler
This application can be run on different hosts and offers functionality to
schedule execution of arbitrary command at particular time or periodically.
There are two ways to communicate with application: gRPC and REST. Remote
interfaces are
specified in dsched.proto file
Corresponding REST API could be also found over there in form of API
annotations. We also provide generated Swagger files.
To specify task execution timing, we are using notation adopted by cron.
Scheduled tasks are stored in file and loaded automatically during startup.
Building
Install gRPC
Install gRPC gateway
To parse crontab statements and schedule task execution, we are using gopkg.in/robfig/cron.v2 library.
So it should be installed also: go get -u gopkg.in/robfig/cron.v2. Documentation could be found here
Get dsched package: go get
-u gitlab.com/andreynech/dsched
Now it is possible to run standard go build command in dscheduler and
gateway directories to generate binaries for scheduler and REST/JSON API
gateway. It might be also helpful to examine our
CI configuration file to see how we
set up building environment.
Running
All the scheduling functionality is implemented by dscheduler executable. So
it could be run on system startup or on demand. As described by dscheduler --help,
there are two command line parameters:
-i string - File name to store task list (default "/var/run/dscheduler.db")
-p string - Endpoint to listen (default ":50051")
If there is a need to offer REST/JSON API, gateway application located in
gateway directory should be run. It could reside on the same host as
dscheduler, but typically it would be other host which is accessible over
HTTP from outside and at the same way can talk to dscheduler running in
internal network. This setup was also the reason to split scheduler and
gateway in two executables. gateway is mostly generated application and
supports several command-line parameters described by running gateway --help.
Important parameter is -sched_endpoint string which is endpoint of Scheduler
service (default "localhost:50051"). It specifies the host name and port
where dscheduler is listening for requests.
Scheduling tasks (testing)
There are three ways to control scheduler server:
Using Go client implemented in cli/ directory
Using Python client implemented in py_cli directory
Using REST/JSON API gateway and curl
Go and Python clients have similar set of command line parameters.
$ ./cli --help
Usage of cli:
-a string
The command to execute at time specified by -c parameter
-c string
Statement in crontab format describes when to execute the command
-e string
Host:port to connect (default "localhost:50051")
-l List scheduled tasks
-p Purge all scheduled tasks
-r int
Remove the task with specified id from schedule
-s Schedule task. -c and -a arguments are required in this case
They are using gRPC protocol to talk to scheduler server. Here are several
example invocations:
$ ./cli -l list currently scheduled tasks
$ ./cli -s -c "@every 0h00m10s" -a "df" schedule df command for
execution every 10 seconds
$ ./cli -s -c "0 30 * * * *" -a "ls -l" schedule ls -l command to
run every 30 minutes
$ ./cli -r 3 remove task with ID 3
$ ./cli -p remove all scheduled tasks
It is also possible to use curl to invoke dscheduler functionality over
REST/JSON API gateway. Assuming that dscheduler and gateway applications
are running, here are some invocations to list, add and remove scheduling
entries from the same host (localhost):
curl 'http://localhost:8080/v1/scheduler/list' list currently scheduled tasks
curl -d '{"id":0, "cron":"@every 0h00m10s", "action":"ls"}' -X POST 'http://localhost:8080/v1/scheduler/add' schedule ls command for execution every 10 seconds
curl -d '{"id":0, "cron":"0 30 * * * *", "action":"ls -l"}' -X POST 'http://localhost:8080/v1/scheduler/add' schedule ls -l to run every 30 minutes
curl -d '{"id":2}' -X POST 'http://localhost:8080/v1/scheduler/remove' remove task with ID 2.
curl -X POST 'http://localhost:8080/v1/scheduler/removeall' remove all scheduled tasks
All changes are automatically saved in file.
Thoughts on scheduler service discovery
In large deployment scenarios (like hundreds of hosts) it might be
challenging problem to find out all IP addresses and ports where scheduler
service is started. It would be pretty easy to add support for Zeroconf
(Bonjour/Avahi) technology to simplify service discovery. As alternative, it
might be possible to implement something similar to CORBA Naming Service
where running services register themself and location of naming service is
well known. We decide to collect feedback before deciding for particular
service discovery implementation. So your input very welcome!