sciacquone/cmd/crawlerWrapper/README.md

1.7 KiB

crawlerWrapper

Run

run it with

$GOPATH/bin/crawlerWrapper wc

and it will count characters of your requests. Wow.

Job submission

Submit with

curl localhost:8123/submit -X POST -i -d '{"requestid": "026bff12-66c9-4b02-868c-cb3bbee1c08f", "Job": {"data": "foobar"}}'

yes, the requestid MUST be a valid UUIDv4. Well, you can omit it, but you shouldn't. The Job field must be a a JSON object. Strings, arrays, numbers will not work fine. It must be an object, but it can be any object. This basically means that your worker should probably understand JSON.

If you chose wc as worker, it will count the bytes for the literal string {"data": "foobar"} without doing any JSON parsing.

Worker output

You might expect that stdout coming from the worker is handled in some way. That's not exact. A worker is supposed to deal about the rest of the chain by itself, so no, the output is not automatically fed to some other worker or anything else.

However, there is at least a way to have a notification of completed job. This is done adding two fields to the job submission.

curl localhost:8123/submit -X POST -i -d '{"requestid": "'$(python3 -c 'import uuid; print(uuid.uuid4());')'", "ResponseRequested": true, "PingbackURL": "http://google.it/"}'

Here we added two fields; ResponseRequested is a boolean specifying if we care about knowing that the command has completed. If unspecified, it is false. If it is true, then a POST will be made to the URL specified in PingbackURL. The format of this POST is specified in crawler.JobCompletionNotice. Please notice that the POST will be done even on worker error.