forked from avana/sciacquone
46 lines
1.7 KiB
Markdown
46 lines
1.7 KiB
Markdown
crawlerWrapper
|
|
==================
|
|
|
|
Run
|
|
-----
|
|
|
|
|
|
run it with
|
|
```
|
|
$GOPATH/bin/crawlerWrapper wc
|
|
```
|
|
and it will count characters of your requests. Wow.
|
|
|
|
Job submission
|
|
-----------------
|
|
|
|
Submit with
|
|
```
|
|
curl localhost:8123/submit -X POST -i -d '{"requestid": "026bff12-66c9-4b02-868c-cb3bbee1c08f", "Job": {"data": "foobar"}}'
|
|
```
|
|
yes, the requestid MUST be a valid UUIDv4. Well, you can omit it, but you shouldn't. The `Job` field must be a
|
|
a JSON object. Strings, arrays, numbers will **not** work fine. It must be an object, but it can be any
|
|
object.
|
|
This basically means that your worker should probably understand JSON.
|
|
|
|
If you chose `wc` as worker, it will count the bytes for the literal string `{"data": "foobar"}` without doing
|
|
any JSON parsing.
|
|
|
|
Worker output
|
|
---------------
|
|
|
|
You might expect that stdout coming from the worker is handled in some way. That's not exact. A worker is
|
|
supposed to deal about the rest of the chain by itself, so no, the output is not automatically fed to some
|
|
other worker or anything else.
|
|
|
|
However, there is at least a way to have a notification of completed job. This is done adding two fields to
|
|
the job submission.
|
|
|
|
```
|
|
curl localhost:8123/submit -X POST -i -d '{"requestid": "'$(python3 -c 'import uuid; print(uuid.uuid4());')'", "ResponseRequested": true, "PingbackURL": "http://google.it/"}'
|
|
```
|
|
|
|
Here we added two fields; `ResponseRequested` is a boolean specifying if we care about knowing that the
|
|
command has completed. If unspecified, it is false. If it is true, then a `POST` will be made to the URL
|
|
specified in `PingbackURL`. The format of this POST is specified in `crawler.JobCompletionNotice`. Please
|
|
notice that the `POST` will be done even on worker error.
|