1
0
Fork 0
forked from avana/sciacquone
sciacquone/cmd/crawlerWrapper/README.md

47 lines
1.7 KiB
Markdown
Raw Permalink Normal View History

2018-05-07 00:54:45 +02:00
crawlerWrapper
==================
Run
-----
run it with
```
$GOPATH/bin/crawlerWrapper wc
```
and it will count characters of your requests. Wow.
Job submission
-----------------
Submit with
```
curl localhost:8123/submit -X POST -i -d '{"requestid": "026bff12-66c9-4b02-868c-cb3bbee1c08f", "Job": {"data": "foobar"}}'
```
yes, the requestid MUST be a valid UUIDv4. Well, you can omit it, but you shouldn't. The `Job` field must be a
a JSON object. Strings, arrays, numbers will **not** work fine. It must be an object, but it can be any
object.
This basically means that your worker should probably understand JSON.
If you chose `wc` as worker, it will count the bytes for the literal string `{"data": "foobar"}` without doing
any JSON parsing.
Worker output
---------------
You might expect that stdout coming from the worker is handled in some way. That's not exact. A worker is
supposed to deal about the rest of the chain by itself, so no, the output is not automatically fed to some
other worker or anything else.
However, there is at least a way to have a notification of completed job. This is done adding two fields to
the job submission.
```
curl localhost:8123/submit -X POST -i -d '{"requestid": "'$(python3 -c 'import uuid; print(uuid.uuid4());')'", "ResponseRequested": true, "PingbackURL": "http://google.it/"}'
```
Here we added two fields; `ResponseRequested` is a boolean specifying if we care about knowing that the
command has completed. If unspecified, it is false. If it is true, then a `POST` will be made to the URL
specified in `PingbackURL`. The format of this POST is specified in `crawler.JobCompletionNotice`. Please
notice that the `POST` will be done even on worker error.