pezcurrel
|
56c02456d6
|
First commit
|
2023-11-02 08:04:23 +01:00 |
|
pezcurrel
|
6ae7e07e77
|
Fixed bug in loading dead instances
|
2023-10-13 17:37:17 +02:00 |
|
pezcurrel
|
c453ba6e3f
|
Added “revive” action; minor changes
|
2023-10-02 16:28:47 +02:00 |
|
pezcurrel
|
56900c9caa
|
Added “require” for “parsetime.php”, it was missing
|
2023-07-01 20:38:55 +02:00 |
|
pezcurrel
|
ccc2826508
|
Added “links” key even to “urls” array in commented part
|
2023-07-01 20:25:47 +02:00 |
|
pezcurrel
|
fa68d393ac
|
Added “links” key “hitspage” array
|
2023-07-01 20:12:38 +02:00 |
|
pezcurrel
|
69a320481c
|
Added languages to languages array
|
2023-06-27 16:17:13 +02:00 |
|
pezcurrel
|
c1a4a29edc
|
Added waituntilonline function
|
2023-01-18 07:55:41 +01:00 |
|
pezcurrel
|
4767f1f2df
|
Now it works from the “git root”, so file references in translation files are correct
|
2023-01-07 19:30:41 +01:00 |
|
pezcurrel
|
f69f2853db
|
Trap more signals; added shutdown function
|
2023-01-07 12:54:07 +01:00 |
|
pezcurrel
|
861fdb5345
|
Removed useless extra newline in signalHandler
|
2023-01-07 12:53:25 +01:00 |
|
pezcurrel
|
50b1b880a4
|
Made error messages in case of failed charset setting report mysql error number too
|
2023-01-06 22:06:40 +01:00 |
|
pezcurrel
|
0be17601f8
|
Made it run peerscrawl.php with a gracetime of 1 year once after the 26th of the month; minor changes
|
2023-01-06 17:08:07 +01:00 |
|
pezcurrel
|
2c66514b37
|
Made it run crawler.php with a gracetime of 1 year on first of the month; minor changes
|
2023-01-06 17:06:44 +01:00 |
|
pezcurrel
|
662e789bbf
|
Removed useless ability to write to a log file; other minor changes
|
2023-01-06 17:05:47 +01:00 |
|
pezcurrel
|
b22befdec2
|
Removed “excludead” option: use a high “gracetime” when needed; other minor changes
|
2023-01-06 17:03:41 +01:00 |
|
pezcurrel
|
3f656e0fa1
|
Removed useless ability to write to a log file; other minor changes
|
2023-01-06 17:02:04 +01:00 |
|
pezcurrel
|
874479fac7
|
Maintainance script, unifying crawl.bash and backup.bash; first commit
|
2023-01-05 20:08:13 +01:00 |
|
pezcurrel
|
f88b08a85a
|
Added eecho function; other minor changes
|
2023-01-05 20:03:36 +01:00 |
|
pezcurrel
|
83d3c6b5c5
|
Changed “discovered instances” to “instances responded” in message “working on ...”
|
2023-01-04 13:41:55 +01:00 |
|
pezcurrel
|
25fd358666
|
Commented out “tar” line for “run” dir: took long and wasn’t very useful
|
2023-01-04 13:40:53 +01:00 |
|
pezcurrel
|
3f86fc689b
|
“$instanswered” got set to true if nodeinfo responded; now it gets set to true only if api/v2/instance or api/v1/instance responded
|
2023-01-04 13:39:10 +01:00 |
|
pezcurrel
|
b114deba61
|
Report elapsed time in message for finished procs
|
2023-01-04 13:37:05 +01:00 |
|
pezcurrel
|
1d754f9588
|
Fixed getlangid: it used $lang instead of $code
|
2023-01-03 00:11:59 +01:00 |
|
pezcurrel
|
95bd83786e
|
Fixed a bug that caused “languages” to be always saved as “ourlanguages”
|
2023-01-02 08:37:07 +01:00 |
|
pezcurrel
|
6f296a1d27
|
Fixed writing langs ids to InstOurLangs; made detected languages fall back to declared languages when they can't be detected
|
2023-01-01 19:50:57 +01:00 |
|
pezcurrel
|
fa97078fe4
|
Passing calling __LINE__ through getlangsidsarr and getlangsid the right way
|
2023-01-01 19:26:39 +01:00 |
|
pezcurrel
|
87fbbbd2f9
|
Added validity check on declared and detected lang codes to getlangid function - when lang code is not valide it falls back to “en” default; added a getlangsidsarr function to cope with possible dupes and spare code
|
2023-01-01 19:21:21 +01:00 |
|
pezcurrel
|
c039d85d76
|
A little script to fix InstLangs, InstOurLangs and Languages table as of 2023-01-01
|
2023-01-01 18:23:34 +01:00 |
|
pezcurrel
|
24b9f9b911
|
Made it canonicalize lang codes in getlangid
|
2023-01-01 18:22:48 +01:00 |
|
pezcurrel
|
a4486403c7
|
Made language fetching normalize language code to a form with _ insted of -
|
2023-01-01 16:51:20 +01:00 |
|
pezcurrel
|
5ffd28f906
|
Added nl language
|
2023-01-01 11:14:25 +01:00 |
|
pezcurrel
|
0863c32703
|
Added InstChecks cleaning to “clean” action
|
2022-12-31 09:46:28 +01:00 |
|
pezcurrel
|
a4215c943d
|
Added a notify() to getlangid() to be done if Languages contains more than 1 record with given language code
|
2022-12-31 07:19:00 +01:00 |
|
pezcurrel
|
0c2f4e6b27
|
Added “intness” check on values from “tag trands history”
|
2022-12-31 07:13:02 +01:00 |
|
pezcurrel
|
b8e84c72c5
|
Fixed a misleading comment
|
2022-12-30 19:20:48 +01:00 |
|
pezcurrel
|
b40e474e76
|
Added code to set TotChecks and OkChecks
|
2022-12-30 18:12:07 +01:00 |
|
pezcurrel
|
b9392a0c25
|
Little script to set Instances.TotChecks and Instances.OkChecks according to records stored in InstChecks table
|
2022-12-30 17:16:11 +01:00 |
|
pezcurrel
|
f2d4119dfa
|
Fixed bug on domain block entity format check
|
2022-12-30 14:07:22 +01:00 |
|
pezcurrel
|
b39eafdb80
|
Fixed getlangid missing $opts; fixed array_pop on inexistent $languages var
|
2022-12-29 12:49:38 +01:00 |
|
pezcurrel
|
94bfec5f78
|
“$list” paramater of function “crawl” is now passed by reference; “$list” now gets unset in any case after it has been looped through; this hopefully will decrease a bit the amount of memory used by the script
|
2022-12-29 09:28:18 +01:00 |
|
pezcurrel
|
ec33912ff8
|
Restructured a bit the language management code
|
2022-12-28 23:47:23 +01:00 |
|
pezcurrel
|
db749d2e7d
|
Consider the possibility that “our languages” have been locked
|
2022-12-28 19:17:05 +01:00 |
|
pezcurrel
|
b929d06302
|
Don’t INSERT an instance if it did not respond: it was useful before instance “deadness” was autonomously managed by peerscrawl.php
|
2022-12-28 18:59:09 +01:00 |
|
pezcurrel
|
e8d588c0f2
|
Refactored language management
|
2022-12-28 18:34:57 +01:00 |
|
pezcurrel
|
2b37228a1c
|
Fixed some flaws in detecting if Thumb and AdmAvatar are to be set to “unavailable”; fixed “noindex” logic, now it also explictly set AdmAccount to special “OPTED OUT” value when noindex=true
|
2022-12-28 17:09:25 +01:00 |
|
pezcurrel
|
870e3524ca
|
Fixed $maxround not set to actual max round
|
2022-12-28 17:06:39 +01:00 |
|
pezcurrel
|
6f5de9730e
|
When server thumb or admin avatar are unavailable, set them to “unavailable”
|
2022-12-28 07:01:29 +01:00 |
|
pezcurrel
|
cf158ceb73
|
Now it makes a tar.xz with “run” directory contents and remove them before running commands
|
2022-12-28 05:25:56 +01:00 |
|
pezcurrel
|
ce66aa56e9
|
Added instance blocks support
|
2022-12-27 23:02:31 +01:00 |
|
pezcurrel
|
430c35e17a
|
Little change in progress string
|
2022-12-27 23:01:39 +01:00 |
|
pezcurrel
|
73469fa012
|
Default pool size back to 10
|
2022-12-27 15:39:46 +01:00 |
|
pezcurrel
|
c0b9c19469
|
Added keys check on instance info fetched from api v2 and v1; subordinated language checks to “$instaswered”; made get_toot_languages better cope with possible errors
|
2022-12-27 09:05:31 +01:00 |
|
pezcurrel
|
f7f1ac4cb2
|
logfile has .gii.log extension, handy to select only these files when run from crawler.php
|
2022-12-26 22:10:20 +01:00 |
|
pezcurrel
|
ea0118d445
|
cmd now prepends exec to command; pipes get closed
|
2022-12-26 22:09:18 +01:00 |
|
pezcurrel
|
463ef7cd37
|
Removed a dangling “}” which was breaking the script
|
2022-12-26 18:06:54 +01:00 |
|
pezcurrel
|
ff2d6c09a8
|
Removed “peers.all” file; made the script write to “peers files” only on exit (be it clean or by interruption)
|
2022-12-26 16:40:10 +01:00 |
|
pezcurrel
|
44b6456695
|
Made check on peers more strict, made the script abandon checking when a peers entry is malformed
|
2022-12-26 15:27:14 +01:00 |
|
pezcurrel
|
d301d25fcc
|
Removed witches.live, have to check why on the server it takes so long to check its peers, here it doesn’t, it’s probably just that we should increase the server ram and-or cpu
|
2022-12-26 15:02:50 +01:00 |
|
pezcurrel
|
3f7a5ff69c
|
Moved $graceline definition up, fixing a bug; made peers checking msgs more informative
|
2022-12-26 15:01:35 +01:00 |
|
pezcurrel
|
97f5b99654
|
Removed ckpeers function, it was overkill; added a preliminary check for “stringness” to the checks on each peer
|
2022-12-26 14:51:42 +01:00 |
|
pezcurrel
|
a4b2ae731c
|
Added witches.live since its peers list is huge and full of .activitypub-troll.cf
|
2022-12-26 14:50:26 +01:00 |
|
pezcurrel
|
926b1b0d73
|
Added ckpeers function to check if the json array returned by api/v1/instance/peers is well formed
|
2022-12-26 14:12:06 +01:00 |
|
pezcurrel
|
3c1621df1d
|
Added “LastOkCheckTS” to $instints (array of Instances columns of integer type)
|
2022-12-26 13:29:19 +01:00 |
|
pezcurrel
|
e820845775
|
Consider instances which have LastOkCheckTS=null but InsertTS>=$graceline as not dead, and to be checked
|
2022-12-26 13:28:09 +01:00 |
|
pezcurrel
|
61da12100f
|
Removed comment with Instances columns
|
2022-12-26 12:29:08 +01:00 |
|
pezcurrel
|
ebc458cc2c
|
Removed “resurrect” option and references to Instances.Dead
|
2022-12-26 12:28:21 +01:00 |
|
pezcurrel
|
1e1b2a99e9
|
Dropped Instances.Dead, using Instances.LastOkCheckTS now instead
|
2022-12-26 12:25:15 +01:00 |
|
pezcurrel
|
9b3cca9a45
|
Script to reasonably set Instances.LastOkCheckTS, first commit
|
2022-12-26 12:21:47 +01:00 |
|
pezcurrel
|
95b9ccfc31
|
Renamed “LastCheckOk” to “WasLastCheckOk”
|
2022-12-26 05:30:35 +01:00 |
|
pezcurrel
|
00caa1dcb9
|
Changed default for “deadline” option from 62 to 31 days
|
2022-12-26 05:17:59 +01:00 |
|
pezcurrel
|
44f437c928
|
Translated initial comment, made it more terse
|
2022-12-26 05:09:09 +01:00 |
|
pezcurrel
|
5312aea0cc
|
Added writing server rules in the db
|
2022-12-26 05:08:17 +01:00 |
|
pezcurrel
|
337eb32f51
|
Made mail optional, inactive by default
|
2022-12-25 23:45:31 +01:00 |
|
pezcurrel
|
429ab42ff5
|
Added 2 debug messages stating how mani dead instances the script got from Instances and Peers tables
|
2022-12-25 18:55:54 +01:00 |
|
pezcurrel
|
119c9119c2
|
Made the logic for “deadline” much more terse
|
2022-12-25 18:41:13 +01:00 |
|
pezcurrel
|
acde202b2e
|
Made it write summary from crawl run into a log file of its own
|
2022-12-25 18:40:21 +01:00 |
|
pezcurrel
|
ba171bd5f2
|
Use specifically bash since we use &> redirect
|
2022-12-25 11:43:30 +01:00 |
|
pezcurrel
|
ec9b65e42f
|
Little script to run peerscrawl.php in loop; first commit
|
2022-12-25 11:32:49 +01:00 |
|
pezcurrel
|
10e2e1b58a
|
Added “lecho” for “message levels”, removed “gecho”, removed “verbose” option; removed “loop” option (do loop from a shell script if needed)
|
2022-12-25 11:32:08 +01:00 |
|
pezcurrel
|
1d0c6b799a
|
Small edit to “logminmsglev” and “tuiminmsglev” TUI option parsing errors
|
2022-12-25 11:29:34 +01:00 |
|
pezcurrel
|
d95bc70b8a
|
Exposed “deadline” option; minor changes
|
2022-12-25 09:47:04 +01:00 |
|
pezcurrel
|
c0802de828
|
Removed “restore” option: could work, but it’s not very useful and would require a big hassle; added loops and new found instances counters; made sighandler use mexit
|
2022-12-25 09:24:23 +01:00 |
|
pezcurrel
|
d6b77b0e29
|
Removed option “-p peers” from crawler cmdline because now peerscrawl directly writes new instances into the db
|
2022-12-24 08:59:33 +01:00 |
|
pezcurrel
|
9fabb3853b
|
Infatti
|
2022-12-23 19:13:37 +01:00 |
|
pezcurrel
|
96aa6f3aa9
|
Quella roba lì
|
2022-12-23 19:12:18 +01:00 |
|
pezcurrel
|
05fed0142c
|
Lowered a bit default values for “timeout” and “curltimeout”
|
2022-12-23 11:23:32 +01:00 |
|
pezcurrel
|
89a2ea0b26
|
Fixed “trending tags” ordering and fetching
|
2022-12-23 11:22:25 +01:00 |
|
pezcurrel
|
edee66b834
|
Temporarily disabled “restore” option because it needs more work to actually work
|
2022-12-22 15:32:30 +01:00 |
|
pezcurrel
|
61d0fcb3d8
|
Added “loop” option allowing to run the crawl in an infinite loop or until sig(int|hup|term) received; other minor changes
|
2022-12-22 15:05:55 +01:00 |
|
pezcurrel
|
6477e8812f
|
Exposed “curltimeout” option; changed “timeout” default from 5 to 10; changed “curltimeout” default from 10 to 20
|
2022-12-22 14:24:48 +01:00 |
|
pezcurrel
|
6d4ce26f98
|
Adapted “restore” code to the new workings; minor changes and fixes
|
2022-12-22 14:04:29 +01:00 |
|
pezcurrel
|
c27053314a
|
Added code to store and consider “instance checks” made by the script to independently mark peers ad dead
|
2022-12-22 11:32:18 +01:00 |
|
pezcurrel
|
706c831e23
|
Little change in a message
|
2022-12-22 11:28:29 +01:00 |
|
pezcurrel
|
f8cdf2cf3b
|
Changed check against “activity” values, which are strings, not integers
|
2022-12-22 07:40:41 +01:00 |
|
pezcurrel
|
c6c3feb500
|
Removed leftovers of “jsonwrite” option
|
2022-12-22 07:05:21 +01:00 |
|
pezcurrel
|
277296512c
|
Explicitly set idn_to_ascii flags, otherwise with php 7.3 it complained
|
2022-12-21 22:15:40 +01:00 |
|
pezcurrel
|
9316e686b9
|
Bir rewrite, made it shorter and hopefully a bit more readable
|
2022-12-21 22:07:05 +01:00 |
|
pezcurrel
|
732ea79480
|
Moved $mastodons definition upper
|
2022-12-21 22:06:10 +01:00 |
|
pezcurrel
|
1c524ffd69
|
Moved mysqli_close after the optional loading of dead instances from the db; renamed $eta to $tet
|
2022-12-21 22:05:15 +01:00 |
|