Commit graph

319 commits

Author SHA1 Message Date
pezcurrel
d502fde347 Removed useless, redundant code filtering the list of instances at the beginning of the “crawl” function: filtering already happens before adding an instance to next round list 2023-12-29 12:21:50 +01:00
pezcurrel
a677d3301b Added “-i, --includedead” option explanation to help text 2023-12-29 11:05:04 +01:00
pezcurrel
83868504d7 Super-optimized it; made it a bit more verbose and clearer 2023-12-29 10:04:05 +01:00
pezcurrel
9ece96fcd9 Modified according to new locale directory layout 2023-12-29 10:02:03 +01:00
pezcurrel
560db4d69d Renamed “donefp” var to “resurrfp” 2023-12-29 10:01:16 +01:00
pezcurrel
6a3948783f Began working on checks 2023-12-28 12:37:38 +01:00
pezcurrel
943c65b2ba Added regexps check before adding an inst to “Instances” and “Peers” tables 2023-12-27 21:55:33 +01:00
pezcurrel
738fa8c864 Added “.matdoes.dev” exclusion 2023-12-27 21:54:34 +01:00
pezcurrel
61a621e4d2 Added “checkspam” action 2023-12-27 21:53:44 +01:00
pezcurrel
6a53523a87 A script to download and parse Anti-Meta Fedi Pact’s list of instances blocking Threads; first commit 2023-12-27 16:46:58 +01:00
pezcurrel
9d3230877e Changed to not eecutable 2023-12-27 16:42:59 +01:00
pezcurrel
540c9d4440 Optimized, fixed a flaw 2023-12-27 16:42:31 +01:00
pezcurrel
0ed8165a53 Added “fedipact.php” 2023-12-27 16:41:29 +01:00
pezcurrel
8e5cc30412 Optimized “dryrun” behavior, fixed some flaws 2023-12-27 16:40:32 +01:00
pezcurrel
f31aeaf1db First commit 2023-12-27 08:50:36 +01:00
pezcurrel
7e6048f939 Fixed a bug in “Threads” status detection code 2023-12-27 08:50:19 +01:00
pezcurrel
3069415a2b First commit 2023-12-27 08:39:54 +01:00
pezcurrel
5e34f0f1f3 Removed executable file property 2023-12-27 00:59:32 +01:00
pezcurrel
bfbd28d8d3 Fixed some flaws 2023-12-27 00:46:49 +01:00
pezcurrel
7840524d0f First commit 2023-12-27 00:46:32 +01:00
pezcurrel
b2fa12b541 Updated to consider the new directory layout (all library files in “web/lib”) 2023-12-26 23:41:04 +01:00
pezcurrel
5a473b6064 When the list of moderated instances can be accessed, make “Threads” field default to “accessible” 2023-12-26 23:40:18 +01:00
pezcurrel
3ae455a459 Now Threads blocking status gets set for each instance into “Threads” column in “Instances” table 2023-12-26 23:11:01 +01:00
pezcurrel
c62aa9d9bb Removed useless .htaccess files 2023-12-26 11:22:58 +01:00
pezcurrel
721d892f64 Refactored directory tree 2023-12-26 11:17:54 +01:00
pezcurrel
272f7a2cd2 Added “BlockListAvailable” check 2023-12-24 22:36:38 +01:00
pezcurrel
a416c53915 Fixed “find+rm” command 2023-12-07 09:21:04 +01:00
pezcurrel
ce5280af47 Use “find -exec” instead of plain “rm” to delete all files in “run/” directory, to avoid hitting kernel ARG_MAX 2023-12-04 16:01:26 +01:00
pezcurrel
0ad282329d Made it consider site/lib/*.php too 2023-11-02 22:16:48 +01:00
pezcurrel
1be1194b62 Automatically add language code in masthelp.pot and fix “{singular}” and “{plural}” in en.masthelp.po 2023-11-02 20:58:56 +01:00
pezcurrel
b0e2ac7a5a Check if lang is defined in supplangs 2023-11-02 20:57:21 +01:00
pezcurrel
49a923e95f Using supplangs; other minor changes 2023-11-02 08:10:56 +01:00
pezcurrel
c45e8058c2 Using supplangs; other, minor changes 2023-11-02 08:09:50 +01:00
pezcurrel
41182ecb5b Renamed from “utstodate.php” 2023-11-02 08:07:55 +01:00
pezcurrel
9a606c059a Renamed from “uts2date.php” 2023-11-02 08:05:05 +01:00
pezcurrel
56c02456d6 First commit 2023-11-02 08:04:23 +01:00
pezcurrel
6ae7e07e77 Fixed bug in loading dead instances 2023-10-13 17:37:17 +02:00
pezcurrel
c453ba6e3f Added “revive” action; minor changes 2023-10-02 16:28:47 +02:00
pezcurrel
56900c9caa Added “require” for “parsetime.php”, it was missing 2023-07-01 20:38:55 +02:00
pezcurrel
ccc2826508 Added “links” key even to “urls” array in commented part 2023-07-01 20:25:47 +02:00
pezcurrel
fa68d393ac Added “links” key “hitspage” array 2023-07-01 20:12:38 +02:00
pezcurrel
69a320481c Added languages to languages array 2023-06-27 16:17:13 +02:00
pezcurrel
c1a4a29edc Added waituntilonline function 2023-01-18 07:55:41 +01:00
pezcurrel
4767f1f2df Now it works from the “git root”, so file references in translation files are correct 2023-01-07 19:30:41 +01:00
pezcurrel
f69f2853db Trap more signals; added shutdown function 2023-01-07 12:54:07 +01:00
pezcurrel
861fdb5345 Removed useless extra newline in signalHandler 2023-01-07 12:53:25 +01:00
pezcurrel
50b1b880a4 Made error messages in case of failed charset setting report mysql error number too 2023-01-06 22:06:40 +01:00
pezcurrel
0be17601f8 Made it run peerscrawl.php with a gracetime of 1 year once after the 26th of the month; minor changes 2023-01-06 17:08:07 +01:00
pezcurrel
2c66514b37 Made it run crawler.php with a gracetime of 1 year on first of the month; minor changes 2023-01-06 17:06:44 +01:00
pezcurrel
662e789bbf Removed useless ability to write to a log file; other minor changes 2023-01-06 17:05:47 +01:00
pezcurrel
b22befdec2 Removed “excludead” option: use a high “gracetime” when needed; other minor changes 2023-01-06 17:03:41 +01:00
pezcurrel
3f656e0fa1 Removed useless ability to write to a log file; other minor changes 2023-01-06 17:02:04 +01:00
pezcurrel
874479fac7 Maintainance script, unifying crawl.bash and backup.bash; first commit 2023-01-05 20:08:13 +01:00
pezcurrel
f88b08a85a Added eecho function; other minor changes 2023-01-05 20:03:36 +01:00
pezcurrel
83d3c6b5c5 Changed “discovered instances” to “instances responded” in message “working on ...” 2023-01-04 13:41:55 +01:00
pezcurrel
25fd358666 Commented out “tar” line for “run” dir: took long and wasn’t very useful 2023-01-04 13:40:53 +01:00
pezcurrel
3f86fc689b “$instanswered” got set to true if nodeinfo responded; now it gets set to true only if api/v2/instance or api/v1/instance responded 2023-01-04 13:39:10 +01:00
pezcurrel
b114deba61 Report elapsed time in message for finished procs 2023-01-04 13:37:05 +01:00
pezcurrel
1d754f9588 Fixed getlangid: it used $lang instead of $code 2023-01-03 00:11:59 +01:00
pezcurrel
95bd83786e Fixed a bug that caused “languages” to be always saved as “ourlanguages” 2023-01-02 08:37:07 +01:00
pezcurrel
6f296a1d27 Fixed writing langs ids to InstOurLangs; made detected languages fall back to declared languages when they can't be detected 2023-01-01 19:50:57 +01:00
pezcurrel
fa97078fe4 Passing calling __LINE__ through getlangsidsarr and getlangsid the right way 2023-01-01 19:26:39 +01:00
pezcurrel
87fbbbd2f9 Added validity check on declared and detected lang codes to getlangid function - when lang code is not valide it falls back to “en” default; added a getlangsidsarr function to cope with possible dupes and spare code 2023-01-01 19:21:21 +01:00
pezcurrel
c039d85d76 A little script to fix InstLangs, InstOurLangs and Languages table as of 2023-01-01 2023-01-01 18:23:34 +01:00
pezcurrel
24b9f9b911 Made it canonicalize lang codes in getlangid 2023-01-01 18:22:48 +01:00
pezcurrel
a4486403c7 Made language fetching normalize language code to a form with _ insted of - 2023-01-01 16:51:20 +01:00
pezcurrel
5ffd28f906 Added nl language 2023-01-01 11:14:25 +01:00
pezcurrel
0863c32703 Added InstChecks cleaning to “clean” action 2022-12-31 09:46:28 +01:00
pezcurrel
a4215c943d Added a notify() to getlangid() to be done if Languages contains more than 1 record with given language code 2022-12-31 07:19:00 +01:00
pezcurrel
0c2f4e6b27 Added “intness” check on values from “tag trands history” 2022-12-31 07:13:02 +01:00
pezcurrel
b8e84c72c5 Fixed a misleading comment 2022-12-30 19:20:48 +01:00
pezcurrel
b40e474e76 Added code to set TotChecks and OkChecks 2022-12-30 18:12:07 +01:00
pezcurrel
b9392a0c25 Little script to set Instances.TotChecks and Instances.OkChecks according to records stored in InstChecks table 2022-12-30 17:16:11 +01:00
pezcurrel
f2d4119dfa Fixed bug on domain block entity format check 2022-12-30 14:07:22 +01:00
pezcurrel
b39eafdb80 Fixed getlangid missing $opts; fixed array_pop on inexistent $languages var 2022-12-29 12:49:38 +01:00
pezcurrel
94bfec5f78 “$list” paramater of function “crawl” is now passed by reference; “$list” now gets unset in any case after it has been looped through; this hopefully will decrease a bit the amount of memory used by the script 2022-12-29 09:28:18 +01:00
pezcurrel
ec33912ff8 Restructured a bit the language management code 2022-12-28 23:47:23 +01:00
pezcurrel
db749d2e7d Consider the possibility that “our languages” have been locked 2022-12-28 19:17:05 +01:00
pezcurrel
b929d06302 Don’t INSERT an instance if it did not respond: it was useful before instance “deadness” was autonomously managed by peerscrawl.php 2022-12-28 18:59:09 +01:00
pezcurrel
e8d588c0f2 Refactored language management 2022-12-28 18:34:57 +01:00
pezcurrel
2b37228a1c Fixed some flaws in detecting if Thumb and AdmAvatar are to be set to “unavailable”; fixed “noindex” logic, now it also explictly set AdmAccount to special “OPTED OUT” value when noindex=true 2022-12-28 17:09:25 +01:00
pezcurrel
870e3524ca Fixed $maxround not set to actual max round 2022-12-28 17:06:39 +01:00
pezcurrel
6f5de9730e When server thumb or admin avatar are unavailable, set them to “unavailable” 2022-12-28 07:01:29 +01:00
pezcurrel
cf158ceb73 Now it makes a tar.xz with “run” directory contents and remove them before running commands 2022-12-28 05:25:56 +01:00
pezcurrel
ce66aa56e9 Added instance blocks support 2022-12-27 23:02:31 +01:00
pezcurrel
430c35e17a Little change in progress string 2022-12-27 23:01:39 +01:00
pezcurrel
73469fa012 Default pool size back to 10 2022-12-27 15:39:46 +01:00
pezcurrel
c0b9c19469 Added keys check on instance info fetched from api v2 and v1; subordinated language checks to “$instaswered”; made get_toot_languages better cope with possible errors 2022-12-27 09:05:31 +01:00
pezcurrel
f7f1ac4cb2 logfile has .gii.log extension, handy to select only these files when run from crawler.php 2022-12-26 22:10:20 +01:00
pezcurrel
ea0118d445 cmd now prepends exec to command; pipes get closed 2022-12-26 22:09:18 +01:00
pezcurrel
463ef7cd37 Removed a dangling “}” which was breaking the script 2022-12-26 18:06:54 +01:00
pezcurrel
ff2d6c09a8 Removed “peers.all” file; made the script write to “peers files” only on exit (be it clean or by interruption) 2022-12-26 16:40:10 +01:00
pezcurrel
44b6456695 Made check on peers more strict, made the script abandon checking when a peers entry is malformed 2022-12-26 15:27:14 +01:00
pezcurrel
d301d25fcc Removed witches.live, have to check why on the server it takes so long to check its peers, here it doesn’t, it’s probably just that we should increase the server ram and-or cpu 2022-12-26 15:02:50 +01:00
pezcurrel
3f7a5ff69c Moved $graceline definition up, fixing a bug; made peers checking msgs more informative 2022-12-26 15:01:35 +01:00
pezcurrel
97f5b99654 Removed ckpeers function, it was overkill; added a preliminary check for “stringness” to the checks on each peer 2022-12-26 14:51:42 +01:00
pezcurrel
a4b2ae731c Added witches.live since its peers list is huge and full of .activitypub-troll.cf 2022-12-26 14:50:26 +01:00
pezcurrel
926b1b0d73 Added ckpeers function to check if the json array returned by api/v1/instance/peers is well formed 2022-12-26 14:12:06 +01:00
pezcurrel
3c1621df1d Added “LastOkCheckTS” to $instints (array of Instances columns of integer type) 2022-12-26 13:29:19 +01:00
pezcurrel
e820845775 Consider instances which have LastOkCheckTS=null but InsertTS>=$graceline as not dead, and to be checked 2022-12-26 13:28:09 +01:00