Commit graph

324 commits

Author SHA1 Message Date
pezcurrel
f4aa3cb804 A script to set InsertTS field to something suitable when it is null 2022-12-18 18:24:35 +01:00
pezcurrel
c5debcb463 A function to delete an Instance record by ID, and all references to it in other tables; first commit 2022-12-18 18:23:52 +01:00
pezcurrel
ccc9f517fd Moved code to delete an Instances record and all its references in other tables to function delinstbyid in lib/delinstbyid.php; minor changes 2022-12-18 11:44:11 +01:00
pezcurrel
2b0e2398ae Made validhostname accept only valid hostnames :-)) (no ports or path specs) 2022-12-18 11:42:32 +01:00
pezcurrel
32251d1ba8 Added “deleteinstswhere” action 2022-12-18 11:41:09 +01:00
pezcurrel
7d2875075b Deleted 2022-12-18 09:35:24 +01:00
pezcurrel
690b54521b Moved 2022-12-18 09:35:04 +01:00
pezcurrel
f269bb901d Little cosmetic change 2022-12-18 07:00:49 +01:00
pezcurrel
e9b88d6735 Made $jsonfp be written into run dir 2022-12-18 07:00:19 +01:00
pezcurrel
a3ada274e7 Removed stdout/err redirect in cmd; passing proper descriptor and pipe to proc_open; minor changes 2022-12-18 06:59:25 +01:00
pezcurrel
a32a25e095 Many many changes :-)) 2022-12-18 00:34:27 +01:00
pezcurrel
d6dd03694c Removed $context 2022-12-17 22:54:32 +01:00
pezcurrel
441d16a42d ckratelimit goes to sleep only when x-ratelimit-remaining==0; can spit debug info; limit fetching chunks from users directories is now 40 2022-12-17 22:54:02 +01:00
pezcurrel
ca4367b719 Removed the unlinking attempt at lockfp before exit: it was already done before by shutdown; other little changes about open files closing and the like 2022-12-17 18:43:13 +01:00
pezcurrel
e5ad18e619 Fixed a typo 2022-12-17 18:40:55 +01:00
pezcurrel
5c605cbe5b Some little cosmetic (readability of log files) changes 2022-12-17 18:40:22 +01:00
pezcurrel
2571396253 Tuned to recent changes in crawler.php (and getinstinfo.php) 2022-12-17 17:36:46 +01:00
pezcurrel
ad8fa26306 Made mysql connection and charset setting errors more graceful; added “users” page to updstats; other minor changes 2022-12-17 17:35:35 +01:00
pezcurrel
2d1d28b002 Fixed regexp checking if max_charcters is an integer; made mexit use eecho again, moving the closing of logf after eecho(s); made logf be opened only if logminmsglev < 4 2022-12-17 17:33:46 +01:00
pezcurrel
d1f088a026 Command for subprocesses gets now built on the fly using cmd function; logfile doesn’t get opened if logminmsglev < 4; other minor changes 2022-12-17 17:31:24 +01:00
pezcurrel
6d897cfdff Removed “crawlernew” directory 2022-12-17 15:03:11 +01:00
pezcurrel
c7d5b50377 Adapted to new crawler version 2022-12-17 15:02:52 +01:00
pezcurrel
0b9e892aef Splitted old crawler.php in 2; this is the part that coordinates 2022-12-17 15:02:20 +01:00
pezcurrel
7629a1caae Moved from subdir “crawlernew” 2022-12-17 15:00:36 +01:00
pezcurrel
b46469bfbb Cope with mysql errors even with php ver. < 8; check if $link is false before trying to close mysql connection in function mexit 2022-12-16 22:39:51 +01:00
pezcurrel
e46a82d923 Added suffix “s” to option “-t” in $cmd definition; cope with mysql errors even with php ver. < 8; other small changes 2022-12-16 22:38:16 +01:00
pezcurrel
3804171253 Renamed getfc to gurl 2022-12-16 21:59:26 +01:00
pezcurrel
d75a6445ae Fixed sleep and relative message to actually use $options['udirfailst']; changed some options’ default; changed a bit the help text; changed a bit some messages in the “fetchusers” section 2022-12-16 21:23:14 +01:00
pezcurrel
b666fcfdda Use parsetime for “--timeout” too, changed a bit the help text accordingly 2022-12-16 21:19:34 +01:00
pezcurrel
2d31b7ca79 Changed a bit how option “jsonwrite” works; other minor changes 2022-12-16 19:25:45 +01:00
pezcurrel
f193bd294c A very tiny db test, first commit 2022-12-16 19:12:17 +01:00
pezcurrel
9529a938a7 Lots of changes, not very important :-)) 2022-12-16 19:06:47 +01:00
pezcurrel
bab5bd5dd2 Added mysqli_query error management for older php versions to function myq; minor changes 2022-12-16 19:05:46 +01:00
pezcurrel
4522bc3ea8 Added mysqli_query error management for older php versions to function myq 2022-12-16 19:02:41 +01:00
pezcurrel
513981b7e2 Fetches info from an instance, first commit 2022-12-16 00:00:37 +01:00
pezcurrel
1cafbe05ea Crawler new version, “multithreaded”, coordinator script, first commit 2022-12-16 00:00:06 +01:00
pezcurrel
3a720a90ac Added a warning when nodeinfo specs couldn’t be fetched; made it set New=1 even when host doesn’t respond and is not in the db 2022-12-15 12:45:20 +01:00
pezcurrel
9360bdc481 Made false positives for “IsMastodon” less likely (impossible?) 2022-12-12 22:40:17 +01:00
pezcurrel
483fbcd103 Added the possibility, for each set of records with the same URI, to choose one record to keep and delete the others, or to automatically keep the record with the lowest ID and delete the others 2022-12-12 21:32:35 +01:00
pezcurrel
90e85f8182 Took away “-t 10” option from crawler.php calls since 10 is now its default timeout 2022-12-12 17:06:51 +01:00
pezcurrel
e07fba673d Fixed call to non-existent function ”mysq” to “myq” 2022-12-12 08:36:18 +01:00
pezcurrel
ec6324fb4f Made langs() shorten to a maximum of 5 elements the $languages array 2022-12-12 08:29:18 +01:00
pezcurrel
d80ba5ddc4 Fixed double “,” in a query inside langs() 2022-12-12 08:24:26 +01:00
pezcurrel
6f9260e08e myq() did not return results, now it does 2022-12-12 08:17:01 +01:00
pezcurrel
f6752a34bc Added function “myq” as a wrapper for mysqli_query managing exceptions; used it throughout the whole script 2022-12-12 08:12:29 +01:00
pezcurrel
2649e7d137 Info from nodeinfo didn’t end up into $info, now they do 2022-12-12 00:47:06 +01:00
pezcurrel
783e54a9f9 Little script to convert a unix timestamp in input to a date 2022-12-11 23:33:09 +01:00
pezcurrel
1701a2cfe6 Little script to search for records with the same “URI” in Instances table 2022-12-11 23:32:42 +01:00
pezcurrel
853be9f0e0 Little script to fix uris beginning with “https://” 2022-12-11 23:31:47 +01:00
pezcurrel
b16515f4e8 Lots of changes :-)) 2022-12-11 23:29:51 +01:00
pezcurrel
61ad655a62 Disabled fetching profile’s page when “noindex” is not set in account because it takes too long; disabled featured tags fetching fro the same reason; other minor changes 2022-12-10 23:32:58 +01:00
pezcurrel
f343cb702e Changed some eecho messages importance 2022-12-10 13:57:30 +01:00
pezcurrel
4b7f6a199c Added truncs where needed; added code to check for “noindex” on user’s profile page when “noindex” is not set in accounts info 2022-12-10 12:35:22 +01:00
pezcurrel
18ce06871b Added ckratelimit() where useful; made it more flexible with lowercasing every header key; more work on fetching users from users directories 2022-12-09 22:53:18 +01:00
pezcurrel
8341f0e209 Fixed a cosmetic bug; some more work into users directories fetching 2022-12-09 19:25:44 +01:00
pezcurrel
ffd20debe6 Removed executable attribute 2022-12-08 14:35:55 +01:00
pezcurrel
667a0824ff Script to “normalize” language codes in the database 2022-12-08 13:55:47 +01:00
pezcurrel
5d073632f5 Added “pt_BR” language 2022-12-08 13:54:59 +01:00
pezcurrel
5aa4cde8f7 Added “pt_BR” language 2022-12-08 13:54:20 +01:00
pezcurrel
09765e566e Added “normalization” of language codes (dash to underscore) and “pt_BR” language 2022-12-08 13:53:43 +01:00
pezcurrel
6c2672ea00 Fixed and added some “gecho” messages 2022-12-08 00:07:31 +01:00
pezcurrel
8ea4acc17a Fixed wrong path in MAILCFG; stderr redirect to logfile; other minor changes 2022-12-08 00:06:16 +01:00
pezcurrel
c3a90ba6b8 Adding “fetchuser” option; changed “timeout” default from 5 to 10 seconds 2022-12-08 00:03:10 +01:00
pezcurrel
a07e59d52c Added option “--moreclauses” 2022-12-05 21:18:58 +01:00
pezcurrel
b0d2438d8e Updated with regexps matching new spammers 2022-12-04 08:51:03 +01:00
pezcurrel
32a8c1587c Using Kate instead of nano 2022-12-03 22:24:00 +01:00
pezcurrel
a41115f5ac First commit 2022-12-03 21:18:09 +01:00
pezcurrel
d64b0697a8 Backups dir for backup.bash 2022-12-03 15:20:01 +01:00
pezcurrel
876a76c6bb Crawl script 2022-12-03 14:48:14 +01:00
pezcurrel
0530de7d08 Backup script 2022-12-03 14:47:28 +01:00
pezcurrel
a9b5e72373 Added fedibird, ecko and hometown to the regex deciding whether an instance is mastodon or not 2022-12-03 14:44:59 +01:00
pezcurrel
c4b1a53439 Added German language (de) 2022-12-02 16:29:24 +01:00
pezcurrel
63b9271299 Translated to english; added “resurrect” action 2022-12-02 16:08:10 +01:00
pezcurrel
632202b69c Added “Done” eecho in the end of run 2022-12-02 16:07:05 +01:00
pezcurrel
fb357a12d4 Changed short form for option “excludedead” from “-d” to “-E” 2022-12-02 06:04:25 +01:00
pezcurrel
2e37b1dc6e Downgraded from 5.2.0 to 5.1.0 because bida server has php 7.3 and 5.1.0 is the last version supporting it 2022-12-02 04:36:21 +01:00
pezcurrel
64017b615a To import old “Noxious” table dump into new Nox* columns “Instances” 2022-12-01 19:42:45 +01:00
pezcurrel
d795e13e70 Old “Noxious” table dump 2022-12-01 19:41:57 +01:00
pezcurrel
e194bd597f Changed notifications “levels”, removed “eecho”es when “notify” is used since it “eecho”es by itself 2022-12-01 17:42:48 +01:00
pezcurrel
c84a838fbd Added Patrick Shur’s language detection library 2022-12-01 05:47:57 +01:00
pezcurrel
9087c4d015 Declare N as constant 2022-12-01 05:44:06 +01:00
pezcurrel
9bf4128d29 Removed setold, setnoxious actions; limited clean action to Notifications table; other small changes 2022-12-01 05:43:21 +01:00
pezcurrel
92cee555c7 Removed support to Blacklist table 2022-12-01 05:41:54 +01:00
pezcurrel
1eadd2f3ce Removed “StartNodes” loading and “Noxious” table loading 2022-11-30 07:19:14 +01:00
pezcurrel
f110701887 Solved merge conflicts 2022-11-29 18:24:53 +01:00
pezcurrel
a7b745eb75 Added new languages (ar, id) 2022-11-29 17:49:47 +01:00
pezcurrel
1a6446e82b Don’t use “black” as derogatory (“blacklist” -> “list of noxious instances”, etc.) 2022-11-29 17:36:49 +01:00
pezcurrel
f2881318a9 Updated help text 2022-11-29 09:05:57 +01:00
pezcurrel
55e0c82eee Added “clean” action 2022-11-29 08:54:41 +01:00
pezcurrel
9b87447da0 Changed “langs” function so that, if “api/v1/instance” returned a language different from the default en, it assumes it is right, because it has been explicitly set, and avoid doing autodection of languages based on last toots 2022-11-23 20:35:53 +01:00
pezcurrel
86c26009e7 Added “NameGL” to the query in “langs” function (forgot it in previous commit) 2022-11-23 19:32:50 +01:00
pezcurrel
5421041a7c Added gl, uk support into “langs” function 2022-11-23 19:30:32 +01:00
pezcurrel
90b5703d7f Added support to “uk” (Ukrainian) language 2022-11-23 18:57:05 +01:00
pezcurrel
001ec473ba Made it interactive 2022-11-23 18:44:36 +01:00
pezcurrel
5b7d5c9ce6 Added “gl” (galician) to “” array 2022-11-23 13:02:42 +01:00
pezcurrel
f04061782e Updates related to new languages: cs, ru and uk 2022-11-23 10:58:13 +01:00
pezcurrel
e4b7aa2367 Added gl language 2022-11-21 11:22:28 +01:00
pezcurrel
fe1552ed6b Added fa and de languagues 2022-11-21 09:35:55 +01:00
pezcurrel
0082da0c27 Got rid of some redundancies and nailed a bit validhostname() 2022-11-11 21:57:30 +01:00
pezcurrel
e7ec2eba39 Added “optimize” action to optimize every table in the db 2022-11-09 14:44:21 +01:00
pezcurrel
87ff532831 Use gmmktime instead of mktime in pgdatetomy 2022-11-01 07:44:55 +01:00
pezcurrel
628c8ac0a5 Deleted web/clitools/crawl.bash 2022-07-13 16:11:37 +02:00
pezcurrel
a8bcef7a58 Added timestamps to the messages 2022-07-13 12:45:57 +02:00
pezcurrel
dcbd52a04b Fixed a little flaw in get_toot_languages function 2022-05-06 09:21:56 +02:00
pezcurrel
136f99ba24 Rimossa l’opzione «-r» 2022-05-06 06:29:19 +02:00
pezcurrel
5e2139eb25 Varie modifiche 2022-05-06 06:28:29 +02:00
pezcurrel
baf745f315 Commented out useless code 2022-02-12 10:02:07 +01:00
pezcurrel
5c24f1370d (2) Added Galician translation; Updated about page; Updated language-detection 2022-02-12 09:35:20 +01:00
pezcurrel
b2da993c44 aggiornamento traduzioni 2021-02-14 19:03:52 +01:00
pezcurrel
5f51c61ef6 ... 2020-11-01 09:05:03 +01:00
pezcurrel
17bfd27d3a ... 2020-10-31 07:33:40 +01:00
pezcurrel
7da003e8df ... 2020-10-31 07:21:59 +01:00
pezcurrel
b48253094f Modifica tabella e codice per stats 2020-10-31 06:57:34 +01:00
pezcurrel
5efb3b8b23 Bau 2020-10-30 19:33:09 +01:00
pezcurrel
768f4c2a38 ... 2020-10-27 17:07:47 +01:00
pezcurrel
67b3baf25c ... 2020-10-27 16:32:23 +01:00
pezcurrel
a89d7dafb1 ... 2020-10-26 16:44:18 +01:00
pezcurrel
1098d105fa ... 2020-10-26 16:05:59 +01:00
pezcurrel
df7aed49a7 ... 2020-10-23 19:00:43 +02:00
pezcurrel
371e4c9a2e ... 2020-10-22 17:54:05 +02:00
pezcurrel
049c1d839a ... 2020-10-21 15:26:31 +02:00
pezcurrel
20a9cec2b5 ... 2020-10-20 22:06:00 +02:00
pezcurrel
06f702ce71 ... 2020-10-20 16:31:16 +02:00
pezcurrel
5ec1354546 ... 2020-10-18 06:53:27 +02:00