1
0
Fork 0
mirror of https://gitlab.com/news-flash/article_scraper.git synced 2025-07-07 16:15:32 +02:00
Commit graph

320 commits

Author SHA1 Message Date
Jan Lukas Gernert
9f349f8c6f need reqwest streams 2025-05-04 18:00:59 +02:00
Jan Lukas Gernert
498008f630 bump version 2025-05-04 17:51:30 +02:00
Jan Lukas Gernert
ee53f58aeb Merge branch 'empty-body' into 'master'
check for empty http response and parsed documents without root element

See merge request news-flash/article_scraper!11
2025-05-04 15:50:59 +00:00
Jan Lukas Gernert
06990acbc0 fix libxml CI build 2025-05-04 17:38:46 +02:00
Jan Lukas Gernert
f361392c04 check for empty http response and parsed documents without root element 2025-05-04 17:34:33 +02:00
Jan Lukas Gernert
9b374a28c7 update ftr-site-config 2025-04-05 15:47:08 +02:00
Jan Lukas Gernert
b92500fca2 better error messages 2025-04-05 15:45:41 +02:00
Jan Lukas Gernert
0978335d3b [f] ignore url harvest error 2025-03-28 17:18:03 +01:00
Jan Lukas Gernert
9f56ed03b8 article_scraper: don't specify reqwest features 2025-03-10 13:42:31 +01:00
Jan Lukas Gernert
8cfcd6d9f3 clippy 2025-01-17 03:05:55 +01:00
Jan Lukas Gernert
ca1cc47af1 update CI image 2025-01-17 03:02:40 +01:00
Jan Lukas Gernert
7c658a4ba8 resolver 2 2025-01-17 02:58:41 +01:00
Jan Lukas Gernert
89eb87fa85 update thiserror, ftr-site-config submodule and bump version 2025-01-17 02:55:59 +01:00
Jan Lukas Gernert
7fcb781c68 remove useless format! 2024-11-02 11:34:47 +01:00
Jan Lukas Gernert
11ee29feda thumbnail: check for attribute with name property as well (fixes #4) 2024-11-02 11:30:55 +01:00
Jan Lukas Gernert
b3ce28632d update submodule 2024-07-10 11:59:21 +02:00
Jan Lukas Gernert
6932902b7b update CI image 2024-07-06 23:43:23 +02:00
Jan Lukas Gernert
c16e11fdda init parser according to (https://gitlab.gnome.org/GNOME/libxml2/-/wikis/Thread-safety) 2024-07-06 23:38:43 +02:00
Jan Lukas Gernert
f4e4e64b9e absolute default size for embedded youtube videos 2024-06-10 22:27:10 +02:00
Jan Lukas Gernert
df8ebcbb35 treat iframes as valid emtry tags 2024-06-10 22:06:48 +02:00
Jan Lukas Gernert
e01c8e9d34 negative score for thumbnails with emoji alt 2024-06-10 20:40:19 +02:00
Jan Lukas Gernert
06018d98d4 replace emoji images 2024-06-08 23:18:00 +02:00
Jan Lukas Gernert
11e9261bf2 fmt 2024-06-08 01:03:00 +02:00
Jan Lukas Gernert
3e5654e197 fix tests 2024-06-08 01:02:52 +02:00
Jan Lukas Gernert
65b26370a2 update ftr config 2024-03-24 22:11:49 +01:00
Jan Lukas Gernert
a80b8a8274 bump versions 2024-03-24 22:01:34 +01:00
Jan Lukas Gernert
eee7ffee05 update ftr config 2024-03-24 22:00:44 +01:00
Jan Lukas Gernert
e4140ff093 Merge branch 'reqwest-0.12' into 'master'
Reqwest 0.12

See merge request news-flash/article_scraper!10
2024-03-24 20:54:27 +00:00
Jan Lukas Gernert
689a72e6cd reqwest 0.12 2024-03-24 17:54:30 +01:00
Jan Lukas Gernert
0dcebe8b49 fmt 2024-02-13 19:36:58 +01:00
Jan Lukas Gernert
a1ee3b22f9 clippy 2024-02-13 19:35:29 +01:00
Jan Lukas Gernert
b13673ce3b do some null checks before unlinking nodes 2024-02-13 19:06:05 +01:00
Jan Lukas Gernert
ed8a83708b update deps & fix some flaky tests 2024-02-13 17:00:45 +01:00
Jan Lukas Gernert
f9812b556c update ftr config 2023-08-13 16:43:38 +02:00
Jan Lukas Gernert
acb7d1d000 port libxml workaround from hurl 2023-08-10 02:09:07 +02:00
Jan Lukas Gernert
6116ba38ae no need for head 2023-08-10 02:06:52 +02:00
Jan Lukas Gernert
8c7cdacd26 Revert "generate full html document"
This reverts commit 0133b20f06.
2023-08-10 02:06:08 +02:00
Jan Lukas Gernert
0133b20f06 generate full html document 2023-08-10 00:01:31 +02:00
Jan Lukas Gernert
1584649eb4 fix tests 2023-08-10 00:01:10 +02:00
Jan Lukas Gernert
2c76a89f9d add spiegel test 2023-08-09 23:57:25 +02:00
Jan Lukas Gernert
9aa6478e3c update heise test 2023-08-09 23:25:07 +02:00
Jan Lukas Gernert
b91014c685 clean html fragments 2023-08-03 10:40:44 +02:00
Jan Lukas Gernert
9c857a1481 Merge branch 'make-article-public' into 'master'
Make `Article` public

See merge request news-flash/article_scraper!9
2023-08-02 09:04:30 +00:00
Leonardo Fedalto
3211b91bad Make Article public 2023-08-01 21:39:48 +02:00
Jan Lukas Gernert
7a4f5c500d 400 2023-08-01 19:35:22 +02:00
Jan Lukas Gernert
a7e8661a09 update tests & defined youtube iframe height 2023-08-01 18:37:55 +02:00
Jan Lukas Gernert
eb1bfdbca0 print url 2023-07-28 07:09:50 +02:00
Jan Lukas Gernert
40f065d9cd allow downloads without content type smaller than 5mb 2023-07-28 07:03:50 +02:00
Jan Lukas Gernert
db007f752c dont clean video tags 2023-07-27 23:18:17 +02:00
Jan Lukas Gernert
bf7a89fef7 don't fail because of lacking content length 2023-07-23 15:39:24 +02:00