1
0
Fork 0
mirror of https://gitlab.com/news-flash/article_scraper.git synced 2025-07-07 16:15:32 +02:00
Commit graph

299 commits

Author SHA1 Message Date
Jan Lukas Gernert
06018d98d4 replace emoji images 2024-06-08 23:18:00 +02:00
Jan Lukas Gernert
11e9261bf2 fmt 2024-06-08 01:03:00 +02:00
Jan Lukas Gernert
3e5654e197 fix tests 2024-06-08 01:02:52 +02:00
Jan Lukas Gernert
65b26370a2 update ftr config 2024-03-24 22:11:49 +01:00
Jan Lukas Gernert
a80b8a8274 bump versions 2024-03-24 22:01:34 +01:00
Jan Lukas Gernert
eee7ffee05 update ftr config 2024-03-24 22:00:44 +01:00
Jan Lukas Gernert
e4140ff093 Merge branch 'reqwest-0.12' into 'master'
Reqwest 0.12

See merge request news-flash/article_scraper!10
2024-03-24 20:54:27 +00:00
Jan Lukas Gernert
689a72e6cd reqwest 0.12 2024-03-24 17:54:30 +01:00
Jan Lukas Gernert
0dcebe8b49 fmt 2024-02-13 19:36:58 +01:00
Jan Lukas Gernert
a1ee3b22f9 clippy 2024-02-13 19:35:29 +01:00
Jan Lukas Gernert
b13673ce3b do some null checks before unlinking nodes 2024-02-13 19:06:05 +01:00
Jan Lukas Gernert
ed8a83708b update deps & fix some flaky tests 2024-02-13 17:00:45 +01:00
Jan Lukas Gernert
f9812b556c update ftr config 2023-08-13 16:43:38 +02:00
Jan Lukas Gernert
acb7d1d000 port libxml workaround from hurl 2023-08-10 02:09:07 +02:00
Jan Lukas Gernert
6116ba38ae no need for head 2023-08-10 02:06:52 +02:00
Jan Lukas Gernert
8c7cdacd26 Revert "generate full html document"
This reverts commit 0133b20f06.
2023-08-10 02:06:08 +02:00
Jan Lukas Gernert
0133b20f06 generate full html document 2023-08-10 00:01:31 +02:00
Jan Lukas Gernert
1584649eb4 fix tests 2023-08-10 00:01:10 +02:00
Jan Lukas Gernert
2c76a89f9d add spiegel test 2023-08-09 23:57:25 +02:00
Jan Lukas Gernert
9aa6478e3c update heise test 2023-08-09 23:25:07 +02:00
Jan Lukas Gernert
b91014c685 clean html fragments 2023-08-03 10:40:44 +02:00
Jan Lukas Gernert
9c857a1481 Merge branch 'make-article-public' into 'master'
Make `Article` public

See merge request news-flash/article_scraper!9
2023-08-02 09:04:30 +00:00
Leonardo Fedalto
3211b91bad Make Article public 2023-08-01 21:39:48 +02:00
Jan Lukas Gernert
7a4f5c500d 400 2023-08-01 19:35:22 +02:00
Jan Lukas Gernert
a7e8661a09 update tests & defined youtube iframe height 2023-08-01 18:37:55 +02:00
Jan Lukas Gernert
eb1bfdbca0 print url 2023-07-28 07:09:50 +02:00
Jan Lukas Gernert
40f065d9cd allow downloads without content type smaller than 5mb 2023-07-28 07:03:50 +02:00
Jan Lukas Gernert
db007f752c dont clean video tags 2023-07-27 23:18:17 +02:00
Jan Lukas Gernert
bf7a89fef7 don't fail because of lacking content length 2023-07-23 15:39:24 +02:00
Jan Lukas Gernert
345518253a even if img has src 2023-07-22 20:03:32 +02:00
Jan Lukas Gernert
42eb9daf65 remove lazy loading attributes 2023-07-22 19:57:38 +02:00
Jan Lukas Gernert
d562d41b81 download single image 2023-07-16 21:40:10 +02:00
Jan Lukas Gernert
be40383b1a impl from reqwest error 2023-07-16 15:17:01 +02:00
Jan Lukas Gernert
d62aa8c31a clippy fixes 2023-06-29 19:59:38 +02:00
Jan Lukas Gernert
fcec0d83ee don't move content nodes to <article> root node
could fix potential crash?
2023-06-29 19:47:49 +02:00
Jan Lukas Gernert
fdb8d9a97e small fixes 2023-06-27 19:21:26 +02:00
Jan Lukas Gernert
4fd41d98cc add fn to parse thumbnail from html 2023-06-26 23:22:08 +02:00
Jan Lukas Gernert
e32015c1d0 add mercury leading image heuristics 2023-06-26 22:25:57 +02:00
Jan Lukas Gernert
e99a4b4f23 ignore test resources 2023-06-23 21:22:37 +02:00
Jan Lukas Gernert
a7983e873d (cargo-release) version 2.0.0 2023-06-23 21:17:19 +02:00
Jan Lukas Gernert
a036d03510 use ftr-site-config fork with heise patch 2023-06-23 21:15:36 +02:00
Jan Lukas Gernert
a31956531a fix download loop 2023-06-22 00:15:57 +02:00
Jan Lukas Gernert
582834cdf1 fixes 2023-06-21 23:48:09 +02:00
Jan Lukas Gernert
e0ccd7e0b3 split download & parsing 2023-06-21 23:04:21 +02:00
Jan Lukas Gernert
99c5f6220e fix golem test 2023-06-21 23:04:08 +02:00
Jan Lukas Gernert
d8ceee1403 remove <h1/2> duplicating the title 2023-04-30 09:24:00 +02:00
Jan Lukas Gernert
eb4b3603f5 remove artifact 2023-04-29 18:21:21 +02:00
Jan Lukas Gernert
16b102b313 replace multiple <br>s with single <p> 2023-04-29 18:20:58 +02:00
Jan Lukas Gernert
c4f8bd2bc2 fix heise crash: simpler way of checking for ancestor 2023-04-28 15:56:29 +02:00
Jan Lukas Gernert
44d01ad1c6 Merge branch 'hardwareluxx' into 'master'
Hardwareluxx

See merge request news-flash/article_scraper!8
2023-04-28 05:57:37 +00:00