Jan Lukas Gernert
|
c16e11fdda
|
init parser according to (https://gitlab.gnome.org/GNOME/libxml2/-/wikis/Thread-safety)
|
2024-07-06 23:38:43 +02:00 |
|
Jan Lukas Gernert
|
f4e4e64b9e
|
absolute default size for embedded youtube videos
|
2024-06-10 22:27:10 +02:00 |
|
Jan Lukas Gernert
|
df8ebcbb35
|
treat iframes as valid emtry tags
|
2024-06-10 22:06:48 +02:00 |
|
Jan Lukas Gernert
|
e01c8e9d34
|
negative score for thumbnails with emoji alt
|
2024-06-10 20:40:19 +02:00 |
|
Jan Lukas Gernert
|
06018d98d4
|
replace emoji images
|
2024-06-08 23:18:00 +02:00 |
|
Jan Lukas Gernert
|
11e9261bf2
|
fmt
|
2024-06-08 01:03:00 +02:00 |
|
Jan Lukas Gernert
|
3e5654e197
|
fix tests
|
2024-06-08 01:02:52 +02:00 |
|
Jan Lukas Gernert
|
65b26370a2
|
update ftr config
|
2024-03-24 22:11:49 +01:00 |
|
Jan Lukas Gernert
|
a80b8a8274
|
bump versions
|
2024-03-24 22:01:34 +01:00 |
|
Jan Lukas Gernert
|
eee7ffee05
|
update ftr config
|
2024-03-24 22:00:44 +01:00 |
|
Jan Lukas Gernert
|
e4140ff093
|
Merge branch 'reqwest-0.12' into 'master'
Reqwest 0.12
See merge request news-flash/article_scraper!10
|
2024-03-24 20:54:27 +00:00 |
|
Jan Lukas Gernert
|
689a72e6cd
|
reqwest 0.12
|
2024-03-24 17:54:30 +01:00 |
|
Jan Lukas Gernert
|
0dcebe8b49
|
fmt
|
2024-02-13 19:36:58 +01:00 |
|
Jan Lukas Gernert
|
a1ee3b22f9
|
clippy
|
2024-02-13 19:35:29 +01:00 |
|
Jan Lukas Gernert
|
b13673ce3b
|
do some null checks before unlinking nodes
|
2024-02-13 19:06:05 +01:00 |
|
Jan Lukas Gernert
|
ed8a83708b
|
update deps & fix some flaky tests
|
2024-02-13 17:00:45 +01:00 |
|
Jan Lukas Gernert
|
f9812b556c
|
update ftr config
|
2023-08-13 16:43:38 +02:00 |
|
Jan Lukas Gernert
|
acb7d1d000
|
port libxml workaround from hurl
|
2023-08-10 02:09:07 +02:00 |
|
Jan Lukas Gernert
|
6116ba38ae
|
no need for head
|
2023-08-10 02:06:52 +02:00 |
|
Jan Lukas Gernert
|
8c7cdacd26
|
Revert "generate full html document"
This reverts commit 0133b20f06 .
|
2023-08-10 02:06:08 +02:00 |
|
Jan Lukas Gernert
|
0133b20f06
|
generate full html document
|
2023-08-10 00:01:31 +02:00 |
|
Jan Lukas Gernert
|
1584649eb4
|
fix tests
|
2023-08-10 00:01:10 +02:00 |
|
Jan Lukas Gernert
|
2c76a89f9d
|
add spiegel test
|
2023-08-09 23:57:25 +02:00 |
|
Jan Lukas Gernert
|
9aa6478e3c
|
update heise test
|
2023-08-09 23:25:07 +02:00 |
|
Jan Lukas Gernert
|
b91014c685
|
clean html fragments
|
2023-08-03 10:40:44 +02:00 |
|
Jan Lukas Gernert
|
9c857a1481
|
Merge branch 'make-article-public' into 'master'
Make `Article` public
See merge request news-flash/article_scraper!9
|
2023-08-02 09:04:30 +00:00 |
|
Leonardo Fedalto
|
3211b91bad
|
Make Article public
|
2023-08-01 21:39:48 +02:00 |
|
Jan Lukas Gernert
|
7a4f5c500d
|
400
|
2023-08-01 19:35:22 +02:00 |
|
Jan Lukas Gernert
|
a7e8661a09
|
update tests & defined youtube iframe height
|
2023-08-01 18:37:55 +02:00 |
|
Jan Lukas Gernert
|
eb1bfdbca0
|
print url
|
2023-07-28 07:09:50 +02:00 |
|
Jan Lukas Gernert
|
40f065d9cd
|
allow downloads without content type smaller than 5mb
|
2023-07-28 07:03:50 +02:00 |
|
Jan Lukas Gernert
|
db007f752c
|
dont clean video tags
|
2023-07-27 23:18:17 +02:00 |
|
Jan Lukas Gernert
|
bf7a89fef7
|
don't fail because of lacking content length
|
2023-07-23 15:39:24 +02:00 |
|
Jan Lukas Gernert
|
345518253a
|
even if img has src
|
2023-07-22 20:03:32 +02:00 |
|
Jan Lukas Gernert
|
42eb9daf65
|
remove lazy loading attributes
|
2023-07-22 19:57:38 +02:00 |
|
Jan Lukas Gernert
|
d562d41b81
|
download single image
|
2023-07-16 21:40:10 +02:00 |
|
Jan Lukas Gernert
|
be40383b1a
|
impl from reqwest error
|
2023-07-16 15:17:01 +02:00 |
|
Jan Lukas Gernert
|
d62aa8c31a
|
clippy fixes
|
2023-06-29 19:59:38 +02:00 |
|
Jan Lukas Gernert
|
fcec0d83ee
|
don't move content nodes to <article> root node
could fix potential crash?
|
2023-06-29 19:47:49 +02:00 |
|
Jan Lukas Gernert
|
fdb8d9a97e
|
small fixes
|
2023-06-27 19:21:26 +02:00 |
|
Jan Lukas Gernert
|
4fd41d98cc
|
add fn to parse thumbnail from html
|
2023-06-26 23:22:08 +02:00 |
|
Jan Lukas Gernert
|
e32015c1d0
|
add mercury leading image heuristics
|
2023-06-26 22:25:57 +02:00 |
|
Jan Lukas Gernert
|
e99a4b4f23
|
ignore test resources
|
2023-06-23 21:22:37 +02:00 |
|
Jan Lukas Gernert
|
a7983e873d
|
(cargo-release) version 2.0.0
|
2023-06-23 21:17:19 +02:00 |
|
Jan Lukas Gernert
|
a036d03510
|
use ftr-site-config fork with heise patch
|
2023-06-23 21:15:36 +02:00 |
|
Jan Lukas Gernert
|
a31956531a
|
fix download loop
|
2023-06-22 00:15:57 +02:00 |
|
Jan Lukas Gernert
|
582834cdf1
|
fixes
|
2023-06-21 23:48:09 +02:00 |
|
Jan Lukas Gernert
|
e0ccd7e0b3
|
split download & parsing
|
2023-06-21 23:04:21 +02:00 |
|
Jan Lukas Gernert
|
99c5f6220e
|
fix golem test
|
2023-06-21 23:04:08 +02:00 |
|
Jan Lukas Gernert
|
d8ceee1403
|
remove <h1/2> duplicating the title
|
2023-04-30 09:24:00 +02:00 |
|