Jan Lukas Gernert
|
acb7d1d000
|
port libxml workaround from hurl
|
2023-08-10 02:09:07 +02:00 |
|
Jan Lukas Gernert
|
6116ba38ae
|
no need for head
|
2023-08-10 02:06:52 +02:00 |
|
Jan Lukas Gernert
|
8c7cdacd26
|
Revert "generate full html document"
This reverts commit 0133b20f06 .
|
2023-08-10 02:06:08 +02:00 |
|
Jan Lukas Gernert
|
0133b20f06
|
generate full html document
|
2023-08-10 00:01:31 +02:00 |
|
Jan Lukas Gernert
|
1584649eb4
|
fix tests
|
2023-08-10 00:01:10 +02:00 |
|
Jan Lukas Gernert
|
2c76a89f9d
|
add spiegel test
|
2023-08-09 23:57:25 +02:00 |
|
Jan Lukas Gernert
|
9aa6478e3c
|
update heise test
|
2023-08-09 23:25:07 +02:00 |
|
Jan Lukas Gernert
|
b91014c685
|
clean html fragments
|
2023-08-03 10:40:44 +02:00 |
|
Jan Lukas Gernert
|
9c857a1481
|
Merge branch 'make-article-public' into 'master'
Make `Article` public
See merge request news-flash/article_scraper!9
|
2023-08-02 09:04:30 +00:00 |
|
Leonardo Fedalto
|
3211b91bad
|
Make Article public
|
2023-08-01 21:39:48 +02:00 |
|
Jan Lukas Gernert
|
7a4f5c500d
|
400
|
2023-08-01 19:35:22 +02:00 |
|
Jan Lukas Gernert
|
a7e8661a09
|
update tests & defined youtube iframe height
|
2023-08-01 18:37:55 +02:00 |
|
Jan Lukas Gernert
|
eb1bfdbca0
|
print url
|
2023-07-28 07:09:50 +02:00 |
|
Jan Lukas Gernert
|
40f065d9cd
|
allow downloads without content type smaller than 5mb
|
2023-07-28 07:03:50 +02:00 |
|
Jan Lukas Gernert
|
db007f752c
|
dont clean video tags
|
2023-07-27 23:18:17 +02:00 |
|
Jan Lukas Gernert
|
bf7a89fef7
|
don't fail because of lacking content length
|
2023-07-23 15:39:24 +02:00 |
|
Jan Lukas Gernert
|
345518253a
|
even if img has src
|
2023-07-22 20:03:32 +02:00 |
|
Jan Lukas Gernert
|
42eb9daf65
|
remove lazy loading attributes
|
2023-07-22 19:57:38 +02:00 |
|
Jan Lukas Gernert
|
d562d41b81
|
download single image
|
2023-07-16 21:40:10 +02:00 |
|
Jan Lukas Gernert
|
be40383b1a
|
impl from reqwest error
|
2023-07-16 15:17:01 +02:00 |
|
Jan Lukas Gernert
|
d62aa8c31a
|
clippy fixes
|
2023-06-29 19:59:38 +02:00 |
|
Jan Lukas Gernert
|
fcec0d83ee
|
don't move content nodes to <article> root node
could fix potential crash?
|
2023-06-29 19:47:49 +02:00 |
|
Jan Lukas Gernert
|
fdb8d9a97e
|
small fixes
|
2023-06-27 19:21:26 +02:00 |
|
Jan Lukas Gernert
|
4fd41d98cc
|
add fn to parse thumbnail from html
|
2023-06-26 23:22:08 +02:00 |
|
Jan Lukas Gernert
|
e32015c1d0
|
add mercury leading image heuristics
|
2023-06-26 22:25:57 +02:00 |
|
Jan Lukas Gernert
|
e99a4b4f23
|
ignore test resources
|
2023-06-23 21:22:37 +02:00 |
|
Jan Lukas Gernert
|
a7983e873d
|
(cargo-release) version 2.0.0
|
2023-06-23 21:17:19 +02:00 |
|
Jan Lukas Gernert
|
a036d03510
|
use ftr-site-config fork with heise patch
|
2023-06-23 21:15:36 +02:00 |
|
Jan Lukas Gernert
|
a31956531a
|
fix download loop
|
2023-06-22 00:15:57 +02:00 |
|
Jan Lukas Gernert
|
582834cdf1
|
fixes
|
2023-06-21 23:48:09 +02:00 |
|
Jan Lukas Gernert
|
e0ccd7e0b3
|
split download & parsing
|
2023-06-21 23:04:21 +02:00 |
|
Jan Lukas Gernert
|
99c5f6220e
|
fix golem test
|
2023-06-21 23:04:08 +02:00 |
|
Jan Lukas Gernert
|
d8ceee1403
|
remove <h1/2> duplicating the title
|
2023-04-30 09:24:00 +02:00 |
|
Jan Lukas Gernert
|
eb4b3603f5
|
remove artifact
|
2023-04-29 18:21:21 +02:00 |
|
Jan Lukas Gernert
|
16b102b313
|
replace multiple <br>s with single <p>
|
2023-04-29 18:20:58 +02:00 |
|
Jan Lukas Gernert
|
c4f8bd2bc2
|
fix heise crash: simpler way of checking for ancestor
|
2023-04-28 15:56:29 +02:00 |
|
Jan Lukas Gernert
|
44d01ad1c6
|
Merge branch 'hardwareluxx' into 'master'
Hardwareluxx
See merge request news-flash/article_scraper!8
|
2023-04-28 05:57:37 +00:00 |
|
Jan Lukas Gernert
|
871b441776
|
parse image objects
|
2023-04-28 07:46:28 +02:00 |
|
Jan Lukas Gernert
|
572fada104
|
parse video objects
|
2023-04-27 19:03:07 +02:00 |
|
Jan Lukas Gernert
|
34a737c89c
|
overhaul non-readability tests
|
2023-04-27 07:40:28 +02:00 |
|
Jan Lukas Gernert
|
f737ab27fd
|
update readability test results
|
2023-04-26 21:04:35 +02:00 |
|
Jan Lukas Gernert
|
2a4f17d458
|
ignore image download test
|
2023-04-26 20:58:25 +02:00 |
|
Jan Lukas Gernert
|
62c0968619
|
remove empty nodes
|
2023-04-26 19:54:34 +02:00 |
|
Jan Lukas Gernert
|
5621a0ea54
|
fmt
|
2023-04-26 09:12:55 +02:00 |
|
Jan Lukas Gernert
|
fbb6585596
|
replace first occurence only
|
2023-04-26 09:09:06 +02:00 |
|
Jan Lukas Gernert
|
afbc384b38
|
update ftr config
|
2023-04-26 07:45:40 +02:00 |
|
Jan Lukas Gernert
|
dd958fe30f
|
fix encoding
|
2023-04-26 07:44:32 +02:00 |
|
Jan Lukas Gernert
|
bd413a795c
|
fmt
|
2023-04-25 19:12:15 +02:00 |
|
Jan Lukas Gernert
|
a0161e92d4
|
next page fixes
|
2023-04-25 18:57:24 +02:00 |
|
Jan Lukas Gernert
|
37d317ad86
|
simplify iterating over dir
|
2023-04-25 08:58:15 +02:00 |
|