Jan Lukas Gernert
|
c4f8bd2bc2
|
fix heise crash: simpler way of checking for ancestor
|
2023-04-28 15:56:29 +02:00 |
|
Jan Lukas Gernert
|
44d01ad1c6
|
Merge branch 'hardwareluxx' into 'master'
Hardwareluxx
See merge request news-flash/article_scraper!8
|
2023-04-28 05:57:37 +00:00 |
|
Jan Lukas Gernert
|
871b441776
|
parse image objects
|
2023-04-28 07:46:28 +02:00 |
|
Jan Lukas Gernert
|
572fada104
|
parse video objects
|
2023-04-27 19:03:07 +02:00 |
|
Jan Lukas Gernert
|
34a737c89c
|
overhaul non-readability tests
|
2023-04-27 07:40:28 +02:00 |
|
Jan Lukas Gernert
|
f737ab27fd
|
update readability test results
|
2023-04-26 21:04:35 +02:00 |
|
Jan Lukas Gernert
|
2a4f17d458
|
ignore image download test
|
2023-04-26 20:58:25 +02:00 |
|
Jan Lukas Gernert
|
62c0968619
|
remove empty nodes
|
2023-04-26 19:54:34 +02:00 |
|
Jan Lukas Gernert
|
5621a0ea54
|
fmt
|
2023-04-26 09:12:55 +02:00 |
|
Jan Lukas Gernert
|
fbb6585596
|
replace first occurence only
|
2023-04-26 09:09:06 +02:00 |
|
Jan Lukas Gernert
|
afbc384b38
|
update ftr config
|
2023-04-26 07:45:40 +02:00 |
|
Jan Lukas Gernert
|
dd958fe30f
|
fix encoding
|
2023-04-26 07:44:32 +02:00 |
|
Jan Lukas Gernert
|
bd413a795c
|
fmt
|
2023-04-25 19:12:15 +02:00 |
|
Jan Lukas Gernert
|
a0161e92d4
|
next page fixes
|
2023-04-25 18:57:24 +02:00 |
|
Jan Lukas Gernert
|
37d317ad86
|
simplify iterating over dir
|
2023-04-25 08:58:15 +02:00 |
|
Jan Lukas Gernert
|
309a60c5d0
|
update regex
|
2023-04-23 20:45:45 +02:00 |
|
Jan Lukas Gernert
|
c51f0fd731
|
cargo.toml metadata
|
2023-04-23 16:47:02 +02:00 |
|
Jan Lukas Gernert
|
1695e33f9e
|
fmt
|
2023-04-23 16:37:06 +02:00 |
|
Jan Lukas Gernert
|
57df2e6832
|
write some docs
|
2023-04-23 16:35:00 +02:00 |
|
Jan Lukas Gernert
|
bfb31dc188
|
fmt
|
2023-04-21 08:53:12 +02:00 |
|
Jan Lukas Gernert
|
baf2a8a15d
|
rename test
|
2023-04-21 08:47:25 +02:00 |
|
Jan Lukas Gernert
|
b4b5d802c9
|
only serialize root node
|
2023-04-21 08:46:10 +02:00 |
|
Jan Lukas Gernert
|
3f58a39fcf
|
dump node
|
2023-04-20 08:53:06 +02:00 |
|
Jan Lukas Gernert
|
cd3d3468a3
|
clean html
|
2023-04-20 08:41:10 +02:00 |
|
Jan Lukas Gernert
|
3096f28aae
|
empty clean html fn
|
2023-04-16 22:00:00 +02:00 |
|
Jan Lukas Gernert
|
f427b7c36f
|
cli: progress bar for image download
|
2023-04-16 21:31:11 +02:00 |
|
Jan Lukas Gernert
|
3dd7c7d57a
|
tmp: calc download size & print progress
|
2023-04-16 18:10:43 +02:00 |
|
Jan Lukas Gernert
|
ccc8223db0
|
cleanup & fixes
|
2023-04-14 17:50:39 +02:00 |
|
Jan Lukas Gernert
|
57f74c635b
|
fix clippy
|
2023-04-14 10:32:05 +02:00 |
|
Jan Lukas Gernert
|
3a465f2619
|
somehow made things much slower
|
2023-04-14 08:49:49 +02:00 |
|
Jan Lukas Gernert
|
4fd4dd39db
|
download images concurrently
|
2023-04-13 07:54:31 +02:00 |
|
Jan Lukas Gernert
|
35a14b0a5f
|
start improving image download
|
2023-04-12 08:27:22 +02:00 |
|
Jan Lukas Gernert
|
c198225012
|
eliminate additional head request
|
2023-04-11 07:49:01 +02:00 |
|
Jan Lukas Gernert
|
fa41633e11
|
cli to parse single page with ftr
|
2023-04-10 13:47:45 +02:00 |
|
Jan Lukas Gernert
|
d978059709
|
command to use readability extractor
|
2023-04-07 11:51:14 +02:00 |
|
Jan Lukas Gernert
|
063996d62f
|
readability cli
|
2023-04-06 08:53:19 +02:00 |
|
Jan Lukas Gernert
|
a2719c8c7e
|
first few cli args
|
2023-04-05 08:43:00 +02:00 |
|
Jan Lukas Gernert
|
4a7349a5fa
|
add cli crate
|
2023-04-04 08:42:04 +02:00 |
|
Jan Lukas Gernert
|
9832fa2c77
|
clippy fixes
|
2023-04-02 13:23:07 +02:00 |
|
Jan Lukas Gernert
|
acc2fe781a
|
port final tests from readability for now
|
2023-04-02 13:22:16 +02:00 |
|
Jan Lukas Gernert
|
fcc5cb0e88
|
fix hidden fallback images for wikipedia & add more tests
|
2023-04-02 09:55:25 +02:00 |
|
Jan Lukas Gernert
|
3fa8c9674d
|
fix relative srcset urls & more tests
|
2023-04-02 09:03:37 +02:00 |
|
Jan Lukas Gernert
|
15eec43ad9
|
6 more tests
|
2023-04-01 18:22:42 +02:00 |
|
Jan Lukas Gernert
|
cc6ff6d7e2
|
6 more tags & make seattletimes test consistent
|
2023-04-01 18:14:05 +02:00 |
|
Jan Lukas Gernert
|
0d6db710e8
|
4 more test & remove share elements
|
2023-04-01 17:19:37 +02:00 |
|
Jan Lukas Gernert
|
be6e08bd6d
|
fix replacing font tags
|
2023-04-01 12:31:56 +02:00 |
|
Jan Lukas Gernert
|
253afc48f0
|
fmt
|
2023-03-31 21:21:44 +02:00 |
|
Jan Lukas Gernert
|
e09292d66d
|
qq -.-
|
2023-03-31 21:21:14 +02:00 |
|
Jan Lukas Gernert
|
a11bb1293b
|
adding more tests
|
2023-03-31 11:23:44 +02:00 |
|
Jan Lukas Gernert
|
c46d93058f
|
fix nytimes-3
|
2023-03-31 10:38:04 +02:00 |
|