1
0
Fork 0
mirror of https://gitlab.com/news-flash/article_scraper.git synced 2025-07-07 16:15:32 +02:00
Commit graph

250 commits

Author SHA1 Message Date
Jan Lukas Gernert
44d01ad1c6 Merge branch 'hardwareluxx' into 'master'
Hardwareluxx

See merge request news-flash/article_scraper!8
2023-04-28 05:57:37 +00:00
Jan Lukas Gernert
871b441776 parse image objects 2023-04-28 07:46:28 +02:00
Jan Lukas Gernert
572fada104 parse video objects 2023-04-27 19:03:07 +02:00
Jan Lukas Gernert
34a737c89c overhaul non-readability tests 2023-04-27 07:40:28 +02:00
Jan Lukas Gernert
f737ab27fd update readability test results 2023-04-26 21:04:35 +02:00
Jan Lukas Gernert
2a4f17d458 ignore image download test 2023-04-26 20:58:25 +02:00
Jan Lukas Gernert
62c0968619 remove empty nodes 2023-04-26 19:54:34 +02:00
Jan Lukas Gernert
5621a0ea54 fmt 2023-04-26 09:12:55 +02:00
Jan Lukas Gernert
fbb6585596 replace first occurence only 2023-04-26 09:09:06 +02:00
Jan Lukas Gernert
afbc384b38 update ftr config 2023-04-26 07:45:40 +02:00
Jan Lukas Gernert
dd958fe30f fix encoding 2023-04-26 07:44:32 +02:00
Jan Lukas Gernert
bd413a795c fmt 2023-04-25 19:12:15 +02:00
Jan Lukas Gernert
a0161e92d4 next page fixes 2023-04-25 18:57:24 +02:00
Jan Lukas Gernert
37d317ad86 simplify iterating over dir 2023-04-25 08:58:15 +02:00
Jan Lukas Gernert
309a60c5d0 update regex 2023-04-23 20:45:45 +02:00
Jan Lukas Gernert
c51f0fd731 cargo.toml metadata 2023-04-23 16:47:02 +02:00
Jan Lukas Gernert
1695e33f9e fmt 2023-04-23 16:37:06 +02:00
Jan Lukas Gernert
57df2e6832 write some docs 2023-04-23 16:35:00 +02:00
Jan Lukas Gernert
bfb31dc188 fmt 2023-04-21 08:53:12 +02:00
Jan Lukas Gernert
baf2a8a15d rename test 2023-04-21 08:47:25 +02:00
Jan Lukas Gernert
b4b5d802c9 only serialize root node 2023-04-21 08:46:10 +02:00
Jan Lukas Gernert
3f58a39fcf dump node 2023-04-20 08:53:06 +02:00
Jan Lukas Gernert
cd3d3468a3 clean html 2023-04-20 08:41:10 +02:00
Jan Lukas Gernert
3096f28aae empty clean html fn 2023-04-16 22:00:00 +02:00
Jan Lukas Gernert
f427b7c36f cli: progress bar for image download 2023-04-16 21:31:11 +02:00
Jan Lukas Gernert
3dd7c7d57a tmp: calc download size & print progress 2023-04-16 18:10:43 +02:00
Jan Lukas Gernert
ccc8223db0 cleanup & fixes 2023-04-14 17:50:39 +02:00
Jan Lukas Gernert
57f74c635b fix clippy 2023-04-14 10:32:05 +02:00
Jan Lukas Gernert
3a465f2619 somehow made things much slower 2023-04-14 08:49:49 +02:00
Jan Lukas Gernert
4fd4dd39db download images concurrently 2023-04-13 07:54:31 +02:00
Jan Lukas Gernert
35a14b0a5f start improving image download 2023-04-12 08:27:22 +02:00
Jan Lukas Gernert
c198225012 eliminate additional head request 2023-04-11 07:49:01 +02:00
Jan Lukas Gernert
fa41633e11 cli to parse single page with ftr 2023-04-10 13:47:45 +02:00
Jan Lukas Gernert
d978059709 command to use readability extractor 2023-04-07 11:51:14 +02:00
Jan Lukas Gernert
063996d62f readability cli 2023-04-06 08:53:19 +02:00
Jan Lukas Gernert
a2719c8c7e first few cli args 2023-04-05 08:43:00 +02:00
Jan Lukas Gernert
4a7349a5fa add cli crate 2023-04-04 08:42:04 +02:00
Jan Lukas Gernert
9832fa2c77 clippy fixes 2023-04-02 13:23:07 +02:00
Jan Lukas Gernert
acc2fe781a port final tests from readability for now 2023-04-02 13:22:16 +02:00
Jan Lukas Gernert
fcc5cb0e88 fix hidden fallback images for wikipedia & add more tests 2023-04-02 09:55:25 +02:00
Jan Lukas Gernert
3fa8c9674d fix relative srcset urls & more tests 2023-04-02 09:03:37 +02:00
Jan Lukas Gernert
15eec43ad9 6 more tests 2023-04-01 18:22:42 +02:00
Jan Lukas Gernert
cc6ff6d7e2 6 more tags & make seattletimes test consistent 2023-04-01 18:14:05 +02:00
Jan Lukas Gernert
0d6db710e8 4 more test & remove share elements 2023-04-01 17:19:37 +02:00
Jan Lukas Gernert
be6e08bd6d fix replacing font tags 2023-04-01 12:31:56 +02:00
Jan Lukas Gernert
253afc48f0 fmt 2023-03-31 21:21:44 +02:00
Jan Lukas Gernert
e09292d66d qq -.- 2023-03-31 21:21:14 +02:00
Jan Lukas Gernert
a11bb1293b adding more tests 2023-03-31 11:23:44 +02:00
Jan Lukas Gernert
c46d93058f fix nytimes-3 2023-03-31 10:38:04 +02:00
Jan Lukas Gernert
c42ffa57a2 start adding nytimes tests 2023-03-31 09:37:23 +02:00