1
0
Fork 0
mirror of https://gitlab.com/news-flash/article_scraper.git synced 2025-07-08 16:40:00 +02:00
Commit graph

53 commits

Author SHA1 Message Date
Jan Lukas Gernert
98c06e11f4 improve title extraction 2023-02-20 02:32:58 +01:00
Jan Lukas Gernert
c08f5afa5d move stuff around 2022-12-13 08:54:57 +01:00
Jan Lukas Gernert
27be5a3204 port failure -> thiserror 2022-12-01 09:22:08 +01:00
Jan Lukas Gernert
d906f6b7fe readability stub 2022-10-08 23:10:26 +02:00
Jan Lukas Gernert
273ddd832c start refactor & fingerprints 2022-10-08 23:09:00 +02:00
Jan Lukas Gernert
7b205e8e27 fmt 2022-10-07 09:32:39 +02:00
Jan Lukas Gernert
69659da983 clippy fixes 2022-10-07 09:20:10 +02:00
Jan Lukas Gernert
8c2af14871 special handling trying to find single page links: fixes youtube 2022-10-07 08:48:09 +02:00
Jan Lukas Gernert
7b1b027c6d add support for header values: fixes golem test 2022-10-07 07:17:33 +02:00
Jan Lukas Gernert
c1ae011fcd use global rules 2022-10-07 07:17:31 +02:00
Jan Lukas Gernert
3a6a70ee64 embedded config files 2022-10-07 07:16:54 +02:00
Jan Lukas Gernert
aa09666f4c async config loading 2022-10-07 07:16:06 +02:00
Volker Weißmann
593901c849 Fixed spelling 2022-06-15 19:15:51 +02:00
Jan Lukas Gernert
76940232a5 take url reference 2021-01-21 08:53:51 +01:00
Jan Lukas Gernert
b73448b189 fix clippy lints 2021-01-06 10:32:43 +01:00
Jan Lukas Gernert
7e05a98f30 update to tokio 1.0 2021-01-06 09:53:47 +01:00
Jan Lukas Gernert
196a106e7a shut up clippy 2020-06-07 13:40:08 +02:00
Jan Lukas Gernert
6b6c52f315 only use builtin youtube parsing if no config is provided 2020-06-07 13:21:53 +02:00
Jan Lukas Gernert
34eaf1eeb1 fmt 2020-06-07 12:53:33 +02:00
Jan Lukas Gernert
82a0a46323 special handling for youtube videos 2020-06-07 12:39:44 +02:00
Jan Lukas Gernert
a42ececb2a check if final url differs from original even without redirect status 2020-06-06 05:18:25 +02:00
Felix Buehler
fa54b82e52 [ci] add fmt + lint checking 2020-05-30 13:07:10 +02:00
Felix Buehler
0c3946dd5b fix fmt+lint 2020-05-29 18:55:00 +02:00
Jan Lukas Gernert
f51605a92c naivedatetime -> datetime utc 2020-05-20 16:33:40 +02:00
Jan Lukas Gernert
1fd7173eac update for newer deps 2020-04-28 02:51:30 +02:00
Jan Lukas Gernert
d2960d8539 require client for parsing 2020-02-10 18:01:35 +01:00
Jan Lukas Gernert
1ecc0fc4b4 option to set custom reqwest client 2020-02-03 17:46:54 +01:00
Jan Lukas Gernert
98348b7e59 tmp: dont strip scripts 2020-01-27 16:47:13 +01:00
Jan Lukas Gernert
23514aff9e less dramatic logging 2020-01-27 02:03:06 +01:00
Jan Lukas Gernert
afe661fe6c only go for single page link if xpath res isn't empty 2020-01-27 01:54:37 +01:00
Jan Lukas Gernert
e58acf828c improve logging clearity 2020-01-27 01:48:54 +01:00
Jan Lukas Gernert
c720dbc299 fixup 2020-01-27 01:35:15 +01:00
Jan Lukas Gernert
b272c99911 fix missing '/' in url completion 2020-01-27 01:21:21 +01:00
Jan Lukas Gernert
f570873aba load config files in background thread 2020-01-26 21:44:26 +01:00
Jan Lukas Gernert
d843809437 update reqwest to stable 2020-01-18 19:06:53 +01:00
Jan Lukas Gernert
9e995122c4 only strip topmost nodes in tree branches 2019-12-19 17:36:48 +01:00
Jan Lukas Gernert
26346839f2 remove prints 2019-11-19 19:28:49 +01:00
Jan Lukas Gernert
edfbca3cf3 fix document going out of scope 2019-11-19 14:41:08 +01:00
Jan Lukas Gernert
2c6bfed550 frickel 2019-11-18 05:53:34 +01:00
Jan Lukas Gernert
4b8af0d709 wip: async 2019-11-10 14:43:59 +01:00
Jan Lukas Gernert
2137e84743 update to new serialization api of libxml 2019-09-26 21:28:05 +02:00
Jan Lukas Gernert
f9905c8a9d download images parameter to parse method 2019-09-24 02:43:36 +02:00
Jan Lukas Gernert
f1be8a2608 make image downloader public 2019-09-24 02:40:43 +02:00
Jan Lukas Gernert
3ca59d7f02 embed images as base64 inside article html 2019-03-06 18:37:24 +01:00
Jan Lukas Gernert
e1905d3c2c remove life time annotations added by rust 2018 2018-12-08 23:25:07 +01:00
Jan Lukas Gernert
aa26e099df get rid of 'extern crate' 2018-12-07 02:26:46 +01:00
Jan Lukas Gernert
b679f2e1fa update to rust 2018 2018-12-07 02:19:40 +01:00
Jan Lukas Gernert
6f38c2bc4c merge 2018-12-07 02:17:15 +01:00
Jan Lukas Gernert
fcea6cf5d1 update to reqwest 0.9 2018-12-07 02:14:50 +01:00
Jan Lukas Gernert
fab4306ed9 TIL: map_err 2018-08-31 16:49:58 +02:00