1
0
Fork 0
mirror of https://gitlab.com/news-flash/article_scraper.git synced 2025-07-08 08:30:00 +02:00
Commit graph

55 commits

Author SHA1 Message Date
Jan Lukas Gernert
9e73b94f11 fix medialnewstoday test 2023-03-30 07:58:11 +02:00
Jan Lukas Gernert
029aaffcea 2 passing test & 2 failing tests 2023-03-29 18:08:00 +02:00
Jan Lukas Gernert
ded7cf5adb more tests & title fixes 2023-03-29 08:35:36 +02:00
Jan Lukas Gernert
e6c11ec684 4 more tests 2023-03-28 07:25:05 +02:00
Jan Lukas Gernert
d8a9d0a757 update lazy image fixing code 2023-03-27 21:10:48 +02:00
Jan Lukas Gernert
2189f527d7 fix strip unlikely table-child & add 2 new tests 2023-03-26 11:54:13 +02:00
Jan Lukas Gernert
873e081c33 clean js-links & add new test 2023-03-26 11:31:59 +02:00
Jan Lukas Gernert
b541cd73f8 whitespace fixes 2023-03-24 08:02:08 +01:00
Jan Lukas Gernert
2217c3c71a add new test 2023-03-20 20:55:41 +01:00
Jan Lukas Gernert
0901a37475 add new test 2023-03-20 00:09:10 +01:00
Jan Lukas Gernert
280c516cbe make cleaning more obvious 2023-03-19 23:09:06 +01:00
Jan Lukas Gernert
11e08ae505 move conditional cleaning right after parsing & port attribute cleaning form readability 2023-03-19 22:43:26 +01:00
Jan Lukas Gernert
47eed3a94f add hidden notes test 2023-03-19 19:54:41 +01:00
Jan Lukas Gernert
78b0ab693e add herald sun test 2023-03-19 19:51:31 +01:00
Jan Lukas Gernert
c90d05cf84 add heise test 2023-03-19 19:25:56 +01:00
Jan Lukas Gernert
41ee8eec2c add guardian test 2023-03-19 19:23:39 +01:00
Jan Lukas Gernert
914b66a0a2 add 3 more tests 2023-03-19 19:16:34 +01:00
Jan Lukas Gernert
001fd8f167 add engadget & firefox blog tests 2023-03-19 18:40:42 +01:00
Jan Lukas Gernert
32dd074b6d add embedded videos test 2023-03-19 15:39:08 +01:00
Jan Lukas Gernert
8309e227eb add 2nd ehow test 2023-03-19 15:35:09 +01:00
Jan Lukas Gernert
cb00f7add2 add ehow test 2023-03-19 13:31:35 +01:00
Jan Lukas Gernert
b5d8f43ef8 stabalize buzzfeed test 2023-03-12 23:13:52 +01:00
Jan Lukas Gernert
848291e4f3 small fixes 2023-03-12 23:13:28 +01:00
Jan Lukas Gernert
603b373e0d lots of fixes 2023-03-12 19:36:10 +01:00
Jan Lukas Gernert
14ba2ccb70 add dropbox test 2023-03-12 13:50:06 +01:00
Jan Lukas Gernert
23c156ab2c add new test 2023-03-12 13:39:17 +01:00
Jan Lukas Gernert
c19525f8cd add new test 2023-03-12 12:21:00 +01:00
Jan Lukas Gernert
779afd6245 fix cleaning of empty p/div-tags 2023-03-12 12:20:50 +01:00
Jan Lukas Gernert
d9c92ea42c add new test 2023-03-12 11:56:41 +01:00
Jan Lukas Gernert
fa63d297f8 add new test 2023-03-12 11:53:42 +01:00
Jan Lukas Gernert
c654f63319 add cnn test 2023-03-12 11:42:44 +01:00
Jan Lukas Gernert
6a58e45c7a add cnet test 2023-03-10 07:05:10 +01:00
Jan Lukas Gernert
a915d8fe67 update some older tests 2023-03-10 06:36:21 +01:00
Jan Lukas Gernert
7b6d22ebc8 add cnet-svg-classes test 2023-03-10 06:33:24 +01:00
Jan Lukas Gernert
3ece2522bb add clean links test 2023-03-09 21:24:29 +01:00
Jan Lukas Gernert
c5c6b788c8 add citilab test & fix noscript unwrapping 2023-03-09 20:10:03 +01:00
Jan Lukas Gernert
612f022879 add buzzfeed test 2023-03-06 01:36:37 +01:00
Jan Lukas Gernert
45b4141049 add new test 2023-03-06 00:04:23 +01:00
Jan Lukas Gernert
9c5ffda5de add breitbart test 2023-03-04 23:40:23 +01:00
Jan Lukas Gernert
e2b804d00a add blogger test 2023-03-04 17:41:22 +01:00
Jan Lukas Gernert
6964724102 add bbc test 2023-03-02 01:09:44 +01:00
Jan Lukas Gernert
4031750956 tag cleaning test 2023-03-01 01:37:44 +01:00
Jan Lukas Gernert
cea23f1638 always use fakehost url for tests 2023-03-01 00:46:35 +01:00
Jan Lukas Gernert
80de6d177c url completion test 2023-03-01 00:42:44 +01:00
Jan Lukas Gernert
451dd61547 add two new tests 2023-02-28 18:28:55 +01:00
Jan Lukas Gernert
aea57d0cf3 fix has_single_tag_inside_element & update tests 2023-02-28 03:59:48 +01:00
Jan Lukas Gernert
31a8033844 fixes, more sanitation & 1 more failing test 2023-02-28 01:50:13 +01:00
Jan Lukas Gernert
df999cd9fc more cleanups & more tests 2023-02-27 01:00:56 +01:00
Jan Lukas Gernert
0834c4d72a fixes 2023-02-26 02:22:53 +01:00
Jan Lukas Gernert
e3246af28b refactor & more testing 2023-02-25 00:42:26 +01:00