mirror of
https://gitlab.com/news-flash/article_scraper.git
synced 2025-07-07 16:15:32 +02:00
tag cleaning test
This commit is contained in:
parent
7c9e527827
commit
4031750956
4 changed files with 81 additions and 0 deletions
20
expected.html
Normal file
20
expected.html
Normal file
|
@ -0,0 +1,20 @@
|
|||
<article><DIV id="readability-page-1">
|
||||
<div>
|
||||
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
|
||||
tempor incididunt ut labore et dolore magna aliqua.</p>
|
||||
<p>Ut enim ad minim veniam,
|
||||
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
|
||||
consequat.</p>
|
||||
<p>Duis aute irure dolor in reprehenderit in voluptate velit esse
|
||||
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
|
||||
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
|
||||
</div>
|
||||
<div>
|
||||
<p>Tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
|
||||
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
|
||||
consequat.</p>
|
||||
<p>Duis aute irure dolor in reprehenderit in voluptate velit esse
|
||||
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
|
||||
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
|
||||
</div>
|
||||
</DIV></article>
|
|
@ -0,0 +1,20 @@
|
|||
<article><DIV id="readability-page-1">
|
||||
<div>
|
||||
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
|
||||
tempor incididunt ut labore et dolore magna aliqua.</p>
|
||||
<p>Ut enim ad minim veniam,
|
||||
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
|
||||
consequat.</p>
|
||||
<p>Duis aute irure dolor in reprehenderit in voluptate velit esse
|
||||
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
|
||||
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
|
||||
</div>
|
||||
<div>
|
||||
<p>Tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
|
||||
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
|
||||
consequat.</p>
|
||||
<p>Duis aute irure dolor in reprehenderit in voluptate velit esse
|
||||
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
|
||||
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
|
||||
</div>
|
||||
</DIV></article>
|
36
resources/tests/readability/basic-tags-cleaning/source.html
Normal file
36
resources/tests/readability/basic-tags-cleaning/source.html
Normal file
|
@ -0,0 +1,36 @@
|
|||
<!DOCTYPE html>
|
||||
<html>
|
||||
<head>
|
||||
<meta charset="utf-8"/>
|
||||
<title>Basic tag cleaning test</title>
|
||||
</head>
|
||||
<body>
|
||||
<article>
|
||||
<h1>Lorem</h1>
|
||||
<div>
|
||||
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
|
||||
tempor incididunt ut labore et dolore magna aliqua.</p>
|
||||
<p>Ut enim ad minim veniam,
|
||||
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
|
||||
consequat.</p>
|
||||
<iframe src="about:blank">Iframe fallback test</iframe>
|
||||
<p>Duis aute irure dolor in reprehenderit in voluptate velit esse
|
||||
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
|
||||
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
|
||||
</div>
|
||||
<h2>Foo</h2>
|
||||
<div>
|
||||
<p>Tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
|
||||
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
|
||||
consequat.</p>
|
||||
<object data="foo.swf" type="application/x-shockwave-flash" width="88" height="31">
|
||||
<param movie="foo.swf" />
|
||||
</object>
|
||||
<embed src="foo.swf"/>
|
||||
<p>Duis aute irure dolor in reprehenderit in voluptate velit esse
|
||||
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
|
||||
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
|
||||
</div>
|
||||
</article>
|
||||
</body>
|
||||
</html>
|
|
@ -91,6 +91,11 @@ async fn base_url_base_element_relative() {
|
|||
run_test("base-url-base-element-relative").await
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn basic_tags_cleaning() {
|
||||
run_test("basic-tags-cleaning").await
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn webmd_1() {
|
||||
run_test("webmd-1").await
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue