mirror of
https://gitlab.com/news-flash/article_scraper.git
synced 2025-07-07 16:15:32 +02:00
add clean links test
This commit is contained in:
parent
c5c6b788c8
commit
3ece2522bb
5 changed files with 3258 additions and 178 deletions
170
expected.html
170
expected.html
|
@ -1,170 +0,0 @@
|
||||||
<article><DIV id="readability-page-1"><article itemscope="itemscope" itemtype="https://schema.org/NewsArticle"><meta itemprop="datePublished" content="2019-04-30T13:39:00-04:00">
|
|
||||||
<meta itemprop="dateModified" content="2019-04-30T13:40:00-04:00">
|
|
||||||
<meta itemprop="mainEntityOfPage" content="https://www.citylab.com/design/2019/04/neon-signage-20th-century-history/588400/">
|
|
||||||
<figure itemprop="image" itemscope="itemscope" itemtype="http://schema.org/ImageObject"><picture><source srcset="https://cdn.citylab.com/media/img/citylab/2019/04/mr1/940.jpg?mod=1556645448" media="(min-width: 1024px)"></source><source srcset="https://cdn.citylab.com/media/img/citylab/2019/04/mr1/lead_large.jpg?mod=1556645448" media="(min-width: 576px)"></source></picture><meta itemprop="height" content="128">
|
|
||||||
<meta itemprop="width" content="300">
|
|
||||||
<meta itemprop="url" content="https://cdn.citylab.com/media/img/citylab/2019/04/mr1/300.jpg?mod=1556645448">
|
|
||||||
<picture><source srcset="https://cdn.citylab.com/media/img/citylab/2019/04/mr1/300.jpg?mod=1556645448" media="(max-width: 575px)"></source><img src="https://cdn.citylab.com/media/img/citylab/2019/04/mr1/300.jpg?mod=1556645448" alt=""></picture><figcaption><span itemprop="caption">The Moulin Rouge cabaret in
|
|
||||||
Paris</span><span itemprop="creator">Benoit
|
|
||||||
Tessier/Reuters</span></figcaption></figure><div>
|
|
||||||
<h2 itemprop="headline">
|
|
||||||
Why Neon Is the Ultimate Symbol of the 20th Century
|
|
||||||
</h2>
|
|
||||||
<div><p><span><time>1:39 PM
|
|
||||||
ET</time></span></p></div>
|
|
||||||
</div>
|
|
||||||
<h2 itemprop="description">
|
|
||||||
The once-ubiquitous form of lighting was novel when it first emerged in the early 1900s,
|
|
||||||
though it has since come to represent decline.
|
|
||||||
</h2>
|
|
||||||
<section id="article-section-1"><p>
|
|
||||||
In the summer of 1898, the Scottish chemist Sir William Ramsay made a discovery that
|
|
||||||
would eventually give the Moulin Rouge in Paris, the Las Vegas Strip, and New York’s
|
|
||||||
Times Square their perpetual nighttime glow. Using the boiling point of argon as a
|
|
||||||
reference point, Ramsay and his colleague Morris W. Travers isolated three more noble
|
|
||||||
gases and gave them evocative Greek names: neon, krypton, and xenon. In so doing, the
|
|
||||||
scientists bestowed a label of permanent novelty on the most famous of the trio—neon,
|
|
||||||
which translates as “new.” This discovery was the foundation on which the French
|
|
||||||
engineer Georges Claude crafted a new form of illumination over the next decade. He
|
|
||||||
designed glass tubes in which neon gas could be trapped, then electrified, to create a
|
|
||||||
light that glowed reliably for more than 1,000 hours.
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
In the 2012 book <em>L’être et le Néon</em>, <a href="https://mitpress.mit.edu/books/being-and-neonness-translation-and-content-revised-augmented-and-updated-edition-luis-de-miranda" target="_blank">which
|
|
||||||
has been newly translated into English by Michael Wells</a>, the philosopher Luis de
|
|
||||||
Miranda weaves a history of neon lighting as both artifact and metaphor. <em>Being and
|
|
||||||
Neonness</em>, as the book is called in its English edition, isn’t a typical
|
|
||||||
material history. There are no photographs. Even de Miranda’s own example of a neon deli
|
|
||||||
sign spotted in Paris is re-created typographically, with text in all caps and dashes
|
|
||||||
forming the border of the sign, as one might attempt on Twitter. Fans of Miami Beach’s
|
|
||||||
restored Art Deco hotels and California’s bowling alleys might be disappointed by the
|
|
||||||
lack of glossy historical images. Nonetheless, de Miranda makes a convincing case for
|
|
||||||
neon as a symbol of the grand modern ambitions of the 20th century.
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
De Miranda beautifully evokes the notion of neon lighting as an icon of the 1900s in his
|
|
||||||
introduction: “When we hear the word <em>neon</em>, an image pops into our heads: a
|
|
||||||
combination of light, colors, symbols, and glass. This image is itself a mood. It
|
|
||||||
carries an atmosphere. It speaks … of the essence of cities, of the poetry of nights, of
|
|
||||||
the 20th century.” When neon lights debuted in Europe, they seemed dazzlingly
|
|
||||||
futuristic. But their husky physicality started becoming obsolete by the 1960s, thanks
|
|
||||||
in part to the widespread use of plastic for fluorescent signs. Neon signs exist today,
|
|
||||||
though they’ve been eclipsed by newer technologies such as digital billboards, and they
|
|
||||||
remain charmingly analog: Signs must be made by hand because there’s no cost-effective
|
|
||||||
way to mass-produce them.
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
In the 1910s, neon started being used for cosmopolitan flash in Paris at precisely the
|
|
||||||
time and place where the first great modernist works were being created. De Miranda’s
|
|
||||||
recounting of the ingenuity emerging from the French capital a century ago is thrilling
|
|
||||||
to contemplate: the cubist art of Pablo Picasso, the radically deconstructed fashions of
|
|
||||||
Coco Chanel, the stream-of-consciousness poetry of Gertrude Stein, and the genre-defying
|
|
||||||
music of Claude Debussy—all of which heralded a new age of culture for Europe and for
|
|
||||||
the world.
|
|
||||||
</p></section><section id="article-section-2"><p>
|
|
||||||
Amid this artistic groundswell, Georges Claude premiered his neon lights at the <a href="https://www.mondial-paris.com/en/visiteur/auto" target="_blank">Paris Motor Show</a> in
|
|
||||||
December 1910, captivating visitors with 40-foot-tall tubes affixed to the building’s
|
|
||||||
exterior. The lights shone orange-red because neon, by itself, produces that color.
|
|
||||||
<em>Neon lighting</em> is a catchall term that describes the technology of glass tubing
|
|
||||||
that contains gas or chemicals that glow when electrified. For example, neon fabricators
|
|
||||||
use carbon dioxide to make white, and mercury to make blue. Claude acknowledged at the
|
|
||||||
time that neon didn’t produce the ideal color for a standard light bulb and insisted
|
|
||||||
that it posed no commercial threat to incandescent bulbs.
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
Of course, the very quality that made neon fixtures a poor choice for interior lighting
|
|
||||||
made them perfect for signs, de Miranda notes. The first of the neon signs was switched
|
|
||||||
on in 1912, advertising a barbershop on Paris’s Boulevard Montmartre, and eventually
|
|
||||||
they were adopted by cinemas and nightclubs. While Claude had a monopoly on neon
|
|
||||||
lighting throughout the 1920s, the leaking of trade secrets and the expiration of a
|
|
||||||
series of patents broke his hold on the rapidly expanding technology.
|
|
||||||
</p></section><section id="article-section-3"><p>
|
|
||||||
In the following decades, neon’s nonstop glow and vibrant colors turned ordinary
|
|
||||||
buildings and surfaces into 24/7 billboards for businesses, large and small, that wanted
|
|
||||||
to convey a sense of always being open. The first examples of neon in the United States
|
|
||||||
debuted in Los Angeles, where the Packard Motor Car Company commissioned two large
|
|
||||||
blue-and-orange <span>Packard</span> signs that literally stopped
|
|
||||||
traffic because they distracted motorists. The lighting also featured heavily at the
|
|
||||||
Chicago Century of Progress Exposition in 1933 and at the 1939 World’s Fair in New York.
|
|
||||||
At the latter event, a massive neon sign reading <span>Futurama</span>
|
|
||||||
lit the way to a General Motors exhibition that heralded “The World of Tomorrow.”
|
|
||||||
</p>
|
|
||||||
<figure><picture><img alt="" data-srcset="https://cdn.theatlantic.com/assets/media/img/posts/2019/04/AP_8912060228/cbd32b0e1.jpg"></picture><figcaption>
|
|
||||||
Workers remove a hammer and sickle from a neon sign that reads “Glory to Communism,”
|
|
||||||
visible on the roof of the Communist-run electricity-board headquarters in
|
|
||||||
Czechoslovakia in 1989. (AP)
|
|
||||||
</figcaption></figure><p>
|
|
||||||
De Miranda points out that businesses weren’t alone in embracing neon’s ability to
|
|
||||||
spread messages effectively. By the middle of the century, the lighting was being
|
|
||||||
adopted for more political purposes. “In the 1960s, the Soviets deployed a vast
|
|
||||||
‘neonization’ of the Eastern bloc capitals to emulate capitalist metropolises,” de
|
|
||||||
Miranda writes. “Because consumer shops were rare in the Polish capital [of Warsaw],
|
|
||||||
they did not hesitate to illuminate the façades of public buildings.” In other words, as
|
|
||||||
opposed to the sole use of the more obvious forms of propaganda via posters or slogans,
|
|
||||||
the mass introduction of neon lighting was a way of getting citizens of Communist cities
|
|
||||||
to see their surroundings with the pizzazz and nighttime glamour of major Western
|
|
||||||
capitals.
|
|
||||||
</p></section><section id="article-section-4"><p>
|
|
||||||
Neon, around this time, began to be phased out, thanks to cheaper and less
|
|
||||||
labor-intensive alternatives. In addition, the global economic downturn of the 1970s
|
|
||||||
yielded a landscape in which older, flickering neon signs, which perhaps their owners
|
|
||||||
couldn’t afford to fix or replace, came to look like symbols of decline. Where such
|
|
||||||
signs were once sophisticated and novel, they now seemed dated and even seedy.
|
|
||||||
</p>
|
|
||||||
<section><h2>
|
|
||||||
Cities are changing fast. Keep up with the <b>CityLab Daily</b> newsletter.
|
|
||||||
</h2>
|
|
||||||
<label for="promo-email-input-email">The best way to follow issues you
|
|
||||||
care about.</label></section><p>
|
|
||||||
De Miranda understands this evolution by zooming out and looking at the 1900s as the
|
|
||||||
“neon century.” The author draws a parallel between the physical form of neon lights,
|
|
||||||
which again are essentially containers for electrified gases, and that of a glass
|
|
||||||
capsule—suggesting they are a kind of message in a bottle from a time before the First
|
|
||||||
World War. “Since then, [neon lights] have witnessed all the transformations that have
|
|
||||||
created the world we live in,” de Miranda writes. “Today, they sometimes seem to
|
|
||||||
maintain a hybrid status, somewhere between junkyards and museums, not unlike European
|
|
||||||
capitals themselves.”
|
|
||||||
</p>
|
|
||||||
<figure><picture><img alt="" data-srcset="https://cdn.theatlantic.com/assets/media/img/posts/2019/04/AP_945361213236/888fdd750.jpg"></picture><figcaption>
|
|
||||||
Martin Wartman, a student at Northern Kentucky University, works on a neon sign at
|
|
||||||
the Neonworks of Cincinnati workshop connected to the American Sign Museum, in 2016.
|
|
||||||
(John Minchillo / AP)
|
|
||||||
</figcaption></figure><p>
|
|
||||||
Another mark of neon’s hybridity: Its obsolescence started just as some contemporary
|
|
||||||
artists began using the lights in their sculptures. Bruce Nauman’s 1968 work <em><a href="https://www.stedelijk.nl/en/collection/1097-bruce-nauman-my-name-as-though-it-were-written-on-the-surface-of-the-moon" target="_blank">My
|
|
||||||
Name as Though It Were Written on the Surface of the Moon</a></em> poked fun at
|
|
||||||
the space race—another symbol of 20th-century technological innovation whose moment has
|
|
||||||
passed. The piece uses blue “neon” letters (mercury, actually) to spell out the name
|
|
||||||
“bruce” in lowercase cursive, with each character repeated several times as if to convey
|
|
||||||
a person speaking slowly in outer space. The British artist Tracey Emin has made <a href="https://www.artsy.net/collection/tracey-emin-neon-sculptures-and-prints" target="_blank">sculptures</a>
|
|
||||||
that resemble neon Valentine’s Day candies: They read as garish and sentimental
|
|
||||||
confections with pink, heart-shaped frames that surround blue text fragments. Drawing on
|
|
||||||
the nostalgia-inducing quality of neon, the sculptures’ messages are redolent of
|
|
||||||
old-fashioned movie dialogue, with titles such as “You Loved Me Like a Distant Star” and
|
|
||||||
“The Kiss Was Beautiful.”
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
Seeing neon lighting tamed in the context of a gallery display fits comfortably with de
|
|
||||||
Miranda’s notion that neon technology is like a time capsule from another age. In
|
|
||||||
museums, works of neon art and design coexist with objects that were ahead of their own
|
|
||||||
time in years past—a poignant fate for a technology that made its name advertising “The
|
|
||||||
World of Tomorrow.” Yet today neon is also experiencing a kind of craft revival. The
|
|
||||||
fact that it can’t be mass-produced has made its fabrication something akin to a
|
|
||||||
cherished artisanal technique. Bars and restaurants hire firms such as Let There Be Neon
|
|
||||||
in Manhattan, or <a href="https://www.instagram.com/theneonqueen/" target="_blank">the L.A.-based master
|
|
||||||
neon artist Lisa Schulte</a>, to create custom signs and works of art. Neon’s story
|
|
||||||
even continues to glow from inside museums such as California’s <a href="https://www.neonmona.org/" target="_blank">Museum of Neon Art</a> and the Neon Museum in Las
|
|
||||||
Vegas. If it can still be a vital medium for artists and designers working today,
|
|
||||||
“neonness” need not only be trapped in the past. It might also capture the mysterious
|
|
||||||
glow of the near future—just as it did a century ago.
|
|
||||||
</p>
|
|
||||||
<p><em>This article originally appeared on <a href="https://www.theatlantic.com/entertainment/archive/2019/04/being-and-neonness-neon-lights-symbol-20th-century/588184/" target="_blank">The
|
|
||||||
Atlantic</a>.</em></p></section><section data-include="css:https://cdn.citylab.com/static/a/frontend/dist/citylab/css/components/author-article.cf4e8e0b143f.css"><h4>
|
|
||||||
About the Author
|
|
||||||
</h4>
|
|
||||||
<div itemprop="author">
|
|
||||||
<h5 itemprop="name"><a href="https://www.citylab.com/authors/sarah-archer/" target="_blank">Sarah Archer</a></h5>
|
|
||||||
<p itemprop="description"><a href="https://www.citylab.com/authors/sarah-archer/" data-omni-click="inherit" target="_blank">Sarah Archer</a> is the author of <em>The
|
|
||||||
Midcentury Kitchen</em>.
|
|
||||||
</p>
|
|
||||||
</div></section></article></DIV></article>
|
|
1369
resources/tests/readability/clean-links/expected.html
Normal file
1369
resources/tests/readability/clean-links/expected.html
Normal file
File diff suppressed because it is too large
Load diff
1863
resources/tests/readability/clean-links/source.html
Normal file
1863
resources/tests/readability/clean-links/source.html
Normal file
File diff suppressed because it is too large
Load diff
|
@ -502,16 +502,20 @@ impl FullTextParser {
|
||||||
let node_vec = Util::evaluate_xpath(context, xpath, false)?;
|
let node_vec = Util::evaluate_xpath(context, xpath, false)?;
|
||||||
for mut node in node_vec {
|
for mut node in node_vec {
|
||||||
if let Some(url) = node.get_attribute(attribute) {
|
if let Some(url) = node.get_attribute(attribute) {
|
||||||
|
let trimmed_url = url.trim();
|
||||||
let is_relative_url = url::Url::parse(&url)
|
let is_relative_url = url::Url::parse(&url)
|
||||||
.err()
|
.err()
|
||||||
.map(|err| err == url::ParseError::RelativeUrlWithoutBase)
|
.map(|err| err == url::ParseError::RelativeUrlWithoutBase)
|
||||||
.unwrap_or(false);
|
.unwrap_or(false);
|
||||||
|
|
||||||
if is_relative_url {
|
let completed_url = if is_relative_url {
|
||||||
let completed_url = article_url.join(&url)?;
|
article_url.join(trimmed_url)?
|
||||||
node.set_attribute(attribute, completed_url.as_str())
|
} else {
|
||||||
.map_err(|_| FullTextParserError::Scrape)?;
|
Url::parse(trimmed_url)?
|
||||||
}
|
};
|
||||||
|
|
||||||
|
node.set_attribute(attribute, completed_url.as_str())
|
||||||
|
.map_err(|_| FullTextParserError::Scrape)?;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
Ok(())
|
Ok(())
|
||||||
|
@ -867,7 +871,7 @@ impl FullTextParser {
|
||||||
Util::clean_conditionally(&mut root, "ul");
|
Util::clean_conditionally(&mut root, "ul");
|
||||||
Util::clean_conditionally(&mut root, "div");
|
Util::clean_conditionally(&mut root, "div");
|
||||||
|
|
||||||
Self::clean_classes(&mut root)?;
|
Self::clean_attributes(&mut root)?;
|
||||||
Self::simplify_nested_elements(&mut root)?;
|
Self::simplify_nested_elements(&mut root)?;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -895,7 +899,7 @@ impl FullTextParser {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
fn clean_classes(root: &mut Node) -> Result<(), FullTextParserError> {
|
fn clean_attributes(root: &mut Node) -> Result<(), FullTextParserError> {
|
||||||
let mut node_iter = Some(root.clone());
|
let mut node_iter = Some(root.clone());
|
||||||
|
|
||||||
while let Some(mut node) = node_iter {
|
while let Some(mut node) = node_iter {
|
||||||
|
@ -904,6 +908,11 @@ impl FullTextParser {
|
||||||
FullTextParserError::Xml
|
FullTextParserError::Xml
|
||||||
})?;
|
})?;
|
||||||
|
|
||||||
|
node.remove_attribute("align").map_err(|e| {
|
||||||
|
log::error!("{e}");
|
||||||
|
FullTextParserError::Xml
|
||||||
|
})?;
|
||||||
|
|
||||||
node.remove_attribute(constants::SCORE_ATTR).map_err(|e| {
|
node.remove_attribute(constants::SCORE_ATTR).map_err(|e| {
|
||||||
log::error!("{e}");
|
log::error!("{e}");
|
||||||
FullTextParserError::Xml
|
FullTextParserError::Xml
|
||||||
|
@ -915,6 +924,10 @@ impl FullTextParser {
|
||||||
FullTextParserError::Xml
|
FullTextParserError::Xml
|
||||||
})?;
|
})?;
|
||||||
|
|
||||||
|
if node.get_name().to_uppercase() == "FONT" {
|
||||||
|
node.set_name("span").unwrap();
|
||||||
|
}
|
||||||
|
|
||||||
node_iter = Util::next_node(&node, false);
|
node_iter = Util::next_node(&node, false);
|
||||||
}
|
}
|
||||||
Ok(())
|
Ok(())
|
||||||
|
|
|
@ -19,7 +19,7 @@ async fn run_test(name: &str) {
|
||||||
let xpath_ctx = crate::FullTextParser::get_xpath_ctx(&document).unwrap();
|
let xpath_ctx = crate::FullTextParser::get_xpath_ctx(&document).unwrap();
|
||||||
|
|
||||||
crate::FullTextParser::strip_junk(&xpath_ctx, None, &empty_config);
|
crate::FullTextParser::strip_junk(&xpath_ctx, None, &empty_config);
|
||||||
|
|
||||||
crate::FullTextParser::fix_urls(&xpath_ctx, &url);
|
crate::FullTextParser::fix_urls(&xpath_ctx, &url);
|
||||||
let mut article = Article {
|
let mut article = Article {
|
||||||
title: None,
|
title: None,
|
||||||
|
@ -126,6 +126,11 @@ async fn citylab_1() {
|
||||||
run_test("citylab-1").await
|
run_test("citylab-1").await
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn clean_links() {
|
||||||
|
run_test("clean-links").await
|
||||||
|
}
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn webmd_1() {
|
async fn webmd_1() {
|
||||||
run_test("webmd-1").await
|
run_test("webmd-1").await
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue