mirror of
https://gitlab.com/news-flash/article_scraper.git
synced 2025-07-07 16:15:32 +02:00
add clean links test
This commit is contained in:
parent
c5c6b788c8
commit
3ece2522bb
5 changed files with 3258 additions and 178 deletions
170
expected.html
170
expected.html
|
@ -1,170 +0,0 @@
|
|||
<article><DIV id="readability-page-1"><article itemscope="itemscope" itemtype="https://schema.org/NewsArticle"><meta itemprop="datePublished" content="2019-04-30T13:39:00-04:00">
|
||||
<meta itemprop="dateModified" content="2019-04-30T13:40:00-04:00">
|
||||
<meta itemprop="mainEntityOfPage" content="https://www.citylab.com/design/2019/04/neon-signage-20th-century-history/588400/">
|
||||
<figure itemprop="image" itemscope="itemscope" itemtype="http://schema.org/ImageObject"><picture><source srcset="https://cdn.citylab.com/media/img/citylab/2019/04/mr1/940.jpg?mod=1556645448" media="(min-width: 1024px)"></source><source srcset="https://cdn.citylab.com/media/img/citylab/2019/04/mr1/lead_large.jpg?mod=1556645448" media="(min-width: 576px)"></source></picture><meta itemprop="height" content="128">
|
||||
<meta itemprop="width" content="300">
|
||||
<meta itemprop="url" content="https://cdn.citylab.com/media/img/citylab/2019/04/mr1/300.jpg?mod=1556645448">
|
||||
<picture><source srcset="https://cdn.citylab.com/media/img/citylab/2019/04/mr1/300.jpg?mod=1556645448" media="(max-width: 575px)"></source><img src="https://cdn.citylab.com/media/img/citylab/2019/04/mr1/300.jpg?mod=1556645448" alt=""></picture><figcaption><span itemprop="caption">The Moulin Rouge cabaret in
|
||||
Paris</span><span itemprop="creator">Benoit
|
||||
Tessier/Reuters</span></figcaption></figure><div>
|
||||
<h2 itemprop="headline">
|
||||
Why Neon Is the Ultimate Symbol of the 20th Century
|
||||
</h2>
|
||||
<div><p><span><time>1:39 PM
|
||||
ET</time></span></p></div>
|
||||
</div>
|
||||
<h2 itemprop="description">
|
||||
The once-ubiquitous form of lighting was novel when it first emerged in the early 1900s,
|
||||
though it has since come to represent decline.
|
||||
</h2>
|
||||
<section id="article-section-1"><p>
|
||||
In the summer of 1898, the Scottish chemist Sir William Ramsay made a discovery that
|
||||
would eventually give the Moulin Rouge in Paris, the Las Vegas Strip, and New York’s
|
||||
Times Square their perpetual nighttime glow. Using the boiling point of argon as a
|
||||
reference point, Ramsay and his colleague Morris W. Travers isolated three more noble
|
||||
gases and gave them evocative Greek names: neon, krypton, and xenon. In so doing, the
|
||||
scientists bestowed a label of permanent novelty on the most famous of the trio—neon,
|
||||
which translates as “new.” This discovery was the foundation on which the French
|
||||
engineer Georges Claude crafted a new form of illumination over the next decade. He
|
||||
designed glass tubes in which neon gas could be trapped, then electrified, to create a
|
||||
light that glowed reliably for more than 1,000 hours.
|
||||
</p>
|
||||
<p>
|
||||
In the 2012 book <em>L’être et le Néon</em>, <a href="https://mitpress.mit.edu/books/being-and-neonness-translation-and-content-revised-augmented-and-updated-edition-luis-de-miranda" target="_blank">which
|
||||
has been newly translated into English by Michael Wells</a>, the philosopher Luis de
|
||||
Miranda weaves a history of neon lighting as both artifact and metaphor. <em>Being and
|
||||
Neonness</em>, as the book is called in its English edition, isn’t a typical
|
||||
material history. There are no photographs. Even de Miranda’s own example of a neon deli
|
||||
sign spotted in Paris is re-created typographically, with text in all caps and dashes
|
||||
forming the border of the sign, as one might attempt on Twitter. Fans of Miami Beach’s
|
||||
restored Art Deco hotels and California’s bowling alleys might be disappointed by the
|
||||
lack of glossy historical images. Nonetheless, de Miranda makes a convincing case for
|
||||
neon as a symbol of the grand modern ambitions of the 20th century.
|
||||
</p>
|
||||
<p>
|
||||
De Miranda beautifully evokes the notion of neon lighting as an icon of the 1900s in his
|
||||
introduction: “When we hear the word <em>neon</em>, an image pops into our heads: a
|
||||
combination of light, colors, symbols, and glass. This image is itself a mood. It
|
||||
carries an atmosphere. It speaks … of the essence of cities, of the poetry of nights, of
|
||||
the 20th century.” When neon lights debuted in Europe, they seemed dazzlingly
|
||||
futuristic. But their husky physicality started becoming obsolete by the 1960s, thanks
|
||||
in part to the widespread use of plastic for fluorescent signs. Neon signs exist today,
|
||||
though they’ve been eclipsed by newer technologies such as digital billboards, and they
|
||||
remain charmingly analog: Signs must be made by hand because there’s no cost-effective
|
||||
way to mass-produce them.
|
||||
</p>
|
||||
<p>
|
||||
In the 1910s, neon started being used for cosmopolitan flash in Paris at precisely the
|
||||
time and place where the first great modernist works were being created. De Miranda’s
|
||||
recounting of the ingenuity emerging from the French capital a century ago is thrilling
|
||||
to contemplate: the cubist art of Pablo Picasso, the radically deconstructed fashions of
|
||||
Coco Chanel, the stream-of-consciousness poetry of Gertrude Stein, and the genre-defying
|
||||
music of Claude Debussy—all of which heralded a new age of culture for Europe and for
|
||||
the world.
|
||||
</p></section><section id="article-section-2"><p>
|
||||
Amid this artistic groundswell, Georges Claude premiered his neon lights at the <a href="https://www.mondial-paris.com/en/visiteur/auto" target="_blank">Paris Motor Show</a> in
|
||||
December 1910, captivating visitors with 40-foot-tall tubes affixed to the building’s
|
||||
exterior. The lights shone orange-red because neon, by itself, produces that color.
|
||||
<em>Neon lighting</em> is a catchall term that describes the technology of glass tubing
|
||||
that contains gas or chemicals that glow when electrified. For example, neon fabricators
|
||||
use carbon dioxide to make white, and mercury to make blue. Claude acknowledged at the
|
||||
time that neon didn’t produce the ideal color for a standard light bulb and insisted
|
||||
that it posed no commercial threat to incandescent bulbs.
|
||||
</p>
|
||||
<p>
|
||||
Of course, the very quality that made neon fixtures a poor choice for interior lighting
|
||||
made them perfect for signs, de Miranda notes. The first of the neon signs was switched
|
||||
on in 1912, advertising a barbershop on Paris’s Boulevard Montmartre, and eventually
|
||||
they were adopted by cinemas and nightclubs. While Claude had a monopoly on neon
|
||||
lighting throughout the 1920s, the leaking of trade secrets and the expiration of a
|
||||
series of patents broke his hold on the rapidly expanding technology.
|
||||
</p></section><section id="article-section-3"><p>
|
||||
In the following decades, neon’s nonstop glow and vibrant colors turned ordinary
|
||||
buildings and surfaces into 24/7 billboards for businesses, large and small, that wanted
|
||||
to convey a sense of always being open. The first examples of neon in the United States
|
||||
debuted in Los Angeles, where the Packard Motor Car Company commissioned two large
|
||||
blue-and-orange <span>Packard</span> signs that literally stopped
|
||||
traffic because they distracted motorists. The lighting also featured heavily at the
|
||||
Chicago Century of Progress Exposition in 1933 and at the 1939 World’s Fair in New York.
|
||||
At the latter event, a massive neon sign reading <span>Futurama</span>
|
||||
lit the way to a General Motors exhibition that heralded “The World of Tomorrow.”
|
||||
</p>
|
||||
<figure><picture><img alt="" data-srcset="https://cdn.theatlantic.com/assets/media/img/posts/2019/04/AP_8912060228/cbd32b0e1.jpg"></picture><figcaption>
|
||||
Workers remove a hammer and sickle from a neon sign that reads “Glory to Communism,”
|
||||
visible on the roof of the Communist-run electricity-board headquarters in
|
||||
Czechoslovakia in 1989. (AP)
|
||||
</figcaption></figure><p>
|
||||
De Miranda points out that businesses weren’t alone in embracing neon’s ability to
|
||||
spread messages effectively. By the middle of the century, the lighting was being
|
||||
adopted for more political purposes. “In the 1960s, the Soviets deployed a vast
|
||||
‘neonization’ of the Eastern bloc capitals to emulate capitalist metropolises,” de
|
||||
Miranda writes. “Because consumer shops were rare in the Polish capital [of Warsaw],
|
||||
they did not hesitate to illuminate the façades of public buildings.” In other words, as
|
||||
opposed to the sole use of the more obvious forms of propaganda via posters or slogans,
|
||||
the mass introduction of neon lighting was a way of getting citizens of Communist cities
|
||||
to see their surroundings with the pizzazz and nighttime glamour of major Western
|
||||
capitals.
|
||||
</p></section><section id="article-section-4"><p>
|
||||
Neon, around this time, began to be phased out, thanks to cheaper and less
|
||||
labor-intensive alternatives. In addition, the global economic downturn of the 1970s
|
||||
yielded a landscape in which older, flickering neon signs, which perhaps their owners
|
||||
couldn’t afford to fix or replace, came to look like symbols of decline. Where such
|
||||
signs were once sophisticated and novel, they now seemed dated and even seedy.
|
||||
</p>
|
||||
<section><h2>
|
||||
Cities are changing fast. Keep up with the <b>CityLab Daily</b> newsletter.
|
||||
</h2>
|
||||
<label for="promo-email-input-email">The best way to follow issues you
|
||||
care about.</label></section><p>
|
||||
De Miranda understands this evolution by zooming out and looking at the 1900s as the
|
||||
“neon century.” The author draws a parallel between the physical form of neon lights,
|
||||
which again are essentially containers for electrified gases, and that of a glass
|
||||
capsule—suggesting they are a kind of message in a bottle from a time before the First
|
||||
World War. “Since then, [neon lights] have witnessed all the transformations that have
|
||||
created the world we live in,” de Miranda writes. “Today, they sometimes seem to
|
||||
maintain a hybrid status, somewhere between junkyards and museums, not unlike European
|
||||
capitals themselves.”
|
||||
</p>
|
||||
<figure><picture><img alt="" data-srcset="https://cdn.theatlantic.com/assets/media/img/posts/2019/04/AP_945361213236/888fdd750.jpg"></picture><figcaption>
|
||||
Martin Wartman, a student at Northern Kentucky University, works on a neon sign at
|
||||
the Neonworks of Cincinnati workshop connected to the American Sign Museum, in 2016.
|
||||
(John Minchillo / AP)
|
||||
</figcaption></figure><p>
|
||||
Another mark of neon’s hybridity: Its obsolescence started just as some contemporary
|
||||
artists began using the lights in their sculptures. Bruce Nauman’s 1968 work <em><a href="https://www.stedelijk.nl/en/collection/1097-bruce-nauman-my-name-as-though-it-were-written-on-the-surface-of-the-moon" target="_blank">My
|
||||
Name as Though It Were Written on the Surface of the Moon</a></em> poked fun at
|
||||
the space race—another symbol of 20th-century technological innovation whose moment has
|
||||
passed. The piece uses blue “neon” letters (mercury, actually) to spell out the name
|
||||
“bruce” in lowercase cursive, with each character repeated several times as if to convey
|
||||
a person speaking slowly in outer space. The British artist Tracey Emin has made <a href="https://www.artsy.net/collection/tracey-emin-neon-sculptures-and-prints" target="_blank">sculptures</a>
|
||||
that resemble neon Valentine’s Day candies: They read as garish and sentimental
|
||||
confections with pink, heart-shaped frames that surround blue text fragments. Drawing on
|
||||
the nostalgia-inducing quality of neon, the sculptures’ messages are redolent of
|
||||
old-fashioned movie dialogue, with titles such as “You Loved Me Like a Distant Star” and
|
||||
“The Kiss Was Beautiful.”
|
||||
</p>
|
||||
<p>
|
||||
Seeing neon lighting tamed in the context of a gallery display fits comfortably with de
|
||||
Miranda’s notion that neon technology is like a time capsule from another age. In
|
||||
museums, works of neon art and design coexist with objects that were ahead of their own
|
||||
time in years past—a poignant fate for a technology that made its name advertising “The
|
||||
World of Tomorrow.” Yet today neon is also experiencing a kind of craft revival. The
|
||||
fact that it can’t be mass-produced has made its fabrication something akin to a
|
||||
cherished artisanal technique. Bars and restaurants hire firms such as Let There Be Neon
|
||||
in Manhattan, or <a href="https://www.instagram.com/theneonqueen/" target="_blank">the L.A.-based master
|
||||
neon artist Lisa Schulte</a>, to create custom signs and works of art. Neon’s story
|
||||
even continues to glow from inside museums such as California’s <a href="https://www.neonmona.org/" target="_blank">Museum of Neon Art</a> and the Neon Museum in Las
|
||||
Vegas. If it can still be a vital medium for artists and designers working today,
|
||||
“neonness” need not only be trapped in the past. It might also capture the mysterious
|
||||
glow of the near future—just as it did a century ago.
|
||||
</p>
|
||||
<p><em>This article originally appeared on <a href="https://www.theatlantic.com/entertainment/archive/2019/04/being-and-neonness-neon-lights-symbol-20th-century/588184/" target="_blank">The
|
||||
Atlantic</a>.</em></p></section><section data-include="css:https://cdn.citylab.com/static/a/frontend/dist/citylab/css/components/author-article.cf4e8e0b143f.css"><h4>
|
||||
About the Author
|
||||
</h4>
|
||||
<div itemprop="author">
|
||||
<h5 itemprop="name"><a href="https://www.citylab.com/authors/sarah-archer/" target="_blank">Sarah Archer</a></h5>
|
||||
<p itemprop="description"><a href="https://www.citylab.com/authors/sarah-archer/" data-omni-click="inherit" target="_blank">Sarah Archer</a> is the author of <em>The
|
||||
Midcentury Kitchen</em>.
|
||||
</p>
|
||||
</div></section></article></DIV></article>
|
1369
resources/tests/readability/clean-links/expected.html
Normal file
1369
resources/tests/readability/clean-links/expected.html
Normal file
File diff suppressed because it is too large
Load diff
1863
resources/tests/readability/clean-links/source.html
Normal file
1863
resources/tests/readability/clean-links/source.html
Normal file
File diff suppressed because it is too large
Load diff
|
@ -502,18 +502,22 @@ impl FullTextParser {
|
|||
let node_vec = Util::evaluate_xpath(context, xpath, false)?;
|
||||
for mut node in node_vec {
|
||||
if let Some(url) = node.get_attribute(attribute) {
|
||||
let trimmed_url = url.trim();
|
||||
let is_relative_url = url::Url::parse(&url)
|
||||
.err()
|
||||
.map(|err| err == url::ParseError::RelativeUrlWithoutBase)
|
||||
.unwrap_or(false);
|
||||
|
||||
if is_relative_url {
|
||||
let completed_url = article_url.join(&url)?;
|
||||
let completed_url = if is_relative_url {
|
||||
article_url.join(trimmed_url)?
|
||||
} else {
|
||||
Url::parse(trimmed_url)?
|
||||
};
|
||||
|
||||
node.set_attribute(attribute, completed_url.as_str())
|
||||
.map_err(|_| FullTextParserError::Scrape)?;
|
||||
}
|
||||
}
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
|
@ -867,7 +871,7 @@ impl FullTextParser {
|
|||
Util::clean_conditionally(&mut root, "ul");
|
||||
Util::clean_conditionally(&mut root, "div");
|
||||
|
||||
Self::clean_classes(&mut root)?;
|
||||
Self::clean_attributes(&mut root)?;
|
||||
Self::simplify_nested_elements(&mut root)?;
|
||||
}
|
||||
|
||||
|
@ -895,7 +899,7 @@ impl FullTextParser {
|
|||
}
|
||||
}
|
||||
|
||||
fn clean_classes(root: &mut Node) -> Result<(), FullTextParserError> {
|
||||
fn clean_attributes(root: &mut Node) -> Result<(), FullTextParserError> {
|
||||
let mut node_iter = Some(root.clone());
|
||||
|
||||
while let Some(mut node) = node_iter {
|
||||
|
@ -904,6 +908,11 @@ impl FullTextParser {
|
|||
FullTextParserError::Xml
|
||||
})?;
|
||||
|
||||
node.remove_attribute("align").map_err(|e| {
|
||||
log::error!("{e}");
|
||||
FullTextParserError::Xml
|
||||
})?;
|
||||
|
||||
node.remove_attribute(constants::SCORE_ATTR).map_err(|e| {
|
||||
log::error!("{e}");
|
||||
FullTextParserError::Xml
|
||||
|
@ -915,6 +924,10 @@ impl FullTextParser {
|
|||
FullTextParserError::Xml
|
||||
})?;
|
||||
|
||||
if node.get_name().to_uppercase() == "FONT" {
|
||||
node.set_name("span").unwrap();
|
||||
}
|
||||
|
||||
node_iter = Util::next_node(&node, false);
|
||||
}
|
||||
Ok(())
|
||||
|
|
|
@ -126,6 +126,11 @@ async fn citylab_1() {
|
|||
run_test("citylab-1").await
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn clean_links() {
|
||||
run_test("clean-links").await
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn webmd_1() {
|
||||
run_test("webmd-1").await
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue