Memoize Articles - Not Printing #61

theshapguy · 2014-07-07T19:41:19Z

Articles not being parsed from Memoize?

import newspaper
cnn_paper = newspaper.build('http://cnn.com', memoize_articles=True)

for article in cnn_paper.articles:
    print article.url

It runs for the first time as it is not cached and prints all the results, The second time nothing is printed, -- BLANK --

codelucas · 2014-07-08T11:09:32Z

"It runs for the first time as it is not cached and prints all the results, The second time nothing is printed"

That is expected behavior. On the second run the data is cached but not displayed. This is because of the use case of newspaper as typically you don't want duplicate articles, with memoization on, you can extract from CNN freely without worrying about if you extracted the same article twice in a row.

If you don't like this behavior you have two options:
1.) Turn memoization off, nothing is cached but you get all the data every time.
2.) Go a step lower, instead of using newspaper.build, import Article objects directly
and choose which articles you want to cache & keep or etc.

Reference:
http://newspaper.readthedocs.org/en/latest/user_guide/quickstart.html#article-caching

theshapguy closed this as completed Jul 9, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memoize Articles - Not Printing #61

Memoize Articles - Not Printing #61

theshapguy commented Jul 7, 2014

codelucas commented Jul 8, 2014

Memoize Articles - Not Printing #61

Memoize Articles - Not Printing #61

Comments

theshapguy commented Jul 7, 2014

codelucas commented Jul 8, 2014