forked from brighton36/libcraigscrape
-
Notifications
You must be signed in to change notification settings - Fork 1
/
TODO.txt
46 lines (39 loc) · 2.24 KB
/
TODO.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
* Address that system_post spec
------------------------------------------
Possible features?:
* FEATURE: Rather then 'search/sss?query=artist+needed', maybe use CraigScrape::search_for('artist needed')
* FEATURE: Support for post submissions?
* FEATURE: Get the YAML template features working in the chris_report.yml
* should be pretty easy - I think we just add to the schema
* Might have to do this yourself with a search_templates section and an search ':include' thing
* Should we have 'defined searches' in the yaml and then apply these defitions to a report?
* It would be nice to have one search definition for 'art I want' and then allow these
* to be referenced in separate reports for 'art in florida', 'art in nyc', 'art everywhere esle' ..
* The other thing I'd like to do though is have "New art watch/Classic Artwatch" type reports that look
* separate - but that dont cause the crawl to happen twice. Just once with two reports per crawl..
* FEATURE: Stats in the email: bytes transferred, generation time, urls scrapped, posts scrapped
* FEATURE: We might want to parse the sub-locations in the geo-location code... (so for miami: brw/mia/wpb)
------------------------------------------
# Quickie craigscrape examples/tests:
$: << './lib'
require 'libcraigscrape'
CraigScrape.new('us/fl/miami', 'us/fl/keys').posts_since(Time.now - 3600*48, 'search/sss?query=z06')
CraigScrape.new('us/fl/miami').posts('search/sss?query=z06')
i=0; CraigScrape.new('us/fl/miami', 'us/fl/keys').each_post('rea', 'search/sss?query=rack'){|p| break if i>50; puts p.inspect ;i += 1;}
CraigScrape.new('us/fl/miami', 'us/fl/keys').each_listing('rea', 'search/sss?query=rack'){ |listing| puts listing.inspect }
CraigScrape.new('us/fl/miami', 'us/fl/keys').listings('rea', 'search/sss?query=rack')
-----------------------------------------
# Quickie way to analyze an email:
$: << './lib'
require 'libcraigscrape'
email=<<EOD
email contents here
EOD
r=/regex-here/
email.tr('\n', '').scan( /http\:\/\/[^ \"]+/).reject{|l| !/\.html/.match l}.each{ |l|
puts "\nURL: %s" % l
c = CraigScrape::Posting.new l
puts " * Label : %s" % r.match(c.label).to_a[0]
puts " * Contents: %s" % r.match(c.contents_as_plain).to_a[0]
}
-----------------------------------------