Class DataHarvester::Ad
In: scripts/ad.rb
Parent: Object

Represents a single ad, which may contain multiple items for sale.

Most of the work to manage the parsed data, including when to split ad ad into multiple items is contained here.

Methods

Constants

AD_FIELDS = ['areacode', 'phone', 'email']   These fields apply to the ad itself, or to all items within the ad. If there are duplicates of these, the duplicates should be tossed, rather than creating a duplicate ad.

Attributes

atts  [R] 
cat  [R] 
fielded_atts  [R] 
id  [R] 
text  [R] 

Public Class methods

An ad is made up of:

 id - the unique id of the ad.
 cat - the category of the ad.
 text - text of the ad.
 fielded_atts - hash of name/vals of attributes
    specified for this ad.  This is used to compare
    against the extraction results.

 atts - Hash of attributes extracted from the text
   position in the original text is the key.  The
   name,value tuple is the second part.  If there is
   no name, the first part will be nil.

Public Instance methods

Applies the given rule, storing the results

Returns an array of hashes for the attributes in this ad. The first item in the array will represent the data common to all items, like the phone number. The rest will apply to specific items within the ads. The hashes themselves will be name/value pairs.

Underscores have special meaning. Any fields with the same name before and underscore, but different afterwards, will be treated as the same field, but with different values. By convention, _1, _2, etc. is used. For example, ‘bed_1’ and ‘bed_2’ should cause the ad to split into 2 identical items, except that they will have different values for ‘bed’

Only a max of 1 field using this underscore convention is supported, so that if you have ‘bed_1’ and ‘bath_1’, you will only get one of those fields back.

Returns a String with the matched portions marked with <<match>>

Returns a printable string for the ad, for troubleshooting

 purposes.

[Validate]