| Class | DataHarvester::Ad |
| In: |
scripts/ad.rb
|
| Parent: | Object |
Represents a single ad, which may contain multiple items for sale.
Most of the work to manage the parsed data, including when to split ad ad into multiple items is contained here.
| AD_FIELDS | = | ['areacode', 'phone', 'email'] | These fields apply to the ad itself, or to all items within the ad. If there are duplicates of these, the duplicates should be tossed, rather than creating a duplicate ad. |
| atts | [R] | |
| cat | [R] | |
| fielded_atts | [R] | |
| id | [R] | |
| text | [R] |
An ad is made up of:
id - the unique id of the ad.
cat - the category of the ad.
text - text of the ad.
fielded_atts - hash of name/vals of attributes
specified for this ad. This is used to compare
against the extraction results.
atts - Hash of attributes extracted from the text
position in the original text is the key. The
name,value tuple is the second part. If there is
no name, the first part will be nil.
Returns an array of hashes for the attributes in this ad. The first item in the array will represent the data common to all items, like the phone number. The rest will apply to specific items within the ads. The hashes themselves will be name/value pairs.
Underscores have special meaning. Any fields with the same name before and underscore, but different afterwards, will be treated as the same field, but with different values. By convention, _1, _2, etc. is used. For example, ‘bed_1’ and ‘bed_2’ should cause the ad to split into 2 identical items, except that they will have different values for ‘bed’
Only a max of 1 field using this underscore convention is supported, so that if you have ‘bed_1’ and ‘bath_1’, you will only get one of those fields back.