Help needed again with Panorama 6 and Web Page


#1

Hello, for some reason I just cannot get a grip on extracting data from a web page. I received some help in the past but that page does not work anymore. I have found a new one but can not seem to get the data.

The new site is

and the data I am trying to extract is the
Year: into a field named newyr
Make: into a field named newmake
Model: into a field named newmodel
and
Types of Fuel: into a field named newfuel

some sample VINS are I have in a field named vin
1GNKVHKD9GJ212190
1GKS2GKC2GR224013
1GYEC63TX3R174318
1GNFK16T23R316468
1GCGTCE30G1240485

any help would be appreciated. I seem to have a block on this type of stuff.

the code I have so far is (not much)

Local theSource
Loop
LoadURL theSource, “https://trustvin.com/"+vin+"-VIN
newyr= tagdata(theSource,DON’T KNOW WHAT GOES HERE")
newmake= tagdata(theSource,DON’T KNOW WHAT GOES HERE")
newmodel= tagdata(theSource,DON’T KNOW WHAT GOES HERE")
newfuel= tagdata(theSource,DON’T KNOW WHAT GOES HERE")
DownRecord
Until info(“Stopped”)

Vic


#2

Here is the relevant part of the page source.

                            <td>VIN:</td>
                            <td>1GYEC63TX3R174318</td>
                        </tr>
                        <tr>
                            <td>
                                Year:
                            </td>
                            <td>
                                2003                                </td>
                        </tr>
                        <tr>
                            <td>
                                Make:
                            </td>
                            <td>
                                Cadillac                                </td>
                        </tr>
                        <tr>
                            <td>
                                Model:
                            </td>
                            <td>
                                Escalade                                </td>
                        </tr>

What you are looking for is the text that goes before and after the text you want to extract. I would use a couple of nested tagdata( for each item. For example

newyr=strip(tagdata(tagdata(theSource,"Year:","</tr>",1),"<td>","</td>",1))

There are a lot of white space characters preceding and following the text you want. The strip( function gets rid of them.

The type of fuel is much farther down in the page. That section looks like this.

                        <tr>
                <td>Types of Fuel:</td>
                <td>Regular Gasoline</td>
            </tr>

For that, you might use

newfuel=tagdata(tagdata(theSource,"Types of Fuel:","</tr>",1)"<td>","</td>",1)

I say “might” because I don’t know what it will look like if the vehicle is a multi-fuel.


#3

Thanks Dave,

Unfortunately I still cannot get this to work. Panorama 6 says that the following code is OK, and it seems to work it’s way down through the list of VIN’s but it does not return any data from the web site. I have looked at chapter 3 of the Programming Techniques and looked at the weatherpage example, still can’t see what is wrong. Any suggestions? (I put in messages to see what was being returned as you will see.)

Thanks,
Vic

Local theSource
Loop
LoadURL theSource, “https://trustvin.com/"+vin+"-VIN”
newyr=strip(tagdata(tagdata(theSource,"Year:","</tr>",1),"<td>","</td>",1))
message "yr = " +newyr
newmake=strip(tagdata(tagdata(theSource,"Make:","</tr>",1),"<td>","</td>",1))
message "make = " +newmake
newmodel=strip(tagdata(tagdata(theSource,"Model:","</tr>",1),"<td>","</td>",1))
message "model = " +newmodel
newfuel=tagdata(tagdata(theSource,"Types of Fuel:","</tr>",1),"<td>","</td>",1)
message "fuel = " +newfuel
DownRecord
Until info(“Stopped”)

#4

Vic, you have a mixture of smart quotes and plain quotes in this line. Make them all plain quotes.


#5

THANK YOU, THANK YOU, THANK YOU! That was driving me nuts. I can’t tell you how much time you and Dave have saved me. I really do appreciate it.

Vic


#6

What you didn’t do, unfortunately, is check to see if any data was coming back from the loadurl statement. Since it is likely to be a lot of data, I would use the displaydata statement, like this:

loadurl theSource, ...
displaydata theSource

Had you done that, you would have immediately seen that it wasn’t the text you were expecting (probably a 404 error message). That would have narrowed the problem down to the loadurl line, rather than the entire procedure, making the problem much easier to find.

The secret to debugging is almost always figuring how to break the problem down into smaller sections. When you get to a small enough section, the answer is often obvious.

Wow Gary, you have a good eye to spot that out of all that code!


#7

Well, not so much a sharp eye as logical troubleshooting. Here is what I did to quickly get to the problem.

!. Confirmed the URL was valid in my browser.
2. Insert a displaydata after the loadurl statement to see what is returned.
3. When the displaydata was never executed check the url formula with:

message “https://trustvin.com/"+vin+"-VIN”

4 See why the inner quotes and their included text is being included in this output.
5. Notice the beginning smart quote in the url and also plain quotes in the middle and a trailing smart quote at the end.

Bingo! :fireworks: