Hello, for some reason I just cannot get a grip on extracting data from a web page. I received some help in the past but that page does not work anymore. I have found a new one but can not seem to get the data.
The new site is
and the data I am trying to extract is the
Year: into a field named newyr
Make: into a field named newmake
Model: into a field named newmodel
and
Types of Fuel: into a field named newfuel
some sample VINS are I have in a field named vin
1GNKVHKD9GJ212190
1GKS2GKC2GR224013
1GYEC63TX3R174318
1GNFK16T23R316468
1GCGTCE30G1240485
any help would be appreciated. I seem to have a block on this type of stuff.
the code I have so far is (not much)
Local theSource
Loop
LoadURL theSource, “trustvin.com”
newyr= tagdata(theSource,DON’T KNOW WHAT GOES HERE")
newmake= tagdata(theSource,DON’T KNOW WHAT GOES HERE")
newmodel= tagdata(theSource,DON’T KNOW WHAT GOES HERE")
newfuel= tagdata(theSource,DON’T KNOW WHAT GOES HERE")
DownRecord
Until info(“Stopped”)
What you are looking for is the text that goes before and after the text you want to extract. I would use a couple of nested tagdata( for each item. For example
Unfortunately I still cannot get this to work. Panorama 6 says that the following code is OK, and it seems to work it’s way down through the list of VIN’s but it does not return any data from the web site. I have looked at chapter 3 of the Programming Techniques and looked at the weatherpage example, still can’t see what is wrong. Any suggestions? (I put in messages to see what was being returned as you will see.)
What you didn’t do, unfortunately, is check to see if any data was coming back from the loadurl statement. Since it is likely to be a lot of data, I would use the displaydata statement, like this:
loadurl theSource, ...
displaydata theSource
Had you done that, you would have immediately seen that it wasn’t the text you were expecting (probably a 404 error message). That would have narrowed the problem down to the loadurl line, rather than the entire procedure, making the problem much easier to find.
The secret to debugging is almost always figuring how to break the problem down into smaller sections. When you get to a small enough section, the answer is often obvious.
Wow Gary, you have a good eye to spot that out of all that code!
Well, not so much a sharp eye as logical troubleshooting. Here is what I did to quickly get to the problem.
!. Confirmed the URL was valid in my browser.
2. Insert a displaydata after the loadurl statement to see what is returned.
3. When the displaydata was never executed check the url formula with:
message “https://trustvin.com/"+vin+"-VIN”
4 See why the inner quotes and their included text is being included in this output.
5. Notice the beginning smart quote in the url and also plain quotes in the middle and a trailing smart quote at the end.