Browser source code: how do I scrape the web browser object?


#1

Does anyone have experience or an idea of how to get the source code of the current content of a web browser object when browsing a Web site (e.g., cnn.com)? I’ve read the documentation and don’t see a clear way to do this.


#2

Finally did find it in the files found under “Panorama X Video Training” in the Help menu. Specifically, the Web Browser link.


#3

Using the example’s code, I can get the source code for a web page … but it is very slow (it takes several minutes to download 225Kb of source). I have a cable modem Internet connection which tests at or above 25 Mbps (so it likely isn’t the Internet connection).

The code to get the source of the web page is:

local scriptResult
objectaction “Browser Object name” ,“script”, {document.body.innerHTML; }, scriptResult
displaydata scriptResult,{width=500 height=400 size=18}

By substituting:

clipboard = scriptResult

for the “displaydata” command, the results are instantaneous, as expected.


#4

The displaydata command is quite slow when trying to display large amounts of information. I’m not sure how to rectify that, but it would be nice.

If you want to scrap a web site that is known in advance, you don’t need to use the web browser object. You can just use one of Panorama’s URL accessing functions – url(, urltask(, loadurl( or posturl(. The only reason to involve the web browser would be if you wanted to interactively choose the page to be accessed. (Note that this has nothing to do with the displaydata speed issue.)


#5

I do need to navigate to the page, as it is on a secure site. Thank you for providing the commands for alternate methods of “scraping.”