How to parse HTML page to import a table daily?

1C:Enterprise platform integration capabilities and techniques

#1
People who like this:0Yes/0No
Active user
Rating: 5
Joined: Sep 27, 2011
Company:

Hello!
I need to load a page from a site, parse it and get a data from a table.
Currently I am using a field of HTMLDocument type on a form and made a simplified html page:

Code
<html>
<head></head>
<body>
<div>
<table id="dataTable">
<tbody>
<tr><td>10</td><td>72</td><td>13</td></tr>
<tr><td>12</td><td>23</td><td>36</td></tr>
<tr><td>13</td><td>22</td><td>32</td></tr>
</tbody>
</table>
</div>
</body>

Here are the questions:
How can I get the table content parsed as a table?
How can I make it work in a scheduled task daily?

 
#2
People who like this:0Yes/0No
Active user
Rating: 3
Joined: Sep 16, 2011
Company: individual

Hi!

Here is the script for your form:

Code
&AtClient
Procedure Parse(Command)
   Items.HTMLDocument.Document.location.href="http://domain.com/path/file.html";
   ParsingRequired = True;
EndProcedure

&AtClient
Procedure HTMLDocumentDocumentComplete(Item)
   If Items.HTMLDocument.Document.readyState = "complete" And ParsingRequired Then
      For Each tr In Items.HTMLDocument.Document.all["dataTable"].firstChild.children Do
         For Each td In tr.children Do
            UserMessage = New UserMessage;
            UserMessage.Text = td.innerText;
            UserMessage.Message();
         EndDo;
      EndDo;
      ParsingRequired = False;
   EndIf;
EndProcedure

Edited: Samuel Harris - Nov 20, 2012 06:30 PM
 
#3
People who like this:0Yes/0No
Active user
Rating: 5
Joined: Sep 27, 2011
Company:

Thanks a lot, Samuel!
And what about how to execute this operation daily?

 
#4
People who like this:0Yes/0No
Active user
Rating: 7
Joined: Sep 26, 2012
Company: individual

I think, for this "Scheduled Jobs" can help.
there is Service subsystem "Scheduled Jobs" at "1C:Subsystems Library"
there is a chapter 5.5.9.3 at documentation

 
#5
People who like this:0Yes/0No
Active user
Rating: 5
Joined: Sep 27, 2011
Company:

Thanks, Ivan. But if a scheduled job can open a form with HTML document field and execute a command?

 
#6
People who like this:0Yes/0No
Active user
Rating: 7
Joined: Sep 26, 2012
Company: individual

For thin client I would use method of ManagedForm AttachIdleHandler(<ProcedureName>, <Interval>, <Single>)

 
#7
People who like this:0Yes/0No
Active user
Rating: 5
Joined: Sep 27, 2011
Company:

Thus I will have to run an application to emulate a scheduled job... If there is a way to use DOM at server in scheduled job?

 
#8
People who like this:1Yes/0No
Active user
Rating: 7
Joined: Sep 26, 2012
Company: individual

There is object HTTPConnection at server. For using it, you need to know name of getting file.

Code
inputFile= TempFilesDir()+"input.txt";
НТТР = New HTTPConnection("domain.com",,,,); 
НТТР.Get("/path/file.html", inputFile); 
textFromFile = New TextDocument;
textFromFile.Read(inputFile);
html = textFromFile.GetText();


at variable "html" you will have html script, which you can put to HTMLDocument field

 
#9
People who like this:0Yes/0No
Active user
Rating: 6
Joined: Sep 16, 2011
Company:

And maybe XMLReader could help to parse this text using DOM...

 
#10
People who like this:0Yes/0No
Active user
Rating: 6
Joined: Sep 16, 2011
Company:

Or HTMLDocumentShell, method GetHTMLDocument...

 
Subscribe
Users browsing this topic (guests: 1, registered: 0, hidden: 0)
Be the first to know tips & tricks on business application development!

A confirmation e-mail has been sent to the e-mail address you provided .

Click the link in the e-mail to confirm and activate the subscription.