Desktop version

Main > Forum > 1C:Enterprise Platform > 1C:Integration > How to parse HTML page to import a table daily?

Forum

Search UsersRules
How to parse HTML page to import a table daily?
#1
Active user
Points:: 0
Joined:: Sep 27, 2011

Hello!
I need to load a page from a site, parse it and get a data from a table.
Currently I am using a field of HTMLDocument type on a form and made a simplified html page:

Code
<html>
<head></head>
<body>
<div>
<table id="dataTable">
<tbody>
<tr><td>10</td><td>72</td><td>13</td></tr>
<tr><td>12</td><td>23</td><td>36</td></tr>
<tr><td>13</td><td>22</td><td>32</td></tr>
</tbody>
</table>
</div>
</body>

Here are the questions:
How can I get the table content parsed as a table?
How can I make it work in a scheduled task daily?

Profile
#2
Active user
Points:: 32
Joined:: Sep 16, 2011

Hi!

Here is the script for your form:

Code
&AtClient
Procedure Parse(Command)
   Items.HTMLDocument.Document.location.href="http://domain.com/path/file.html";
   ParsingRequired = True;
EndProcedure

&AtClient
Procedure HTMLDocumentDocumentComplete(Item)
   If Items.HTMLDocument.Document.readyState = "complete" And ParsingRequired Then
      For Each tr In Items.HTMLDocument.Document.all["dataTable"].firstChild.children Do
         For Each td In tr.children Do
            UserMessage = New UserMessage;
            UserMessage.Text = td.innerText;
            UserMessage.Message();
         EndDo;
      EndDo;
      ParsingRequired = False;
   EndIf;
EndProcedure

Profile
#3
Active user
Points:: 0
Joined:: Sep 27, 2011

Thanks a lot, Samuel!
And what about how to execute this operation daily?

Profile
#4
Active user
Points:: 0
Joined:: Sep 26, 2012

I think, for this "Scheduled Jobs" can help.
there is Service subsystem "Scheduled Jobs" at "1C:Subsystems Library"
there is a chapter 5.5.9.3 at documentation

Profile
#5
Active user
Points:: 0
Joined:: Sep 27, 2011

Thanks, Ivan. But if a scheduled job can open a form with HTML document field and execute a command?

Profile
#6
Active user
Points:: 0
Joined:: Sep 26, 2012

For thin client I would use method of ManagedForm AttachIdleHandler(<ProcedureName>, <Interval>, <Single>)

Profile
#7
Active user
Points:: 0
Joined:: Sep 27, 2011

Thus I will have to run an application to emulate a scheduled job... If there is a way to use DOM at server in scheduled job?

Profile
#8
Active user
Points:: 0
Joined:: Sep 26, 2012

There is object HTTPConnection at server. For using it, you need to know name of getting file.

Code
inputFile= TempFilesDir()+"input.txt";
НТТР = New HTTPConnection("domain.com",,,,); 
НТТР.Get("/path/file.html", inputFile); 
textFromFile = New TextDocument;
textFromFile.Read(inputFile);
html = textFromFile.GetText();


at variable "html" you will have html script, which you can put to HTMLDocument field

Profile
#9
Active user
Points:: 2
Joined:: Sep 16, 2011

And maybe XMLReader could help to parse this text using DOM...

Profile
#10
Active user
Points:: 2
Joined:: Sep 16, 2011

Or HTMLDocumentShell, method GetHTMLDocument...

Profile
Subscribe
Users browsing this topic (guests: 1, registered: 0, hidden: 0)



© 1C LLC. All rights reserved
1C Company respects the privacy of our customers and visitors
to our Web-site.