Optimizing client/server applied solutions for SaaS application delivery model





Introduction

At the time of writing this article, SaaS provisioning of applied solutions requires their integration with 1C:Subsystems Library. But we are working on moving this functionality to a new 1C:Cloud Technology Library to facilitate integration and update of subsystems required for development of SaaS solutions. Since this work is not yet finished, we will refer to 1C:Subsystems Library in this article.

Here we will discuss the essentials of applied solution development for SaaS application delivery model, focusing on performance, resource costs, reliability, and correctness.

Many of the practices described here are also mentioned in other articles of the Best practices section. Some of them are based on our accumulated experience of correct or incorrect usage of 1C:Enterprise features.

Database integration

In this section we will discuss the most significant database integration tasks: the efficiency of database queries and the integrity of database operations. We will also discuss lesser but nonetheless important aspects of database operations.

Efficient queries

When writing queries, always keep in mind that the query should exactly solve your task. A query retrieves a data selection from a database. It seems obvious that the query should only retrieve data that is needed. If by any reason the query retrieves data that you do not need, you have to cut it off in order to minimize the selection size. In practice, developers often write code that retrieves a lot of data and never uses some part of that data. This is inefficient. If you have a single user working with your database, they might not notice the difference. If you have many users working simultaneously, each user's small inefficiency adds up to cause big problems in everyone's work.

Another mistake is overestimating query capabilities and implementing the entire task logic in queries, which makes them quite complicated. Of course such queries return correct results but not as quickly as the developer wants.

For example, if a query contains 15 join operations, which in turn contain selection operations, OR expressions and so on, of course it cannot be efficient. The database might be unable to process the query correctly. A clear sign of this problem is a timeout warning in a compiled query plan. It means that the query optimizer could not find the best query plan before the timeout expired, and it is likely that the query is too complicated and therefore inefficient. The severity of this problem varies depending on the DBMS but you never know which DMBS your users will have, so do your best to make the queries equally efficient for all DMBS.

From the DBMS point of view, the complexity of the algorithm described in a query is not the sole reason of inefficiency. A simple desire to make a query more clear and human-readable by using nested queries in it can also hinder the performance. Nesting makes queries more complex, so you should not use it unless absolutely necessary.

To make a query clear, use simple table joining conditions. But if your conditions contain fields not included in the index, the joining is not efficient.

The number of tables included in a query can also affect its efficiency. With a large number of tables, the DBMS will be unable to build an efficient query plan. The time required to build the query plan grows almost exponentially depending on the number of tables. When the timeout expires, the DBMS will simply record the best query plan it managed to create by that time.

So, what is a large number of tables? Sometimes 5 or 7 tables in a single query is too much, sometimes more. You cannot tell for sure in advance but keep in mind that reducing the number of tables is one of the available optimization methods.

Efficient query plans

Earlier in this article we mentioned query plans. Now let us discuss them in detail.

When you write a query, it is your way to tell the DBMS what data you want to retrieve and what conditions apply to that data. The DBMS has many ways to complete your request. You cannot specify the desired way explicitly because the DBMS chooses the way automatically by building a query plan. A plan is a set of physical operators that the DBMS executes to retrieve the requested data. These operators can include scanning tables, scanning the index, selecting some record by index, various joins, and more. If you ever studied DBMS operations, you probably know what query plans look like.

From the business logic point of view it may seem that plans are not important because the query works regardless of the plan type. But the query plan might be inefficient. For example, searching a table with a million records by iterating over them is extremely inefficient and the query execution will take a lot of time. And if the table being searched is joined to another one (NESTED LOOPS), that million is multiplied by something and the search takes almost forever.

You can say that the DBMS can select an inefficient plan but you cannot tell it to use another plan anyway. True, you cannot specify a plan explicitly but you can learn the general principles of building those plans. They are based on the query text, as well as available indexes and statistics. And therefore you can implicitly affect the plan selection.

For example, 1C:Enterprise query language provides the option to use functions in queries. You can even use functions in query parameters, as in the following example:

WHERE
   TemporaryFiles.FileDate >= BEGINOFPERIOD(&Date, YEAR)

We recommend that you do not write queries like this because the DBMS will be unable to optimize the query well. The DBMS does not recognize the passed parameter as a constant, while actually it is a constant. It is clearly visible in the correct example:

WHERE
   TemporaryFiles.FileDate >= &Date
Query.SetParameter("Date", BegOfYear(Date));

In the correct example the function value is calculated in 1C:Enterprise script and then passed to the query. 

This difference does not look big and many developers do not bother to remove functions from their queries because they find it convenient to store all of the logic in a single code fragment. Still, this is inefficient.

When you develop queries that are more or less complex, it is good to ensure that they use efficient query plans. There is only one guaranteed way to ensure this: run the query and then look at the generated query plan.

Note that you have to run the query using the actual amount of data and up-to-date DBMS statistics. The amount of data is important because selecting two records out of a table that contains ten records is always fast, but if the actual table contains ten thousand records, this may turn complicated. And the relevance of DBMS statistics shows how recently the statistics were updated. If you ensure that the statistics are recalculated before you run the query, the DBMS can generate a good plan. But this is difficult to achieve on a live database. If, for example, a massive amount of data was added to the database since the statistics were last updated, the DBMS might select a simpler but less efficient plan, based on the outdated statistics. 

Some developers attempt to guess plans for simple queries but this usually does not help because such guessing is not easy.

So how can you look at the query plans? You can enable recording of those plans to the 1C:Enterprise event log, this will work with any DBMS. Alternatively, you can use the DBMS tools. For example, if you work with Microsoft SQL Server, you can use SQL Server Profiler.

Indexes

As we said earlier, indexes are the second important part of building efficient query plans.

In general, indexes speed up selection of data from tables. 1C:Enterprise generates table indexes automatically during database restructuring. Developers cannot explicitly specify which indexes they want created or tables where they will be created. But you can affect this by setting the Index property of some attributes to Index or Index with additional ordering. This does not guarantee that 1C:Enterprise builds a composite index based on all those fields, and this does not guarantee that indexing will be performed by those fields only. The resulting index structure is complex and it is based on the properties of the indexed configuration objects. You can get the indexes of a live database using the GetDBStorageStructureInfo() function of 1C:Enterprise script.

In practice, indexes added by developers are often inefficient because their applied solutions rarely use these indexes. Adding too many indexes also does not help because the DBMS will be unable to find suitable ones and therefore will generate an inefficient plan.

Indexes are not used as often as you might want them to be used. For example, a query contains joining with another table by some fields, or a query includes selection by some fields. Let us name these fields A, B, and C. And the index is built by fields B, C, and D. The index will not be used in these queries. To use this index, you have to select by fields B and C, or by fields B, C, and D. In other words, all the fields included in the condition must be located at the top of the index. Then the DBMS will be able to use the physical search by index operation (INDEX SEEK).

If the table does not have a suitable index, the DBMS will use the index scanning (INDEX SCAN) or table scanning. Table scanning is almost impossible in 1C:Enterprise databases because each of the tables has a cluster index. Still, this does not make a big difference because scanning a cluster index is as bad as scanning a table when you need to retrieve a small amount of data.

On the other hand, if you need to select 80% of table records or even 50%, scanning a table is possibly the best method. So if you find the SCAN operation in your exchange plan, this does not always mean that the plan is inefficient, it depends on the amount of data you want to select.

The DBMS can also select index scanning when the index does not match the selection conditions but still contains all of the required fields. In this scenario the DBMS has a choice between scanning a table and scanning an index. Indexes are usually smaller than tables, that is why the DBMS selects an index.

Another distinctive feature of indexes is related not to the fields included in the selection condition but to the fields that are selected from a table. For example, you have a catalog table and you want to select records where one of the catalog attributes has a specific value. But the selection includes all catalog attributes (in other words, almost all of the fields in each record). In this scenario we can almost guarantee that the index will not be used because the DBMS has to find the required records first, and then perform the LOOKUP operation on a table or a cluster index for each record to get the rest of the fields. The DBMS will only select this method if the selection includes a small number of records. And if the selection covers 10% of the table, the DBMS will use SCAN because it tends to select the simplest options.

Another scenario where the DBMS can select scanning a table or a cluster index involves a table of a small size. When a table contains just several thousand records, scanning is easier than involving LOOKUPs.

As we have mentioned before, each 1C:Enterprise table has a single cluster index. The index is automatically generated by the platform and developers cannot change it. The index is usually built by the primary table key. For tables that contain object data (catalogs, documents, and so on) the index is built by the Ref field. For information register tables the index is built by dimensions. For periodic information registers, the Period field is added to the dimension fields, and so on.

Another important index characteristic is its selectivity. It is reciprocal to the number of records that can be selected from the table using the index. Lower percentage of selectable records means higher selectivity. Primary keys have the maximum selectivity.

For example, selectivity is very high when you need to select something from a catalog by Ref because you will get a single record. On the other hand, indexes built by Boolean attributes do not make data selections faster. If the distribution of true and false values is even, a half of the table records will contain True and the other half will contain False. The DBMS will not use such index because it will not help to narrow the selection. It will only use such index if the percentage of the values that you want selected is low (for example, when 99% of the records contain True, 1% contain False, and you want to select by False).

This explains why indexing by nonselective fields makes no sense. Maintaining indexes uses up resources and you do not want them wasted on unused indexes.

Transactions

The DBMS performs all data operations within transactions. A transaction is a sequence of operations that can be either fully completed or not completed at all. In other words, a scenario where only a part of transaction operations is completed is impossible. In computer science, ACID (Atomicity, Consistency, Isolation, and Durability) is a widely used set of requirements that guarantee the reliability of database transactions.

What are transactions used for? The simplest example is moving goods between warehouses. You need to write off the goods from a warehouse and add them to another one. If you have these actions performed outside of a transaction, the goods might appear in the second warehouse without being removed from the first one, or vice versa.  That is why all operations of that kind must be performed within transactions.

You will rarely need to open transactions explicitly using 1C:Enterprise script. For most of the handlers, the platform opens transactions automatically. Examples of such handlers include BeforeWrite, OnWrite, and BeforeDelete. But you can open transactions explicitly whenever needed using the methods BeginTransaction(), CommitTransaction(), and RollbackTransaction(). You cannot open a nested transaction inside a transaction. An attempt to open it increments the internal transaction counter but does not actually open a new one. If you roll back a nested transaction or it is automatically rolled back due to an exception, the main transaction is rolled back because we only have one transaction here.

Managed locks

The concept of locks is related to transactions. If your transaction includes reading data that can be changed at any time, consider locking that data. If you do not, another transaction executed on this data might change it. As a result, the last goods unit left can be sold twice, which is incorrect.

On the other hand, if transactions do not change data or the data is not changed often, locking it is not necessary. The platform is unable to read "dirty data" anyway because the DBMS isolation level is high enough to prevent this. If the worst scenario you will have to wait until the data lock timeout expires if someone else is changing that data, and this does not seem bad. And if you add a managed lock here, this does not improve the result but generates additional load on the 1C:Enterprise lock manager. In fact, it is just replacing the DBMS lock by a managed 1C:Enterprise lock, which does not generate any profit.

Let us check some examples. When a document is being posted, you need to retrieve prices for generating register records. Let us assume that a user set the prices in a catalog interactively. If you only retrieve the prices once and you do not need to retrieve any data linked to prices at that time, locking is not required.

Another example: when posting a document, you need to specify an invoice for generating register records. Before you read invoice data, you have to lock the register by dimensions used to select the invoice, otherwise multiple documents can have the same invoice selected.

If you want to modify data within your transaction, you have to set an exceptional lock, otherwise you might get a deadlock. Getting deadlocks is pretty easy. For example, you have two documents that attempt to lock the same resource. First, they both set shared locks, and then they attempt to write the document. At that time the platform attempts to set an exceptional lock but cannot do it for either of the sessions, so it keeps waiting permanently. Finally, one of the transactions is rolled back due to an error.

Changes in version 8.3

The method of handling Microsoft SQL and IBM DB2 transactions was changed in the platform version 8.3. Prior to this version, all readings from these DBMS that were performed outside of transactions could result in "dirty reading". In other words, there was a good chance to read data from uncommitted transactions. For example, reports included data from unposted documents. In version 8.3 "dirty reading" is no longer possible (unless you use compatibility with earlier versions). Therefore, setting shared locks is only allowed when you read the same data several times within a single transaction.

Using long transactions

Transaction duration is important.  It is hard to provide strict criteria for what is long, but long transactions should be avoided at all times because the DBMS writes all transaction operations to its transaction log. Most of the locks are kept till the end of a query but data change locks are kept during the entire transaction. Microsoft SQL Server stores them in the server memory. Managed locks are always stored in 1C:Enterprise server memory. This is not a problem for DBMS with multiversion concurrency control (PostgreSQL and Oracle Database) but the software must ensure efficient work with all supported DBMS, therefore make transactions shorter whenever possible.

Longer transactions keeps resources locked for longer periods, they use up more disk space for the transaction log, and rolling them back takes more time. If you have a choice between one long transaction and several short ones, we recommend that you use the second option.

Using dynamic lists

Since platform version 8.2, developers have the option to use custom queries in dynamic lists, and they use this feature a lot, sometimes a bit excessively. The option to display a lot of useful data in a list seems attractive, but sometimes it requires up to 20 connections with nested queries. Of course it is not efficient.

Dynamic lists use cursor queries and have three modes available. The most efficient mode is dynamic data reading, provided that the query is not too complicated (does not contain too many joins or restrictions at database record and field level). In this mode a dynamic list reads a data chunk and remembers the first and last rows. Once the user scrolls the list, it reads the next data chunk before or after the displayed one, depending on the scrolling direction.

This mode is efficient if the table where data is selected has an index, which matches the filtering and sorting conditions. If the list uses filtering by a specific field, set its Index attribute to Index with additional ordering. If a suitable index is not found, the DBMS performs the SCAN operation, which significantly decreases the dynamic list performance. That is why dynamic list queries must be as simple as possible.

In the second mode dynamic data reading is disabled but the main table is specified. This mode is rarely used, mostly in filter criteria. Actually, it is still a cursor selection but storing 1000 records at once in the server buffer. The client reads data from the buffer, which is less efficient than dynamic reading.

In the third mode, dynamic reading is disabled and there is no main table. The list reads the entire table and stores it in the buffer, which is very inefficient.

Support of various DBMS

The platform supports five DBMS: the file database, Microsoft SQL Server, IBM DB2, PostgreSQL, and Oracle Database. Applied solution developers should do their best to make their queries efficient in all of these DBMS but of course they will always face the DBMS differences.

In a perfect world, you should test your applied solution on all DBMS. In practice it is not always possible. We recommend that you test your solution at least with the file database (because it is a popular choice) and 1 or 2 third-party DBMS.

Web client support

If you design an applied solution for SaaS delivery model, it must provide its full functionality when accessed from a web client. An easy way to pass this check is performing all development and testing using the web client.

Web client users can use the file system extension. Sometimes developers rely on this extension without providing any alternative, which is incorrect. The file system extension is intended to improve user experience but it should not be the only available option. Otherwise users who do not install the extension cannot use all applied solution features. Users can have good reasons to avoid installing the file system extension (for example, it is not available for a half of the supported browsers in the platform version 8.2). Even if the extension is available, some users do not have the right to install browser add-ins and some users simply choose not to install it.

User interface performance

You can estimate the user interface performance by the maximum response time to a single user action. Avoid scenarios where it takes more than 1 second.

To measure that second, you can try using the performance indicators of the platform or writing a measurement script. But both these methods are incorrect because the time measured is different from the actual response time on a live system. The web client will show the least accurate result because it executes a lot of asynchronous operations. Therefore, the start time of a script fragment (for example, when an idle handler is triggered) can be very different from the time when the form drawing is finished so that users can interact with it.

The best measurement method is using a stopwatch.

We do not recommend taking measurements on the computer where you develop the applied solution. As a rule, computers used for development are more powerful than end-user computers. End users often have computers that are quite old, and they do not think of upgrading them in the near future.

In addition to a slow computer, we recommend that you use a browser with slow JavaScript execution. While testing the applied solution in a fast browser, you can overlook many issues. In fact, a half of the applied solution performance depends on the browser. A slow computer with a fast browser often provides the same performance as a fast computer with a slow browser.

We also recommend that you use the Emulate delay on server calls and Slow connection features. If you want to perform a thorough testing, the best option is using mobile GPRS connection because in some cases it might hinder the performance even further.

Factors affecting the web client performance

Client/server calls are the main factor that affects the interface performance. We recommend that you minimize their number by combining several calls into one. This recommendation applies not only to the web client but also to the thin client.

Also pay attention to the following:

  1. Changing forms using 1C:Enterprise script. This can seriously impact the form performance. The platform uses multilevel caching, in particular, it caches form descriptions. Normally (if a form does not include any script) the form is sent to the client computer when the user accesses it for the first time. During subsequent accesses, the form is retrieved from the cache. But if that form is changed using 1C:Enterprise script, it leads to the following disadvantages. First, each server call that opens the form retrieves not only the regular form data but also the form changes, which are not cached. Second, applying those changes takes time, which reduces the form performance.
  2. Complex forms are slower than simple ones. A form with a large number of items with conditional appearance has reduced performance (on typical end-user computers). Therefore, we do not recommend creating forms with 20 tabs. Instead, simplify the forms, split their functionality into several forms, turn off some functionality parts, and so on.
  3. Transferring large volumes of data can affect performance. It is only noticeable at low connection speed, but we still recommend that you reduce the size of transferred data whenever possible. Do not store rarely used data in forms because it might be never used in some scenarios. Instead, have such data retrieved from the server when needed. Alternatively, you can cache that data in a client module for future use.
  4. Use the Val keyword in procedure and function declarations. In client/server operations this keyword has a different meaning compared to operations within a single computer (client or server). When you use Val in a declaration of a server procedure parameter and the procedure is called from a client, the parameter value is not transferred back to the client. And if you do not use Val (which is the default scenario), it works as in the following example: a server procedure is called and an array is passed to it. The array will never be needed at the client side, it is simply a parameter that you no longer need. But once the server call is completed, the array is packed to XML or JSON (the web client uses the latter option) and then transferred back to the client. Of course this reduces the efficiency. Therefore, if you do not need the return value passed as a parameter, use Val with that parameter. Of course you can omit Val for Boolean parameters but it is still a bad practice.

Time-consuming operations

Long server calls are also worth mentioning. If a client/server call is not completed before the timeout expires, this can cause various issues. The timeout duration that is considered acceptable depends on various conditions.

For example, the Russian cloud service 1cFresh features a 75-second timeout. When the timeout expires, the web server no longer expects an answer from the 1C:Enterprise server, and the client receives an error message. Things are even worse for users of Mac computers because their default Safari browser features a built-in 8-second timeout. If a server call takes longer than 8 seconds, the application stops working. And in general, long server calls are not recommended because users cannot perform any actions while they are waiting for a server response.

We recommend that you use Long actions (a part of the Base functionality subsystem) of 1C:Subsystem Library 2.0 to avoid such issues. It calls the functionality that you want executed on the server in a background job. An idle handler on the client periodically checks whether a server response is available. In addition to solving the issue, this method gives you a bonus: the user work is not paused while waiting for the server response.

Resource saving

RAM

RAM is one of the most  important and valuable resources, although these days, when each desktop computer can have 16 GB of RAM installed, one might easily forget that.

Still, you have to reduce RAM usage whenever possible. There cannot be enough RAM on the server. When hundreds of users are connected to it, any inefficient memory usage can hinder the performance. With thousands of users, it will drastically hinder the performance.

So, when you write your algorithms, always keep in mind that RAM is limited. If the volume of data used by your algorithm is virtually not limited, find a way to limit it. For example, you can use cursor selections similar to those used in dynamic lists.

Pay special attention to scenarios where large data structures are created in the memory. For example, 1C:Enterprise script allows you to process files as a whole. The TextDocument object is intended for processing text files, DOMDocument is for XML files, and HTMLDocument is for HTML files. But you should avoid using them for processing large files because this loads the entire file into the memory together with a large amount of auxiliary data. In practice, this operation is only needed in some rare scenarios where you need access to specific parts of the file content. And the majority of practical tasks involve processing the entire file. Use consecutive reading and writing for this: XMLReader, TextReader, XMLWriter, and TextWriter. Methods of these objects read files by portions and provide the required memory usage efficiency.

Memory leaks

Memory leaks are yet another RAM-related issue. They do not show up often but they cause a lot of problems. The diagnostics of such issues is difficult despite the fact that the event log has tools for this. One can easily create a memory leak in 1C:Enterprise script, a single cyclic link is enough, as in the following example:

Data = New Structure;
Data.Insert("Key", Data);

Of course this example is not taken from a live system, but one can accidentally create a similar algorithm. For example, if you have a set of nested objects and the object from the last nesting level refers to the top one, you get a cyclic link.

So, let us see what happens when such an algorithm is executed. When all of the external links to an object cease to exist, the object is not deleted from the memory. This happens because it is based on the reference counter. When you call the object (create a reference to an object), the counter is incremented. When the reference is no longer available or explicitly broken, the counter is decremented. If the counter never reaches zero, the memory is never released.

The recommendation to avoid writing this code is quite hard to implement. That is why you have to remember about memory leaks when you create data structures in the memory, otherwise you might have difficulties localizing the issue later. Unfortunately, leaks cannot be found during the debugging or testing. Instead, they show up on a live server when it gets out of memory.

Reusing return values

Modules where return values are reused can add up to inefficient memory usage. Analysis of custom configurations shows that developers often overestimate this feature and use it in scenarios where it does not provide any advantages. Reusing return values is actually caching and caching uses resources, so you have to ensure that you only cache what you really need.

For example, you can create common modules with reusable return values, which return string constants. But this does not make sense because getting a string constant directly is faster that getting it from a common module with reusable return values. On the other hand, reusing data retrieved from a database makes sense.

Also, when you cache something you have to ensure that this data will be frequently accessed. The cache does not store data forever, a value is cleared in 20 minutes after a calculation or in 6 minutes after its last use, whatever comes first. In addition to that, a value is deleted when the server working process is out of memory, when a working process is restarted, or when a client is switched to another working process. If you do not have a chance to use the cached data, you waste the cache resources.

Incorrect input parameters can also become an issue in such modules. The range of input values should not be broad. For example, some configurations include functions that take contractors as arguments. This might be inefficient. If the database stores a large number of contractors and the chance that two users access the same contractor within a 5-minute period is low, the resources are wasted. While the waste seems small, it grows huge when multiplied by the number of users that simultaneously use the application.

Finally, a cache does not return object instances. It always returns references to a single object in the memory. One can easily change that object by mistake. In our practice we had a function that returned a reusable array and a value was added to the array during each call. Posting documents caused this array to grow uncontrollably. So we strongly recommend that, in order to avoid such issues, you use return values that cannot be changed, such as FixedArray or FixedStructure.

Using temporary files

Smart usage of temporary files can also save resources. When creating temporary files, use the names returned by the GetTempFileName() function. The platform deletes files with such names automatically once the process that created the files is finished. While server restarts or working process restarts are rare, it is still better than total lack of automatic temporary file deletion. Of course this feature does not eliminate the need to delete the temporary files that you no longer need using 1C:Enterprise script, otherwise they might be stored for a long time, eventually shutting down the server when it gets out of disk space.

Using nonseparated data

Pay special attention to nonseparated data. If you find a way to move some part of data to the nonseparated area, this gives you several advantages at once. Still, you should be careful with that because nonseparated data, while increasing the solution efficiency, also increases its complexity. And maintenance of nonseparated data is a complex task.

Nonseparated data is stored as a single instance, which seemingly does not make a big advantage. It only slightly reduces the database size. What matters here is the fact that the data requires updating.

Let us review what happens when separated data (some classifier, for example) is updated. This includes accessing each database area and changing the classifier data within this area, which will take significant time. On a live database this might take hours. Therefore, we recommend that you only separate data that actually requires separation.

In many scenarios you can distinguish separated and nonseparated data. However, remember that data entered by users cannot be stored in the nonseparated area because in this case it becomes available to all users. Note that writing nonseparated data in a session that uses all separators is very risky.

Optimizing 1C:Enterprise script algorithms

You can also reduce resource usage with efficient 1C:Enterprise script algorithms. Even a small inefficiency in your code can cause delays that last dozens of seconds if that code is executed in a loop with dozens of thousands iterations. Watch such code fragments closely. A typical example: if you use a calculated value in a loop, it is wise to calculate it in advance and store it to a variable instead of calculating it during each iteration.

Minimizing infobase update time

The infobase version update time is important because during all that time the infobase is not available to users. You can use 1C:Subsystems Library to streamline the update. Additionally, developers almost always write custom data processors for updating each infobase version.

So, let us see how such data processors can hinder the performance. You can specify a configuration version for each data processor, and it only will be executed for that version. There is an option to specify * instead of a version (a data processor with an asterisk). Such data processors are executed every time the infobase is updated, even if the change only includes a couple of script lines, and things get even worse if such data processors are separated. Of course we strongly do not recommend using such data processors.

We recommend that you do your best to optimize all infobase update data processors, even if they do not have asterisks. You can use the following optimization algorithm: create a nonseparated update data processor that stores its data in the nonseparated area. When a new version is available, it checks whether anything is changed. If it detects any changes, it starts separated data processors, otherwise it does nothing. This data processor performs some actions during every update but it does not impact the performance because it is only executed once in a nonseparated session instead of twelve thousand executions.

Scheduled jobs

Scheduled jobs are the last aspect related to resource usage that we want to discuss. Note that the recommendations provided in this section apply to all infobases, not only to separated ones.

When specifying a job schedule, avoid frequent repetitions. This might cause issues even in nonseparated infobases. If you have 10 infobases and a cluster where a scheduled job is performed once per minute, this will inevitably lead to severe problems. We recommend that you do not use this approach. Imagine that you have a cluster where half of the infobases are not in use at the moment. Every minute a new session calls all of the platform caches, loads the configuration to memory, performs some query and releases everything. It is a total waste of resources.

You can consolidate scheduled jobs to reduce their total number.

The described issue impacts separated infobases more than nonseparated ones. Here is a simple example. A scheduled job "extract text for full-text search indexing" is started every 85 seconds. It checks whether any changes that require processing were made. If any changes are available, it extracts the data, performs the indexing and stores the result in the database. During most of its runs it does not find any changes, especially when run on an application such as 1C:AccountingSuite that does not offer many options for storing text data. Still, that job has its uses: when someone adds a Microsoft Word file, it should be indexed.

If that job were separated, running several thousands of job instances every 85 seconds would crash the server. If we convert it into a job queue, it would be executed not in parallel but subsequently in all data areas every 85 seconds.  In this example the measurements showed that, while a single job takes about 100 milliseconds, together they use up more than 100% of CPU time. And that job seemingly does nothing!

You can introduce a flag that is set to True when an object that requires full-text search indexing is updated, and create a single scheduled job that checks the flag every 85 seconds and does not check any separated areas. If the flag is set to True, the scheduled job plans text extraction.

This method helps with the majority of indexing tasks but it is not universal. For example, it does not work for email because in addition to sent messages we have messages delivered from external addresses and of course they will not have that flag set.

Finally, never use predefined separated scheduled jobs because they will crash the server. 

Following platform architecture

In this section we will talk about the principles that originate from the platform architecture. Following these principles is important for client/server applications.

Restricting external resource usage

Using external resources might result in security breaches. 1C:Enterprise version 8.3 introduced cluster security profiles. A security profile in assigned to a specific infobase in a cluster. You can use security profiles to disable access to all external resources, which includes accessing the server file system, running COM objects, using 1C:Enterprise add-ins, running external reports and data processors, running applications installed on the server and accessing Internet resources.

If a configuration requires access to external resources, this should be explicitly stated in the documentation and access to those resources should be limited. For example, in the event of file system access, ensure that writing to such locations as C:\Temp is not allowed. Instead, allow writing to a dedicated directory only. Otherwise you might find that, for example, the event log directory contains files that do not belong to the log.

Accessing the server file system

Although this recommendation is included in the best practices, developers often forget it. A cluster can include multiple servers and therefore saving files between client/server calls does not make sense. During the next call you might not find the file because that call is processed by a different server.

If you need to preserve some data between server calls, put it to a temporary storage. Note that writing a temporary file will not help because it will be stored on the server that was processing the call. The next call might be directed to a physically different computer.

Time zone support

And finally, a reminder about time zones. We recommend that you do not use the CurrentDate() function. When CurrentDate() is executed on the server, it returns the server date and time, while the client might be located in a different time zone. Using client time can also lead to inaccuracies because you cannot guarantee that the client date and time are correct.

We recommend that you use the CurrentSessionDate() function instead. In the SaaS mode, it retrieves the data area session date.

If by any reason this logic does not suit your needs, you can implement a custom function in your application. We have not yet seen such implementations but you can change the session time zone when needed.

In some cases you can use the client computer date, but only in very specific ones, such as reminders that are tied to client computers.


Comments
0
Add comment