In any application that needs database interaction, DB operations are the key to the application performance. Most of the application performance problems come because the sqls being executed are not optimized or there are huge numbers of queries being executed or there is too much data getting loaded by the query or the columns are not properly indexed or there is no caching being done and the application always hits the DB. In this series, I will try to cover different strategies that you need to use for a good performing ORM based application.

As we all know, the fundamental strategy to tune an application performance is to optimize the sql queries. As a general practice, object retrieval using many round trips to the database is avoided and you would fetch all the required data for a particular operation using a single SQL query using Joins to retrieve related entities. Also, you would fetch only the data that is required i.e data will not be fetched unnecessarily if it is not needed so as to reduce the load on the DB. However this becomes an issue when you use ORM because you no longer write the SQL queries yourself and queries are generated and executed by the underlying ORM engine.

Thankfully ORM engine like Hibernate provides various hooks to optimize the sql as well no of trips that will be made to the database. The most important of these hooks is “fetching strategy” which defines what data will be fetched, when and how.

There are four fetching strategies for loading an object and its associations. (We will use Department-Employee relationship for all the explanation)

  1. Immediate fetching : In this strategy, the associated object is fetched immediately after the owning entity is fetched, either from the database using a separate SQL query or from the seconadary cache. This is usually not an efficient strategy unless the associated object is cached in the secondary cache or when separate queries are more efficient than a Join query. You can define this strategy by setting lazy=”false” and fetch=”select” for the relationship property definition in the CFC.example :
    <cfproperty name="employees" fieldtype="one-to-many" cfc="employee" fkcolumn="deptId" lazy="false" fetch="select">

    With this strategy, on loading the department object, its employees object will be loaded immediately using a separate SQL query. As a result, this strategy is extremely vulnerable to ‘N+1 Select problem’.

    pros :
    The association is loaded immediately and hence the associated object can be accessed even after the ORM session is closed.
    cons : A large number of sqls get executed causing a higher traffic between application and the database. The association is loaded even if it might not be needed.

    When to use : When the association is almost always read after loading the object and executing separate sql is more efficient than executing a join query.

  2. Lazy fetching : In this strategy, the associated object or collection is fetched lazily i.e only when required. For example, when you load a Department object, all the associated employees will not be loaded at all. It will be loaded only when you access it. This results in a new request to the database but it controls how much of data is loaded and when is it loaded. This helps in reducing the database load because you fetch only the data that is required and is a good default strategy. We will talk about this in much more detail in the next post. For the time being lets just say this is the most commonly used and the default strategy for obvious reasons. You can define this strategy by setting lazy=”true” or lazy=”extra”.
    example :

    <cfproperty name="employees" fieldtype="one-to-many" cfc="employee" fkcolumn="deptId" lazy="true" >

    pros : Only the minimum required data is loaded. This avoids loading of entire object graph in memory and hence the performance is generally good.
    cons : If the association is always accessed after loading, this would result in extra sql execution. If the loaded object is accessed in another ORM session (i.e has become detached), extra care must be taken to avoid errors like ‘LazyInitializationException’ or ‘NonUniqueObjectException’.

    When to use : When the association is not immediately read after loading the object. This is the most commonly used and default strategy.

  3. Eager fetching : In this strategy, the associated object or collection is fetched together with the owning entity using a single SQL Join query. Thus, this strategy reduces the number of trips to the database and is a good optimization when you always access the associated object immediately after loading the owning entity. You can define this strategy by setting fetch=”join” for the relationship property definition in the CFC.example :
    <cfproperty name="employees" fieldtype="one-to-many" cfc="employee" fkcolumn="deptId" fetch="join">

    With this strategy, on loading the department object, both department and employees data will be fetched from the database using a single join query.

    Even if the eager fetching is not defined in the CFC metadata, it can be done at runtime using ORMExecuteQuery. This can be very powerful in scenarios where in most of the cases, you choose the assocition to be lazily loaded but in some cases, you want to immediately load it. In those case, use Join in the HQL and execute that using ORMExecuteQuery.

    Example :

    ORMExecuteQuery("from Department dept left join fetch dept.employees")
    ORMExecuteQuery("from Department dept left join fetch dept.employees where dept.id=1001")

    pros : The association is loaded immediately and hence the associated object can be accessed even after the ORM session is closed. The association is loaded using a single join query which usually is more efficient than executing multiple queries.
    cons : The association is loaded even if it might not be needed. Since the query used is a join query, the resultset returned by the DB will typically contain lot of repititive data. If used for more than one collection of an entity, this will create a cartesian product of the collection’s data and thus causing creation of a huge resultset.

    When to use : When the association is almost always read after loading the object. More suitable for many-to-one and one-to-one association or single collection where the associated objects can be loaded using join query without much overhead.

  4. Batch fetching : This strategy tells Hibernate to optimize the second SQL select in Immediate fetching or lazy fetching to load batch of objects or collections in a single query. This allows you to load a batch of proxied objects or unitialized collections that are referenced in the current request. This is a blind-guess optimization technique but very useful in nested tree loading.
    The concept of batch-fetching is slightly confusing (at least I got confused when I first read about it). So you need to pay careful attention to this.
    This can be specified using “batch-size” attribute for CFC or relationship property. There are two ways you can tune batch fetching: on the CFC and on the collection.

    • Batch fetching at CFC level : This allows batch fetching of the proxied objects and hence is applied to one-to-one and many-to-one relationship. To give an example, cosider Employee-Department example where there are 25 employee instance loaded in the request(ORM session). Each employee has a reference to the department and the relationship is lazy. Therefore employee objects will contain the proxied object for Department.If you now iterate through all the employees and call getDepartment() on each, by default 25 SELECT statements will be executed to retrieve the proxied owners, one for each Department proxy object. This can be batched by specifying the ‘batch-size’ attribute on the Department CFC like

      <cfcomponent table=”Department” batch-size=”10″ …>

      When you call getDepartment() on the first employee object, it will see that department should be batch fetched, and hence it will fetch 10 department objects that are proxied in the current request.
      So for 25 employee objects, this will make Hibernate to execute at max three queries – in batch of 10, 10 and 5.
      You must note that batch-size at component level does not mean that whenever you load a Department object, 10 department objects will get loaded in the session. It just means that if there are proxied instances of Department object in the session, 10 of those proxied objects will get loaded together.

    • Batch fetching at collections : This allows batch fetching of value collections, one-to-many or many-to-many relationships that are unitialized. To give an example, consider Department-Employee one-to-many relationship where there are 25 departments loaded and each department has a lazy collection of employees. If you now iterate through the departments and call getEmployees() on each, by default 25 SELECT statements will be executed, one for each Department to load its employee objects. This can be optimized by enabling batch fetching which is done by specifying “batch-size” on the relationship property like

      In Department.cfc :

      &lt;cfproperty name="employees" fieldtype="one-to-many" cfc="employee" fkcolumn="deptId" lazy="true" batch-size="10"&gt;

      One important thing to understand here is that batch-size here does not mean that 10 employees will be loaded at one time for a department. it actually means that 10 employee collections (i.e employees for 10 department objects) will be loaded together.
      When you call getEmployees() on the first department, employees for 9 other departments will also be fetched along with the one that was asked for.

    The value for batch-size attribute should be chosen based on the expected number of proxied objects or unitialized collections in the session.