Archive for category performance

Using CFFeed with URL sending compressed content

Today someone posted on ColdFusion forum regarding this problem where cffeed was not able to handle a particular URL and it was throwing an error. The URL which he tried was http://movies.msn.com/rss/topcelebs and it failed with an error

Unable to read the source URL.
unknown compression method

The reason it happens is – The URL returns the response in gzip compressed format only. So when ColdFusion sent a request to this URL and asked for uncompressed data, it could not get anything and hence it was unable to read it. A simpe workaround for this is to use cfhttp to fetch the content, write to a temporary file and then use the cffeed tag to read this file. Important thing to keep in mind here is to set an additional header in the cfhttp tag using cfhttpparam to indicate that it can accept compressed data as well.

Here is the modified code where it first tries cffeed with the URL. If that fails, then it tries to use cfhttp to fetch the content and writes to a temporary file and then uses it in cffeed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<cfset tempDir=GetTempDirectory()>
<cfset tempFile = GetTempFile(tempDir, "myfeed")>
<cfset tempFileName = GetFileInfo(tempFile).name>
 
<cftry>
    <cffeed action="read" source="http://movies.msn.com/rss/topcelebs" name="feedInStruct" >
<cfcatch any>
    <cfhttp url="http://movies.msn.com/rss/topcelebs" path="#tempDir#" file="#tempFileName#">
        <cfhttpparam type="header" name="Accept-Encoding" value="compress,gzip,deflate">
    </cfhttp>
    <cffeed action="read" source="#tempFile#" name="feedInStruct" >
</cfcatch>
</cftry>
<cfif FileExists(tempFile)>
    <cfset FileDelete(tempFile)>
</cfif>
 
<cfdump var="#feedInStruct#">

Tags: ,

ColdFusion 8 : IsInstanceOf

If you use lot of CFC inside your ColdFusion application, I am sure you would have come across a situation where you would need to know whether the object is an instance of a particular CFC. This is specially needed when you have components extending other component or you are passing the objects around. ColdFusion 8 introduces a new function IsInstanceOf to do exactly the same. It becomes even more useful after we have interfaces in ColdFusion. And the icing on the cake is that it works even with java objects which means that you can use this function to find out if a particular object is of a particular java class type.

Here is how the function looks.

IsInstanceOf(object, typeName)

where typeName is name of the component/Interface or fully qualified java class name.

It returns ‘true’ if

  • The object passed is an instance of a component which is same as specified type or inherits it or implements the specified interface. Just to be clear, a component ‘A’ inherits a component ‘B’ if A or any of its super component extends ‘B’. Similarly a Component ‘A’ implements an interface ‘B’ if A or any of its super component, implements interface ‘B’ or any of the interface that ‘A’ or its parents implement, extends from the specified interface.
  • The object passed is an instance of a java class (created using cfobject or createObject for java class) which is same as specified class name or inherits the specified class name or implements the specified interface.

Here is an example

Intf.cfc

<cfinterface>
    <cffunction name = "foo">
    </cffunction>
</cfinterface>

Comp.cfc

<cfcomponent implements="Intf">
    <cffunction name = "foo">
        <cfoutput>In method foo</cfoutput>
    </cffunction>
</cfcomponent>

test.cfm

<cfset obj = CreateObject("Component", "Comp")>
<!--- Create a Java object --->
<cfset javaObj = CreateObject("java", "java.lang.StringBuffer")>
 
<cfoutput>object is of type Comp : #IsInstanceOf(obj, "Comp")#</cfoutput><br>
<cfoutput>object is of type Intf : #IsInstanceOf(obj, "Intf")#</cfoutput><br>
 
<cfoutput>java object is of type String : #IsInstanceOf(javaobj, "java.lang.String")#</cfoutput><br>
<cfoutput>java object is of type StringBuffer : #IsInstanceOf(javaobj, "java.lang.StringBuffer")#</cfoutput><br>

Tags: ,

ColdFusion 8 performance whitepaper

I am glad to see the ColdFusion community so excited with CF 8 performance. We got very encouraging response regarding the performance in our pre-release forums and we are thankful to everyone who gave such invaluable feedbacks. Though people started seeing a very good performance gain in their application on ColdFusion 8, we never talked about the performance benchmark numbers until CFUnited 2007, where Ben showed the performance numbers for ColdFusion 8. No wonder this is the #1reason in Top 8 reasons why you want ColdFusion 8 :-).

Extracting performance up to this extent was not an easy task. We analyzed nearly 2.4 million lines of real life CF application code, to zero-in on the most commonly used tags and functions. Main challenge was after that – analyze the generated java code for all the tags, change the compiler to generate more optimized code, run it through profiler and optimize CF engine for each of the tags and functions and their various combinations. And this went on and on in many many iterations. Overall it was real fun :-)

I still remember the most exciting moment when I ran the load test for CFC, after I had made some code changes, and the result was freaking unbelievable. It looked too good to be true and I was literally running around with the code changes to run it on other machines and verify it . That one small change gave nearly 6x gain :-) . It is not that the CF7 code was inefficient or poorly written. It was only matter of extracting juices as much as possible and putting some smart intelligence.

Check out the ColdFusion 8 performance whitepaper which talks in much detail about the performance numbers for different areas, the methodology used for benchmarking etc. Manju Kiran, who was my QA-buddy for most of the features I worked on, did a tremendous job in setting up and running the benchmark and creating the meat of this wonderful document (and of course keeping me on toes).

Tags: ,

New File I/O in ColdFusion 8

Till now we have been using <cffile> for all kind of file operations and it does a very good job. If you want to read a file, give the file to this tag and this tag gives you the read content. If you want to write content to a file, you give the content and file name to this tag and it will do that. You want to copy/delete/move your files, this tag will do all of that. All very simple and short. But there are two particular issues which <cffile> does not address.

1. Reading/writing big files – Since <cffile> is a tag, it can only perform one-shot operations. So, to read, it has to read everything in one shot and to write, you have to provide the entire content and that means that <cffile> will have to keep the entire content in memory. It is not of much concern if the file size is just few KBs but as the size increases beyond 100 kb or when it reaches few megs, it can really hurt. It would create a memory crunch on the server and if the load is high and there are many read/write happening simultaneously with large files, it can even lead to OutOfMemory error in server. Apart from creating memory crunch, it will also slow down the server because VM would need to allocate/deallocate larger chunk of memory which would lead to larger and frequent Garbage Collection cycle. At this point, you might ask, why would I ever read or write such a big file? Well I can think of few

  • You need to process the data that comes in a flat file
  • csv parsing
  • Finding the mime type of a file like mp3, image, video etc
  • you want to create a log viewer
  • … many more

You get the idea.. right?

2. Again since <cffile> is a tag, it is not very easy to use inside cfscript. Either you have to move out of cfscript to use this tag or you wrap this tag in a function and call that function. Though thats true with all the tags but cffile is so commonly used that this looks like a limitation.

New File I/O introduced in ColdFusion 8 addresses both these problems. New File I/O is all based on functions and hence that automatically takes care of problem 2. That means you no longer need to use cffile if you are inside cfscript. I will give more details on handling problem 2 in my next post. In this post I will mainly focus on reading/writing files in chunk using new IO .

The new I/O is based on the same philosophy that is used in other languages i.e;

  1. You first open a file
  2. Perform read/write operations on it
  3. and close the file.

Lets see each of the steps in little detail.

Step 1 : Open a file : Here is the function to open a file

FileOpen(filepath [,mode] [,charset]) -> fileobject

Both mode and charset here and optional. Mode can be “read”, “readBinary”, “write” or “append”

read” mode, which is default, is used to read a text file and hence any read operation will give you text data from it. When the file is a text file, you can also optionally specify the charset of the file. So if the file contains UTF-8 or UTF-16 characters (or characters from any other charset), you need to specify it while opening the file.

readBinary” is used to read a binary file and hence any read operation will give you the binary data i.e byte array.

write” mode will open the file in write mode which means that if the file already exists, it will be overwritten.

append” mode, as the name suggests, will open the file in append mode which means that any write operation on that file object will write it at the end of file.

FileOpen function returns you a handle to the native file and you need to use this handle for all further read/write operation. Of course you should keep in mind that you can not perform “read” operation on a file handle that was opened in “write” mode and vice versa.

Step 2
: Do Read/Write operations : Once you get the handle to file object, you can perform multiple read/write operations using this handle. There are several functions to do that.

2A. Read Operation :

i) FileRead(fileobj, no of character/bytes to read) : This provides you a way to read a chunk of data (say 1 kb at a time) from the file at a time. Since you only read a chink of data at a time, it does not create memory crunch on the server. Since this is read operation, file must have been opened in “read” or “readBinary” mode. Depending on which mode the file was opened, this function will return the text or binary data read. One thing to note here – If the data remaining is less than the requested size, this method will return you only the remainign data. i.e if 100 character are remaining in the file being read, and you request for 1000 characters, it will return you 100 characters only.

ii) FileReadLine(fileobject) – This reads one line from the text file. To call this method, the file must have been opened in “read” mode.

Both these read operations can be called multiple times until you reach end of the file. One the end of file has reached, any further read call will result into an “EndOfFile” error. So in order to avoid this error, you should always check whether you have reached the end of file. And the function to do that is

FileIsEOF(fileobj) : Just to be more clear, EOF here stands for “End of File”. This function will return true if the end of file has been reached otherwise will return false.

Here are few examples of reading content from file
Read 1 kb binary data at a time.

<cfscript>
    myfile = FileOpen("c:\temp\song.mp3", "readbinary");
    while (! FileIsEOF(myfile)) { // continue the loop if the end of file has not reached
       x = FileRead(myfile, 1024); // read 1 kb binary data
       ...// process this binary data..
    }
</cfscript>

Process a text file line by line

<cfscript>
    myfile = FileOpen("c:\temp\myfile.txt", "read");
    while (! FileIsEOF(myfile)) { // continue the loop if the end of file has not reached
        x = FileReadLine(myfile); // read a line
        ...// process this line..
    }
</cfscript>

2B. Write operation

i)FileWrite(fileobject, content) – This will add the text or binary content to the file. The file must have been opened in “write” or “append” mode.

ii) FileWriteLine(fileobject, text) – This will add the text followed by a new line character to the file. Here again, the file must have been opened in “write” or “append” mode.

You might wonder that if both the write operations add the content to the file, whats the difference between “write” and “append” mode? The difference is only at the time of opening the file. As I said earlier, opening the file in “write” mode will overwrite the file if already existsed and put the file pointer at the the beginning of file. Whereas opening file in “append” mode will simply put the file pointer at the end of file.

Any subsequent “write” calls, irrespective of “write” or “append” mode, will append the content to the file.

Here is an example of writing content to a file. This reads one line from an input file, does some processing on it, and writes the resultant data to another
file.

<cfscript>
    infile = FileOpen("c:\temp\input.txt", "read");
    outfile = FileOpen("C:\temp\result.txt", "write");
    while (! FileIsEOF(infile)) { // continue the loop if the end of file has not reached
        x = FileReadLine(infile); // read a line
        data = processLine(x);
        FileWriteLine(outfile, data);
    }
</cfscript>

Step 3 : Close the file : Once you are done with read/write operations, you *must* close the handle to file. And the way to do that is using function

FileClose(fileobj)

What if you don’t close the file object? Well, that file will remain locked by the server as long as the file is open, and no other process can modify/rename or delete that file.
You might also ask, why does not ColdFusion automatically take care of closing the file? Why should the developer be bothered about it? Well.. ColdFusion does take care of it when the file object goes out of scope and if it is not kept in any accessible scopes but you can never be certain when exactly this will happen. This might happen immediately or this might happen hours later :-) .
Bottomline, you should make it a practice to call FileClose() once you are done with the file object.
Just to show its usage, I will complete the example I used in write.

<cfscript><br />
    infile = FileOpen("c:\temp\input.txt", "read");
    outfile = FileOpen("C:\temp\result.txt", "write");
    while (! FileIsEOF(infile)) { // continue the loop if the end of file has not reached
        x = FileReadLine(infile); // read a line
        data = processLine(x);
        FileWriteLine(outfile, data);
    }
    FileClose(infile);
    FileClose(outfile);
</cfscript>

These set of functions would greatly help if you need to work with a file of more than 10 kb size.

Apart from these set of functions, ColdFusion 8 also adds a new language struct to read text files. With ColdFusion 8, you can use <cfloop> to iterate over “lines” or “characters” in a text file. This makes it very easy and convenient to do any kind of text file parsing or processing in your application. Lets take a look at the new syntax of cfloop for reading file (and I really love this syntax :-) ).

New attributes in cfloop for reading file :

file” – path of the file to read
characters” – no of characters to read in one iteration.

  1. Reading Lines : Below is the simplest syntax to read one line at a time from the file in a loop. This would read the entire file and the loop would end when the file has been completely read. The read content will be available in the index variable specified.

    <cfloop file="c:\temp\myfile.txt" index="line">
        <cfoutput>#line#</cfoutput> <!--- or do whatever with the line --->
        </cfloop><br />

    With cfloop, you can also iterate over a part of the file by specifying “from” and “to” values.
    Here is an example to loop over lines between 10 and 20.

    <cfloop file="c:\temp\myfile.txt" index="line" from=10 to=20>
        <cfoutput>#line#</cfoutput> <!--- or do whatever with the line --->
    </cfloop>

    “from” and “to” both are optional attributes where “from” defaults to ‘1′ i.e start of file and “to” defaults to the last line of the file.

    So to read first 10 lines from the file, you can use

    <cfloop file="c:\temp\myfile.txt" index="line" to="10"><br />
        <cfoutput>#line#</cfoutput> <!--- or do whatever with the line --->
    </cfloop><br />

    One word of caution here – If you use “to” attribute here, its value must be less than the number of lines in the file otherwise you would get an “EndOfFile” error. For example, if I had only 5 lines in my file and value of to is 7, this would throw an error because line 6 and 7 do not exist.

  2. Reading characters : For reading characters instead of line, you need to provide the value for “characters” attribute and as many characters will be read in one iteration. The loop will automatically end when the end of file has reached. The read content will be available in the index variable specified.

    An example for that is

    <cfloop file="c:\temp\myfile.txt" index="chars" characters="1000"><br />
        <cfset x=chars>
        <!--- do whatever with the characters --->
    </cfloop>

    One important thing to note here. In the last iteration, when the end of file has reached, index variable will only have the remaining characters. For example if I have 130 characters in the file and I run the loop to read a chunk of 20 characters, in the last iteration, index variable’s value will only have last 10 characters.

This completes the first part of new File IO which mainly addresses the problem of working with larger files. However this does not mean that you can not or should not use these for smaller files. You can very much use these for all kind of files. These are very simple to use and perform really well. Go ahead and play around with it !

Performance Tips : ColdFusion List

how many of you have written/seen code like this?

<cfset mylist="jan,feb,mar,apr,may,jun,jul,sep,oct,nov,dec">
<cfloop from="1" to=#ListLen(mylist)# index="i">
 <cfset month = ListGetAt(mylist, i)>
 <!--- do something with this month --->
 <cfoutput>#month#</cfoutput>
</cfloop>

While there is nothing wrong with it syntactically or functionally, performance wise it is very poor. Why? ColdFusion list is nothing but String (delimited by delimiter). ColdFusion does not have any way to build any intelligence to keep it in any other datastructure because you can use it like a normal string also. So what happens when you call any List function on this string? We parse the string using the delimiter and get the delimited tokens and process that.
Now lets take ListGetAt(list, index) function. It will keep parsing and getting the token unless it reaches the required index. Imagine doing in a loop. We will be parsing the same string again and again and traversing from the beginning everytime till we reach the next loop index. So, in the Nth iteration, it will start from beginning and tokenize N times. Thus by the time you have completed the loop, you have parsed/tokenized the string N*(N+1)/2 times. Isn’t that too costly? Lesson – Never ever use ListGetAt() in a loop. Either iterate using OR convert the list into array using ListToArray() and iterate over it. Using cfloop is the most optimized way to do this.

Even if you are not iterating over list but you need to call ListGetAt() many times, it is better to convert it to array and then search the index in that.

Same thing applies to search functions like ListFind, ListContains etc. If you need to call these multiple times on the same list, you will be better off converting the list to array and searching in that.

If you need to append many items to the list, then also you will get a better performance by converting the list to array and doing all appends on that.

This does not mean you should not use list at all or you should always convert the list to array and work on that. If the number of operations that you are doing on the list is less, you should stick to list because converting the list to array is also costly. If you are inserting an element in the middle of list, list will be better than array in most cases.

String Concatenation optimization

String concatenation is one of the most common, but, a pretty expensive operation. It can hit the performance severly if not used correctly. The performance goes down drastically if you append strings using ‘&’ OR ListAppend() in a loop. I have seen application performance improving by 50-100% just by optimizing String concatenation (though that depends on how much concatenation is used in the app). So what do you about it?
The simplest and the most optimized way to do these append operations is using java’s StringBuffer. (I am sure you must be aware of it but still.. :) ) .
The code would look like

<cfset sb = createObject("java", "java.lang.StringBuffer")>
<cfloop from=1 to=100 index=i>
  <cfset sb.append("something")>
  <cfset sb.append(i)>
</cfloop>
<cfset result=sb.toString()>

Sometimes I feel that we should have a datastructure like this in ColdFusion directly but again I think whats wrong with using StringBuffer? Its like any other function which we would create. Isn’t it so?

If you are a puristic and don’t want to use any java API inside your CF app, there is another simple way to do the same thing. It uses ColdFusion Array to do the same thing what StringBuffer does. Instead of appending the string in the buffer, you can append to the array using ArrayAppend() and then once you are done and want to get the string back, use ArrayToList() with empty string (”") as delimiter. The code would look like

<cfset arr = ArrayNew(1)>
<cfloop from=1 to=100 index=i>
  <cfset ArrayAppend(arr, "something")>
  <cfset ArrayAppend(arr, i)>
</cfloop>
<cfset result=ArrayToList(arr,"")>

This would give a much better performance as compared to concatenation using ‘&’ or using ListAppend() but will have lower performance as compared to StringBuffer. That is because of the overhead of Array object creation and array append operation. ArrayToList() will anyway create the string buffer and append the strings

You should use ‘&’ or ListAppend() only when there are only 2-3 strings to be concatenated. Otherwise always use either of the two techniques above.