Many applications end up in performance problem, when those are pushed through testing phase or are in production. If we look at the causes behind performance problems, there are a set of problems that are caused by inferior data handling. These appplication require these data handling best practices. Data handling is very critical in case of applications processing huge amount of data. Here are few practical data handling best practices or tips for better Java application performance.
Reduce Amount of Data Transferred
In any java application, method call is made to either get something done on method side, or get data processing done by passing input data. Both operations involve data exchange between the caller and processor methods. There can be a rule of thumb that the amount of data that is transferred to and from a method call is minimized. Lesser data provides many benefits e.g. reduced amount of processing, lesser objects to clean up, smaller memory footprint and so on. The program design should target to reduce the amount of data getting transferred across methods, layers, applications and even organizations. This can be achieved by exploring possibility of data source side processing explained below.
This is delaying data fetching from data store until the last moment when data is actually needed. It is very beneficial in case of heavy objects. E.g. there is a file entity in database which has file contents as blob attribute along with other file characteristics. This blob can be from few Kilobytes to hundreds of Megabytes. Middle tier logic is dependent on attributes other than this heavy blob attributes until it is finally showing the file contents. Lazy loading can be used to delay loading of blob attributes.
Repeated/Duplicate Data Calls
Repeated calling to data provider to get data can hamper performance considerably when it is a remote call of certain type, e.g. database call, web service call, or any call involving marshaling and un-marshaling of data. Use Façade pattern instead; pull all required data in one call and minimize the connection cost and data transfer over network cost.
Frequently used but less frequently changing data can be cached. Mostly it is static/master data that is cached. Now a days, applications requiring very high performance use transaction data caching also. But as a simple rule, while designing the application, identify such entities that will be frequently required but are not going to change so frequently or not at all going to change, and cache those at appropriate location. Refresh logic can also be employed for such entities.
Data Source Side Data Processing
Good to process data at its source or storage location itself. Transferring large amount of data to the client, and then processing it involves cost of transfer, and sometimes client specific processing logic implementation in changed format. E.g. Filtering certain amount of data in oracle can be as simple as adding a where clause to the select query, while getting all the records and filtering those in java program will require row by row traversing, and individual attribute comparison. This Java implementation may not be very optimal.
Minimum Data Transformation and Conversion/ Incorrect Data Types
Converting data from one format to other may involve cost of conversion. For a single value the performance hit due to conversion may be tiny, but if we combine it for thousands of records, it becomes considerable. E.g. selecting value in String data type and converting it to double or primitive and vice versa.
Correct Collection to Hold Data
It is a very important consideration from performance point of view. Java has provided different collections to fulfill different needs, e.g. there are few kind of raw collections e.g. ArrayList, where you can go on adding data without the collection doing any intelligent operation on the data in collection. Example is Vector, which is synchronized. It is good to ask this question, do I need synchronized data insertion in my data collection? If not, then don’t go for Vector, try ArrayList or any other collection depending on other parameters.
Sub Optimal Data Processing Algorithms
Sometimes it is the data processing algorithm or logic implementation which is having performance problems. This logic implementation should have performance as one of the goals along with other goals, such as memory consumption. Few things that make algorithms sub optimal are –
- · Expensive calls, database calls from loops
- · Object declaration inside loops
- · Unnecessary nested loops
- · Objects storage in multiple collections
Finalize Heavy Data Objects
We cannot guaranty immediate garbage collection, but finalizing or setting null is a good practice.
Utilize Technology Features for Optimal Data Processing
Many technologies have come up with features to help in data processing. This includes ready features of implementation of some of above points. Below are few examples –
– Prepared statements support from Oracle
– Framework caching e.g. Hibernate primary and secondary cache
– Lazy loading support by Hibernate framework
Data Serialization and Deserialization
Design to avoid these expensive operations. If not then it is good to minimize the amount of data that is transformed. E.g. In case of web applications, minimize amount of data being put in session.
Parallel Data Processing
If there is need for huge amount of data processing, then unrelated data can be processed in parallel. This gives reduced total processing time.
Instead of creating heavy objects from scratch, one can create clone of existing objects and modify only the required attributes reusing whatever possible. Amount of reuse can be controlled by shallow or deep cloning.