Sunday, February 15, 2009

How to Design Software for Flexibility, Reusability and Scalability without loosing KISS principles!

Looking at current developments and blogs, modularization currently is one of the key issues of the Java community. OSGi, and JSR294 are just two of the more prominent candidates who try to cover this topic. Both of them have considerable drawbacks - I think I've shown enough of OSGi's drawbacks in my previous articles ... JSR294 is not yet available and seems to lack the very important Java "write once, run everywhere" principle (http://www.infoq.com/articles/java7-module-system)

The topic's rushing through the community not for nothing. Modularization definitely is one of the key issues of successful software. Why should you wait for either OSGi beeing usable or Java 7 beeing released. Chances are even good that you cannot use any of them due to your projects constraints. This articles shows some architectural constraints how to build (micro) service-oriented applications with a very high degree of modularization, flexibility and reusability. Additionally, it shows you how you can assemble such an application, unit- and integration test it easily and how to automatically force your team members (and yourself!) to not break the modularization rules. Actually, you can have almost all of the superb concepts of efforts such as OSGi or the upcoming Java modularization concepts almost for free - Good news - Isn't it ? :o)

BTW: Along with this article I've prepared some sources to get you started with the concepts - The sources can be downloaded here.

Up front, let me show you how a highly modular application looks like and why we want to have such a high degree of modularity. As an example let us think about a blogging software (just to not see yet another threadbare pet shop). Such a system could at a bare minimum be built as a simple web application using some fancy libraries like Hibernate to store and load it's data. If you think a bit more SOA, then you may split the application into two distinct parts, where the actual blogging services (e.g. finding and storing blogs, authenticating users, etc...) are implemented in a different module that may be accessed through some kind of remote interface - giving you the ability to integrate with other applications. By splitting the application into two distinct parts we get a bit more than just interoperability! First, we have two distinct modules,
one with a good potential for reuse. Second, since the two parts are autonomous, errors are local to each module and thus analyzation borders are narrower. And third, it's much easier to test the two modules because both have well defined interfaces at their border - it's now e.g. pretty easy to test the user interface by just mocking or stubbing the service layer. An additional gain is the chance to scale at module borders. It is important to note that scalability does not automatically come with modularity! However, in our example chances are pretty good that we can scale our service layer by duplicating it onto different machines if necessary. This kind of coarse-grained modularity is good - and the good news are that by bringing down these modularization concepts to a more fine-grained scale we can gain many of the same benefits at that level!

Let's imagine how such a fine-grained modularization could look like. A first sketch of our service layer could show the following modules
  • Data Access (DA): Reads data from and stores data to some kind of data store
  • Blog Editing (BE): Services for blog editors
  • Blog Browsing (BB): Provide blog data to applications (find blogs, get blog data, abstracts, etc...)
  • Full Text Search (FTS): Allows full text searches in blogs
  • Full Text Indexing (FTI): Indexes blogs for full text search

The first three modules are pretty obvious and have some kind of coupling (just logical - coupling on the source code level is just true for BE and BB having a dependency on DA). The same applies to full text indexing and searching. Beside getting all the benefits noted above, things get interesting when we think of assembling the application together. We now have many possibilities to assemble applications. Our first approach will definitely be to build one application out of all parts. A couple of months later, when our blogger software gets popular, we want need to scale... Whew! We now recognize that we cannot just duplicate our application because our full text indexing software can only read concurrently but only one application is allowed to write at once. With the concepts I described in this article we can just reassemble our
application (without changing our application or the modules!), e.g. into the following scenario:
  • Machine 1: BB + DA + FTS
  • Machine 2: BB + DA + FTS
  • Machine 3: BE + DA + FTI
Pretty cool - eh ?

Ok - enough of the introduction. Let's dive into the details. The most important thing is to understand what a module consist of and how it is built. In Java a module maps to a JAR file - or to a project if you like to think in build artifacts. It is implemented, tested and maintained autonomously. A module is self-contained. It knows how (but not when) to start and shutdown by itself. It provides some services but hides the implementation details of it's services from the outside world. Here's a simplistic example of the data access module (don't do it that way - it's just to understand the concept):

public interface BlogStorageService {
BlogEntry findById(long id);
void store(BlogEntry entry);
}

public class DataAccessModule {
public void start() {
blogStorageService = new BlogStorageServiceImpl();
}
public BlogStorageService getBlogStorageService() {
return blogStorageService;
}
}

The service interfaces (BlogStorageService) and data carriers (BlogEntry) as well as any other types to be used with your services build your contract for the outside world. The module implementation itself (DataAccessModule) is the "glue code" that starts up your module. In reality, the glue code building up your module is far from trivial! It's easy to create your service implementations - but there are many other dependencies that need to be satisfied. Our BlogEntryServiceImpl will definitely need access to some kind of data source, configuration facilities etc... Additionally, inter-module dependencies need to be resolved (e.g. Blog Browsing needs Data Access) - this leads to startup issues (what is started first, who has to wait from whom). Unless you are insane you will definitely delegate this task to some kind of IOC container. Although Spring is currently for sure the most popular IOC container I prefer to use Tapestry IOC for this article (and for my current project) because it is extremely easy to use and
addresses all of the above mentionend issues (cross module injection, inter-module dependencies, synchronization issues, configuration and more). Here's our Tapestry IOC based Data Access Module (visit http://tapestry.apache.org/tapestry5/tapestry-ioc/ if you cannot imagine what the following code does...):


public class BlogStorageServiceImpl implement BlogStorageService {
public BlogStorageServiceImpl(DataSource dataSource) { ... }
}

public class DataAccessModule {
public static void bind(ServiceBinder binder) {
binder.bind(BlogStorageService.class, BlogStorageServiceImpl.class);
}
}


Tapestry IOC will take care of creating our service and will inject the required DataSource into the BlogEntryServiceImpl constructor. But wait - where does the DataSource come from. Maybe the next example will give you a bit more insight. Here's and example out of the Blog Browsing module:


public interface BlogBrowser {
BlogEntry findBlogsByUser();
}

public class BlogBrowserImpl {
public BlogBrowserImpl(BlogStorageService blogStorageService) { ... }
}

public class BlogBrowsingModule {
public static void bind(ServiceBinder binder) {
binder.bind(BlogBrowser.class, BlogBrowserImpl.class);
}
}


BlogBrowserImpl requires an instance of BlogStorageService to work correctly. Tapestry IOC will take care of resolving this instance for us and it will automatically pull the instance from DataAccessModule once it's started. You just need someone telling Tapestry IOC which modules your application consists of (you can e.g. do this by using a JAR Manifest or - more practically - use Tapestry's @SubModule annotation that marks a module requiring another one).

This is pretty KISS - Especially when you think of the fact that we already almost reached our vision! Actually the only thing that's missing right now is constraining you and your team. I always divide modules into three different areas:
  • The published area that contains service interfaces and any types the module wants to publish to the outside world.
  • The internal area where the service implementation and any module internal stuff resides
  • The module area where the glue code resides

The areas are built by package conventions:
  • Everything in module.package.internal.* has internal area scope
  • Everything in module.package.module.* has module area scope
  • Everything else has published area scope

The following minimum rules apply:
  • Everyone can access the published area
  • Only the module itself can access the internal area
  • Module areas can access module areas of other modules

In many cases you will even want to introduce one additional rule:
  • Published areas may only access other published areas

This is of special interest if you plan to exchange serialized instances of your objects. In such a case you need to take care that the receiving part can deserialize the object (and thus needs to know the class). You can enforce this by tagging your data carrier objects as final and introducing the above rule. In addition, the last rule allows you to extract the published API at any time without introducing dependencies to internal stuff!

It's pretty easy to enforce this rules by adding tools like Classycle to your build chain. The following Classycle definition file checks the mentioned rules (including the published may only access published rule):


[allModule] = blogger.*.module.*
[allInternal] = blogger.*.internal.*
[allPublished] = blogger.* excluding [allModule] [allInternal]

[moduleModule] = blogger.dataaccess.module.*
[moduleInternal] = blogger.dataaccess.internal.*
[modulePublished] = blogger.dataaccess.* excluding [moduleModule] [moduleInternal]

[allInternalWithoutModuleInternal] = [allInternal] excluding [moduleInternal]

check [modulePublished] independentOf [allModule] [allInternal]
check [moduleModule] independentOf [allInternalWithoutModuleInternal]
check [moduleInternal] independentOf [allInternalWithoutModuleInternal] [allModule]


Unit-testing of modules is easy. Integration testing is easy too - because you only need to load the module that you want to test and the modules this module depends on. Basically an integration test is just a minimum application assemblation. If you made use of Tapestry's @SubModule facilities or if you tell Tapestry to just load all modules that are on the classpath this minimum application assemblation minimizes to just the specification of the module under test! Whenever it comes to unit and integration testing I strongly recommend the usage of Unitils (http:///www.unitils.org) - an excellent framework that supports many best practices in unit- and integration testing. I've written a simple Tapestry IOC extension for Unitils that allows us to specify the module(s) under test and injects services into our test. Here's an example integration test:


@RunWith(UnitilsJUnit4TestRunner.class)
@TapestryModule(BlogBrowsingModule.class)
public void BlogBrowsingIntegrationTest {

@TapestryService
public BlogBrowser browser;

@Test
public void testFindBlogEntriesForPeterRietzler() {
...
}
}


That's integration testing brought down to the simplicity of a Unit-Test. You can additionally use all the Unitils stuff to setup your database for the test etc...

That's all folks! Just to summarize - with just a few lines of code (it really is NOT more) we've built an extremely modular and flexible architecture bringing much of the stuff that OSGi or the upcoming JSR294 provide. More - We've brought integration testing to an almost unit-testing level (from the aspect of complexity) and we have an application that can be assembled together out of its constituent parts in many different ways. And please don't tell me that's not KISS - I can't think of a simpler way to do it!

This article was written after the above concept was applied to a real-life project (not a blogger :o). The project currently consists of ~30 distinct modules and is expected to grow much further than that. I've worked for projects that follow the same principles (altough they often missed the KISS described in this article) with more than 100 modules and I know of other applications with more then 800 distinct modules.

The sources for this article contain everything that you need to get started with the concept in real-world projects (everything is still KISS but with some minor enhancements to help you with everyday tasks...):
  • Some module examples
  • The Unitils extensions
  • An integration testing example (showing how easy integration testing gets and demonstrating some application assembling options)
  • The maven POM's including the Classycle dependency integration

Wednesday, January 14, 2009

How to Pull the Best out of OSGi - Even without OSGi!

Whew! When looking at my blog statistics I seem to have hit a hot topic with my last blog entry about OSGi...

I am delighted to see such attention in the community - because OSGi's principles are first class! I somehow got addicted about Peter Kriens and Oleg Zhurakousky's comments about "silver bundles". The world of the enterprise Java developer is a world of almost extreme diversity - When I start up a greenfield enterprise Java application, I have to decide from a huge variety of components. Most of us decide on an open-source stack - and there we hit the point - no "silver bundles" in our stack.

In this article I'll illustrate how to employ OSGi's modularity principles without using OSGi as the runtime platform - and therefore restore many existing library's "silver bundle" status. Needless to note that we will miss some benefits of OSGi. However, there's a good chance that you don't need these features (e.g. version conflict resolution or dynamic bundle installation
during runtime). Oleg noted that he does not think that "anyone in the right state of mind would deny OSGi" - and he's right - Therefore I do take special care that an application that follows these principles can be run within an OSGi runtime pretty easy once there is a need to turn over!

Everything's mentioned in this article is available as source code! I don't know how to deal with the sources yet. I would really like to contribute the stuff to SpringSource because I am convinced it's highly valuable for all Spring users - but they want to sell DM server - so it may not be their primary wish to promote such things in Spring core. If you find these things useful and if you think this would be a win for Spring core please leave a comment on this post! A comment to my last post noted another open source project (http://code.google.com/p/impala) that seems to have similiar objectives - maybe I will join Impala and contribute the stuff there. I'd be happy about any suggestions! In the meantime you can download the source code here:

Before I get started let me take two assumptions:
Assumption #1: Most applications do not need the high dynamics of OSGi. These applications may be redeployed in case new modules are available.
Assumption #2: Most applications do know the set of services they intend to use. They also know that they can or cannot operate if a service is not available in our current deployment scenario (e.g. the module containing the service is not deployed in this installation). In OOP terms: You either get an object connecting you to the service or you don't get such an object. You don't have to care about service objects suddenly vanishing at runtime.

These assumptions build the boundaries. If you have an application that needs to heavily load and unload modules at runtime and if you cannot predict who will contribute to your application (versioning conflicts, security, etc.) you will probably be perfectly happy with OSGi and you can stop reading. Eclipse is a perfect example. Please leave a comment if you know server-side applications with such requirements!

Enough of introduction ... Let's get started with the basic principles:
  1. An application is made out of a set of modules. A module is much like an OSGi bundle.
  2. A module provides services (described through Java interfaces) to others. The services ar instantiated by the module itself. Services are very much like OSGi services.
  3. A loader discovers modules (by convention or through rules in special deployment situations) and starts the modules.
  4. Constraints (e.g. published vs. private packages) are checked at build time.
Today, Spring DM would be my choice for OSGi development. To make a late turn to OSGi realistic and due to the fact that Spring is an essential part of many chosen open source stacks, a module is pretty much the same than a Spring DM module. The module ships with at least one application context where our objects are instantiated and our services are exported:

<!-- module internal stuff -->
<bean id="myService" class="com.blogspot.peterrietzler.internal.MyServiceImpl"/>
<!-- service export. supports a feature set comparable to Spring DM -->
<modules:service ref="myService" interface="com.blogspot.peterrietzler.MyService">
<modules:service-properties>
<prop key="someQoS">true</prop>
</modules:service-properties>
</modules>


Any other module may now reference our service. Here is an example using a plain reference matching any service and a reference matching only a service with a required quality of service (filters are RFC1960 - same as in OSGi):
<modules:reference id="myService" interface="com.blogspot.peterrietzler.MyService"/>
<modules:reference id="myService" interface="com.blogspot.peterrietzler.MyService" filter="(someQoS=true)"/>

You can also reference lists of services:

<modules:list id="myServices" interface="com.blogspot.peterrietzler.MyService"/>
<modules:list id="myServices" interface="com.blogspot.peterrietzler.MyService" filter="(someQoS=true)"/>

Modules are loaded independent of each other. Each module has it's own IOC container. This is important because this will allow aspects to be applied on a module scope (self-containment!). If a service is not available, e.g. because the module is not available in the current deployment, application startup fails. If your module can start without a specific service being available, then just mark it as optional and a null value or an empty list will be injected if the service is not available in your deployment:

<modules:reference id="myService" interface="com.blogspot.peterrietzler.MyService" optional="true"/>
<modules:list id="myServices" interface="com.blogspot.peterrietzler.MyService" optional="true"/>


When working with Spring DM you need to be careful with some parts of Spring, e.g. because Spring DM creates a proxy object for each bundle and other Spring components use the object's identities as keys in hashmaps. The Hibernate session manager is a pretty good example and
I leave it up to your own imagination what can happen if you have a Hibernate session manager exported as a Spring DM service :o). Additionally, Spring prototype beans are not supported to be exported as a service. To reduce these hard to recognize pitfalls there is some special support
for these two issues (note that this is not OSGi compliant and needs to be refactored to other design patterns in case of a turn over to OSGi).

  1. You can export a prototype scoped bean as a service. Each reference will be backed by one instance (behaves like in standard Spring).
  2. Each exported singleton service will be represented by the same object within all modules (behaves like in standard Spring).
Why bothering about this and why accepting something that does not match Spring DM / OSGi concepts ? Because these are really hard to observe pitfalls. Spring DM is so close to Spring that you intuitively think that you can do the same things as you can do with standard Spring. In fact
there are some very subtle differences. You can e.g. run into situations where your application seems to work but behind the scenes it just seem work because it falls back to some (in this case unwanted) default strategies (such as creating a new Hibernate session and database transaction for each method call in your data access layer).

Last thing is module loading. I follow the Spring DM conventions: By default all XML files in the META-INF/spring directory are loaded in the modules IOC container. In order to mark something as a module you put a module-info.properties file into the META-INF/spring directory. This file is in it's most basic form just a marker file that tells the module loader that this is a module to be loaded (much like hte OSGi manifest). In more sophisticated scenarios the module.properties file can specify strategies how a module should be instantiated. The minimal module.properties file just contains an artifical module name:

Name: MyModule

Application startup is easy:

new ModuleApplicationContextLoader(new ClasspathModuleLoader(), new DefaultServiceRegistry()).loadApplicationContexts();

To include a module in your startup process just put in on the classpath. This greatly simplifies integration testing. If you are using Maven you don't have to do any special setup since everything will already be on your classpath! For special situations and deployments you can either implement your own module loader or you use module loading rules. Here is an example for a module loading rule file:

<?xml version="1.0" encoding="UTF-8"?> <modules xmlns="http://www.rietzler.com/schema/spring/modules/loader/resolver" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:resolver="http://www.rietzler.com/schema/spring/modules/loader/resolver" xsi:schemaLocation="http://www.rietzler.com/schema/spring/modules/loader/resolver http://www.rietzler.com/schema/spring/modules/loader/resolver.xsd" includeByDefault="false">

<!-- include only modules with matching names. evaluated against module.properties file -->
<include name=".*Include.*"/>
<!-- include only modules matching the given RFC1960 filter. evaluated against the module.properties file -->

<include filter="(moduleQoS=xxx)"/>

<!-- provide your own rules -->

<rule class="com.rietzler.spring.modules.loader.XmlModuleResolverBuilderTest$IncludeByNameRule" parameters="MyModule" />

</modules>


Build-time constraint checks are currently out of scope of the source code. However, I use Classycle (http://classycle.sourceforge.net) to enfore similiar constraints that OSGi does at runtime. With Classycle it would be a pretty easy task to write a tool that understands OSGi exported packages and checks against OSGi bundles.

For further details you can have a look at the source code. It's not final production quality, but it already contains source code documenation and you can inspect the tests and test resources for getting further information.