Sunday, February 15, 2009

How to Design Software for Flexibility, Reusability and Scalability without loosing KISS principles!

Looking at current developments and blogs, modularization currently is one of the key issues of the Java community. OSGi, and JSR294 are just two of the more prominent candidates who try to cover this topic. Both of them have considerable drawbacks - I think I've shown enough of OSGi's drawbacks in my previous articles ... JSR294 is not yet available and seems to lack the very important Java "write once, run everywhere" principle (http://www.infoq.com/articles/java7-module-system)

The topic's rushing through the community not for nothing. Modularization definitely is one of the key issues of successful software. Why should you wait for either OSGi beeing usable or Java 7 beeing released. Chances are even good that you cannot use any of them due to your projects constraints. This articles shows some architectural constraints how to build (micro) service-oriented applications with a very high degree of modularization, flexibility and reusability. Additionally, it shows you how you can assemble such an application, unit- and integration test it easily and how to automatically force your team members (and yourself!) to not break the modularization rules. Actually, you can have almost all of the superb concepts of efforts such as OSGi or the upcoming Java modularization concepts almost for free - Good news - Isn't it ? :o)

BTW: Along with this article I've prepared some sources to get you started with the concepts - The sources can be downloaded here.

Up front, let me show you how a highly modular application looks like and why we want to have such a high degree of modularity. As an example let us think about a blogging software (just to not see yet another threadbare pet shop). Such a system could at a bare minimum be built as a simple web application using some fancy libraries like Hibernate to store and load it's data. If you think a bit more SOA, then you may split the application into two distinct parts, where the actual blogging services (e.g. finding and storing blogs, authenticating users, etc...) are implemented in a different module that may be accessed through some kind of remote interface - giving you the ability to integrate with other applications. By splitting the application into two distinct parts we get a bit more than just interoperability! First, we have two distinct modules,
one with a good potential for reuse. Second, since the two parts are autonomous, errors are local to each module and thus analyzation borders are narrower. And third, it's much easier to test the two modules because both have well defined interfaces at their border - it's now e.g. pretty easy to test the user interface by just mocking or stubbing the service layer. An additional gain is the chance to scale at module borders. It is important to note that scalability does not automatically come with modularity! However, in our example chances are pretty good that we can scale our service layer by duplicating it onto different machines if necessary. This kind of coarse-grained modularity is good - and the good news are that by bringing down these modularization concepts to a more fine-grained scale we can gain many of the same benefits at that level!

Let's imagine how such a fine-grained modularization could look like. A first sketch of our service layer could show the following modules
  • Data Access (DA): Reads data from and stores data to some kind of data store
  • Blog Editing (BE): Services for blog editors
  • Blog Browsing (BB): Provide blog data to applications (find blogs, get blog data, abstracts, etc...)
  • Full Text Search (FTS): Allows full text searches in blogs
  • Full Text Indexing (FTI): Indexes blogs for full text search

The first three modules are pretty obvious and have some kind of coupling (just logical - coupling on the source code level is just true for BE and BB having a dependency on DA). The same applies to full text indexing and searching. Beside getting all the benefits noted above, things get interesting when we think of assembling the application together. We now have many possibilities to assemble applications. Our first approach will definitely be to build one application out of all parts. A couple of months later, when our blogger software gets popular, we want need to scale... Whew! We now recognize that we cannot just duplicate our application because our full text indexing software can only read concurrently but only one application is allowed to write at once. With the concepts I described in this article we can just reassemble our
application (without changing our application or the modules!), e.g. into the following scenario:
  • Machine 1: BB + DA + FTS
  • Machine 2: BB + DA + FTS
  • Machine 3: BE + DA + FTI
Pretty cool - eh ?

Ok - enough of the introduction. Let's dive into the details. The most important thing is to understand what a module consist of and how it is built. In Java a module maps to a JAR file - or to a project if you like to think in build artifacts. It is implemented, tested and maintained autonomously. A module is self-contained. It knows how (but not when) to start and shutdown by itself. It provides some services but hides the implementation details of it's services from the outside world. Here's a simplistic example of the data access module (don't do it that way - it's just to understand the concept):

public interface BlogStorageService {
BlogEntry findById(long id);
void store(BlogEntry entry);
}

public class DataAccessModule {
public void start() {
blogStorageService = new BlogStorageServiceImpl();
}
public BlogStorageService getBlogStorageService() {
return blogStorageService;
}
}

The service interfaces (BlogStorageService) and data carriers (BlogEntry) as well as any other types to be used with your services build your contract for the outside world. The module implementation itself (DataAccessModule) is the "glue code" that starts up your module. In reality, the glue code building up your module is far from trivial! It's easy to create your service implementations - but there are many other dependencies that need to be satisfied. Our BlogEntryServiceImpl will definitely need access to some kind of data source, configuration facilities etc... Additionally, inter-module dependencies need to be resolved (e.g. Blog Browsing needs Data Access) - this leads to startup issues (what is started first, who has to wait from whom). Unless you are insane you will definitely delegate this task to some kind of IOC container. Although Spring is currently for sure the most popular IOC container I prefer to use Tapestry IOC for this article (and for my current project) because it is extremely easy to use and
addresses all of the above mentionend issues (cross module injection, inter-module dependencies, synchronization issues, configuration and more). Here's our Tapestry IOC based Data Access Module (visit http://tapestry.apache.org/tapestry5/tapestry-ioc/ if you cannot imagine what the following code does...):


public class BlogStorageServiceImpl implement BlogStorageService {
public BlogStorageServiceImpl(DataSource dataSource) { ... }
}

public class DataAccessModule {
public static void bind(ServiceBinder binder) {
binder.bind(BlogStorageService.class, BlogStorageServiceImpl.class);
}
}


Tapestry IOC will take care of creating our service and will inject the required DataSource into the BlogEntryServiceImpl constructor. But wait - where does the DataSource come from. Maybe the next example will give you a bit more insight. Here's and example out of the Blog Browsing module:


public interface BlogBrowser {
BlogEntry findBlogsByUser();
}

public class BlogBrowserImpl {
public BlogBrowserImpl(BlogStorageService blogStorageService) { ... }
}

public class BlogBrowsingModule {
public static void bind(ServiceBinder binder) {
binder.bind(BlogBrowser.class, BlogBrowserImpl.class);
}
}


BlogBrowserImpl requires an instance of BlogStorageService to work correctly. Tapestry IOC will take care of resolving this instance for us and it will automatically pull the instance from DataAccessModule once it's started. You just need someone telling Tapestry IOC which modules your application consists of (you can e.g. do this by using a JAR Manifest or - more practically - use Tapestry's @SubModule annotation that marks a module requiring another one).

This is pretty KISS - Especially when you think of the fact that we already almost reached our vision! Actually the only thing that's missing right now is constraining you and your team. I always divide modules into three different areas:
  • The published area that contains service interfaces and any types the module wants to publish to the outside world.
  • The internal area where the service implementation and any module internal stuff resides
  • The module area where the glue code resides

The areas are built by package conventions:
  • Everything in module.package.internal.* has internal area scope
  • Everything in module.package.module.* has module area scope
  • Everything else has published area scope

The following minimum rules apply:
  • Everyone can access the published area
  • Only the module itself can access the internal area
  • Module areas can access module areas of other modules

In many cases you will even want to introduce one additional rule:
  • Published areas may only access other published areas

This is of special interest if you plan to exchange serialized instances of your objects. In such a case you need to take care that the receiving part can deserialize the object (and thus needs to know the class). You can enforce this by tagging your data carrier objects as final and introducing the above rule. In addition, the last rule allows you to extract the published API at any time without introducing dependencies to internal stuff!

It's pretty easy to enforce this rules by adding tools like Classycle to your build chain. The following Classycle definition file checks the mentioned rules (including the published may only access published rule):


[allModule] = blogger.*.module.*
[allInternal] = blogger.*.internal.*
[allPublished] = blogger.* excluding [allModule] [allInternal]

[moduleModule] = blogger.dataaccess.module.*
[moduleInternal] = blogger.dataaccess.internal.*
[modulePublished] = blogger.dataaccess.* excluding [moduleModule] [moduleInternal]

[allInternalWithoutModuleInternal] = [allInternal] excluding [moduleInternal]

check [modulePublished] independentOf [allModule] [allInternal]
check [moduleModule] independentOf [allInternalWithoutModuleInternal]
check [moduleInternal] independentOf [allInternalWithoutModuleInternal] [allModule]


Unit-testing of modules is easy. Integration testing is easy too - because you only need to load the module that you want to test and the modules this module depends on. Basically an integration test is just a minimum application assemblation. If you made use of Tapestry's @SubModule facilities or if you tell Tapestry to just load all modules that are on the classpath this minimum application assemblation minimizes to just the specification of the module under test! Whenever it comes to unit and integration testing I strongly recommend the usage of Unitils (http:///www.unitils.org) - an excellent framework that supports many best practices in unit- and integration testing. I've written a simple Tapestry IOC extension for Unitils that allows us to specify the module(s) under test and injects services into our test. Here's an example integration test:


@RunWith(UnitilsJUnit4TestRunner.class)
@TapestryModule(BlogBrowsingModule.class)
public void BlogBrowsingIntegrationTest {

@TapestryService
public BlogBrowser browser;

@Test
public void testFindBlogEntriesForPeterRietzler() {
...
}
}


That's integration testing brought down to the simplicity of a Unit-Test. You can additionally use all the Unitils stuff to setup your database for the test etc...

That's all folks! Just to summarize - with just a few lines of code (it really is NOT more) we've built an extremely modular and flexible architecture bringing much of the stuff that OSGi or the upcoming JSR294 provide. More - We've brought integration testing to an almost unit-testing level (from the aspect of complexity) and we have an application that can be assembled together out of its constituent parts in many different ways. And please don't tell me that's not KISS - I can't think of a simpler way to do it!

This article was written after the above concept was applied to a real-life project (not a blogger :o). The project currently consists of ~30 distinct modules and is expected to grow much further than that. I've worked for projects that follow the same principles (altough they often missed the KISS described in this article) with more than 100 modules and I know of other applications with more then 800 distinct modules.

The sources for this article contain everything that you need to get started with the concept in real-world projects (everything is still KISS but with some minor enhancements to help you with everyday tasks...):
  • Some module examples
  • The Unitils extensions
  • An integration testing example (showing how easy integration testing gets and demonstrating some application assembling options)
  • The maven POM's including the Classycle dependency integration