Tuesday, November 30, 2010

Getting started with Apache Mahout

Recently I have got an interesting problem to solve: how to classify text from different sources using automation? Some time ago I read about a project which does this as well as many other text analysis stuff - Apache Mahout. Though it's not a very mature one (current version is 0.4), it's very powerful and scalable. Build on top of another excellent project, Apache Hadoop, it's capable to analyze huge data sets.

So I did a small project in order to understand how Apache Mahout works. I decided to use Apache Maven 2 in order to manage all dependencies so I will start with POM file first.


  4.0.0
  org.acme
  mahout
  0.94
  Mahout Examples
  Scalable machine learning library examples
  jar

  
    UTF-8
    0.4
  
 
  
    
      
        org.apache.maven.plugins
        maven-compiler-plugin
        
          UTF-8
          1.6
          1.6
          true
        
      
    
  

  
    
      org.apache.mahout
      mahout-core
      ${apache.mahout.version}
    

    
      org.apache.mahout
      mahout-math
      ${apache.mahout.version}
    

    
      org.apache.mahout
      mahout-utils
      ${apache.mahout.version}
    


     
      org.slf4j
      slf4j-api
      1.6.0
    

    
      org.slf4j
      slf4j-jcl
      1.6.0
    
  

Then I looked into Apache Mahout examples and algorithms available for text classification problem. The most simple and accurate one is Naive Bayes classifier. Here is a code snippet:
package org.acme;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.FileReader;
import java.util.List;

import org.apache.hadoop.fs.Path;
import org.apache.mahout.classifier.ClassifierResult;
import org.apache.mahout.classifier.bayes.TrainClassifier;
import org.apache.mahout.classifier.bayes.algorithm.BayesAlgorithm;
import org.apache.mahout.classifier.bayes.common.BayesParameters;
import org.apache.mahout.classifier.bayes.datastore.InMemoryBayesDatastore;
import org.apache.mahout.classifier.bayes.exceptions.InvalidDatastoreException;
import org.apache.mahout.classifier.bayes.interfaces.Algorithm;
import org.apache.mahout.classifier.bayes.interfaces.Datastore;
import org.apache.mahout.classifier.bayes.model.ClassifierContext;
import org.apache.mahout.common.nlp.NGrams;

public class Starter {
 public static void main( final String[] args ) {
  final BayesParameters params = new BayesParameters();
  params.setGramSize( 1 );
  params.set( "verbose", "true" );
  params.set( "classifierType", "bayes" );
  params.set( "defaultCat", "OTHER" );
  params.set( "encoding", "UTF-8" );
  params.set( "alpha_i", "1.0" );
  params.set( "dataSource", "hdfs" );
  params.set( "basePath", "/tmp/output" );
  
  try {
      Path input = new Path( "/tmp/input" );
      TrainClassifier.trainNaiveBayes( input, "/tmp/output", params );
   
      Algorithm algorithm = new BayesAlgorithm();
      Datastore datastore = new InMemoryBayesDatastore( params );
      ClassifierContext classifier = new ClassifierContext( algorithm, datastore );
      classifier.initialize();
      
      final BufferedReader reader = new BufferedReader( new FileReader( args[ 0 ] ) );
      String entry = reader.readLine();
      
      while( entry != null ) {
          List< String > document = new NGrams( entry, 
                          Integer.parseInt( params.get( "gramSize" ) ) )
                          .generateNGramsWithoutLabel();

          ClassifierResult result = classifier.classifyDocument( 
                           document.toArray( new String[ document.size() ] ), 
                           params.get( "defaultCat" ) );          

          entry = reader.readLine();
      }
  } catch( final IOException ex ) {
   ex.printStackTrace();
  } catch( final InvalidDatastoreException ex ) {
   ex.printStackTrace();
  }
 }
}
There is one important note here: system must be taught before starting classification. In order to do so, it's necessary to provide examples (more - better) of different text classification. It should be simple files where each line starts with category separated by tab from text itself. F.e.:
SUGGESTION  That's a great suggestion
QUESTION  Do you sell Microsoft Office?
...
More files you can provide, more precise classification you will get. All files must be put to the '/tmp/input' folder, they will be processed by Apache Hadoop first. :)

Sunday, October 17, 2010

Injecting file dependency into Spring bean

Continuing to explore power of Spring Framework, I would like to explain how to inject file resource into the Spring bean using @Resource annotation.

First of all, let's start with beans definition file. We need basically to declare file dependency we would like to inject:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="
      http://www.springframework.org/schema/beans
      http://www.springframework.org/schema/beans/spring-beans-2.5.xsd">

   <context:annotation-config />   
   <context:component-scan base-package="org.example" />

   <bean id="source" class="org.springframework.core.io.ClassPathResource">
      <constructor-arg index="0" value="some.file.txt" />
   </bean>

</beans>

Having configuration part is ready, we can inject file to a bean.
package org.example;

import java.util.List;

import javax.annotation.Resource;

import org.apache.commons.io.IOUtils;
import org.springframework.beans.factory.InitializingBean;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.core.io.InputStreamSource;
import org.springframework.stereotype.Component;

@Component
public class SomeBean implements InitializingBean {   
 @Autowired @Resource( name = "source" ) private InputStreamSource source; 
 
 public SomeBean () {
 }
 
 @Override
 public void afterPropertiesSet() throws Exception {
     for( final String line: ( List< String > )IOUtils.readLines( source.getInputStream() ) ) {
                // do something here    
     }
 }
}
That's it :)
Pretty easy and powerful technique.

Sunday, June 6, 2010

Testing BlazeDS remote objects with soapUI

Testing never was an easy thing. I am following TDD approach for at least last 5-6 years and really excited about it. But for me, TDD is not only unit testing. It is whole set of testing techniques I find appropriate for particular project (unit tests, integration tests, performance tests, ...). Recently I discovered excellent tool - soapUI. It has a bunch of useful features but the one I would like to cover today is testing BlazeDS services using AMF protocol.

Before we start with code snippets, let's copy BlazeDS libraries to bin/ext folder of soapUI installation:
- commons-codec-1.3.jar
- commons-httpclient-3.0.1.jar
- commons-logging.jar
- flex-messaging-common.jar
- flex-messaging-core.jar
- flex-messaging-opt.jar
- flex-messaging-proxy.jar
- flex-messaging-remoting.jar

Among other very cool features, soapUI supports Groovy as a scripting language which is just awesome. So all my examples will be in Groovy. Let's start with necessary part: creating connection and aliasing services.
import flex.messaging.io.amf.ASObject;
import flex.messaging.io.amf.client.AMFConnection;
import flex.messaging.messages.CommandMessage;
import flex.messaging.util.Base64.Encoder;
import flex.messaging.messages.Message;
import flex.messaging.io.amf.ASObject;

def clientId = "soapUI." +  UUID.randomUUID().toString();
def amfConnection = new AMFConnection();
amfConnection.instantiateTypes = false

amfConnection.connect(  "http://localhost:8080/server/messagebroker/amf" );   
amfConnection.addAmfHeader( Message.FLEX_CLIENT_ID_HEADER, clientId );

// Create remote object aliases
amfConnection.registerAlias( "testService", "com.example.remoteobjects.TestFacade" );
Having connection established, we are ready to call service methods of any aliased remote objects. Here is a code snippet to call service method foo() which has no parameters.
// Calling service method without arguments
def result = amfConnection.call( "testService.foo" );
And here is a code snippet to call service method foo() which accepts one parameter of type Person.
// Calling service method with object as argument
def person = new ASObject( "com.example.Person" );
person["name"]= "John Smith" ;

result = amfConnection.call( "testService.foo", person );
There's one issue which I've omitted for a moment. If you have security enabled for channels, you must proceed with authentication before calling any services. It's quite simple to do:
def credentials = encodeToBase64( username ) + ":" + encodeToBase64( password );

CommandMessage c = new CommandMessage();
c.setHeader( Message.FLEX_CLIENT_ID_HEADER, clientId );
c.setOperation( CommandMessage.LOGIN_OPERATION );
c.setDestination( "auth" );
c.setBody( encodeToBase64( credentials ) );      
amfConnection.call( null, c );

def encodeToBase64( final byte[] bytes ) {
    Encoder encoder = new Encoder( bytes.length );
    encoder.encode( bytes );
    return encoder.drain();     
}
When we are done, let's be a good citizens and close connection:
amfConnection.close();
Again, if security for channels is enabled, do logout before closing connection:
CommandMessage c = new CommandMessage();  
c.setHeader( Message.FLEX_CLIENT_ID_HEADER, clientId );
c.setOperation( CommandMessage.LOGOUT_OPERATION );
c.setDestination( "auth" );   
amfConnection.call( null, c );
Having such a script, soapUI allows you to create load test based on it. It also support quite complicated scenarios with many scripts involved and parameters passed from one to another. There is very good blog which contains tons of very useful information how to use soapUI for different kind of testing.

Sunday, May 16, 2010

Integrating Spring Flex

Looking for better Adobe BlazeDS and Java platform integration, I would like to recommend one very useful project from SpringSource portfolio: Spring Flex (or Spring BlazeDS integration). It's pretty easy to start with and, moreover, you could integrate it with other projects like Spring Framework and Spring Security.

Let's start with simple configuration.
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
  xmlns:context="http://www.springframework.org/schema/context"
  xmlns:flex="http://www.springframework.org/schema/flex"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="
      http://www.springframework.org/schema/context
      http://www.springframework.org/schema/context/spring-context-2.5.xsd  
      http://www.springframework.org/schema/beans
      http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
      http://www.springframework.org/schema/flex
      http://www.springframework.org/schema/flex/spring-flex-1.0.xsd">

   <context:annotation-config />   
   <context:component-scan base-package="org.example.flex" />
   
   <flex:message-broker id="_messageBroker" services-config-path="/WEB-INF/flex/services-config.xml">
       <flex:message-service default-channels="default-amf, secure-amf" />     
   </flex:message-broker> 
  
</beans>
Basically, those few lines of code do all routine work to start Adobe BlazeDS MessageBroker servlet (to handle AMF protocol), publish your classes (annotated as @RemotingDestination) as remote objects to be accessible by Flex clients.

Adobe BlazeDS configuration, referenced here as /WEB-INF/flex/services-config.xml is pretty standard. It includes bare minimum enough to run simple application.
  • /WEB-INF/flex/services-config.xml
  • <?xml version="1.0" encoding="UTF-8"?>
    <services-config>
        <services>
            <service-include file-path="remoting-config.xml" />
            <service-include file-path="proxy-config.xml" />
            <service-include file-path="messaging-config.xml" />     
    
         <default-channels>
             <channel ref="default-amf"/>
         </default-channels>
        </services>
    
        <channels>
            <channel-definition id="default-amf" class="mx.messaging.channels.AMFChannel">
                <endpoint url="http://{server.name}:{server.port}/{context.root}/messagebroker/amf/" class="flex.messaging.endpoints.AMFEndpoint"/>
            </channel-definition>
    
            <channel-definition id="secure-amf" class="mx.messaging.channels.SecureAMFChannel">
                <endpoint url="https://{server.name}:9400/{context.root}/messagebroker/amfsecure/" class="flex.messaging.endpoints.SecureAMFEndpoint"/>
            </channel-definition>
        </channels>
    </services-config>
    
  • /WEB-INF/flex/messaging-config.xml
  • <?xml version="1.0" encoding="UTF-8"?>
    <service id="message-service" class="flex.messaging.services.MessageService">
        <adapters>
            <adapter-definition id="actionscript" class="flex.messaging.services.messaging.adapters.ActionScriptAdapter" default="true"/>
            <adapter-definition id="jms" class="flex.messaging.services.messaging.adapters.JMSAdapter" />
        </adapters>
    </service>
    
  • /WEB-INF/flex/remoting-config.xml
  • <?xml version="1.0" encoding="UTF-8"?>
    <service id="remoting-service" class="flex.messaging.services.RemotingService">
        <adapters>
            <adapter-definition id="java-object" class="flex.messaging.services.remoting.adapters.JavaAdapter" default="true"/>
        </adapters>
    </service>
    
  • /WEB-INF/flex/proxy-config.xml
  • <?xml version="1.0" encoding="UTF-8"?>
    <service id="proxy-service" class="flex.messaging.services.HTTPProxyService">
        <properties>
            <connection-manager>
                <max-total-connections>100</max-total-connections>
                <default-max-connections-per-host>2</default-max-connections-per-host>
            </connection-manager>
            <allow-lax-ssl>true</allow-lax-ssl>
        </properties>
    
        <adapters>
            <adapter-definition id="http-proxy" class="flex.messaging.services.http.HTTPProxyAdapter" default="true"/>
            <adapter-definition id="soap-proxy" class="flex.messaging.services.http.SOAPProxyAdapter"/>
        </adapters>
    
        <destination id="DefaultHTTP">
         <properties>
             <url>/{context.root}/default.jsp</url>
         </properties>
        </destination>
    </service>
    
Configuration part is done. Let's create a simple remote object class.
package org.example.flex;

import org.springframework.flex.remoting.RemotingDestination;
import org.springframework.stereotype.Service;

@Service
@RemotingDestination( value = "simpleService", channels = { "default-amf", "secure-amf" } )
public class SimpleService {
    public Boolean test() {
 return Boolean.TRUE;
    }
}
That's it! SimpleService is declared as simple POJO with @RemotingDestination annotation and will be discovered by Spring configuration and automatically published as remote object for "default-amf" and "secure-amf" channels.

Integrating Spring Security is again just a few configuration lines. Here is an example:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
  xmlns:security="http://www.springframework.org/schema/security"
  xmlns:context="http://www.springframework.org/schema/context"
  xmlns:flex="http://www.springframework.org/schema/flex"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="
 http://www.springframework.org/schema/context
 http://www.springframework.org/schema/context/spring-context-2.5.xsd  
 http://www.springframework.org/schema/beans
 http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
 http://www.springframework.org/schema/flex
 http://www.springframework.org/schema/flex/spring-flex-1.0.xsd
 http://www.springframework.org/schema/security 
 http://www.springframework.org/schema/security/spring-security-3.0.xsd">

    <context:annotation-config />   
    <context:component-scan base-package="org.example.flex" />

    <bean id="authenticationProvider" class="org.example.flex.CustomAuthenticationProvider" /> 

    <security:authentication-manager alias="authenticationManager">
        <security:authentication-provider ref="authenticationProvider" />  
    </security:authentication-manager> 
  
    <flex:message-broker id="_messageBroker" services-config-path="/WEB-INF/flex/services-config.xml">
       <flex:message-service default-channels="default-amf, secure-amf" />     
       <flex:secured authentication-manager="authenticationManager" />        
    </flex:message-broker>   
</beans>
Spring Flex also provides a bunch of interesting features such as exception translators. It worthwhile to look at this project if you are developing Flex applications with Adobe BlazeDS.

Monday, April 19, 2010

Using Maven 2 and Ant's XMLTask to modify XML files

When we are talking about software development, it's not only about writing a code (for sure, high-quality code). It's also about a bunch of supporting processes like automated building, testing, deployment, integration, ... In this blog I am trying to touch every aspect so this post starts a series of articles about building Java projects with Apache Maven 2. The Maven's web site has very good documentation so I will skip introductory part and concentrate on some practical issues which arrive quite often.

Suppose, you have XML configuration files and depending on build profile you have to modify some parameters (database server address, JMS endpoints, ...). How to do that with Apache Maven 2? Quite easy using ... Apache Ant integration for Apache Maven 2. Apache Ant has excellent and very powerful plug-in to work with XML files - XMLTask. Let us make use of it!

<profiles>
 <profile>      
  <id>testing</id>
   <build>
    <plugins>
     <plugin>
      <artifactId>maven-antrun-plugin</artifactId>
       <dependencies>
        <dependency>
         <groupId>com.oopsconsultancy</groupId>
         <artifactId>xmltask</artifactId>
         <version>1.14</version>
       </dependency>
      </dependencies>
      <executions>
       <execution>
        <phase>prepare-package</phase>
        <configuration>
         <tasks> 
           <echo message="Using testing configuration" />
            <taskdef name="xmltask"
             classname="com.oopsconsultancy.xmltask.ant.XmlTask"
             classpathref="maven.plugin.classpath"/>
            <xmltask 
             source="${project.basedir}/src/main/webapp/WEB-INF/web.xml" 
             dest="${project.build.directory}/web.xml" 
             preserveType="true">        
            <remove path="//*[@id='<some id here>']" />
           </xmltask>           
         </configuration>
        </executions>
       </execution>
     </plugin>
    </plugins>
   </build>
  </profile>
 </profiles>

What this simple fragment does: for testing builds, it will remove from web.xml all XML elements with id attribute <some id here>. Not very meaningful but gives the idea how it works. XMLTask could do mostly everything you need: insert/removed elements and XML fragments, insert/remove/modify attributes with values and properties, copy/cut/paste XML, and a lot more. I found it extremely useful.

Sunday, April 18, 2010

On the wave of RIA, Adobe Flex and Java

This post will be not very technical but I would like to share some of my experience related to Internet applications development.

It's quite a few years I have been involved into web applications development. I started from PHP, then moved to ASP.NET, then to JSF, then AJAX diluted all that stuff, and finally I moved to Adobe Flex. The trend is obvious: web applications must be as closed to desktop counterparts as possible. Adobe Flex is really cool, very coooool ... I didn't play with Microsoft Silverlight and JavaFX too much but it all about the same.

As more reach become web applications, more features are requested from them. For developers it's a whole new world to explore. My current project is built on top of Adobe Flex and Java. It worth-while to say that Adobe Flex and Java integrates very good via BlazeDS (opensource) or LCDS (commercial) bridges. SpringSource provides excellent support for Flex and BlazeDS development by means of Spring BlazeDS integration project.

What all this is about... Development of RIA on top of Java platform is a challenge which requires from developer to engage the whole new technology stack. It's something which couldn't be done using pure Java platform. JavaFX is coming, but too late. Will it be successful?

Nevertheless, I would like to encourage developers to consider Adobe Flex as part of your next web project. It's worthwhile the time you will spend on it.

Saturday, January 30, 2010

Testing servlets with Spring

So far I haven't had a need to test servlets within Spring framework environment. But the issue came up recently and I am going to share my experience with testing file upload servlet based on Apache FileUpload and Spring.

Let's start with a file upload servlet implementation. I will omit some unnecessary details and concentrate on two issues: get application context and retrieve/save file to disk.
public class FileUploadServlet extends HttpServlet { @Override public void doPost( HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { ApplicationContext appContext = WebApplicationContextUtils .getRequiredWebApplicationContext( getServletContext() ); // Get some beans here from application context ... DiskFileItemFactory factory = new DiskFileItemFactory(); ServletFileUpload upload = new ServletFileUpload( factory ); try { Iterator< ? > iter = upload.parseRequest( request ).iterator(); while( iter.hasNext() ) { FileItem item = ( FileItem )iter.next(); if( !item.isFormField() ) { // store items here ... } } response.setStatus( HttpServletResponse.SC_OK ); } catch( Exception e ) { response.setStatus( HttpServletResponse.SC_INTERNAL_SERVER_ERROR ); } finally { response.flushBuffer(); } } }
Servlet is ready. Let's develop test case to verify it. There are basically three steps:
  • create mock request (and response)
  • create servlet instance and pass Spring application context to it
  • wrap file into request and call servlet's post()
The code fragment below shows how easy it could be done using Spring testing scaffolding (thanks Spring team again).
public class UploadServlerTestCase extends AbstractJUnit4SpringContextTests { private byte[] buffer; @Before public void setUp() throws Exception { // Load file content from resource final InputStream in = getClass().getResourceAsStream( "test.pdf" ); buffer = new byte[ in.available() ]; in.read( buffer ); in.close(); } @Test public void testFileUpload() { // create mock servlet config and pass Spring application context to it StaticWebApplicationContext ctx = new StaticWebApplicationContext(); ctx.setParent( applicationContext ); MockServletConfig sc = new MockServletConfig(); sc.getServletContext().setAttribute( WebApplicationContext.ROOT_WEB_APPLICATION_CONTEXT_ATTRIBUTE, ctx ); // create mock request (and response) MockHttpServletRequest request = new MockHttpServletRequest( "POST", "http://localhost/" ); MockHttpServletResponse response = new MockHttpServletResponse(); // wrap file into request final ByteArrayOutputStream out = new ByteArrayOutputStream(); try { out.write( String.format( "-----1234\r\n" + "Content-Disposition: form-data; name=\"%s\"; filename=\"%s\"\r\n" + "Content-Type: %s\r\n" + "\r\n", "textField", "test.pdf", "application/pdf" ).getBytes() ); out.write( buffer ); out.write( new String( "\r\n-----1234" ).getBytes() ); out.flush(); request.setContentType( "multipart/form-data; boundary=---1234" ); request.setContent( out.toByteArray() ); } finally { out.close(); } // create servlet instance and call post() FileUploadServlet servlet = new FileUploadServlet(); servlet.init( sc ); servlet.doPost( request, response ); // do some checks to ensure file has been stored ... } }
Test case is ready. Depending on your uploads management strategy (disk, database, Amazon S3, ...), test case should be extended to ensure that file has been stored by upload servlet at proper location.