Friday, November 19, 2010

Running pentaho transformation stored in DB reposiotry from Java

Maven Configuration required for kettle

<properties>
        <pentaho.kettle.version>4.0.1-GA</pentaho.kettle.version>
</properties>


<dependency>
            <groupId>pentaho.kettle</groupId>
            <artifactId>kettle-core</artifactId>
            <version>${pentaho.kettle.version}</version>
        </dependency>
        <dependency>
            <groupId>pentaho.kettle</groupId>
            <artifactId>kettle-db</artifactId>
            <version>${pentaho.kettle.version}</version>
        </dependency>
        <dependency>
            <groupId>pentaho.kettle</groupId>
            <artifactId>kettle-engine</artifactId>
            <version>${pentaho.kettle.version}</version>
        </dependency>
        <dependency>
            <groupId>pentaho.kettle</groupId>
            <artifactId>kettle-ui-swt</artifactId>
            <version>${pentaho.kettle.version}</version>
        </dependency>
        <dependency>
            <groupId>pentaho-library</groupId>
            <artifactId>libformula</artifactId>
            <version>1.1.7</version>
            </dependency>
        <dependency>
            <groupId>org.codehaus.janino</groupId>
            <artifactId>janino</artifactId>
            <version>2.5.16</version>
        </dependency>
        <dependency>
            <groupId>rhino</groupId>
            <artifactId>js</artifactId>
            <version>1.7R2</version>
        </dependency>
        <dependency>
            <groupId>javax.mail</groupId>
            <artifactId>mail</artifactId>
            <version>1.4.1</version>
        </dependency>
  <!-- Assuming the tranformation is stored in MySql database-->
      <dependency>
            <groupId>mysql</groupId>
             <artifactId>mysql-connector-java</artifactId>
             <version>5.1.17</version>
      </dependency>


Steps for running the Kettle transformation from Java code

Assumption the transformation are stored in the database repository

  • Initialize the kettle environment

                     KettleEnvironment.init();

  • Create a instance of kettle database repository
                     KettleDatabaseRepository repository = new KettleDatabaseRepository();
  • Create a instance of database meta

                    DatabaseMeta databaseMeta = new DatabaseMeta(
      type, type, "", host, databaseName, port, userName, password );
                 
                      type --> Data base type like "MySql"  
                      host --> database server name
                      databaseName --> Schema name
                      port --> Database port e.g 3306 for MySql
                      userName, Password --> To connect to the given database 

  • Create a instance of kettle repository  meta
                      KettleDatabaseRepositoryMeta kettleDatabaseMeta = new KettleDatabaseRepositoryMeta(
      repositoryId, repositoryName, "Transformation description", databaseMeta );

                     repositoryId --> Repository Id
                     repositoryName --> Repository name
                     databaseMeta --> DatabaseMeta defined above
  • Initialize the repository

                     repository.init( kettleDatabaseMeta );
  • Connect to the repository default user name password is "admin", "admin" 

                     repository.connect( repUserName, repPassword );
  • Load the root directory

                     RepositoryDirectoryInterface directory = repository.loadRepositoryDirectoryTree();
  • Read the saved transformation  meta data using the directory and the transformation name
                     TransMeta transformationMeta = repository.loadTransformation(
        transformationName, directory, null, true, null ) ;
  • Create a new transformation object using the transformation meta

                     Trans trans = new Trans( transformationMeta );

  • Set any transformation parameters 

                      trans.setParameterValue( parameterName, parameterValue);

  • Execute the transformation

                     trans.execute( null ); // You can pass arguments instead of null.
  • Wait till the transformation finish

                     trans.waitUntilFinished();
  • Check for any errors during the execution                    
                     if ( trans.getErrors() > 0 ) {                    
                               //Errors running transformation.
                     }else{
                               //Transformation run successfully.
                     }

32 comments:

  1. This was a really good example for kettle.
    Thanks a lot!!!

    ReplyDelete
  2. Hi Ameeth.
    First of thank you for creating this post as the example on Pentaho wiki isn't working with kettle 4
    Can you provide some more steps for creating HelloWorld kind of example for Kettle.

    The example you provided are the steps, I am looking for the Java code.
    Thanks in Advance.

    Regards
    Abhijeet

    ReplyDelete
  3. Hi Abhijeet please check below post for the Hello World kind of example

    http://ameethpaatil.blogspot.com/2011/08/hello-world-with-pentaho.html

    ReplyDelete
  4. thank you Ameeth for posting this example.

    ReplyDelete
  5. How do you select the database driver? DatabaseMeta tries to load it as a plugin how is this configured via this maven example? Thanks for the simple example.

    ReplyDelete
  6. If the transformation database is say MySql then we need to add below dependency in maven POM file

    <dependency>
    <groupId>mysql</groupId>
    <artifactId>mysql-connector-java</artifactId>
    <version>5.1.17</version>
    </dependency>

    and the driver is picked based on type in DatabaseMeta. If the kettle transformation is accessing the data from different database other than mysql then that driver also needs to be added to the dependency.

    Thanks for pointing this will update the article to include the MySQL dependency

    ReplyDelete
  7. Hi Ameeth,
    Thank you for this post.

    While trying to run a transformation from java the following exception occured. I am using a MySql DB.I am looking for a solution.

    Thanks in Advance.

    Caused by: java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:169)

    ReplyDelete
  8. Check your settings its trying to refer the oracle driver. "Caused by: java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver".

    ReplyDelete
  9. Hi,
    useful post.
    I'm using it for embedding Kettle 4.2.1 and I wish to point out some differences in Maven configuration.

    groupId of Kettle artifacts has changed to "pentaho-kettle" (was "pentaho.kettle")

    I will also add the not-so-obvious note that Kettle Maven repo is at http://repository.pentaho.org/artifactory/pentaho and should be added as custom project repository.

    ReplyDelete
    Replies
    1. Ameeth,
      Great Post , thank you.
      Got a follow up question. Suppose my kettle throws an exception, something as trivial as a javascript error, I would like that error to show up in my app logs in the calling Java program ,I see them in the Sys.out not in the app log. Can you think of anything I am missing?

      Thanks in advance.

      Delete
    2. I have also faced the same issue. Kettle libraries uses wrapper over the standard logging frameworks so its bit difficult to configure this. By default its been redirected to to console. Need to check if this has been addressed in the latest versions of kettle jars.

      Delete
  10. am loading .ktr file created by the data integration tool

    when will run the application ma getting this exception:
    Unable to load step info from XML step nodeorg.pentaho.di.core.exception.KettleStepLoaderException:
    Unable to load class for step/plugin with id [MongoDbInput]. Check if the plugin is available in the plugins subdirectory of the Kettle distribution.


    Unable to load class for step/plugin with id [MongoDbInput]. Check if the plugin is available in the plugins subdirectory of the Kettle distribution.

    some one suggested me like:

    Please check if you are adding VM argument while running:

    -DKETTLE_PLUGIN_BASE_FOLDERS=D:/LOCATION/data-integration/plugins

    where i need to add the pulgin:

    ReplyDelete
  11. Ameeth,
    is it mandatory to have maven dependency. i am running an eclipse project and added all the dependent jars to the project. now i added all the meta information like this:

    DatabaseMeta databaseMeta = new DatabaseMeta("SQLSERVER");
    databaseMeta.setHostname("5346ZH1");
    databaseMeta.setDBPort("1433");
    databaseMeta.setDBName("KettleRep");
    databaseMeta.setUsername("____");
    databaseMeta.setPassword("____");


    when i try to run the program, i get the exception:
    [Fatal Error] :1:1: Content is not allowed in prolog.
    Exception in thread "main" org.pentaho.di.core.exception.KettleXMLException:
    Error reading information from XML string :
    SQLSERVER
    Content is not allowed in prolog.

    i tried passing MSQL too, but it didn't work. so how to pass the datatype as sqlserver without running into exceptions.

    ReplyDelete
    Replies
    1. Doesn't need to be maven project.

      For SQLServer try with "MSSQLServer" not sure

      Delete
  12. I am trying to connect to a SQL Server database, but am getting the error
    Exception in thread "main" java.lang.RuntimeException: Database type not found!

    Can you please provide an example of the databaesMeta line that should be used for SQL Server databases.

    Thanks

    ReplyDelete
  13. Hey Ameeth. I want to ask some questions.

    is the repository necessary? what if I don't have one?
    I want to connect my transformation to my PostgreSQL database. But i don't have repository yet, do I have to make it first?
    and one more thing, have you tried to run transformation involving noSQL database? can you help me on this too?

    Thanks

    ReplyDelete
    Replies
    1. Repository is not necessary. Check the simple example of pentaho transformation running from java http://ameethpaatil.blogspot.in/2011/08/hello-world-with-pentaho.html.
      I have not tried noSQL but latest version does support noSQL integration

      Delete
  14. Exception in thread "Thread-10" java.lang.NoClassDefFoundError: org/pentaho/metastore/api/IMetaStore
    at atrix.ace.service.JobExecutorServiceImpl.runJobKettleTransformation(JobExecutorServiceImpl.java:183)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:318)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:196)
    at com.sun.proxy.$Proxy39.runJobKettleTransformation(Unknown Source)
    at atrix.ace.service.Tasklet.run(Tasklet.java:49)
    at java.lang.Thread.run(Thread.java:745)
    Caused by: java.lang.ClassNotFoundException: org.pentaho.metastore.api.IMetaStore
    at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1284)
    at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1132)

    ReplyDelete
    Replies
    1. are you trying this example with verison 4.0.1-GA?

      Delete
    2. This example works with 4.0.1-GA. If you are using the 5.0.1 version then you may need to include some more extra dependencies. The exception is thrown for org/pentaho/metastore/api/IMetaStore class which is present in pentaho metastore jar so you may need to include the below dependency


      pentaho
      metastore
      5.0.1

      Delete
  15. Hi Ameeth

    I am trying to run a .ktr file which has a hadoop output step through java and I get the following error. The same ktr runs fine when I run from the Spoon. How do I ensure that it runs fine from my java code as well ?

    2015/11/18 13:47:23 - Property_Validation - ERROR (version 5.3.0.0-200, build 1 from 2015-01-20_19-50-27 by buildguy) : org.pentaho.di.core.exception.KettleException:
    2015/11/18 13:47:23 - Property_Validation - Unexpected error during transformation metadata load
    2015/11/18 13:47:23 - Property_Validation -
    2015/11/18 13:47:23 - Property_Validation - Missing plugins found while loading a transformation
    2015/11/18 13:47:23 - Property_Validation -
    2015/11/18 13:47:23 - Property_Validation - Step : HadoopFileOutputPlugin
    2015/11/18 13:47:23 - Property_Validation -
    2015/11/18 13:47:23 - Property_Validation - at org.pentaho.di.job.entries.trans.JobEntryTrans.getTransMeta(JobEntryTrans.java:1205)
    2015/11/18 13:47:23 - Property_Validation - at org.pentaho.di.job.entries.trans.JobEntryTrans.execute(JobEntryTrans.java:648)
    2015/11/18 13:47:23 - Property_Validation - at org.pentaho.di.job.Job.execute(Job.java:716)
    2015/11/18 13:47:23 - Property_Validation - at org.pentaho.di.job.Job.access$000(Job.java:115)
    2015/11/18 13:47:23 - Property_Validation - at org.pentaho.di.job.Job$1.run(Job.java:835)
    2015/11/18 13:47:23 - Property_Validation - at java.lang.Thread.run(Thread.java:745)
    2015/11/18 13:47:23 - Property_Validation - Caused by: org.pentaho.di.core.exception.KettleMissingPluginsException:
    2015/11/18 13:47:23 - Property_Validation - Missing plugins found while loading a transformation
    2015/11/18 13:47:23 - Property_Validation -

    I have already added the below big data plugin dependency into my pom.xml file but still I get the above error.


    pentaho
    pentaho-big-data-plugin
    5.2.1.0-148


    Can you please help ?

    Thanks
    Koushik

    ReplyDelete
  16. Hi,
    I am able to execute ktr file successfully but I want to know the audit log like how many record has inserted or process or other information. Could you please help me to get that audit log.

    ReplyDelete
  17. Hi Ameeth,

    I created ktr file using spoon, insdie ktr need to know how to get excel sheet dynamically.

    if you want sure I will send my ktr.

    My Scenario:

    Bulkupload(Excel) to db using pentaho.

    This is very urgent can you help me?


    Thanks,
    Vincent Louis.

    ReplyDelete
    Replies
    1. Excel file from (jsp, spring) java application.

      Delete
    2. Getting Bellow Error:

      ERROR 25-01 10:16:14,965 - DMC Usage Data DB.0 - org.pentaho.di.core.exception.KettleDatabaseBatchException:
      Error updating batch
      Batch entry 0 INSERT INTO dmcusage (id, report_date, file_download, ui_application, collection_id, asset_id, user_id, page_views, user_country, collection_type, design_element, isasset, version_flag, ui_application_flag, collection_name, user_region_name, report_year_segments, segments, season, artwork_property, published_date, collection_region, collection_lob, org_name, asset_lob_owner, dcp_nondcp, internal_external, ldap_pin, property_flag) VALUES ( NULL, '2015-11-1 0:0:0.000000 +5:30:0', 'CC247_1.pdf', 'DMC Version 4.70.49', NULL, NULL, 'WRIGE029', 0, 'Australia', NULL, NULL, 'No', 1, 'M', NULL, 'North America', NULL, NULL, NULL, NULL, NULL, NULL, NULL, 'null', NULL, 'DCP', 'INTERNAL', 'null', NULL) was aborted. Call getNextException to see the cause.

      at org.pentaho.di.core.database.Database.emptyAndCommit(Database.java:1403)
      at org.pentaho.di.trans.steps.tableoutput.TableOutput.dispose(TableOutput.java:611)
      at org.pentaho.di.trans.step.BaseStep.runStepThread(BaseStep.java:2918)
      at org.pentaho.di.trans.steps.tableoutput.TableOutput.run(TableOutput.java:680)
      Caused by: java.sql.BatchUpdateException: Batch entry 0 INSERT INTO dmcusage (id, report_date, file_download, ui_application, collection_id, asset_id, user_id, page_views, user_country, collection_type, design_element, isasset, version_flag, ui_application_flag, collection_name, user_region_name, report_year_segments, segments, season, artwork_property, published_date, collection_region, collection_lob, org_name, asset_lob_owner, dcp_nondcp, internal_external, ldap_pin, property_flag) VALUES ( NULL, '2015-11-1 0:0:0.000000 +5:30:0', 'CC247_1.pdf', 'DMC Version 4.70.49', NULL, NULL, 'WRIGE029', 0, 'Australia', NULL, NULL, 'No', 1, 'M', NULL, 'North America', NULL, NULL, NULL, NULL, NULL, NULL, NULL, 'null', NULL, 'DCP', 'INTERNAL', 'null', NULL) was aborted. Call getNextException to see the cause.
      at org.postgresql.jdbc.PgStatement$BatchResultHandler.handleError(PgStatement.java:2356)
      at org.postgresql.core.v3.QueryExecutorImpl$1.handleError(QueryExecutorImpl.java:395)
      at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1912)
      at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:338)
      at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:2534)
      at org.pentaho.di.core.database.Database.emptyAndCommit(Database.java:1387)
      ... 3 more

      Delete
    3. Not sure what this error can be but check it can be database constraint error as I can see ID column and insert statement has null.

      Delete
  18. Hello
    if I need to send several value of the same parameters
    how can I do this


    Set any transformation parameters

    trans.setParameterValue( parameterName, parameterValue);
    or
    trans.setVariable("parameterName", parameterValue);

    as it would be the correct sentence since the mentioned ones do not work to me I can only send 1 single value per parameter

    Thanks!!!!

    ReplyDelete
    Replies
    1. you can create a CSV files for list of data and send the file name as parameter. You can use the CSV file input to read from the file in your transformation.

      Delete
    2. thanks for answering

      I tried to do this
      but he tells me the following error

      ERROR 26-10 08:25:56,957 - Text file input - Error initializing step [Text file input]
      ERROR 26-10 08:25:56,965 - Text file input - java.lang.NoSuchMethodError: org.pentaho.di.core.fileinput.FileInputList.createFileList(Lorg/pentaho/di/core/variables/VariableSpace;[Ljava/lang/String;[Ljava/lang/String;[Ljava/lang/String;[Ljava/lang/String;[Z)Lorg/pentaho/di/core/fileinput/FileInputList;
      org.pentaho.di.trans.steps.textfileinput.TextFileInputMeta.getTextFileList(TextFileInputMeta.java:1610)
      org.pentaho.di.trans.steps.textfileinput.TextFileInput.init(TextFileInput.java:1612)
      org.pentaho.di.trans.step.StepInitThread.run(StepInitThread.java:62)
      java.lang.Thread.run(Thread.java:745)

      ERROR 26-10 08:25:57,405 - Lecturas_Logica - Step [Text file input.0] failed to initialize!

      Delete