Maven Configuration required for kettle
<properties>
<pentaho.kettle.version>4.0.1-GA</pentaho.kettle.version>
</properties>
<dependency>
<groupId>pentaho.kettle</groupId>
<artifactId>kettle-core</artifactId>
<version>${pentaho.kettle.version}</version>
</dependency>
<dependency>
<groupId>pentaho.kettle</groupId>
<artifactId>kettle-db</artifactId>
<version>${pentaho.kettle.version}</version>
</dependency>
<dependency>
<groupId>pentaho.kettle</groupId>
<artifactId>kettle-engine</artifactId>
<version>${pentaho.kettle.version}</version>
</dependency>
<dependency>
<groupId>pentaho.kettle</groupId>
<artifactId>kettle-ui-swt</artifactId>
<version>${pentaho.kettle.version}</version>
</dependency>
<dependency>
<groupId>pentaho-library</groupId>
<artifactId>libformula</artifactId>
<version>1.1.7</version>
</dependency>
<dependency>
<groupId>org.codehaus.janino</groupId>
<artifactId>janino</artifactId>
<version>2.5.16</version>
</dependency>
<dependency>
<groupId>rhino</groupId>
<artifactId>js</artifactId>
<version>1.7R2</version>
</dependency>
<dependency>
<groupId>javax.mail</groupId>
<artifactId>mail</artifactId>
<version>1.4.1</version>
</dependency>
<!-- Assuming the tranformation is stored in MySql database-->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.17</version>
</dependency>
Steps for running the Kettle transformation from Java code
Assumption the transformation are stored in the database repository
KettleEnvironment.init();
DatabaseMeta databaseMeta = new DatabaseMeta(
type, type, "", host, databaseName, port, userName, password );
type --> Data base type like "MySql"
host --> database server name
databaseName --> Schema name
port --> Database port e.g 3306 for MySql
userName, Password --> To connect to the given database
repositoryId, repositoryName, "Transformation description", databaseMeta );
repositoryId --> Repository Id
repositoryName --> Repository name
databaseMeta --> DatabaseMeta defined above
repository.init( kettleDatabaseMeta );
repository.connect( repUserName, repPassword );
RepositoryDirectoryInterface directory = repository.loadRepositoryDirectoryTree();
transformationName, directory, null, true, null ) ;
Trans trans = new Trans( transformationMeta );
trans.setParameterValue( parameterName, parameterValue);
trans.execute( null ); // You can pass arguments instead of null.
trans.waitUntilFinished();
//Errors running transformation.
}else{
//Transformation run successfully.
}
<properties>
<pentaho.kettle.version>4.0.1-GA</pentaho.kettle.version>
</properties>
<dependency>
<groupId>pentaho.kettle</groupId>
<artifactId>kettle-core</artifactId>
<version>${pentaho.kettle.version}</version>
</dependency>
<dependency>
<groupId>pentaho.kettle</groupId>
<artifactId>kettle-db</artifactId>
<version>${pentaho.kettle.version}</version>
</dependency>
<dependency>
<groupId>pentaho.kettle</groupId>
<artifactId>kettle-engine</artifactId>
<version>${pentaho.kettle.version}</version>
</dependency>
<dependency>
<groupId>pentaho.kettle</groupId>
<artifactId>kettle-ui-swt</artifactId>
<version>${pentaho.kettle.version}</version>
</dependency>
<dependency>
<groupId>pentaho-library</groupId>
<artifactId>libformula</artifactId>
<version>1.1.7</version>
</dependency>
<dependency>
<groupId>org.codehaus.janino</groupId>
<artifactId>janino</artifactId>
<version>2.5.16</version>
</dependency>
<dependency>
<groupId>rhino</groupId>
<artifactId>js</artifactId>
<version>1.7R2</version>
</dependency>
<dependency>
<groupId>javax.mail</groupId>
<artifactId>mail</artifactId>
<version>1.4.1</version>
</dependency>
<!-- Assuming the tranformation is stored in MySql database-->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.17</version>
</dependency>
Steps for running the Kettle transformation from Java code
Assumption the transformation are stored in the database repository
- Initialize the kettle environment
KettleEnvironment.init();
- Create a instance of kettle database repository
- Create a instance of database meta
DatabaseMeta databaseMeta = new DatabaseMeta(
type, type, "", host, databaseName, port, userName, password );
type --> Data base type like "MySql"
host --> database server name
databaseName --> Schema name
port --> Database port e.g 3306 for MySql
userName, Password --> To connect to the given database
- Create a instance of kettle repository meta
repositoryId, repositoryName, "Transformation description", databaseMeta );
repositoryId --> Repository Id
repositoryName --> Repository name
databaseMeta --> DatabaseMeta defined above
- Initialize the repository
repository.init( kettleDatabaseMeta );
- Connect to the repository default user name password is "admin", "admin"
repository.connect( repUserName, repPassword );
- Load the root directory
RepositoryDirectoryInterface directory = repository.loadRepositoryDirectoryTree();
- Read the saved transformation meta data using the directory and the transformation name
transformationName, directory, null, true, null ) ;
- Create a new transformation object using the transformation meta
Trans trans = new Trans( transformationMeta );
- Set any transformation parameters
trans.setParameterValue( parameterName, parameterValue);
- Execute the transformation
trans.execute( null ); // You can pass arguments instead of null.
- Wait till the transformation finish
trans.waitUntilFinished();
- Check for any errors during the execution
//Errors running transformation.
}else{
//Transformation run successfully.
}
This was a really good example for kettle.
ReplyDeleteThanks a lot!!!
Hi Ameeth.
ReplyDeleteFirst of thank you for creating this post as the example on Pentaho wiki isn't working with kettle 4
Can you provide some more steps for creating HelloWorld kind of example for Kettle.
The example you provided are the steps, I am looking for the Java code.
Thanks in Advance.
Regards
Abhijeet
Hi Abhijeet please check below post for the Hello World kind of example
ReplyDeletehttp://ameethpaatil.blogspot.com/2011/08/hello-world-with-pentaho.html
thank you Ameeth for posting this example.
ReplyDeleteHow do you select the database driver? DatabaseMeta tries to load it as a plugin how is this configured via this maven example? Thanks for the simple example.
ReplyDeleteIf the transformation database is say MySql then we need to add below dependency in maven POM file
ReplyDelete<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.17</version>
</dependency>
and the driver is picked based on type in DatabaseMeta. If the kettle transformation is accessing the data from different database other than mysql then that driver also needs to be added to the dependency.
Thanks for pointing this will update the article to include the MySQL dependency
Hi Ameeth,
ReplyDeleteThank you for this post.
While trying to run a transformation from java the following exception occured. I am using a MySql DB.I am looking for a solution.
Thanks in Advance.
Caused by: java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:169)
Check your settings its trying to refer the oracle driver. "Caused by: java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver".
ReplyDeletehelpme
ReplyDeleteDonde descargo la API?
Hi,
ReplyDeleteuseful post.
I'm using it for embedding Kettle 4.2.1 and I wish to point out some differences in Maven configuration.
groupId of Kettle artifacts has changed to "pentaho-kettle" (was "pentaho.kettle")
I will also add the not-so-obvious note that Kettle Maven repo is at http://repository.pentaho.org/artifactory/pentaho and should be added as custom project repository.
Ameeth,
DeleteGreat Post , thank you.
Got a follow up question. Suppose my kettle throws an exception, something as trivial as a javascript error, I would like that error to show up in my app logs in the calling Java program ,I see them in the Sys.out not in the app log. Can you think of anything I am missing?
Thanks in advance.
I have also faced the same issue. Kettle libraries uses wrapper over the standard logging frameworks so its bit difficult to configure this. By default its been redirected to to console. Need to check if this has been addressed in the latest versions of kettle jars.
Deleteam loading .ktr file created by the data integration tool
ReplyDeletewhen will run the application ma getting this exception:
Unable to load step info from XML step nodeorg.pentaho.di.core.exception.KettleStepLoaderException:
Unable to load class for step/plugin with id [MongoDbInput]. Check if the plugin is available in the plugins subdirectory of the Kettle distribution.
Unable to load class for step/plugin with id [MongoDbInput]. Check if the plugin is available in the plugins subdirectory of the Kettle distribution.
some one suggested me like:
Please check if you are adding VM argument while running:
-DKETTLE_PLUGIN_BASE_FOLDERS=D:/LOCATION/data-integration/plugins
where i need to add the pulgin:
Ameeth,
ReplyDeleteis it mandatory to have maven dependency. i am running an eclipse project and added all the dependent jars to the project. now i added all the meta information like this:
DatabaseMeta databaseMeta = new DatabaseMeta("SQLSERVER");
databaseMeta.setHostname("5346ZH1");
databaseMeta.setDBPort("1433");
databaseMeta.setDBName("KettleRep");
databaseMeta.setUsername("____");
databaseMeta.setPassword("____");
when i try to run the program, i get the exception:
[Fatal Error] :1:1: Content is not allowed in prolog.
Exception in thread "main" org.pentaho.di.core.exception.KettleXMLException:
Error reading information from XML string :
SQLSERVER
Content is not allowed in prolog.
i tried passing MSQL too, but it didn't work. so how to pass the datatype as sqlserver without running into exceptions.
Doesn't need to be maven project.
DeleteFor SQLServer try with "MSSQLServer" not sure
I am trying to connect to a SQL Server database, but am getting the error
ReplyDeleteException in thread "main" java.lang.RuntimeException: Database type not found!
Can you please provide an example of the databaesMeta line that should be used for SQL Server databases.
Thanks
Try with "MSSQLServer"
DeleteHey Ameeth. I want to ask some questions.
ReplyDeleteis the repository necessary? what if I don't have one?
I want to connect my transformation to my PostgreSQL database. But i don't have repository yet, do I have to make it first?
and one more thing, have you tried to run transformation involving noSQL database? can you help me on this too?
Thanks
Repository is not necessary. Check the simple example of pentaho transformation running from java http://ameethpaatil.blogspot.in/2011/08/hello-world-with-pentaho.html.
DeleteI have not tried noSQL but latest version does support noSQL integration
Exception in thread "Thread-10" java.lang.NoClassDefFoundError: org/pentaho/metastore/api/IMetaStore
ReplyDeleteat atrix.ace.service.JobExecutorServiceImpl.runJobKettleTransformation(JobExecutorServiceImpl.java:183)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:318)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:196)
at com.sun.proxy.$Proxy39.runJobKettleTransformation(Unknown Source)
at atrix.ace.service.Tasklet.run(Tasklet.java:49)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.pentaho.metastore.api.IMetaStore
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1284)
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1132)
are you trying this example with verison 4.0.1-GA?
Delete5.0.1
DeleteThis example works with 4.0.1-GA. If you are using the 5.0.1 version then you may need to include some more extra dependencies. The exception is thrown for org/pentaho/metastore/api/IMetaStore class which is present in pentaho metastore jar so you may need to include the below dependency
Deletepentaho
metastore
5.0.1
Hi Ameeth
ReplyDeleteI am trying to run a .ktr file which has a hadoop output step through java and I get the following error. The same ktr runs fine when I run from the Spoon. How do I ensure that it runs fine from my java code as well ?
2015/11/18 13:47:23 - Property_Validation - ERROR (version 5.3.0.0-200, build 1 from 2015-01-20_19-50-27 by buildguy) : org.pentaho.di.core.exception.KettleException:
2015/11/18 13:47:23 - Property_Validation - Unexpected error during transformation metadata load
2015/11/18 13:47:23 - Property_Validation -
2015/11/18 13:47:23 - Property_Validation - Missing plugins found while loading a transformation
2015/11/18 13:47:23 - Property_Validation -
2015/11/18 13:47:23 - Property_Validation - Step : HadoopFileOutputPlugin
2015/11/18 13:47:23 - Property_Validation -
2015/11/18 13:47:23 - Property_Validation - at org.pentaho.di.job.entries.trans.JobEntryTrans.getTransMeta(JobEntryTrans.java:1205)
2015/11/18 13:47:23 - Property_Validation - at org.pentaho.di.job.entries.trans.JobEntryTrans.execute(JobEntryTrans.java:648)
2015/11/18 13:47:23 - Property_Validation - at org.pentaho.di.job.Job.execute(Job.java:716)
2015/11/18 13:47:23 - Property_Validation - at org.pentaho.di.job.Job.access$000(Job.java:115)
2015/11/18 13:47:23 - Property_Validation - at org.pentaho.di.job.Job$1.run(Job.java:835)
2015/11/18 13:47:23 - Property_Validation - at java.lang.Thread.run(Thread.java:745)
2015/11/18 13:47:23 - Property_Validation - Caused by: org.pentaho.di.core.exception.KettleMissingPluginsException:
2015/11/18 13:47:23 - Property_Validation - Missing plugins found while loading a transformation
2015/11/18 13:47:23 - Property_Validation -
I have already added the below big data plugin dependency into my pom.xml file but still I get the above error.
pentaho
pentaho-big-data-plugin
5.2.1.0-148
Can you please help ?
Thanks
Koushik
Hi,
ReplyDeleteI am able to execute ktr file successfully but I want to know the audit log like how many record has inserted or process or other information. Could you please help me to get that audit log.
Hi Ameeth,
ReplyDeleteI created ktr file using spoon, insdie ktr need to know how to get excel sheet dynamically.
if you want sure I will send my ktr.
My Scenario:
Bulkupload(Excel) to db using pentaho.
This is very urgent can you help me?
Thanks,
Vincent Louis.
Excel file from (jsp, spring) java application.
DeleteGetting Bellow Error:
DeleteERROR 25-01 10:16:14,965 - DMC Usage Data DB.0 - org.pentaho.di.core.exception.KettleDatabaseBatchException:
Error updating batch
Batch entry 0 INSERT INTO dmcusage (id, report_date, file_download, ui_application, collection_id, asset_id, user_id, page_views, user_country, collection_type, design_element, isasset, version_flag, ui_application_flag, collection_name, user_region_name, report_year_segments, segments, season, artwork_property, published_date, collection_region, collection_lob, org_name, asset_lob_owner, dcp_nondcp, internal_external, ldap_pin, property_flag) VALUES ( NULL, '2015-11-1 0:0:0.000000 +5:30:0', 'CC247_1.pdf', 'DMC Version 4.70.49', NULL, NULL, 'WRIGE029', 0, 'Australia', NULL, NULL, 'No', 1, 'M', NULL, 'North America', NULL, NULL, NULL, NULL, NULL, NULL, NULL, 'null', NULL, 'DCP', 'INTERNAL', 'null', NULL) was aborted. Call getNextException to see the cause.
at org.pentaho.di.core.database.Database.emptyAndCommit(Database.java:1403)
at org.pentaho.di.trans.steps.tableoutput.TableOutput.dispose(TableOutput.java:611)
at org.pentaho.di.trans.step.BaseStep.runStepThread(BaseStep.java:2918)
at org.pentaho.di.trans.steps.tableoutput.TableOutput.run(TableOutput.java:680)
Caused by: java.sql.BatchUpdateException: Batch entry 0 INSERT INTO dmcusage (id, report_date, file_download, ui_application, collection_id, asset_id, user_id, page_views, user_country, collection_type, design_element, isasset, version_flag, ui_application_flag, collection_name, user_region_name, report_year_segments, segments, season, artwork_property, published_date, collection_region, collection_lob, org_name, asset_lob_owner, dcp_nondcp, internal_external, ldap_pin, property_flag) VALUES ( NULL, '2015-11-1 0:0:0.000000 +5:30:0', 'CC247_1.pdf', 'DMC Version 4.70.49', NULL, NULL, 'WRIGE029', 0, 'Australia', NULL, NULL, 'No', 1, 'M', NULL, 'North America', NULL, NULL, NULL, NULL, NULL, NULL, NULL, 'null', NULL, 'DCP', 'INTERNAL', 'null', NULL) was aborted. Call getNextException to see the cause.
at org.postgresql.jdbc.PgStatement$BatchResultHandler.handleError(PgStatement.java:2356)
at org.postgresql.core.v3.QueryExecutorImpl$1.handleError(QueryExecutorImpl.java:395)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1912)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:338)
at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:2534)
at org.pentaho.di.core.database.Database.emptyAndCommit(Database.java:1387)
... 3 more
Not sure what this error can be but check it can be database constraint error as I can see ID column and insert statement has null.
DeleteHello
ReplyDeleteif I need to send several value of the same parameters
how can I do this
Set any transformation parameters
trans.setParameterValue( parameterName, parameterValue);
or
trans.setVariable("parameterName", parameterValue);
as it would be the correct sentence since the mentioned ones do not work to me I can only send 1 single value per parameter
Thanks!!!!
you can create a CSV files for list of data and send the file name as parameter. You can use the CSV file input to read from the file in your transformation.
Deletethanks for answering
DeleteI tried to do this
but he tells me the following error
ERROR 26-10 08:25:56,957 - Text file input - Error initializing step [Text file input]
ERROR 26-10 08:25:56,965 - Text file input - java.lang.NoSuchMethodError: org.pentaho.di.core.fileinput.FileInputList.createFileList(Lorg/pentaho/di/core/variables/VariableSpace;[Ljava/lang/String;[Ljava/lang/String;[Ljava/lang/String;[Ljava/lang/String;[Z)Lorg/pentaho/di/core/fileinput/FileInputList;
org.pentaho.di.trans.steps.textfileinput.TextFileInputMeta.getTextFileList(TextFileInputMeta.java:1610)
org.pentaho.di.trans.steps.textfileinput.TextFileInput.init(TextFileInput.java:1612)
org.pentaho.di.trans.step.StepInitThread.run(StepInitThread.java:62)
java.lang.Thread.run(Thread.java:745)
ERROR 26-10 08:25:57,405 - Lecturas_Logica - Step [Text file input.0] failed to initialize!
Respect and that i have a swell offer you: Whole House Renovation Cost Calculator home renovation designers
ReplyDelete