Trifacta Developer Guide
Trifacta Developer Guide
Version: 6.0.2
Doc Build Date: 05/24/2019
Copyright © Trifacta Inc. 2019 - All Rights Reserved. CONFIDENTIAL
For third-party license information, please select About Trifacta from the User
menu.
1. Developer . . 7
1.1 User-Defined Functions . 10
1.1.1 Java UDFs . 12
1.2 Create Custom Data Types Using RegEx . 21
1.3 Command Line Interface . 25
1.3.1 CLI Migration to APIs . 26
1.3.2 Install CLI Tools . 48
1.3.3 CLI for Connections . 50
1.3.4 CLI for Jobs . 65
1.3.4.1 CLI Example - Parameterize Job Runs . 81
1.3.4.2 CLI Publishing Options File . 85
1.3.5 CLI for User Admin . 87
1.3.6 CLI Config File . 95
1.4 API Reference . 96
1.4.1 API Overview . 99
1.4.2 API Authentication 104
1.4.2.1 Manage API Access Tokens 108
1.4.3 API Endpoints 111
1.4.3.1 v4 Endpoints 113
1.4.3.1.1 API AccessTokens Create v4 118
1.4.3.1.2 API AccessTokens Delete v4 120
1.4.3.1.3 API AccessTokens Get List v4 121
1.4.3.1.4 API AccessTokens Get v4 123
1.4.3.1.5 API Connections Create DryRun v4 124
1.4.3.1.6 API Connections Create v4 127
1.4.3.1.7 API Connections Delete v4 133
1.4.3.1.8 API Connections Get List v4 134
1.4.3.1.9 API Connections Get Status v4 138
1.4.3.1.10 API Connections Get v4 139
1.4.3.1.11 API Connections Patch v4 143
1.4.3.1.12 API Connections Permissions Create User v4 144
1.4.3.1.13 API Connections Permissions Delete User v4 145
1.4.3.1.14 API Connections Permissions Get User v4 146
1.4.3.1.15 API Connections Vendors Get List v4 148
1.4.3.1.16 API Deployments Create v4 158
1.4.3.1.17 API Deployments Delete v4 160
1.4.3.1.18 API Deployments Get List v4 161
1.4.3.1.19 API Deployments Get Release List v4 162
1.4.3.1.20 API Deployments Get v4 165
1.4.3.1.21 API Deployments Object Import Rules Patch v4 166
1.4.3.1.22 API Deployments Patch v4 170
1.4.3.1.23 API Deployments Run v4 172
1.4.3.1.24 API Deployments Value Import Rules Patch v4 174
1.4.3.1.25 API EMRClusters Create v4 177
1.4.3.1.26 API EMRClusters Delete v4 179
1.4.3.1.27 API EMRClusters Get Count v4 180
1.4.3.1.28 API EMRClusters Get List v4 181
1.4.3.1.29 API EMRClusters Get v4 183
1.4.3.1.30 API EMRClusters Patch v4 185
1.4.3.1.31 API Flows Create v4 186
1.4.3.1.32 API Flows Delete v4 188
1.4.3.1.33 API Flows Get List v4 189
1.4.3.1.34 API Flows Get v4 192
1.4.3.1.35 API Flows Package Get DryRun v4 193
1.4.3.1.36 API Flows Package Get v4 194
1.4.3.1.37 API Flows Package Post DryRun v4 195
1.4.3.1.38 API Flows Package Post v4 197
Page #3
1.4.3.1.39 API Flows Patch v4 204
1.4.3.1.40 API ImportedDatasets Create v4 205
1.4.3.1.41 API ImportedDatasets Delete v4 215
1.4.3.1.42 API ImportedDatasets Get List v4 216
1.4.3.1.43 API ImportedDatasets Get v4 219
1.4.3.1.44 API ImportedDatasets Patch v4 228
1.4.3.1.45 API ImportedDatasets Post AddToFlow v4 230
1.4.3.1.46 API JobGroups Cancel v4 232
1.4.3.1.47 API JobGroups Create v4 233
1.4.3.1.48 API JobGroups Delete v4 237
1.4.3.1.49 API JobGroups Get Jobs v4 238
1.4.3.1.50 API JobGroups Get List v4 241
1.4.3.1.51 API JobGroups Get Publications v4 245
1.4.3.1.52 API JobGroups Get Status v4 248
1.4.3.1.53 API JobGroups Get v4 249
1.4.3.1.54 API JobGroups Put Publish v4 254
1.4.3.1.55 API OutputObjects Create v4 257
1.4.3.1.56 API OutputObjects Delete v4 260
1.4.3.1.57 API OutputObjects Get List v4 261
1.4.3.1.58 API OutputObjects Get v4 264
1.4.3.1.59 API OutputObjects Update v4 268
1.4.3.1.60 API People Create v4 270
1.4.3.1.61 API People Delete v4 272
1.4.3.1.62 API People Get List v4 273
1.4.3.1.63 API People Get v4 276
1.4.3.1.64 API People Patch v4 278
1.4.3.1.65 API Publications Create v4 280
1.4.3.1.66 API Publications Delete v4 282
1.4.3.1.67 API Publications Get List v4 283
1.4.3.1.68 API Publications Get v4 286
1.4.3.1.69 API Publications Update v4 288
1.4.3.1.70 API Releases Create DryRun v4 290
1.4.3.1.71 API Releases Create v4 298
1.4.3.1.72 API Releases Delete v4 305
1.4.3.1.73 API Releases Get v4 306
1.4.3.1.74 API Releases Package Get v4 308
1.4.3.1.75 API Releases Patch v4 309
1.4.3.1.76 API WrangledDatasets Create v4 311
1.4.3.1.77 API WrangledDatasets Delete v4 314
1.4.3.1.78 API WrangledDatasets Get List v4 315
1.4.3.1.79 API WrangledDatasets Get PrimaryInputDataset v4 319
1.4.3.1.80 API WrangledDatasets Get v4 323
1.4.3.1.81 API WrangledDatasets Patch v4 326
1.4.3.1.82 API WrangledDatasets Post AddToFlow v4 327
1.4.3.1.83 API WrangledDatasets Put PrimaryInputDataset v4 329
1.4.3.1.84 API WriteSettings Create v4 333
1.4.3.1.85 API WriteSettings Delete v4 335
1.4.3.1.86 API WriteSettings Get List v4 337
1.4.3.1.87 API WriteSettings Get v4 340
1.4.3.1.88 API WriteSettings Update v4 344
1.4.3.2 v3 Endpoints 345
1.4.3.2.1 API Connections Create v3 348
1.4.3.2.2 API Connections Delete v3 352
1.4.3.2.3 API Connections Get List v3 353
1.4.3.2.4 API Connections Get Status v3 356
1.4.3.2.5 API Connections Get v3 357
Page #4
1.4.3.2.6 API Deployments Create v3 361
1.4.3.2.7 API Deployments Delete v3 363
1.4.3.2.8 API Deployments Get List v3 364
1.4.3.2.9 API Deployments Get Release List v3 366
1.4.3.2.10 API Deployments Get v3 367
1.4.3.2.11 API Deployments Object Import Rules Patch v3 369
1.4.3.2.12 API Deployments Patch v3 372
1.4.3.2.13 API Deployments Run v3 373
1.4.3.2.14 API Deployments Value Import Rules Patch v3 376
1.4.3.2.15 API Flows Create v3 379
1.4.3.2.16 API Flows Delete v3 380
1.4.3.2.17 API Flows Get List v3 381
1.4.3.2.18 API Flows Get v3 384
1.4.3.2.19 API Flows Package Get DryRun v3 386
1.4.3.2.20 API Flows Package Get v3 387
1.4.3.2.21 API Flows Package Post DryRun v3 388
1.4.3.2.22 API Flows Package Post v3 390
1.4.3.2.23 API Flows Patch v3 391
1.4.3.2.24 API ImportedDatasets Create v3 393
1.4.3.2.25 API ImportedDatasets Delete v3 399
1.4.3.2.26 API ImportedDatasets Get List v3 400
1.4.3.2.27 API ImportedDatasets Get v3 404
1.4.3.2.28 API ImportedDatasets Post AddToFlow v3 409
1.4.3.2.29 API JobGroups Create v3 411
1.4.3.2.30 API JobGroups Delete v3 415
1.4.3.2.31 API JobGroups Get Jobs v3 416
1.4.3.2.32 API JobGroups Get List v3 419
1.4.3.2.33 API JobGroups Get Status v3 422
1.4.3.2.34 API JobGroups Get v3 423
1.4.3.2.35 API JobGroups Put Publish v3 427
1.4.3.2.36 API People Create v3 430
1.4.3.2.37 API People Delete v3 431
1.4.3.2.38 API People Get List v3 432
1.4.3.2.39 API People Get v3 434
1.4.3.2.40 API People Patch v3 436
1.4.3.2.41 API Releases Create DryRun v3 438
1.4.3.2.42 API Releases Create v3 439
1.4.3.2.43 API Releases Delete v3 441
1.4.3.2.44 API Releases Get v3 442
1.4.3.2.45 API Releases Package Get v3 444
1.4.3.2.46 API Releases Patch v3 445
1.4.3.2.47 API WrangledDatasets Create v3 447
1.4.3.2.48 API WrangledDatasets Delete v3 449
1.4.3.2.49 API WrangledDatasets Get List v3 450
1.4.3.2.50 API WrangledDatasets Get PrimaryInputDataset v3 454
1.4.3.2.51 API WrangledDatasets Get v3 456
1.4.3.2.52 API WrangledDatasets Put PrimaryInputDataset v3 459
1.4.3.3 API Session Get 461
1.4.4 API Version Support Matrix 464
1.4.4.1 API Migration to v4 464
1.4.5 API - UI Integrations 468
1.4.5.1 UI Integration - Create Dataset 469
1.4.6 API Workflows 471
1.4.6.1 API Workflow - Develop a Flow 471
1.4.6.2 API Workflow - Deploy a Flow 479
1.4.6.3 API Workflow - Run Job on Dataset with Parameters 493
Page #5
1.4.6.4 API Workflow - Publish Results 500
1.4.6.5 API Workflow - Manage Outputs 505
1.4.6.6 API Workflow - Swap Datasets 518
Page #6
Developer
This section contains topics of interest to data engineers and other developers.
Topics:
User-Defined Functions
Java UDFs
Create Custom Data Types Using RegEx
Command Line Interface
CLI Migration to APIs
Install CLI Tools
CLI for Connections
CLI for Jobs
CLI Example - Parameterize Job Runs
CLI Publishing Options File
CLI for User Admin
CLI Config File
API Reference
API Overview
API Authentication
Manage API Access Tokens
API Endpoints
v4 Endpoints
API AccessTokens Create v4
API AccessTokens Delete v4
API AccessTokens Get List v4
API AccessTokens Get v4
API Connections Create DryRun v4
API Connections Create v4
API Connections Delete v4
API Connections Get List v4
API Connections Get Status v4
API Connections Get v4
API Connections Patch v4
API Connections Permissions Create User v4
API Connections Permissions Delete User v4
API Connections Permissions Get User v4
API Connections Vendors Get List v4
API Deployments Create v4
API Deployments Delete v4
API Deployments Get List v4
API Deployments Get Release List v4
API Deployments Get v4
API Deployments Object Import Rules Patch v4
API Deployments Patch v4
API Deployments Run v4
API Deployments Value Import Rules Patch v4
API EMRClusters Create v4
API EMRClusters Delete v4
API EMRClusters Get Count v4
API EMRClusters Get List v4
API EMRClusters Get v4
API EMRClusters Patch v4
API Flows Create v4
User-Defined Functions
Contents:
UDF Service
Supported UDF Language Frameworks
Running a UDF within the Platform
The Trifacta® platform enables the creation of user-defined functions (UDFs) for use in your Trifacta deployment.
A user-defined function is a way to specify a custom process or transformation for use in your specific Trifacta
solution, using familiar development languages and third-party libraries. Through UDFs, you can apply enterprise-
or industry-specific expertise consistently into your data transformations. A user-defined function is a custom
function that is created in one of the supported language frameworks. Each user-defined function has a defined
set of inputs and generates a single output.
UDF Service
The following diagram provides a high-level overview of the UDF service which provides integration of
user-defined functions into recipe execution.
Diagram 1: The figure illustrates execution of a UDF in interactive mode, where a user interacts with the
Transformer grid.
Diagram 2: This feature illustrates how UDFs interact with Hadoop at job execution time.
Please use the following links to enable the creation of user-defined functions in the listed language.
Java UDFs
After you have created and tested your UDF, you can execute it by entering udf in the Search panel and
populating the rest of the step in the Transform Builder. In this example, the AdderUDF function is executed:
Notes:
The udf command causes the named UDF to run.
After you type name, your UDF should appear in a drop-down list. If not, please verify that it has been
properly created, compiled, and registered and that the udf-service has been restarted.
The col argument is a comma-separated list of the source data to be used as inputs to the exec method.
The args argument is a string of comma-separated values used as inputs to the init method.
Optionally, the as parameter can be used to provide a specific name to the generated column. If it is not
used, a column name is generated.
NOTE: When a recipe containing a user-defined function is applied to text data, any non-printing (control)
characters cause records to be truncated by the running environment during Hadoop job execution. In
these cases, please execute the job on the Trifacta Server.
This section describes how to create and deploy Java-based user-defined functions (UDFs) into your Trifacta®
deployment.
NOTE: If you are installing custom UDFs and the Trifacta node does not have an Internet connection, you
should download the Java UDF SDK in an Internet-accessible location, build your customer UDF JAR
there, and then upload the JAR to the Trifacta node.
Overview
Each UDF can take one or more inputs and produces a single output value (map only).
Inputs and outputs must be one of the following types:
Bool
String
Long
Double
Known Limitations
In the Trifacta application, previews are not available for user-defined functions.
Retaining state information across the exec method is unstable. More information is provided below.
NOTE: When a recipe containing a user-defined function is applied to text data, any null characters
cause records to be truncated by the running environment during Trifacta Server job execution. In
these cases, please execute the job on Hadoop.
Enable Service
You must enable the Java UDF service in the Trifacta platform.
Steps:
1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json.
For more information, see Platform Configuration Methods.
2. Enable the correct flag:
"feature.enableUDFTransform.enabled": true,
Deployment
Steps:
1. Unzip java-custom-udf-sdk.zip.
gradlew eclipse
Creating a UDF
UDF Requirements
All UDFs must implement the TrifactaUDF interface. This interface adds the four methods that each UDF must
override: init, exec, inputSchema, and finish.
1. init method: Used for setting private variables in the UDF. This method may be a no-op function if no
variables must be set. See the Example - Concatenate strings below.
Tip: In this method, perform your data validation on the input parameters, including count, data
type, and other constraints.
NOTE: The init method must be specified but can be empty, if there are no input parameters.
2. exec method: Contains functionality of the UDF. The output of the exec method must be one of the
supported types. It is also must match the generic as described. In the following example, TrifactaUDF<
String> implements a String. This method is run on each record.
Tip: In this method, you should check the number of input columns.
Keep state that varies across calls to the exec method can lead to unexpected behavior.
One-time initialization, such as initializing the regex compiler, is safe, but do not allow state
information to mutate across calls to exec. This is a known issue.
3. inputSchema method: The inputSchema method describes the schema of the list on which the exec
method is acting. The classes in the schema must be supported. Essentially, you should support the I/O
types described earlier.
4. finish method: The finish method is run at the end of UDF. Typically, it is a no-op.
NOTE: If you are executing your UDF on the Spark running environment, the finish method cannot
be invoked at this point. Instead, it is invoked as part of the shutdown of the Java VM. This later
execution may result in the finish method failing to be invoked in situations like a JVM crash.
The following code example concatenates two input strings in the List<Object>. This UDF can be easily
modified to concatenate more strings by modifying the inputSchema function.
/**
* Example UDF that concatenates two columns
*/
public class ConcatUDF implements TrifactaUDF<String> {
@Override
public String exec(List<Object> inputs) throws IOException {
if (inputs == null) {
return null;
}
StringBuilder sb = new StringBuilder();
for (int i = 0; i < inputSchema().length; i += 1) {
if (inputs.get(i) == null) {
return null;
}
sb.append(inputs.get(i));
}
return sb.toString();
}
@SuppressWarnings("rawtypes")
public Class[] inputSchema() {
return new Class[]{String.class, String.class};
}
@Override
public void finish() throws IOException {
}
@Override
public void init(List<Object> initArgs) {
}
}
Notes:
The first line indicates that the function is part of the com.trifacta.trifactaudfs package.
The defined UDF class implements the TrifactaUDF class, which is the base interface for UDFs.
It is parameterized with the return type of the UDF (a Java String in this case).
The input into the function is a list with input parameters in the order they are passed to the function
within the Trifacta platform. See Running Your UDF below.
The UDF checks the input data for null values, and if any nulls are detected, returns a null.
The inputSchema describes the input list passed into the exec method.
An error is thrown if the type of the data that is passed into the UDF does not match the schema.
The UDF must handle improper data. See Error Handling below.
In this example, the input value is added by a constant, which is defined in the init method.
/**
* Example UDF. Adds a constant amount to an Integer column.
*/
public class AdderUDF implements TrifactaUDF<Long> {
private Long _addAmount;
@Override
public void init(List<Object> initArgs) {
if (initArgs.size() != 1) {
System.out.println("AdderUDF takes in exactly one init argument");
}
Long addAmount = (Long) initArgs.get(0);
_addAmount = addAmount;
}
@Override
public Long exec(List<Object> input) {
if (input == null) {
return null;
}
if (input.size() != 1) {
return null;
}
return (Long) input.get(0) + _addAmount;
}
@SuppressWarnings("rawtypes")
public Class[] inputSchema() {
return new Class[]{Long.class};
}
@Override
public void finish() throws IOException {
}
}
Error Handling
The UDF must handle any error that should occur when processing the function. Two ways of dealing with errors:
1. For null data generated in the exec method, a null value can be returned. It appears in the final generated
column.
2. Any errors that cause the UDF to stop in the init or exec methods cause an IOException to be thrown. This
error signals the platform that an issue occurred with the UDF.
Tip: You can add to the Trifacta logs through Logger. Annotate your exceptions at the appropriate logging
level.
JUnit can be used to test the UDF. Below are examples of testing the two example UDFs.
Example - JUnit test for Concatenate strings:
ConcatUDF Test
@Test
public void concatUDFTest() throws IOException {
ConcatUDF concat = new ConcatUDF();
ArrayList<Object> input = new ArrayList<Object>();
input.add("hello");
input.add("world");
String result = concat.exec(input);
String expected = "helloworld";
assertEquals(expected, result);
}
AdderUDF Test
@Test
public void adderUDFTest() {
AdderUDF add = new AdderUDF();
ArrayList<Object> initArgs = new ArrayList<Object>(1);
initArgs.add(1L);
add.init(initArgs);
ArrayList<Object> inputs1 = new ArrayList<Object>();
inputs1.add(1L);
long result = add.exec(inputs1);
long expected = 2L;
assertEquals(expected, result);
After writing the UDF, it must be compiled and included in a JAR before registering it with the platform. To compile
and package the function, run the following command from the root directory:
The UDF code is assembled, and unit tests are executed. If all is well, the following JAR file is created in build/
libs.
NOTE: Custom UDFs should be compiled to one or more JAR files. Avoid using the example JAR
filename, which can be overwritten on upgrade.
To avoid an Unsupported major.minor version error during execution, the JDK version used to compile
the UDF JAR file should be less than or equal to the JDK version on the Hadoop cluster.
If this is not possible, then set the value of the Compatibility properties in the local build.gradle file to the
JDK version on the Hadoop cluster prior to building the JAR file.
Example:
If the Hadoop cluster is on JDK 1.8, then add the following to the build.gradle file:
targetCompatibility = '1.8'
sourceCompatibility = '1.8'
Example configuration:
To apply this configuration change, login as an administrator to the Trifacta node. Then, edit trifacta-conf.j
son. Some of these settings may not be available through the Admin Settings Page. For more information, see
Platform Configuration Methods.
Notes:
Set enableUDFTransform.enabled to true, which enables UDFs in general.
Under udf-service:
specify the full path to the JAR under additionalJars
append the paths of any extra JAR dependencies that your UDFs require under classpath
Steps:
After modifying the config, the udf-service needs to be restarted.
a. If you created a new UDF, restart the Trifacta application:
NOTE: For an existing UDF, you must rebuild the JAR first. Otherwise, the changes are not
recognized during service re-initialization.
2. As part of the restart, any newly added Java UDFs are registered with the application.
By default, the UDF service utilizes compression across the websockets when running on the cluster. HDInsight
clusters do not support compression on websockets.
To make sure the UDF service works on your HDInsight cluster, please do the following.
Steps:
1. To apply this configuration change, login as an administrator to the Trifacta node. Then, edit trifacta-c
onf.json. Some of these settings may not be available through the Admin Settings Page. For more
information, see Platform Configuration Methods.
2. Locate the udf-service configuration.
3. Insert the following extra property in the udf-service configuration area:
"udf-service": {
...
"jvmOptions":
["-Dorg.apache.tomcat.websocket.DISABLE_BUILTIN_EXTENSIONS=true"],
...
}
For more information on executing your UDF in the Transformer page, see User-Defined Functions.
Troubleshooting
If you execute a Java UDF, you may see an error similar to the following in the Transformer page:
When you check the udf.log file on the server, the following may be present:
The above issue is likely to be caused by the Photon running environment sending too much data through the
buffer of the UDF's Websocket service. By default, this buffer size is set to 1048576 bytes (1 MB).
The Photon running environment processes data through the Websocket service in 1024 (1 K) rows at a time for
the input and output columns of the UDF. If the data in the input columns to the UDF or output columns from the
UDF exceeds 1 KB (1024 characters) in total size for each row, the default size of the buffer is too small, since
Photon processed 1K records at a time (1 K characters * 1 K rows > 1048576). The query then fails.
When setting a new buffer size:
Assume that 1024 rows are processed from the buffer each time.
Identify the input columns and output columns for the UDF that is failing.
Identify the dataset that has the widest columns for both inputs and outputs here.
Tip: You can use the LEN function to do string-based computations of column width. See
LEN Function.
Perform the following estimate on the widest set of input and output columns that you are processing:
Estimate the total expected number of characters for the input columns of the UDF.
Add a 20% buffer to the above estimate.
Repeat the above estimate for the widest output columns for the UDF.
Set your buffer size to the larger of the two estimates (input columns' width or output columns'
width).
Example: A UDF takes two inputs and produces one output:
If each input column is 256 characters, then the size of 1K rows of input would be 256 bytes * 2
(input cols) * 1024 rows = 0.5 MB.
If the output of the UDF per row is estimated to be 1024 characters, then the output estimate would
be 1024 bytes * 1024 rows = 1MB.
So, set the buffer size to be 1 MB + 20% buffer over the larger estimate between input and output. In
this example, the buffer size should be 1.2 MB or 1258291 Bytes.
Steps:
1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json.
For more information, see Platform Configuration Methods.
2. Change the following setting:
"udf-service.outputBufferSize": 1048576,
On the server hosting the Trifacta platform, type definitions such as dictionaries and custom data types are stored
in the following directory:
/opt/trifacta/js-data/type-packs/trifacta
Before you begin creating custom data types, you should backup the type-packs/trifacta dire
ctory to a location outside of your Trifacta deployment.
NOTE: The trifacta-extras directory in the type-packs directory contains experimental custom
data types. These data types are not officially supported. Please use with caution.
Directory contents:
The dictionaries sub-directory contains user-defined dictionaries.
NOTE: Please use the user interface to interact with your dictionaries. See Custom Type Dialog.
The types sub-directory contains individual custom data type definitions, each in a separate file.
The manifest.json file contains a JSON manifest of all of the custom dictionaries and types in the
system.
Examples
Each custom data type is created and stored in a separate file. The following example file contains a regular
expression method for validating data against the set of days of the week:
Parameters:
name Internal identifier for the custom type. Must be unique across all standard types and custom types.
NOTE: You should verify that your data type's name value does not conflict with other custom
data type names.
category The category to assign to the type. The current categories are displayed within the data type drop-down for
each column.
defaultProbability Assign a default probability for the custom type. See below.
testCase This block contains the regular expression specification to be applied to the column values.
stripWhitespace When set to true, whitespace is removed from any value prior for purposes of validation. The original
value is untouched.
regexes This array contains a set of regular expressions that are used to validate the column values. For a regex
type, the column value must match with at least one value among the set of expressions.
NOTE: All match types must be double-escaped in the regex expression. For example, to
replicate the \d pattern, you must enter: \\d.
Trifacta Wrangler Enterprise implements a version of regular expressions based off of RE2 and PCRE regu
lar expressions.
probability (optional) Assign an incremental change to the probability when a match is found between a value and one
of the regular expressions. See Defining probabilities below.
Tip: In the types sub-directory, you can review the regex-based types that are provided with the Trifacta
platform. While you should not edit these files directly, they may provide some guidance and some regex
tips on how to configure your own custom data types.
For your custom type, the probability values are used to determine the likelihood that matching values indicate
that the entire column is of the custom data type.
The defaultProbability value specifies the baseline probability that a match between a value and one
of the regular expressions indicates that the column is the specified type. On a logarithmic scale, values
are typically 1E-15 to 1E-20.
When a value is matched to one of the regular expressions, the probability value is used to increment
the baseline probability that the next matching value is of the specified type. This value should also be
expressed on a logarithmic scale (e.g. 0.001).
In this manner, a higher number of matching values increases the probability that the type is also a match
to the custom type.
Probabilities become important primarily if you are creating a custom type that is a subset of an existing type. For
example, the Email Address custom type is a subset of String type. So, matches for the patterns expressed in the
Email Address definition should register a higher probability value than the same incremental for the String
type definition.
Tip: For custom types that are subsets of other, non-String types, you should lower the defaultProbab
ility of the baseline type by a factor of 10 (e.g. 1E-15 to 1E-16) and raise the same probability in the
custom type by a factor of 10 (e.g. 1E-14). In this manner, you can give higher probability of matching to
these subset types.
To the $CUSTOM_TYPE_DIR/manifest.json file, you must add the filenames of any custom types that you
have created and stored in the types directory:
{
"types": ["bodies-of-water.json", "dayofweek.json"],
"dictionaries": ["oceans", "seas"]
}
To enable use of your custom data types in the Trifacta platform, locate and edit enabledSemanticTypes prope
rty.
You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For
more information, see Platform Configuration Methods.
NOTE: Add your entries to the items that are already present in enabledSemanticTypes. Do not delete
and replace entries.
where:
<CustomTypeName1> corresponds to the internal name value for your custom data type.
To add your custom types to the Trifacta platform, run the following command from the js-data directory:
Restart platform
In the next release of Trifacta® Wrangler Enterprise after Release 6.0, the Trifacta command line
interface tools will be removed from the product (End of Life). Before upgrading to that release or
a later one, you must migrate your usage of the CLI to use the REST APIs. For more information,
see CLI Migration to APIs.
The Trifacta® command line interface (CLI) enables scripted execution of jobs and management of users and
connections for the Trifacta platform. This section provides documentation on how to install and deploy the
command line tools and includes example commands for each supported action.
Topics:
CLI Migration to APIs
Install CLI Tools
CLI for Connections
CLI for Jobs
CLI for User Admin
CLI Config File
Logging
The CLI submits requests to the platform through the Trifacta application, which writes its logging information to
the following file:
/opt/trifacta/logs/webapp.log
"ranfrom": "cli"
Tip: From the output of the CLI, you should get in the habit of capturing the job, dataset, flow, or other
object identifier that the request is creating, modifying, or removing. These IDs are useful for parsing the
log file or locating the object in the application.
Administrators can download log files through the Trifacta node operating system or through the web interface for
the platform. For more information, see System Services and Logs.
Log Levels
By default, the logging level for the web application is set to INFO.
If you are attempting to debug an issue related to the CLI, you can change the logging level.
You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For
more information, see Platform Configuration Methods.
The log level is defined in the following parameter:
"webapp.loggerOptions.level": "INFO",
In next release of Trifacta® Wrangler Enterprise after Release 6.0, the Trifacta Command Line Interface (CLI) will
reach its end of life (EOL). This means:
A version of the CLI that is compatible with the release will no longer be available for use.
Old versions of the CLI will not work with the new version of the platform.
In the next release of Trifacta® Wrangler Enterprise after Release 6.0, the Trifacta command line
interface tools will be removed from the product (End of Life). Before upgrading to that release or
a later one, you must migrate your usage of the CLI to use the REST APIs. For more information,
see CLI Migration to APIs.
Before you upgrade to the next Trifacta Wrangler Enterprise release, you must migrate any scripts or other
automation projects that currently use the CLI to use the v4 versions of the APIs. This section provides
information on how to manage that migration.
General Differences
API authentication
Tip: The recommended method is to create an API access token for the user account that is to be
accessing the APIs. This feature may need to be enabled in your instance of the platform. For more
information, see Enable API Access Tokens.
Terminology
Depending on the version you are using, please use the following mapping:
Connection Connection
Job JobGroup In the application, a job that you launch is composed of one or more sub-jobs, such as ingest, profiling,
transformation, or sampling.
In the APIs, the object is referenced by its internal platform name: wrangledDataset.
User User The API endpoint is people.
Parameters passed to the CLI are often user-friendly text values. The CLI tool then queries the appropriate REST
API endpoint and converts those values to internal identifiers.
When using the APIs, you must reference the internal identifiers directly.
Below is some information on how you can acquire the appropriate internal identifiers for each type of operation
supported by the CLI.
Object Identifiers
For each CLI command, there is an associated object identifier, which is used to uniquely reference the object. To
reference the object through the APIs, you must use the API unique id.
Tip: In the JSON response from the listed APIs, there may be multiple id values. To assist, you may find
it easier to use the secondary id's to locate each item.
NOTE: Each API endpoint returns only the objects to which the authenticating user has access. If other
users have personal objects that they need to migrate, they must provide access to them to the
authenticating user.
CLI object CLI unique CLI secondary id API Endpoint API API Notes
id secondary unique
id Id
script.cli n/a. See Open recipe in Transformer page API id This endpoint gets the
below. to acquire wrangledDataset Id. WrangledDatasets list of available
See "Important Notes on CLI Get List v4 wrangled datasets
packages" below. (recipes), which are
required for launching a
new job. That endpoint
is
API JobGroups Create
v4
.
The following example steps through the process of acquiring user ids so that you can use the APIs.
CLI - Get list of usernames:
The CLI references users via their platform usernames.
If your CLI scripts contain references to individual users, search them for:
If you want to acquire the list of all available usernames, it's easier to do that via the APIs.
API - Get list of users:
Use the following API endpoint to get the list of all users, including deleted and disabled users.
Endpoint http://example.com:3005/v4/people
Authentication Required
Method GET
Request Body None.
Object Description
email This value maps to the username value in your CLI scripts.
id Unique internal identifier that you can use in other people endpoints.
Tip: You must map each email address to its corresponding id value.
isDisabled If true, the user is disabled and cannot use the platform.
Unlike connection, job, or user objects, a CLI script package does not contain any references to platform objects
by design. These independent, self-contained objects can be used to run a script snapshot as a job at any time.
NOTE: When running jobs via the CLI, you are executing against a static recipe and other configuration
files on your local desktop. When you run via the APIs, you are executing against the current state of the
recipe object. So, if it is important that you execute your jobs against a read-only version of your recipe,
you should create copies of your flows before you run the job.
After you download, however, the script package is no longer aware of any changes that have occurred to the
source objects on the platform, which has the following implications:
1. If source objects, such as the source recipe, have changed, those changes are not present in the CLI
package.
a. The above does not apply to data sources. In the downloaded CLI package, sources are referenced
http://example.com:3005/flows/11?recipe=39&tab=recipe
http://example.com:3005/data/11/39
Run Job
You can issue commands to the CLI to execute jobs using the local package downloaded from the Recipe panel.
NOTE: When you run a job using the CLI, you are executing against a snapshot of a recipe at the
moment in time when the package was downloaded. Please be sure that you are aware of the Important
notes on CLI packages in the previous section.
Below are the three files in the package and their API equivalents:
script.cli A CLI-only version of the The APIs reference the latest definition of the recipe through the
recipe to execute. wrangledDataset object. See API WrangledDatasets Get v4.
datasources.tsv A CLI-only set of links to the The APIs reference the latest saved version of any datasource using the
data sources used to execute importedDataset object. When running a job, the data sources referenced in
the recipe. the WrangledDataset object are automatically pulled into job execution.
publishopts.json A CLI-only set of JSON If these outputs are part of the output definitions for the recipe in Flow
definitions of the outputs that View, they are automatically generated as part of running the job. For
are generated when a job is more information, see Flow View Page.
executed. If these outputs are overrides to the Flow View definitions, you can
insert these outputs as writesettings objects in the request
body when you launch the job. An example of this is provided below.
For more information on managing writesettings via APIs,
see API Workflow - Manage Outputs.
CLI example:
NOTE: Inside the platform, this identifier is a reference to the jobGroup, which is the collection of sub-jobs
for a specified job. Sub-job types include: sampling, ingestion, transformation, and profiling. Collectively,
these appear under a single job identifier in the Trifacta application, and the same value is used as the
jobGroup Id in the APIs.
Default settings: After you have captured the wrangledDataset identifier, you can launch a new job using
default settings:
Endpoint http://localhost:3005/v4/jobGroups
Authentication Required
Method POST
Request Body
{
"wrangledDataset" {
"id", <wrangledDatasetId>
}
}
NOTE: A job group is composed of one or more sub-jobs for sampling, ingestion, transformation, and profiling,
where applicable. You can append ?embed=jobs to include sub-job information in the response.
Specify job overrides: The above request contains only the wrangledDataset identifier. All default output settings
are used.
If needed, you can override these default settings by specifying values as part of the request body. In the
following example, the relevant parameters from the CLI have been added as elements of the JSON body of the
request.
Through the APIs, you can also override the default files, formats, and locations where you output results in the w
ritesettings block.
Endpoint http://localhost:3005/v4/jobGroups
Authentication Required
Method POST
Request Body
{
"wrangledDataset": {
"id": <wrangled_dataset_id>
},
"overrides": {
"execution": "spark",
"profiler": true,
"writesettings": [
{
"path":
"hdfs://hadoop:50070/trifacta/queryResults/[email protected]/MyDataset/42/clea
"action": "create",
"format": "json",
"compression": "none",
"header": false,
"asSingleFile": false
}
]
},
"ranfrom": "cli"
}
NOTE: A job group is composed of one or more sub-jobs for sampling, ingestion, transformation, and profiling, where applicable. Yo
include sub-job information in the response.
You can specify publication options as part of your run_job command. In the following, a single CSV file with
headers is written to a new file with each job execution.
Example (All one command):
CLI example:
After you queue a job through the CLI, you can review the status of the job through the application or through the
CLI.
Tip: You can acquire the job ID through the application as needed. For example, at some point in the
future, you might decide to publish to Hive the results from a job you executed two weeks ago. It might be
easiest to retrieve this job identifier from the Dataset Details page. See Dataset Details Page.
Endpoint http://localhost:3005/v4/jobGroup/42/status
Authentication Required
Method GET
Request Body None.
Reference Docs:
See API JobGroups Get Status v4.
Publish
After a job has successfully completed, you can publish the results to another datastore with which the platform is
integrated.
CLI example:
The following command publishes the results of jobId 42 through connectionId 1 to the dev database. Let's
assume that this is a Hive database.
Endpoint http://localhost:3005/v4/jobGroups/42/publish
Authentication Required
Method PUT
Reference Docs:
See API JobGroups Put Publish v4.
Get Publications
You can retrieve a JSON list of all publications that have been executed for a specific job.
A publication is an object that corresponds to the delivery of a job's results to an external datastore.
In the Trifacta application, publications are executed through the Publishing Dialog, which is available
through the Job Details page. See Publishing Dialog.
CLI example:
Endpoint http://localhost:3005/v4/jobGroups/42/publications
Authentication Required
Method GET
Request Body None.
NOTE: When appending data into a Redshift table, the columns displayed in the Transformer page must
match the order and data type of the columns in the target table.
CLI example:
In the following example, the results of jobId 42 are loaded into a Redshift table called table_42 using
connectionId 2.
Endpoint http://localhost:3005/v4/jobgroups/42/publish
Authentication Required
Method POST
Request Body
{ "connection": {
"id": 2
},
"path": ["dev"],
"table": "table_42",
"action": "load",
"inputFormat": "avro",
"flowNodeId": 27
}
Reference Docs:
API JobGroups Put Publish v4
For existing tables, you can clear them and load them with results from a job. If the table does not exist, a new
one is created and populated.
CLI example:
Endpoint http://localhost:3005/v4/jobgroups/10/publish
Authentication Required
Method POST
Request Body
{ "connection": {
"id": 2
},
"path": ["dev"],
"table": "table_43",
"action": "truncateAndLoad",
"inputFormat": "avro",
"flowNodeId": 27
}
Reference Docs:
API JobGroups Put Publish v4
You can use the CLI for basic management of your connections.
CLI Docs: CLI for Connections
Create Connection
To create a connection, you specify the connection parameters as part of your command line command.
CLI example:
Endpoint http://localhost:3005/v4/connections
Authentication Required
Method POST
Reference Docs:
API Connections Create v4
Edit Connection
In the CLI, you use the edit_connection action to pass in modifications to a connection that is specified using
the conn_name command line parameter.
CLI example:
In the following example, the description, host, and port number are being changed for the aSQLServerConnect
ion.
Endpoint http://localhost:3005/v4/connections/8
Authentication Required
Method PATCH
Request Body
{
"description": "This is my connection.",
"host": "mynewhost.com",
"port": 1234
}
Reference Docs:
See API Connections Patch v4.
List Connections
The CLI command list_connections dumps the JSON objects for all connections to a local file.
CLI example:
Tip: For any endpoint using a GET method, if you omit an object identifier, you retrieve all accessible
objects of that type from the platform.
Endpoint http://localhost:3005/v4/connections
Authentication Required
Method GET
Request Body None.
Reference Docs:
See API Connections Get List v4.
Delete Connection
For the CLI, you use the delete_connection command to remove connections that are specified by conn_na
me.
CLI example:
Endpoint http://localhost:3005/v4/connections/4
Authentication Required
Method DELETE
Request Body None.
Reference Docs:
You can use the CLI for handling of some elements of user management.
NOTE: Some user account properties cannot be managed through the CLI. You must use the APIs or the
application for some tasks.
Create User
CLI example:
Endpoint http://www.example.com:3005/v4/people
Authentication Required
Method POST
Request Body
{
"accept": "accept",
"password": "Hello2U",
"password2": "Hello2U",
"email": "[email protected]",
"name": "Joe"
}
Reference Docs:
API People Create v4
Show User
You can gather a specific user object using the username through the CLI.
CLI example:
Endpoint http://www.example.com:3005/v4/people/4
Authentication Required
Method GET
Request Body None.
Reference Docs:
API People Get v4.
Edit User
You can edit some properties through the CLI edit_user command.
CLI example:
Include only the parameters in the request that are being modified.
Endpoint http://www.example.com:3005/v4/people/4
Authentication Required
Method PUT
Reference Docs:
API People Patch v4
Through the CLI, admins can generate password reset emails to be sent to specific users.
CLI example:
NOTE: The v4 endpoint equivalent of this CLI command is not available in Release 6.0. It will be
available at or before the End of Life (EOL) of v3 endpoints.
Disable User
Through the CLI, you can disable individual users by adding the disable flag as part of an edit_user directive.
CLI example:
Endpoint http://www.example.com:3005/v4/people/4
Authentication Required
Method PATCH
Reference Docs:
API People Patch v4
Delete User
CLI example:
In the following example, the user is deleted by username, and the user's assets are transferred to another user.
NOTE: Transfer of assets is not required. However, if the assets are not transferred, they are no longer
available.
NOTE: You must verify that the transfer step occurs successfully before you execute the deletion.
Deletion of a user cannot be undone.
NOTE: Transferring of assets does not check for access to the objects. It's possible that the receiving
user may not be able to access connections or datasets that were created by the original user. You may
wish to share those assets through the application before you perform the deletions.
Transfer of assets:
Endpoint http://www.example.com:3005/v4/people/7/assetTransfer/4
Authentication Required
Method PATCH
Request Body None.
Response
Body [
[
1,
[
0,
[
{
"connectionId": 7,
"personId": 7,
"role": "owner",
"createdAt": "2019-02-21T19:52:22.993Z",
"updatedAt": "2019-02-21T19:52:22.993Z"
}
]
]
]
]
NOTE: Please verify that you have received a response similar to the above before you delete the user. You
should also verify that the receiving user has the assets accessible in the application.
Delete user:
After assets have been transferred, users can be deleted by userId (4).
Endpoint http://www.example.com:3005/v4/people/4
Authentication Required
Method DELETE
Request Body None.
Reference Docs:
API People Delete v4
In the next release of Trifacta® Wrangler Enterprise after Release 6.0, the Trifacta command line
interface tools will be removed from the product (End of Life). Before upgrading to that release or
a later one, you must migrate your usage of the CLI to use the REST APIs. For more information,
see CLI Migration to APIs.
Contents:
Download
Install
Upgrade
By default, the Trifacta® Command Line Interface (CLI) tools are installed on the Trifacta node during installation.
You can use them from there.
Optionally, you can install the CLI tools on a separate server. For example, you might want to create a dedicated
server from which you can run a set of predefined jobs on a periodic basis.
NOTE: The location from where you are running the CLI tools must be able to access the pre-installed
instance of the Trifacta platform.
This section describes how to download and install the CLI tools on a dedicated server.
Download
The Trifacta CLI installer is available through a separate file next to the software distribution provided by Trifacta.
*.RPM for CentOS/RHEL
*.DEB for Ubuntu
The appropriate file should be downloaded to the server where you are installing the tools. For more information,
see Trifacta Support.
Install
Steps:
1. On the node where you are installing, execute the command.
a. For CentOS/RHEL6:
where:
X.Y.Z = the three-digit release number.
AAAA = internal build number.
b. For CentOS/RHEL7:
where:
X.Y.Z = the three-digit release number.
AAAA = internal build number.
d. For Ubuntu 16.04 (Xenial):
where:
X.Y.Z = the three-digit release number.
AAAA = internal build number.
2. When the installation is complete, you can begin using the tools. The tools are installed in the following
directory:
/opt/trifacta/bin/
Upgrade
When you upgrade to a new version of Trifacta Wrangler Enterprise, you must complete the following steps to
ensure that your CLI tools and scripts are upgraded:
NOTE: There is no guarantee of compatibility between versions of Trifacta Wrangler Enterprise CLI tools.
You should re-install the tools with each upgrade.
1. Download and install the new version of the CLI tools. See earlier in this section.
2. Unless changes are required, you can try to run your CLI scripts using the CLI packages that you
downloaded from the previous version. If your scripts fail when running jobs, then you should try to
re-download the packages from the Transformer page. For more information, see Recipe Panel.
For more information, see Changes to the Command Line Interface.
CLI for Connections
In the next release of Trifacta® Wrangler Enterprise after Release 6.0, the Trifacta command line
interface tools will be removed from the product (End of Life). Before upgrading to that release or
a later one, you must migrate your usage of the CLI to use the REST APIs. For more information,
see CLI Migration to APIs.
Contents:
Requirements
Command Reference
The command line references lets you manage connections between the Trifacta® platform and various types of
datastores. You can also use this CLI for the following:
Create, edit, or delete connections
NOTE: In this release, you cannot create SQL DW connections via the CLI. This known issue will
be fixed in a future release.
NOTE: Sharing of connections is not supported through the command line interface.
Requirements
NOTE: Some types of connections available through the UI cannot be created through the CLI. For more
information, see Connection Types.
The CLI must have access to a running Trifacta instance. You can specify the host and port of this
instance.
For each connection that you create, the Trifacta node must be able to access it through the listed host and
port.
Command Reference
/opt/trifacta/bin/
./trifacta_cli.py (parameters)
Parameters
Common
command_type (Required) The type of CLI command to execute. Accepted values: All
For more information on the following commands, see CLI for Jobs.
user_name (Required) Trifacta username of the user to execute the job. Please specify the All
full username.
NOTE: The user issuing the command must also have execute
permissions on all parent folders in the specified cli_output_p
ath.
disable_ssl_certification (Optional) When communicating over HTTPS, this setting can be used to All
override the default behavior of validating the server certificate before executing
the command.
NOTE: You must modify the host parameter value to include the
appropriate port number for the SSL connection.
NOTE: SSL connections are not supported for Hive, Redshift, or SQL
Server.
The following parameters apply to managing connection objects only. Some of the preceding parameters may be
required for connection actions.
./trifacta_cli.py
create_connection
-h
Hadoop Hive
Amazon Redshift
conn_description This text value is displayed to users when they create or edit create_connection, e
connections of this type through the Trifacta application. dit_connection
conn_credential_location The path to a JSON file containing the credentials for your create_connection,
connection, as consistent with the conn_credential_ edit_connection
type. For more information on the expected format, see
Credentials file below.
conn_params_location When you create a connection, you can reference a JSON file create_connection, e
containing parameters to apply during the creation of any dit_connection
connection of this type. See Params file below.
./trifacta_cli.py --help
Additional documentation might be available for individual commands using the following:
Credentials file
You can store connection login credentials in a file on the Trifacta node. When managing connections, you can
reference this JSON credentials file in the command, which forces the use of encrypted versions of the credentials
stored in the file. Examples are provided below.
Example - Basic credentials:
This example applies for relational connection types: Oracle, PostGreSQL, SQL Server, and Teradata.
{
"username": "<your_username>",
"password": "<your_password>"
}
{
"username": "<your_user>",
"password": "<your_password>"
"iamRoleArn": "<your_IAM_role_ARN>"
}
NOTE: iamRoleArn is optional. For more information, see Configure for EC2 Role-Based Authentication.
In an external file, you can create a set of parameters to pass to any object for which you are creating a
connection. For example, when you create a connection to a database, you may need to reference a default
database to which any instance of the connection connects.
The following parameters are supported for each vendor.
Teradata None.
Additional parameters:
Except for Redshift connections, you can submit additional configuration parameters using the ConnectStrOpts
key-value pair in the parameters file. Example:
"connectStrOpts": ";transportMode=http;httpPath=cliservice"
NOTE: Each vendor uses a specific separator between the connection URL and the connection string
options. For example, if you are creating or editing a Teradata connection and are submitting ConnectSt
rOpts parameters, the string value must begin with a comma:
"connectStrOpts": ",Key1=Value1,Key2=Value2"
For more information, see the documentation provided with your database product.
"connectStrOpts": ",Key1=Value1,Key2=Value2?myView=custom1"
When the connection is created and used, the connection string might look like the following:
For submitting arbitrary parameters to Oracle, please see the example below.
Example - Hive params:
NOTE: By default, the Hive connection is defined to use TCP. If you are using HTTP to connect to Hive,
additional configuration is required, including insertion of additional parameters in your params file. See
Configure for Hive.
NOTE: If you are connecting to a Kerberos-enabled cluster, you must include the Kerberos principal for
Hive as part of the connectStrOpts value. See Configure for Hive.
{
"connectStrOpts": ";<depends_on_deployment>",
"defaultDatabase": "default",
"jdbc": "hive2"
}
For more information on connection string options for Hive, see Configure for Hive.
Example - Redshift params:
{
"defaultDatabase":"<your_database>",
"extraLoadParams": "BLANKSASNULL EMPTYASNULL TRIMBLANKS TRUNCATECOLUMNS"
}
{
"database":"<your_database>"
}
{
"service":"orcl"
}
For submitting arbitrary parameters to Oracle, the arbitrary string must follow the ORA format, in which most of the
connection string is replaced by parameters. For example:
In this case, the generated connection string might look like the following:
jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PO
RT=1521))(CONNECT_DATA(SERVICE_NAME=orcl)))
The original host, port, and service name values specified in the connection are ignored and replaced by these
values.
Examples
At the command line, all jobs must be executed through connection objects. For each datastore to which the Trifa
cta platform is connected, you must create at least one connection object and then reference it in any job
execution tasks.
Create connection
NOTE: For Hive, connections must be created as public connections (include the --conn_is_global fl
ag). You can only create one connection of each of these types.
For more information on creating a Hive connection through the CLI, see Configure for Hive.
Command
Output
{
"conn_credential_location": "~/.trifacta/config_conn.json",
"conn_credential_type": "basic",
"conn_host": "example.com",
"conn_id": 9,
"conn_name": "aSQLServerConnection",
"conn_params_location": "~/.trifacta/p.json",
"conn_port": "1234",
"conn_type": "microsoft_sqlserver",
"host": "http://example.com:3005",
"results": {
"createdAt": "2016-06-30T21:53:58.977Z",
"createdBy": 3,
"credential_type": "basic",
"credentials": [
{
"username": "<trifacta_user>"
}
],
"deleted_at": null,
"description": null,
"host": "example.com",
"id": 9,
"is_global": false,
"name": "aSQLServerConnection",
"port": 1234,
"type": "microsoft_sqlserver",
"updatedAt": "2016-06-30T21:53:58.977Z",
"updatedBy": 3
},
"status": "success",
"user_name": "<trifacta_user>"
}
Edit connection
Command
In the following command, all parameters specified within angled brackets are optional settings that can be
changed. The other ones are required to perform any edit.
You must specify the conn_name or the conn_id.
NOTE: If you are editing the connection's credentials, you must specify the conn_credential_type in
the command, which is required if you are changing any credential parameter. This step completely
replaces the old credentials, so you must specify all connection parameters in the command.
Output
Following assumes that only the above values for host and cli_output_path contain new values:
JSON Response
List connections
Command
Tip: You can specify a conn_name or conn_id to return the information about a connection.
Listing connections
Found 2 connections for params {'noLimit': 'true'}.
Redshift:
description: None
host: dev.redshift.example.com
credentials: ["{u'username': u'<trifacta_user>'}"]
port: 5439
is_global: True
name: Redshift
id: 2
credential_type: custom
params:
extraLoadParams: BLANKSASNULL EMPTYASNULL TRIMBLANKS TRUNCATECOLUMNS
defaultDatabase: dev
type: amazon_redshift
Hive:
description: None
host: dev.hive.example.com
credentials: ["{u'username': u'<trifacta_user>'}"]
port: 10000
is_global: True
name: Hive
id: 1
credential_type: conf
params:
jdbc: hive2
connectStrOpts:
defaultDatabase: default
type: hadoop_hive
JSON results written to conn_list.out.
JSON Response
{
"connections": [
{
"conn_createdAt": "2016-06-01T21:12:59.383Z",
"conn_createdBy": 2,
"conn_credential_type": "custom",
"conn_credentials": [
{
"username": "<trifacta_user>"
}
],
"conn_deleted_at": null,
"conn_description": null,
"conn_host": "dev.redshift.example.com",
Delete connection
Tip: You can delete a connection by using its internal connection identifier ( conn_id), instead of its
connection name.
Command
Output
JSON Response
{
"conn_name": "aSQLServerConnection",
"host": "http://localhost:3005",
"status": "success",
"user_name": "<trifacta_user>"
}
In the next release of Trifacta® Wrangler Enterprise after Release 6.0, the Trifacta command line
interface tools will be removed from the product (End of Life). Before upgrading to that release or
a later one, you must migrate your usage of the CLI to use the REST APIs. For more information,
see CLI Migration to APIs.
Contents:
Requirements
The Command Line Interface for Jobs enables programmatic control over a variety of operations on the platform.
You can use the CLI to execute any of the following types of commands:
Run a job
NOTE: In this release, you cannot run jobs using datasets imported from Redshift or SQL DW
connections via the CLI.
NOTE: In this release, you cannot publish results to Redshift or SQL DW connections via the CLI.
This known issue will be fixed in a future release.
Requirements
The CLI must have access to a running instance of the Trifacta® platform. You can specify the host and
port of this instance.
If you are running jobs for a dataset with parameters, the downloaded assets reference only the first
matching file of the dataset. To run the job across all files in the dataset with parameters, you must build
the matching logic within your CLI script. For more information on datasets with parameters, see
Overview of Parameterization.
Command Reference
Execute the following command from the top-level Trifacta directory. The Python script references script.clia
nd datasources.tsv as parameters.
For repeat executions of the same script.cli file, you can parameterize the values in the datasources.tsv.
/opt/trifacta/bin/
./trifacta_cli.py (parameters)
Parameters
Common
For more information on the following commands, see CLI for Connections.
user_name (Required) Trifacta username of the user to execute the job. Please specify the All
full username.
NOTE: The user issuing the command must also have execute
permissions on all parent folders in the specified cli_output_p
ath.
disable_ssl_certification (Optional) When communicating over HTTPS, this setting can be used to All
override the default behavior of validating the server certificate before executing commands
the command.
NOTE: You must modify the host parameter value to include the
appropriate port number for the SSL connection.
NOTE: SSL connections are not supported for Hive, Redshift, or SQL
Server.
host (Required) The server and port number of the Trifacta instance. All
Replace this value with the host and port of the running Trifacta instance. If
it is not provided, localhost:3005 is assumed.
conn_name Internal name of the connection. This name is referenced in your CLI load_data, publish
scripts. It should be a single value without spaces. , truncate_and_loa
d
NOTE: This value must be unique among your connection
names.
conn_id The internal identifier for the connection. When a connection is created, it is publish , load_dat
assigned an internal numeric identifier. This ID or the connection_na a, truncate_and_lo
me can be used to reference the connection in future commands. ad
job_id The internal identifier for the job. This value can be retrieved from the get_job_status,
output of a completed run_job command. blish,get_publica
tions,load_data
profiler When on, profiling of your job is enabled. Default is off. run_job
data Full UNIX path to the source TSV file. This file contains a URL pointing to run_job
the actual Hive or HDFS source: one TSV file for each job run. Executing
user must have access to this file.
script Full UNIX path from the Trifacta root directory to the CLI script file. run_job
Executing user must have access.
publish_action (Optional) Defines the action taken on second and subsequent publish run_job
operations:
header (Optional), The output for a CSV job with append or create publishing run_job
action includes the column headers as the first row. Default is false.
NOTE: If you use the header option, you must also include the
single_file option, or this setting is ignored.
single_file (Optional) When true, CSV or JSON outputs are written to a single file. run_job
Default is false.
hdfs://host:port/path/filename.csv
s3://bucketName/path/filename.csv
This parameter specifies the base filename. If you are publishing files, the p
ublish_action parameter value may change the exact filename that
is written.
database Name of Redshift or Hive database to which you are publishing or loading. publish,load_data
table The table of the database to which you are publishing or loading. publish,load_data
publish_format The format of the output file from which to publish to Hive or Redshift publish, get_publi
tables. Accepted values: csv, json, pqt (Parquet), or avro (Avro). cations
publish_opt_file Path to file containing definitions for multiple file or table targets to which to run_job
write the job's results. For more information, see
CLI Publishing Options File.
skip_publish_validation By default, the CLI automatically checks for schema validation when run_job
generating results to a pre-existing source.
./trifacta_cli.py --help
Additional documentation may be available for individual commands using the following:
Examples
A key function of the CLI is to execute jobs. You can also check job status through the command line interface
and then take subsequent publication actions using other commands.
This command requires a dataset and a CLI script. The CLI script is used to programmatically run a
recipe produced in the Transformer page.
For example, if you receive raw data each day, you can parameterize the execution of the same recipe against
daily downloads written to HDFS.
Each run of the CLI script creates a new job. A finished CLI job appears on the Jobs page.
Steps:
1. A recipe is specific to a dataset. In the Transformer page, open the Recipe Panel.
2. Click Download.
3. Select CLI Script.
4. Download to your desktop. The ZIP contains the following:
script.cli Contains the necessary code and configuration to access HDFS and the script in the
Trifacta database.
datasources.tsv Contains pointers to the source storage location of your datasource(s).
If you are running jobs for a dataset with parameters, the downloaded assets reference only
the first matching file of the dataset. To run the job across all files in the dataset with
parameters, you must build the matching logic within your CLI script. For more information on
datasets with parameters, see Overview of Parameterization.
For an example of how to add parameters in a local script, see
CLI Example - Parameterize Job Runs.
publishopts.json Template file for defining one or more publishing targets for running jobs. See
CLI Publishing Options File.
5. These files must be transferred to the Trifacta Server where you can reference them from the Trifacta root
directory.
Notes on connections and jobs
In the downloaded ZIP, the datasources.tsv file may contain a reference to the connection used to import the
dataset. However, if you are running the CLI in an Trifacta platform instance that is different from its source, this
connectionId may be different in the new environment. From the new environment, please do the following:
1. Use the list_connections operation to acquire the list of connections available in the new environment.
See CLI for Connections.
2. Acquire the Id value for the connection corresponding to the one used in datasources.tsv.
NOTE: The user who is executing the CLI script must be able to access the connection in the new
environment.
3. Edit datasources.tsv. Replace the connection Id value in the file with the value retrieved through the
CLI.
4. When the job is executed, it should properly connect to the source through the connection in the new
environment.
NOTE: This method of specifying a single-file publishing action has been superseded by a newer method,
which relies on an external file for specifying publishing targets. In a future release, this method may be
deprecated. For more information, see CLI Publishing Options File.
Output
JSON Response
"hdfs://localhost:8020/trifacta/queryResults/[email protected]/MyDataset/4
2/.profiler/profilerValidValueHistograms.json",
"hdfs://localhost:8020/trifacta/queryResults/[email protected]/MyDataset/4
2/.profiler/profilerSamples.json",
"hdfs://localhost:8020/trifacta/queryResults/[email protected]/MyDataset/4
2/.profiler/profilerTypeCheckHistograms.json",
"hdfs://localhost:8020/trifacta/queryResults/[email protected]/MyDataset/4
2/.profiler/profilerInput.json"
]
"job_result_files": [
"hdfs://localhost:8020/trifacta/queryResults/[email protected]/MyDataset/4
2/cleaned_table_1.json",
]
},
"job_id": 42,
"cli_script":
"/trifacta/queryResults/[email protected]/redshift-test/script.cli",
"job_type": "spark",
"profiler": "on",
"source_data":
"/trifacta/queryResults/[email protected]/redshift-test/datasources.tsv",
"host": "localhost:3005",
"output_path":
"hdfs://localhost:8020/trifacta/queryResults/[email protected]/MyDataset/4
2/cleaned_table_1.json",
"user": "[email protected]",
"output_file_formats": [
"json"
],
}
You can specify publication options as part of your run_job command. In the following, a single CSV file with
headers is written to a new file with each job execution.
Example (All one command):
Output
JSON Response
"hdfs://localhost:8020/trifacta/queryResults/[email protected]/MyDataset/4
3/.profiler/profilerValidValueHistograms.json",
"hdfs://localhost:8020/trifacta/queryResults/[email protected]/MyDataset/4
3/.profiler/profilerSamples.json",
"hdfs://localhost:8020/trifacta/queryResults/[email protected]/MyDataset/4
3/.profiler/profilerTypeCheckHistograms.json",
"hdfs://localhost:8020/trifacta/queryResults/[email protected]/MyDataset/4
3/.profiler/profilerInput.json"
]
"job_result_files": [
"hdfs://localhost:8020/trifacta/queryResults/[email protected]/MyDataset/4
3/cleaned_table_1.csv",
]
},
"job_id": 43,
"cli_script":
"/trifacta/queryResults/[email protected]/redshift-test/script.cli",
"output_file_formats": [
"csv",
],
"job_type": "spark",
"host": "localhost:3005",
"job_output_path":
"/trifacta/queryResults/[email protected]/MyDataset/43/",
"user": "[email protected]",
"source_data":
"/trifacta/queryResults/[email protected]/redshift-test/datasources.tsv",
"profiler": "on"
}
As part of the CLI job, you can define multiple file or table targets to which to write the job results. For more
information, see CLI Publishing Options File.
After you queue a job through the CLI, you can review the status of the job through the application or through the
CLI.
Command
Output
JSON Response
"hdfs://localhost:8020/trifacta/queryResults/[email protected]/MyDataset/4
2/.profiler/profilerValidValueHistograms.json",
"hdfs://localhost:8020/trifacta/queryResults/[email protected]/MyDataset/4
2/.profiler/profilerSamples.json",
"hdfs://localhost:8020/trifacta/queryResults/[email protected]/MyDataset/4
2/.profiler/profilerTypeCheckHistograms.json",
"hdfs://localhost:8020/trifacta/queryResults/[email protected]/MyDataset/4
2/.profiler/profilerInput.json"
]
"job_result_files": [
"hdfs://localhost:8020/trifacta/queryResults/[email protected]/MyDataset/4
2/cleaned_table_1.json",
"hdfs://localhost:8020/trifacta/queryResults/[email protected]/MyDataset/4
2/cleaned_table_1.csv",
"hdfs://localhost:8020/trifacta/queryResults/[email protected]/MyDataset/4
2/cleaned_table_1.avro",
]
},
"job_id": 42,
"cli_script":
"/trifacta/queryResults/[email protected]/redshift-test/script.cli",
"output_file_formats": [
"csv",
"json",
"avro",
],
"job_type": "spark",
"host": "localhost:3005",
"job_output_path":
"/trifacta/queryResults/[email protected]/MyDataset/42/",
"user": "[email protected]",
"source_data":
"/trifacta/queryResults/[email protected]/redshift-test/datasources.tsv",
"profiler": "on"
}
Publish
You can publish job results for completed jobs to specified database tables:
NOTE: Even if you are publishing to the default schema, you must preface the table value with
the name of the schema to use: MySchema.MyTable.
Publish commands can be executed as soon as the job identifier has been created. After the publish command is
submitted, the publish job is queued for execution after any related transform job has been completed.
NOTE: You cannot publish ad-hoc results for a job when another publishing job is in progress for the
same job through the applicationor the command line interface. Please wait until the previous job has
been published before retrying to publish the failing job. This is a known issue.
You execute one publish command for each output that you wish to write to a supported database table.
Command
Output
JSON Response
You can retrieve a JSON list of all publications that have been executed for a specific job.
Command
Output
Job with id 42 has 2 avro publication(s) associated with it. The list of
publications is available in "./publications.out".
JSON Response
{
"publications": [
{
"publication_target": "redshift",
"job_id": "42",
"database": "dev",
"publication_id": 69,
"app_host": "trifacta.example.com:3005",
"user": "[email protected]",
"table": "table_job_42",
"publish_format": "avro",
"connect_str": "jdbc:redshift://dev.example.com:5439/dev"
},
{
"publication_target": "hive",
"job_id": "42",
"database": "default",
"publication_id": 70,
"app_host": "trifacta.example.com:3005",
"user": "[email protected]",
"table": "table_job_42",
"publish_format": "avro",
"connect_str": "jdbc:hive2://hadoop:10000/default"
}
],
}
NOTE: When appending data into a Redshift table, the columns displayed in the Transformer page must
match the order and data type of the columns in the target table.
Command
Output
JSON Response
For existing tables, you can clear them and load them with results from a job. If the table does not exist, a new
one is created and populated.
Command
Output
JSON Response
In the next release of Trifacta® Wrangler Enterprise after Release 6.0, the Trifacta command line
interface tools will be removed from the product (End of Life). Before upgrading to that release or
a later one, you must migrate your usage of the CLI to use the REST APIs. For more information,
see CLI Migration to APIs.
You can use the following Bash script to execute parameterized job runs on the Trifacta® node. This script
accepts parameters to identify the CLI package downloaded to the node and then runs the job, whose output
includes an identifier for the current date. In this manner, the script can be run on a daily basis on any number of
CLI packages.
The CLI package includes:
script.cli - script file
datasources.tsv - file containing a pointer to the storage location of the source data
These values are provided to the script as command-line parameters (--script and --source).
#!/bin/bash
--source=*)
SourceParam="${i#*=}"
;;
*)
# unknown option
echo ${i} parameter is not recognized. Please provide --script and
--source values.
echo
exit 0
;;
esac
done
if [${ScriptParam} -eq ""]
then
echo --script param is required.
echo
exit 0
fi
if [${SourceParam} -eq ""]
then
echo --source param is required.
echo
exit 0
fi
## Launch job
echo "/opt/trifacta/bin/trifacta_cli.py run_job --script=$Script
--data=$Data --host=$AppHost --job_type=$JobType --user_name=$User
--password=$Password --output_formats=$OutputFormats
--output_path=$OutputPath $extraArgs" >> stdout.txt
/opt/trifacta/bin/trifacta_cli.py run_job --script=$Script --data=$Data
--host=$AppHost --job_type=$JobType --user_name=$User
--password=$Password --output_formats=$OutputFormats
--output_path=$OutputPath $extraArgs >> stdout.txt 2>> stderr.txt
JobLaunched=$?
if [ "$JobLaunched" -eq 1 ]
then
echo "Failed to launch job. See stderr.txt for details"
exit 1
fi
JobInfo=''
JobStatus='Pending'
## If jobId exists..
if [ "$JobId" -ge 0 ]
then
echo "Job with Id " $JobId " launched"
if [ "$JobStatus" = 'Complete' ]
then
echo "Job "$JobId" is complete."
echo "Output path is $OutputPath"
else
You can use the above as a basic template for execution of any type of CLI command.
CLI Publishing Options File
In the next release of Trifacta® Wrangler Enterprise after Release 6.0, the Trifacta command line
interface tools will be removed from the product (End of Life). Before upgrading to that release or
a later one, you must migrate your usage of the CLI to use the REST APIs. For more information,
see CLI Migration to APIs.
If needed, you can specify multiple file or table targets as part of a single CLI job. In your CLI command, the path
on the Trifacta® node to this JSON file is specified as the publish_opt_file parameter, as in the following:
Tip: To specify this file, you can run this job through the application. After the job has completed,
download the CLI script from the Recipe panel in the Transformer page. The downloaded publishopts
.json file contains the specification for the targets you just executed. See Recipe Panel.
NOTE: All of the following properties require valid values, unless noted.
File targets:
Property Description
path Full path to the target file. Path must include the protocol identifier, such as hdfs:// and the port number.
create - Create a new file with each subsequent publication. Filenames for subsequent job runs are
appended with the job number identifier.
append - The results of each subsequent job run are appended to the existing file contents.
replace - The results of each subsequent job run replace the same file. Previous job run results are lost
unless moved out of the location.
csv
json
avro
pqt
header If set to true, then output files in CSV format include a header row. Headers cannot be applied when compression
is enabled.
asSingleFile If set to true, then output files are written to a single file.
If set to false, then the output files are written to multiple files as needed.
compression (optional) This property can be used to specify any compression to apply to a text-based file. Supported compression
formats:
gzip
bzip2
snappy
If this is not specified, then no compression is applied to the output file.
Hive targets:
Property Description
create - Create a new table with each subsequent publication. Table names for subsequent job runs are
appended with a timestamp.
append - The results of each subsequent job run are appended to the existing table contents.
replace - The results of each subsequent job run are written to the same table, which has been emptied.
Previous job run results are lost unless moved out of the location (dropAndLoad).
overwrite - The results of each subsequent job run are written to a newly created table with the same name
as the output table from the previous job run (truncateAndLoad).
In the next release of Trifacta® Wrangler Enterprise after Release 6.0, the Trifacta command line
interface tools will be removed from the product (End of Life). Before upgrading to that release or
a later one, you must migrate your usage of the CLI to use the REST APIs. For more information,
see CLI Migration to APIs.
Contents:
Command Reference
Troubleshooting
The Command Line Interface for User Administration enables administrators to perform bulk user management
tasks on the platform. You can use the CLI to manage the following tasks:
Create, edit, or delete users.
Enable or disable an existing users.
Retrieve individual or all user profiles, including any security details.
Password reset.
Command Reference
/opt/trifacta/bin/
admin_username Username of the admin account to be used to execute the user admin All
command. Please specify the full username.
host (Optional) The server and port number of the Trifacta® instance. By All
default, this value is set to http://localhost:3005. Specify a
new value if needed.
https://localhost:2443
See below.
disable_ssl_certification (Optional) When communicating over HTTPS, this setting can be used to All
override the default behavior of validating the server certificate before
executing the command.
enable (Optional) Put the user in an enabled state. Default is to enable the user. create_user
and edit_use
r
transfer_assets_to (Optional) When deleting a user, you can optionally transfer all of the delete_user
user's assets to another user.
./trifacta_admin_cli.py --help
Config file
You can store Trifacta platform username and password information in an external file. See CLI Config File.
The following user account properties are exposed through the command line:
--hadoopPrincipal Hadoop principal value that is used to connect to the Hadoop environment. This setting Y
applies only when secure impersonation is enabled.
--outputHomeDir The output home directory for the user. By default, the results of each job executed by the Y
user are generated in a sub-directory within this one.
--isDisabled When set to True, the user account is disabled and cannot be used to login to the Y
application.
--email The email address associated with the user account. The email address is also the userID for Y
the account.
--ssoPrincipal The SSO principal value associated with the user account. This value only applies to Y
environments that are integrated with an enterprise Single Sign On solution.
--enableAdmin When set to True, this user account is a system administrator account. You should limit the Y
number of accounts that have system administrator access.
--disableAdmin When set to True, this user account is not a system administrator account. You should limit Y
the number of accounts that have system administrator access.
--lastLoginTime The timestamp of when the user account was most recently used to login to the application. N
Examples
User Admin under SSO
If you are in an SSO environment, the following properties require special values to properly authenticate with
AD/LDAP. All values are required:
Property Description
admin_username Use the SSO username for the platform admin user issuing the command.
NOTE: In an SSO environment, the default admin user account for the Trifacta platform does not work.
The issuing user must be an SSO user that has been promoted to admin within the Trifacta platform.
host This value must point to the SSO gateway on the Trifacta node and must include the port number. If you are
running the CLI on the Trifacta node, use the following:
https://localhost:2443
ssoPrincipal In SSO environments, this parameter is required. It must be set to the SSO principal value associated with the
user that is being modified.
Command
Example (all one command):
Notes
Add --disable parameter to create the user in a disabled state.
Output
Show user
Command
Example (all one command):
Output
Edit user
Command
The following command changes the Single Sign On principal for the user to a new value. The values for other
user account settings found in the response below can be inserted in the command to modify those settings.
Example (all one command):
Output
Command
The following command generates a URL for a specified user that enables the user to reset his or her account
password.
NOTE: The script returns with a URL containing the hostname with which it was invoked. You should
invoke the script with a fully qualified domain name. If returned hostname is not accessible to the
designated user, then the hostname must be replaced prior to passing the URL to the user for execution.
Output
Disable user
Command
The following command disables the specified user. Disabled users can no longer login to the application and
cannot execute any jobs or commands at the command line.
Example (all one command):
Output
Delete user
Command
Delete the user [email protected] and transfer his assets to [email protected].
NOTE: The transfer of the deleted user's assets is optional. If it is invoked, the user to whom the assets
are assigned must have matching permissions on the datastores where the imported datasets are
located.
Output
Troubleshooting
If you are executing the Admin CLI in SSO mode on the localhost, you may receive the following error message to
standard output:
Exceeded 30 redirects
Solution:
This problem occurs when the CLI is run against the application, instead of the gateway proxy. Please insert the
host of the gateway proxy for the host parameter, instead of the host of the application.
In the next release of Trifacta® Wrangler Enterprise after Release 6.0, the Trifacta command line
interface tools will be removed from the product (End of Life). Before upgrading to that release or
a later one, you must migrate your usage of the CLI to use the REST APIs. For more information,
see CLI Migration to APIs.
As an alternative to including admin passwords in each command that is executed, you can insert a set of admin
credentials into a configuration file. The file location is the following:
~/.trifacta/config.json
In your scripts, you can specify just the value for admin_username, and the config file is checked for the
appropriate password, which is applied to the command.
NOTE: The permissions on this config file should be set such that only the user executing the command
can read the file.
API Reference
This section contains reference information on the REST APIs that are made available by the Trifacta® platform.
Topics:
API Overview
API Authentication
Manage API Access Tokens
API Endpoints
v4 Endpoints
API AccessTokens Create v4
API AccessTokens Delete v4
API AccessTokens Get List v4
API AccessTokens Get v4
API Connections Create DryRun v4
API Connections Create v4
API Connections Delete v4
API Connections Get List v4
API Connections Get Status v4
API Connections Get v4
API Connections Patch v4
API Connections Permissions Create User v4
API Connections Permissions Delete User v4
API Connections Permissions Get User v4
API Connections Vendors Get List v4
API Deployments Create v4
API Deployments Delete v4
API Deployments Get List v4
API Deployments Get Release List v4
API Deployments Get v4
API Deployments Object Import Rules Patch v4
API Deployments Patch v4
API Deployments Run v4
API Deployments Value Import Rules Patch v4
API EMRClusters Create v4
API EMRClusters Delete v4
API EMRClusters Get Count v4
API EMRClusters Get List v4
API EMRClusters Get v4
API EMRClusters Patch v4
API Overview
Contents:
Design Overview
URL Format
Naming Conventions
Operations and Methods
Embedding Associations
Media Type Headers
Authentication
SSL
Upload
Versioning and Endpoint Lifecycle
HTTP Status Codes and Errors
Caching
Use Cases
REST API Tasks
UI Integrations
About This Documentation
To enable programmatic control over its objects, the Trifacta® platform supports a range of REST API endpoints
across the objects in the platform. This section provides an overview of the API design, methods, and supported
use cases.
Supported operations:
Connections: Get information about connections
Datasets: Create, list, update, and delete operations on datasets
Swap datasets
Jobs and Results:
Launch job
Get job status
Publish job results
Create dataset from results
Get profile metadata:
Quality bar status
Schema (column names and types)
Users: Create, list, delete
Uses:
Can be used for automation of resource management for end-to-end workflow
Can be used to integrate wrangling experience in third-party application
See Use Cases below.
URL Format
<http/https>://<my_server>:<port_number>/<version>/<endpoint>/[resource_i
d]/[association][?args]
<http/https> HTTP protocol identifier. The protocol should be https in a production https
environment.
<port_number> Port number over which you access the Trifacta platform. By default, this 3005
value is 3005.
[resource_id] Internal identifier for the specific resource requested from the endpoint. /10
This value defines the object against which the requested operation is
performed.
[association] If applicable, the association identifiers the API endpoint that is /jobGroups
requested using the context determined by the <endpoint> and the
[resource_id].
Associations can also be referenced by query parameter. See
Embedding Associations below.
[?args] In some cases, arguments can be passed to the endpoint in the form of ?arg1=value1&arg2=value2
query parameters.
Naming Conventions
v4 conventions
Field names are in camelCase and are consistent with the resource name in the URL or with the embed U
RL parameter.
From early API versions, foreign keys have been replaced with identifiers like the following:
"createdBy": "creator": {
1, "id": 1
},
"updatedBy": "updater": {
2, "id": 2
},
Support for basic CRUD (Create, Read, Update, and Delete) operations across most platform objects.
NOTE: Some of these specific operations may not be supported in the current release. For a complete
list, see API Endpoints.
Embedding Associations
An association can be referenced using the above URL structuring or by applying the embed query parameter as
part of the reference to the specific resource. Example:
https:/wrangler.example.com/v3/jobGroups/6?embed=flowNode
Example response:
{
"id": 6,
"description": "A nifty job group",
"flowNode": {
"id": 1,
"script": {
"id": 1
},
"terminal": true
...
}
}
NOTE: Some endpoints may accept and return a custom media type. These endpoints are documented.
Client request that expects a response body request header: should include
Accept: application/json
Client request that includes a request body request header: required
Content-Type: application/json
Server response that includes a response body response header: required
Content-Type: application/json
The REST APIs use the same authentication methods as the UI. Each call to an API endpoint must include
authentication credentials for a user with access to the requested objects. See API Authentication.
SSL
If SSL has been enabled for the Trifacta platform, requests to URL endpoints are automatically redirected to the
HTTPS equivalent.
Upload
NOTE: API versioning is not synchronized to specific releases of Trifacta Wrangler Enterprise. For
example, some API endpoints for v4 may be updated, while v3 instances of the API endpoint are still
supported. APIs are designed to be backward compatible.
APIs are designed to be backward compatible so that scripts and other tooling built on a previous version of an
endpoint remain valid until the previous version has reached end-of-life. Each API is supported across a window
of Trifacta Wrangler Enterprise releases, after which you must reference a newer version of the API.
API endpoint routes support a consistent structuring and do not contain business logic.
Version information is available at the following endpoint:
<http/https>://<my_server>:<port_number>/<version>/version
GET /v3/<resource>/<id> 304 Not Modified when client has cached version.
The following error codes can apply to any of the above requests:
Caching
When a resource has been cached in the client, the client may set an If-Modified-Since header in HTTP
date format on the request. If so:
Use Cases
By chaining together sequences of calls to API endpoints, you can create, read, update, and delete objects using
identifiers accessible through the returned JSON. For more information, see API Endpoints.
For more information on endpoint workflows, see API Workflows.
UI Integrations
The REST APIs can also be used for integrating the core transformation experience of the Trifacta platform into a
third-party application. Using a series of URL-based calls, you can retrieve and display specified datasets in the
Transformer page, where authenticated users can wrangle datasets controlled by the third-party application. See
API - UI Integrations.
Unless otherwise noted, the documentation and examples apply to version 3 (v3) of the Trifacta platform A
PIs.
Examples may require modification to work in your environment.
API Authentication
Contents:
Required Permissions
API Access Token Authentication
Basic Authentication
SSO Authentication
Kerberos Authentication
Logout
Required Permissions
Authenticating user must be a valid user of the deployed instance of the Trifacta platform.
API access tokens can be acquired and applied to your requests to obscure sensitive Personally Identifiable
Information (PII) and are compliant with common privacy and security standards. These tokens last for a
preconfigured time period and can be renewed as needed.
NOTE: This feature may need to be enabled in your instance of the Trifacta platform. For more
information, see Enable API Access Tokens.
Basic Steps:
1. You submit a request to create a new access token.
a. You can create and delete access tokens through the Settings area of the Trifacta application. See
Access Tokens Page.
b. You can create access tokens through the REST API endpoint.
i. If you do not have a valid access token, you must submit your request to the endpoint using
one of the other forms of authentication.
ii. If you do have a valid access token, you can use it with your submission to generate a new
access token.
iii. See API AccessTokens Create v4.
2. With each request, you submit the token as part of the Authorization header.
3. Continue using the token. As needed, you can create and use additional tokens. There is no limit to the
number of tokens you can create.
Tip: API access tokens work seamlessly with platform-native SAML and LDAP SSO authentication. They
do not work with the reverse proxy method of SSO authentication. Details are below.
For more information on this process, see Manage API Access Tokens.
Basic Authentication
As request parameters, you can submit username/password under Basic Auth to any REST API endpoint.
NOTE: You must submit authentication credentials with each request to the platform.
NOTE: The user must have permissions to execute the endpoint action.
$ curl -u [email protected]:me_pwd \
-b ~/cookies.txt -c ~/cookies.txt \
http://<platform_host>:<platform_port_number>/v3/<endpoint>
where:
Parameter Description
-b and -c Required paths and filenames for storage of send and receive HTTP cookies.
<platform_port_number> Port number through which to access the Trifacta platform. Default is 3005.
SSO Authentication
You can use the APIs in SSO environments. Below, you can review the best method of authenticating to the APIs
based on your SSO environment:
Platform-native SAML API access tokens work seamlessly. Basic auth does not work.
Platform-native LDAP-AD API access tokens work seamlessly. Basic auth does not work.
Reverse proxy SAML Use basic auth described below. Additional configuration may be required.
Reverse proxy LDAP-AD Use basic auth described below. Additional configuration may be required.
In a single sign-on environment, you can use basic authentication to interact with the APIs.
NOTE: Enabling SSO integration with the Trifacta platform requires additional configuration. See
Configure SSO for AD-LDAP.
Example:
NOTE: For the protocol identifier, you can also use https if SSL is enabled. See Install SSL Certificate.
Parameter Description
Kerberos Authentication
In a Kerberos environment, credentials must be submitted with each request using the SPNEGO Auth method.
Kerberos is a network authentication protocol for client/server applications.
SPNEGO provides a mechanism for extending Kerberos to Web applications through HTTP.
For more information on the differences, see
https://msdn.microsoft.com/en-us/library/ms995330.aspx#http-sso-2_topic2.
Credentials are authenticated by the KDC for each request.
NOTE: SPNEGO must be enabled and configured for your REST client or programming library.
$ curl -V
curl 7.51.0 (x86_64-apple-darwin16.0) libcurl/7.51.0 SecureTransport
zlib/1.2.8
Protocols: dict file ftp ftps gopher http https imap imaps ldap
ldaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IPv6 Largefile GSS-API Kerberos SPNEGO NTLM
NTLM_WB SSL libz UnixSockets
where:
Parameter Description
--negotiate Enables SPNEGO use in cURL. This option requires a library built with GSS-API or SSPI support. If this option
is used several times, only the first one is used. Use --proxy-negotiate to enable Negotiate
(SPNEGO) for proxy authentication.
-u anything Required username. However, this username is ignored. Instead, the principal used in kinit is applied.
Logout
Contents:
Enable
Generate New Token
Via API
Via UI
Use Token
List Tokens
Renew Token
Delete Token
This section provides some workflow information for how to use API access tokens as part of your API projects on
the Trifacta® platform. An access token is a hashed string that enables authentication when submitted to any
endpoint of the platform. Access tokens limit exposure of clear-text authentication values and provide an easy
method of managing authentication outside of the browser.
Notes:
An access token is linked to its creator and can be generated by submitting a username/password
combination or another valid token from the same user.
If a token is created for userA, userB can be provided the token to impersonate userA.
Enable
This feature must be enabled in your instance of the platform. For more information, see
Enable API Access Tokens.
NOTE: The first time that you request a new API token, you must submit a separate form of
authentication to the endpoint. To generate new access tokens after you have created one, you can use a
valid access token if you have one.
Via API
Via UI
NOTE: Copy the value of the token to the clipboard and store it in a secure location for use with your
scripts.
Tip: If you wish to manage your token via the APIs, you should copy the Token ID value, too. The Token
ID can always be retrieved from the Trifacta application.
Use Token
After a token has been acquired, it must be included in each request to the server, for as long as it is valid.
NOTE: API access tokens are not used by users in the Trifacta application.
After you have acquired the token, you submit it with each API request to the platform.
Example - cURL:
where:
(tokenValue) is the value returned for the token when it was created.
List Tokens
NOTE: For security reasons, you cannot acquire the actual token through any of these means.
Tip: You can see all of your current and expired tokens through the Trifacta application. See
Access Tokens Page.
Endpoint Description
API AccessTokens Get List v4 List all access tokens for your user account.
API AccessTokens Get v4 List your access token for the specified token ID.
Renew Token
New tokens can be acquired at any time using the Create method.
NOTE: It is the responsibility of the user to acquire a new API token before the current one expires. If a
token is permitted to expire, a request for a new token must include userId and password information.
Via API: Acquire the tokenId value for the token and use the delete endpoint. See
API AccessTokens Delete v4.
Via UI: In the Access Tokens page, select Delete Token... from the context menu for the token listing. See
Access Tokens Page.
API Endpoints
The following endpoints are available in this release of Trifacta Wrangler Enterprise. Please verify that you are
referring to the correct version of the endpoint.
Topics:
v4 Endpoints
API AccessTokens Create v4
API AccessTokens Delete v4
API AccessTokens Get List v4
API AccessTokens Get v4
API Connections Create DryRun v4
API Connections Create v4
API Connections Delete v4
API Connections Get List v4
API Connections Get Status v4
API Connections Get v4
API Connections Patch v4
API Connections Permissions Create User v4
API Connections Permissions Delete User v4
API Connections Permissions Get User v4
API Connections Vendors Get List v4
API Deployments Create v4
API Deployments Delete v4
API Deployments Get List v4
API Deployments Get Release List v4
API Deployments Get v4
API Deployments Object Import Rules Patch v4
API Deployments Patch v4
API Deployments Run v4
API Deployments Value Import Rules Patch v4
API EMRClusters Create v4
API EMRClusters Delete v4
API EMRClusters Get Count v4
API EMRClusters Get List v4
API EMRClusters Get v4
API EMRClusters Patch v4
API Flows Create v4
API Flows Delete v4
API Flows Get List v4
API Flows Get v4
API Flows Package Get DryRun v4
API Flows Package Get v4
API Flows Package Post DryRun v4
API Flows Package Post v4
API Flows Patch v4
API ImportedDatasets Create v4
API ImportedDatasets Delete v4
API ImportedDatasets Get List v4
API ImportedDatasets Get v4
v4 Endpoints
Contents:
Access Tokens
Connections
Connection permissions
Datasets and Recipes
EMR Clusters
Flows
Flow import and export
Jobgroups and Jobs
Outputs, Publications, and WriteSettings
Deployments and Releases
Users
These endpoints apply to version 4 of the APIs for the Trifacta® platform.
For more information on support for this version, see API Version Support Matrix.
Access Tokens
Connections
Connection permissions
EMR Clusters
The following endpoints apply only if the Trifacta platform has been integrated with an AWS Elastic MapReduce
(EMR) cluster.
/flows/package/dryRun POST Import dry run API Flows Package Post DryRun v4
/flows/:id/package/dryRun GET Export dry run API Flows Package Get DryRun v4
Users
Miscellaneous
Contents:
Required Permissions
Request
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/apiAccessTokens
Request Header:
If you do not have a valid access token to use at this time, you must submit a username/password
combination as part of the Authentication header.
If you have a valid access token, you can submit that token in your Authentication header with this request.
For more information, see API Authentication.
Request Body:
{
"lifetimeSeconds": 100,
"description": "My 100-second token"
}
Response
NOTE: If you receive a Route doesn't exist error message, please verify that the API access token
feature has been enabled in your instance of the platform. For more information, see
Enable API Access Tokens.
Reference
Request reference:
Response reference:
API AccessTokens Delete v4
Contents:
Required Permissions
Request
Response
Reference
If you delete an active access token, you may prevent the user from accessing the platform
outside of the Trifacta application.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Endpoint:
/v4/apiAccessTokens/<id>
where:
Parameter Description
/v4/apiAccessTokens/0bc1d49f-5475-4c62-a0ba-6ad269389ada
Request Body:
Empty.
Response
NOTE: If you receive a Route doesn't exist error message, please verify that the API access token
feature has been enabled in your instance of the platform. For more information, see
Enable API Access Tokens.
Reference
None.
API AccessTokens Get List v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/apiAccessTokens
Request Body:
Empty.
Response
NOTE: If you receive a Route doesn't exist error message, please verify that the API access token
feature has been enabled in your instance of the platform. For more information, see
Enable API Access Tokens.
Reference
For more information on the properties of an access token, see API AccessTokens Get v4.
API AccessTokens Get v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
/v4/apiAccessTokens/<id>
where:
Parameter Description
/v4/apiAccessTokens/0bc1d49f-5475-4c62-a0ba-6ad269389ada
Request Body:
Empty.
Response
NOTE: If you receive a Route doesn't exist error message, please verify that the API access token
feature has been enabled in your instance of the platform. For more information, see
Enable API Access Tokens.
{
"tokenId": "0bc1d49f-5475-4c62-a0ba-6ad269389ada",
"description": "new token",
"expiredAt": "2020-01-15T20:58:28.175Z",
"lastUsed": null,
"createdAt": "2019-01-15T20:58:28.175Z"
}
Reference
Contents:
Required Permissions
Request
Response
Reference
NOTE: In this release, you cannot create Redshift or SQL DW connections via the API. Please create
these connections through the application. This known issue will be fixed in a future release.
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/connections/dryRun
NOTE: Relational connections require the creation and installation of an encryption key file on the Trifacta
node. This file must be present before the connection is created. See Create Encryption Key File.
This example creates a Postgres connection of basic credentials type. A valid username/password combination
must be specified in the credentials property.
For more information on these properties, see API Connections Get v4.
Response
{
"result": "SUCCESS",
"reason": null
}
Reference
For more information on the response body properties, see API Connections Get v4.
Contents:
Required Permissions
Request
Response
Reference
NOTE: In this release, you cannot create Redshift or SQL DW connections via the API. Please create
these connections through the application. This known issue will be fixed in a future release.
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
NOTE: Relational connections require the creation and installation of an encryption key file on the Trifacta
node. This file must be present before the connection is created. See Create Encryption Key File.
This example creates a SQL Server connection of basic credentials type. A valid username/password
combination must be specified in the credentials property.
{
"connectParams": {
"vendor": "sqlserver",
"vendorName": "sqlserver",
"host": "sqlserver.example.com",
"port": "1433"
},
"host": "sqlserver.example.com",
"port": 1433,
"vendor": "sqlserver",
"params": {
"connectStrOpts": ""
},
"ssl": false,
"vendorName": "sqlserver",
"name": "sqlserver_test2",
"description": "",
"type": "jdbc",
"isGlobal": false,
"credentialType": "basic",
"credentialsShared": true,
"disableTypeInference": false,
"credentials": [
{
"username": "<username>",
"password": "<password>"
}
]
}
Property Description
type For more information on the value to insert for the connection, see Connection Types.
port Port number for the relational server. The default value varies between database vendors. For more information,
please see the documentation provided with your database distribution.
params (Optional) Set of JSON parameters that are passed to the database when initializing the connection. Depending on
the database vendor, you may be required to submit via this parameter the name of the default database. You can
also pass in optional parameters through the ConnecStrOpts parameter. For more information, see
CLI for Connections.
ssl (Optional) If set to true, the connection is made over SSL. The default is false.
NOTE: If you connect over SSL, you must modify the hostname value to use HTTPS.
credentialsShared If the connection is a global connection, the credentials to connect can be shared with other users when this
property is true. Otherwise, other users must provide their own credentials.
disableTypeInference By default, the Trifacta platform attempts to infer types when data is imported. For schematized sources, you may
prefer to disable type inference, instead using the types provided by the source.
When this setting is true, initial type inference by the platform is disabled for all data read through this connection.
credentials (Optional) If credentialType=basic, this property must contain the username and password to use to
connect to the relational source.
Property Description
host Set this value to hadoop to integrate with the Hive instance for the Hadoop cluster to which the Trifacta platform
is connected.
"jdbc": "hive2",
{
"connectParams": {
"vendor": "redshift",
"vendorName": "redshift",
"host": "redshift.example.com",
"port": "5439",
"defaultDatabase": "dev",
"extraLoadParams": "BLANKSASNULL EMPTYASNULL TRIMBLANKS
TRUNCATECOLUMNS"
},
"host": "redshift.example.com",
"port": 5439,
"vendor": "redshift",
"params": {
"connectStrOpts": "",
"defaultDatabase": "dev",
"extraLoadParams": "BLANKSASNULL EMPTYASNULL TRIMBLANKS
TRUNCATECOLUMNS"
},
"ssl": false,
"vendorName": "redshift",
"name": "redshift2",
"description": "Redshift connection",
"type": "jdbc",
"isGlobal": true,
"credentialType": "custom",
"credentialsShared": true,
"disableTypeInference": false,
"credentials": [
{"key":"user","value":"<username>"},
{"key":"password","value":"<password>"},
{"key":"iamRoleArn","value":"<IAM_role_ARN>"}
]
}
Property Description
The extraLoadParams value is used when you publish results to Redshift. For more information on these
values, see http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-conversion.html.
credentials username and password must be specified in this key-value format, although the value for either can be an
empty string.
For more information on parameters and credentials, see Create Redshift Connections.
Response
Reference
For more information on the response body properties, see API Connections Get v4.
Contents:
Required Permissions
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/connections/<id>
where:
Parameter Description
/v4/connections/4
Request Body:
Empty.
Response
Reference
None.
API Connections Get List v4
Contents:
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/connections
/v4/connections?limit=100%offset=2
If the count of retrieved results is less than the limit, you have reached the end of the results.
Request Body:
Empty.
Response
{
"data": [
{
"connectParams": {
"vendor": "teradata",
"vendorName": "teradata",
"host": "teradata.example.com",
Reference
For more information on the properties of a connection, see API Connections Get v4.
API Connections Get Status v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/connections/<id>/status
where:
Parameter Description
/v4/connections/10/status
Request Body:
Empty.
{
"result": "SUCCESS",
"reason": null
}
Reference
Property Description
For more information on debugging failures in relational connections, see Enable Relational Connections.
For more information on debugging Hive connections. see Configure for Hive.
For more information on debugging S3 connections, see Enable S3 Access.
reason If the result value is not SUCCESS, additional information may be included here.
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/connections/<id>
Parameter Description
/v4/connections/3
Request Body:
Empty.
Response
Property Description
connectParams.extraLoadParams (if applicable) If the connection types supports them, this setting contains additional parameters to be
passed to the host when making the connection.
params This setting is populated with any parameters that are passed to the source during connection and
operations. For relational sources, this setting may include the default database and extra load
parameters.
ssl When true, the Trifacta platform uses SSL to connect to the source.
NOTE: After a connection has been made public, it cannot be made private again. It must be
deleted and recreated.
Default is false. A connection can be made public through the command line interface or the
Connections page. See Connections Page.
credentialType The type of credentials used for the connection. This value varies depending on where the credentials
are stored. See CLI for Connections.
credentialsShared If true, the credentials used for the connection are available for use by users who have been shared
the connection.
uuid A universal object identifier, which is unique across instances of the platform.
This internal identifier is particularly useful when create import mapping rules.
disableTypeInference If set to false, type inferencing has been disabled for this connection. The default is true.
When type inferencing has been disabled, the Trifacta platform does not apply Trifacta types to data
when it is imported. For more information, see Configure Type Inference.
NOTE: For security reasons, you can store the connection's credentials in an external file on
the Trifacta Server, after which they do not appear in this setting. See CLI for Connections.
updater.id Internal identifier of the user who last updated the connection.
workspace.id Internal identifier of the workspace with which this connection is associated.
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/connections/<id>
where:
Parameter Description
/v4/connections/8
{
"params": {
"defaultDatabase": "my_default_db"
},
"description": "This connection uses a non-default default DB."
}
Response
{
"id": 8,
"updater": {
"id": 1
},
"updatedAt": "2019-01-25T23:19:27.648Z"
}
Reference
For more information on the properties of a connection, see API Connections Get v4.
API Connections Permissions Create User v4
Contents:
Required Permissions
Request
Response
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Endpoint:
/v4/connections/<cid>/permissions/
Parameter Description
/v4/connections/10/permissions/
Request Body:
[
{
"personId": 26,
"role": "readOnly"
}
]
Response
{
"data": [
{
"role": "readOnly",
"createdAt": "2019-03-21T21:01:58.266Z",
"updatedAt": "2019-03-21T21:01:58.266Z",
"person": {
"id": 26
},
"connection": {
"id": 1
}
}
]
}
Contents:
Required Permissions
Request
Response
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/connections/<cid>/permissions/<uid>
Parameter Description
<uid> Intenrnal identifier of the user whose permissions you are removing.
/v4/connections/10/permissions/6
Request Body:
None.
Response
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/connections/<cid>/permissions/
Parameter Description
/v4/connections/7/permissions/
Request Body:
None.
Response
Reference
Property Description
personId Internal identifier of the user who has access to the connection
Contents:
Required Permissions
Request
Response
Reference
Get the list of all vendors of connections that are supported in the instance of the platform.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/connections/vendors
Request Body:
Empty.
Response
[
{
"name": "db2",
"displayName": "DB2",
"type": "db2",
"category": "relational",
"credentialType": "basic",
"operation": "import",
"connectionParameters": [
{
"name": "host",
"displayName": "Host",
"type": "string",
"required": true,
"category": "location",
"default": ""
},
{
"name": "port",
"displayName": "Port",
"type": "integer",
"required": true,
"category": "location",
"default": "1521"
Reference
Property Description
category Tab in the Connections screen where connections of this type can be created
credentialType Type of credentials that are accepted for this connection type:
connectionParameters properties:
For each of the connection properties, the following attributes are specified:
Attribute Description
displayName Display value of the property, which appears above the textbox in the application
required If true, the property must be populated to create a connection of this type.
category Defines the order for how the properties are listed in the Create Connection dialog.
If set to location, the order is determined by the order in which the properties are listed under the vendor object.
Contents:
Required Permissions
Request
Response
NOTE: Deployments pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/deployments/
Request Body:
{
"name": "Test Deployment"
}
Response
{
"id": 1,
"name": "Test Deployment",
"updatedAt": "2019-02-13T20:14:48.537Z",
"createdAt": "2019-02-13T20:14:48.537Z",
"creator": {
"id": 7
},
"updater": {
"id": 7
}
}
For more information on properties of a deployment, see API Deployments Get v4.
API Deployments Delete v4
Contents:
Required Permissions
Request
Response
Reference
Deleting a deployment removes all releases, packages, and flows underneath it. This step cannot
be undone.
NOTE: Deployments pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/deployments/<id>
where:
Parameter Description
/v4/deployments/4
Request Body:
Response
Reference
None.
API Deployments Get List v4
Contents:
Required Permissions
Request
Response
Reference
NOTE: Deployments pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/deployments
If the count of retrieved results is less than the limit, you have reached the end of the results.
Request Body:
Empty.
Response
{
"data": [
{
"id": 2,
"name": "Test Deployment 2",
"createdAt": "2019-02-13T20:15:39.147Z",
"updatedAt": "2019-02-13T20:15:39.147Z",
"numReleases": 0,
"latestRelease": null,
"creator": {
"id": 7
},
"updater": {
"id": 7
}
},
{
"id": 1,
"name": "Test Deployment",
"createdAt": "2019-02-13T20:14:48.537Z",
"updatedAt": "2019-02-13T20:14:48.537Z",
"numReleases": 0,
"latestRelease": null,
"creator": {
"id": 7
},
"updater": {
"id": 7
}
}
]
}
Reference
For more information on the properties of a deployment, see API Deployments Get v4.
API Deployments Get Release List v4
Contents:
Required Permissions
Request
Response
Reference
Get the list of releases for the specified deployment for the authenticated user.
NOTE: Deployments and releases pertain to Production instances of the Trifacta® platform. For more
information, see Overview of Deployment Manager.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/deployments/:id?embed=releases
/v4/deployments/:id?embed=releases&limit=100%offset=2
If the count of retrieved results is less than the limit, you have reached the end of the results.
Request Body:
Empty.
Response
For more information on the properties of a release, see API Releases Get v4.
API Deployments Get v4
Contents:
Required Permissions
Request
Response
Reference
NOTE: Deployments pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/deployments/<id>
where:
Parameter Description
/v4/deployments/1
Request Body:
Empty.
{
"id": 1,
"name": "Test Deployment",
"createdAt": "2019-02-13T20:14:48.537Z",
"updatedAt": "2019-02-13T20:14:48.537Z",
"creator": {
"id": 7
},
"updater": {
"id": 7
}
}
Reference
Property Description
name Display name for the deployment. This value appears in the user interface.
creator.id Internal identifier for the user who created the deployment.
updater.id Internal identifier for the user who last updated the deployment.
Contents:
Required Permissions
Request
Response
Reference
Create a list of object-based import rules for the specified deployment. Delete all previous rules applied to the
same object.
NOTE: Deployments pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
The response contains any previously created rules that have been deleted as a result of this change.
You can also make replacements in the import package based on value mappings. See
API Deployments Value Import Rules Patch v3.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/deployments/<id>/objectImportRules
where:
Parameter Description
/v4/deployments/4/objectImportRules
NOTE: Rules are applied in the listed order. If you are applying multiple rules to the same object in the
import package, the second rule must reference the expected changes applied by the first rule.
This type of replacement applies if the imported packages contain sources that are imported through two separate
connections:
Response
The response body contains any previously created rules that have been deleted as a result of this update.
Response Body Example: All new rule, no deletions
If the update does not overwrite any previous rules, then no rules are deleted. So, the response looks like the
following:
{
"deleted": []
}
"data": [
{
"onCondition": {
"uuid": "d75255f0-a245-11e7-8618-adc1dbb4bed0"
},
"withCondition": {
"id": 1
},
"id": 1,
"tableName": "connections",
"createdAt": "2019-02-13T23:07:51.720Z",
"updatedAt": "2019-02-13T23:07:51.720Z",
"creator": {
"id": 7
},
"updater": {
"id": 7
},
"deployment": {
"id": 4
}
}
]
}
}
Reference
Property Description
onCondition The matching object identifier and the specified literal or pattern to match.
withCondition The identifier for the object type, as specified in by the tableName value, which is being modified.
updater.id Internal identifier of the user who last updated the deleted rule
deployment.id Internal identifier for the deployment to which to apply the import rule.
Contents:
Required Permissions
NOTE: Deployments pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/deployments/<id>
where:
Parameter Description
/v4/deployments/1
NOTE: For the PATCH method, only the properties that are being patched need to be submitted.
{
"name": "New Deployment Name"
}
Response
Reference
For more information on the properties of a deployment, see API Deployments Get v4.
API Deployments Run v4
Contents:
Required Permissions
Request
Response
Reference
Run the job for the active release of the specified deployment.
At least one manual output must be specified for the main flow within the package. See Flow View Page.
An active release must be specified for the deployment. See API Releases Patch v4.
NOTE: Deployments pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/deployments/<id>/run
where:
Parameter Description
/v4/deployments/4/run
Request Body:
Empty.
Request Body - Example for dataset with parameters:
In the following example, the request body contains overrides to the default job definition. In this case, the
override is to set a new value for the parameter of the dataset:
{
"overrides": {
"runParameters": {
"overrides": {
"data": [{
"key": "varRegion",
"value": "02"
}
]}
}
}
}
Response
{
"data": [
{
"reason": "JobStarted",
"sessionId": "14337009-1637-4948-a36f-16479d7138c6",
"id": 3
}
]
}
Property Description
id JobGroup identifier. For more information, see API JobGroups Get v4.
jobs.data.id Internal identifier for the individual jobs that compose the job group being executed.
Contents:
Required Permissions
Request
Response
Reference
Create a list of value-based import rules for the specified deployment. Delete any previous rules applied to the
same values.
The generated rules apply to all flows that are imported into the Production instance after they have been created.
NOTE: Deployments pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
The response contains any previously created rules that have been deleted as a result of this change.
You can also make replacements in the import package based on object references. See
API Deployments Object Import Rules Patch v4.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
where:
Parameter Description
/v4/deployments/4/valueImportRules
NOTE: The executing user of any job must have access to any data source that is remapped in the new
instance.
[
{
"type": "s3Bucket",
"on": "wrangle-dev",
"with": "wrangle-prod"
}
]
NOTE: Rules are applied in the listed order. If you are applying multiple rules to the same object in the
import package, the second rule must reference the expected changes applied by the first rule.
In the above:
The first rule replaces the string klamath in the path to the source with the following value: klondike.
The second rule performs a regular expression match on the string /dev/. Since the match is described
using the regular expression syntax, the backslashes must be escaped. The replacement value is the
following literal: /prod/.
You can specify matching values using the following types of matches:
Response
The response body contains any previously created rules that have been deleted as a result of this update.
Response Body Example: All new rule, no deletions
If the update does not overwrite any previous rules, then no rules are deleted. So, the response looks like the
following:
{
"deleted": []
}
Reference
Property Description
updater.id Internal identifier of the user who last updated the rule
deployment.id Internal identifier for the deployment from which the import rule was deleted.
Contents:
Required Permissions
NOTE: APIs for EMR clusters apply only to instances of the Trifacta platform that are integrated with
Amazon EMR clusters. These APIs can be used to manage switching between EMR clusters when
needed.
NOTE: There can be only one EMR cluster registered with the platform at any time. The registered cluster
is always the active one.
Tip: You can use the PATCH method on the current EMRCluster object to update the EMR cluster ID that
is active for the platform. See API EMRClusters Patch v4.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/emrClusters/
/v4/emrClusters/
Request Body:
{
"emrClusterId": "a-2BFK2$KVR7QQ",
"resourceBucket": "3dog-testing-emr-spark",
"resourcePath": "",
"region": "us-west-2"
}
{
"id": 1,
"emrClusterId": "a-2BFK2$KVR7QQ",
"resourceBucket": "3dog-testing-emr-spark",
"resourcePath": "",
"region": "us-west-2",
"updatedAt": "2019-02-14T01:20:51.303Z",
"createdAt": "2019-02-14T01:20:51.303Z"
}
Reference
Contents:
Required Permissions
Request
Response
Reference
NOTE: APIs for EMR clusters apply only to deployments of the Trifacta® platform that are integrated with
Amazon EMR clusters. These APIs can be used to manage failovers or prolonged outages of a primary
EMR cluster.
NOTE: There can be only one EMR cluster registered with the platform at any time. The registered cluster
is always the active one.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Endpoint:
/v4/emrClusters/<id>
where:
Parameter Description
/v4/emrClusters/1
Request Body:
Empty.
Response
Reference
None.
API EMRClusters Get Count v4
Contents:
Required Permissions
Request
Response
Reference
NOTE: APIs for EMR clusters apply only to instances of the Trifacta® platform that are integrated with
Amazon EMR clusters. These APIs can be used to manage switching between EMR clusters when
needed.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/emrClusters/count
/v4/emrClusters/count
Request Body:
Empty.
Response
{
"count": 1
}
Reference
None.
API EMRClusters Get List v4
Contents:
Required Permissions
Request
Get list of all EMR cluster Ids accessible by the authenticated user.
NOTE: APIs for EMR clusters apply only to instances of the Trifacta® platform that are integrated with
Amazon EMR clusters. These APIs can be used to manage switching between EMR clusters when
needed.
NOTE: There can be only one EMR cluster registered with the platform at any time. The registered cluster
is always the active one.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/emrClusters/
/v4/emrClusters/
Request Body:
Empty.
Response
Reference
Contents:
Required Permissions
Request
Response
Reference
NOTE: APIs for EMR clusters apply only to instances of the Trifacta® platform that are integrated with
Amazon EMR clusters. These APIs can be used to manage switching between EMR clusters when
needed.
NOTE: There can be only one EMR cluster registered with the platform at any time. The registered cluster
is always the active one.
Version: v4
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/emrClusters/<id>
where:
Parameter Description
/v4/emrClusters/22101948
Request Body:
Empty.
Response
{
"id": 22101948,
"emrClusterId": "a-2BFK90SF0F9K",
"resourceBucket": "2dog-testing-emr",
"resourcePath": "",
"region": "us-west-2",
"createdAt": "2019-01-23T17:16:46.000Z",
"updatedAt": "2019-01-23T17:16:46.000Z"
}
Property Description
resourceBucket S3 bucket that contains the Trifacta libraries for EMR and Trifacta job logs
createdAt Timestamp for when the EMR cluster object was launched
updatedAt Timestamp for when the EMR cluster object was last updated
Contents:
Required Permissions
Request
Response
Reference
Tip: You can use this endpoint to switch the currently active EMR cluster to a new EMR cluster by
changing the identifier value for it.
NOTE: Modifying a cluster while jobs are running can result in erroneous reporting of job status. Perform
these modifications during off-peak hours.
NOTE: APIs for EMR clusters apply only to instances of the Trifacta® platform that are integrated with
Amazon EMR clusters. These APIs can be used to manage switching between EMR clusters when
needed.
NOTE: APIs for EMR clusters apply only to deployments of the Trifacta platform that are integrated with
Amazon EMR clusters. These APIs can be used to manage failovers or prolonged outages of a primary
EMR cluster.
NOTE: There can be only one EMR cluster registered with the platform at any time. The registered cluster
is always the active one.
Version: v4
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/emrClusters/<id>
where:
Parameter Description
/v4/emrClusters/1
Request Body:
Only the properties that you are updating need to be included in the request.
{
"resourceBucket": "3dog-testing-emr2",
"resourcePath": "default"
}
Response
{
"id": 1,
"updatedAt": "2019-02-14T01:23:14.344Z"
}
Reference
Contents:
Required Permissions
Request
Response
Reference
Create a new flow with specified name and optional description and target folder.
NOTE: You cannot add datasets to the flow through this endpoint. Moving pre-existing datasets into a
flow is not supported in this release. Create the flow first and then when you create the datasets,
associate them with the flow at the time of creation.
See API ImportedDatasets Create v4.
See API WrangledDatasets Create v4.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/flows/
Request Body:
A name value is required. Other properties are optional.
{
"name": "My Flow",
"description": "This is my flow."
"folder": {
"id": 2
}
}
Response
{
"id": 18,
"updatedAt": "2019-01-08T20:27:53.422Z",
"createdAt": "2019-01-08T20:27:53.422Z",
"name": "My Flow",
"description": "This is my flow.",
"creator": {
"id": 1
},
"updater": {
"id": 1
},
"folder": {
"id": 2
},
"workspace": {
"id": 1
}
}
Reference
For more information on the properties of a flow, see API Flows Get v4.
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Endpoint:
/v4/flows/<id>
where:
Parameter Description
/v4/flows/2
Request Body:
Empty.
Response
Reference
For more information on the properties of a flow, see API Flows Get v4.
API Flows Get List v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
When authenticated, you can review all flows to which you have access.
Request
Endpoint:
/v4/flows
/v4/flows?limit=100%offset=2
If the count of retrieved results is less than the limit, you have reached the end of the results.
Request Body:
Empty.
Response
{
"data": [
{
"id": 9,
"name": "Intern Training",
"description": "(Please don't modify)",
"createdAt": "2019-01-08T18:14:37.851Z",
"updatedAt": "2019-01-08T18:57:26.824Z",
"creator": {
"id": 1
},
"updater": {
"id": 1
},
"folder": {
"id": 1
}
"workspace": {
"id": 1
}
Reference
For more information on the properties of a flow see API Flows Get v4.
API Flows Get v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/flows/<id>
Parameter Description
/v4/flows/6
Request Body:
Empty.
{
"id": 6,
"name": "2013 POS",
"description": null,
"createdAt": "2019-01-08T17:25:21.392Z",
"updatedAt": "2019-01-08T17:30:30.959Z",
"creator": {
"id": 1
},
"updater": {
"id": 1
},
"folder": null,
"workspace": {
"id": 1
}
}
Reference
Property Description
updater.id Internal identifier of the user who last updated the flow.
folder If the flow has been added to a folder, this value contains the path to the folder.
workspace.id Internal identifier of the workspace to which this flow belongs. In most environments, this value is 1.
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/flows/<id>/package/dryRun
Parameter Description
/v4/flows/7/package/dryRun
Request Body:
None.
Response
{ }
Reference
None.
API Flows Package Get v4
Contents:
Required Permissions
Request
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/flows/<id>/package
Parameter Description
/v4/flows/7/package
Request Body:
None.
Response
Reference
None.
API Flows Package Post DryRun v4
Contents:
Required Permissions
Performs a dry-run of importing a flow package, which performs a check of all permissions required to import the
package, as well as any specified import rules.
For more information on import rules, see Define Import Mapping Rules.
If they occur, errors are reported in the response.
After you have successfully completed a dry-run, you can execute a formal import. See
API Flows Package Post v4.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/flows/package/dryRun
/v4/flows/package/dryRun
Request Body:
The request body must include the following key and value combination submitted as form data. This path is the
location of the ZIP package that you are importing.
key value
data "@path-to-file"
{"importRuleChanges":{"object":[],"value":[]},"flowName":"[7dd7da30]
2013 POS"}
Reference
None.
API Flows Package Post v4
Contents:
Required Permissions
Request
Response
Reference
Performs an import of a flow package, which also applies any specified import rules.
Before you import, you can perform a dry-run to check for errors. See API Flows Package Post DryRun v4.
For more information on import rules, see Define Import Mapping Rules.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/flows/package/
/v4/flows/package
Request Body:
key value
data "@path-to-file"
Response
{
"importRuleChanges":{
"object":[],
"value":[]
},
"primaryFlowIds":[
6
],
"flows":[
{
"id":6,
"name":"[b19d9a70] 2013 POS",
"description":null,
"deleted_at":null,
"cpProject":null,
"workspaceId":1,
"createdAt":"2018-04-24T16:11:59.343Z",
"updatedAt":"2018-04-24T18:26:47.522Z",
"createdBy":1,
"updatedBy":1
}
],
"datasources":[
{
"id":10,
"size":"13757",
"path":"/uploads/1/0b6a7d7a-be8d-46c9-92cd-202f39fa5b1b/REF_PROD.txt",
"dynamicPath":null,
"type":"hdfs",
"cpProject":null,
"workspaceId":1,
"path":"/uploads/1/0b6a7d7a-be8d-46c9-92cd-202f39fa5b1b/POS-r03.txt",
"dynamicPath":null,
"type":"hdfs",
"cpProject":null,
"workspaceId":1,
"bucket":null,
"connectionId":null,
"deleted_at":null,
"blobHost":null,
"container":null,
"isSchematized":true,
"isDynamic":false,
"disableTypeInference":false,
"createdAt":"2018-04-24T16:12:00.812Z",
"updatedAt":"2018-04-24T18:26:47.597Z",
"createdBy":1,
"updatedBy":1,
"parsingScriptId":16
},
{
"id":12,
"size":"56976",
"path":"/uploads/1/0b6a7d7a-be8d-46c9-92cd-202f39fa5b1b/REF_CAL.txt",
"dynamicPath":null,
"type":"hdfs",
"cpProject":null,
"workspaceId":1,
"bucket":null,
"connectionId":null,
"deleted_at":null,
"blobHost":null,
"container":null,
"isSchematized":true,
"isDynamic":false,
"path":"/uploads/1/0b6a7d7a-be8d-46c9-92cd-202f39fa5b1b/POS-r02.txt",
"dynamicPath":null,
"type":"hdfs",
"cpProject":null,
"workspaceId":1,
"bucket":null,
"connectionId":null,
"deleted_at":null,
"blobHost":null,
"container":null,
"isSchematized":true,
"isDynamic":false,
"disableTypeInference":false,
"createdAt":"2018-04-24T16:12:01.848Z",
"updatedAt":"2018-04-24T18:26:47.603Z",
"createdBy":1,
"updatedBy":1,
"parsingScriptId":18
},
{
"id":14,
"size":"1799008",
"path":"/uploads/1/0b6a7d7a-be8d-46c9-92cd-202f39fa5b1b/POS-schema.csv",
"dynamicPath":null,
"type":"hdfs",
"cpProject":null,
"workspaceId":1,
"bucket":null,
"connectionId":null,
"deleted_at":null,
"blobHost":null,
"container":null,
"isSchematized":true,
"isDynamic":false,
"disableTypeInference":false,
"createdAt":"2018-04-24T16:12:03.402Z",
"updatedAt":"2018-04-24T18:26:47.607Z",
"createdBy":1,
"updatedBy":1,
"parsingScriptId":19
}
Reference
Node Description
flowNodes Objects (imported datasets, recipes, and reference objects) within the flow definition
outputobjects Output objects related to the Run Job settings for the flow. See Run Job Page.
Contents:
Required Permissions
Request
Response
Reference
NOTE: You cannot add datasets to the flow through this endpoint. Moving pre-existing datasets into a
flow is not supported in this release. Create the flow first and then when you create the datasets,
associate them with the flow at the time of creation.
See API ImportedDatasets Create v4.
See API WrangledDatasets Create v4.
Version: v4
Required Permissions
The authenticated user must be the owner of the flow that is being updated.
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
Parameter Description
/v4/flows/8
Request Body:
You can modify the following properties.
{
"name": "My Flow",
"description": "This is my flow."
}
NOTE: For the PATCH method, only the properties that are being patched need to be submitted.
Response
Reference
For more information on the properties of a flow, see API Flows Get v4.
Contents:
Required Permissions
Request and Response
Examples by Type
File (HDFS and S3 sources)
Hive
Relational
Relational with Custom SQL Query
Reference
NOTE: When an imported dataset is created via API, it is always imported as an unstructured dataset.
Any recipe that references this dataset should contain initial parsing steps required to structure the data.
NOTE: Do not create an imported dataset from a file that is being used by another imported dataset. If
you delete the newly created imported dataset, the file is removed, and the other dataset is corrupted.
Use a new file or make a copy of the first file first.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Endpoint:
/v4/importedDatasets
Examples by Type
Below, you can review the basic request body for creating imported datasets for various types of sources:
File (HDFS or S3 source)
Hive
Relational
Relation with Custom SQL Query
NOTE: The path value should not include the HDFS protocol, host, or port information. You only need to
provide the path on HDFS.
NOTE: The path value should not include the S3 protocol, host, or port information. You only need to
provide the path on S3.
{
"path":
"/tri-h26/uploads/1/343647c7-5b23-41c8-9397-b40a1ff415ea/USDA_Farmers_Mar
ket_2014.avro",
"type": "s3",
"bucket": "myBucket",
"name": "USDA Farmers Market 2014b",
"description": "USDA Farmers Market 2014 - copy"
}
Hive
{
"jdbcTable": "farmers_market_recipe_tri",
"jdbcPath": [
"default"
],
"columns": [
"fmid",
"market_name"
],
"filter": null,
"raw": null,
"id": 19,
"size": "-1",
"path": null,
"dynamicPath": null,
"type": "jdbc",
"bucket": null,
"isSchematized": true,
"isDynamic": false,
"disableTypeInference": false,
"createdAt": "2018-02-26T19:19:33.069Z",
"updatedAt": "2018-02-26T19:19:33.720Z",
"parsingRecipe": {
"id": 35
},
"relationalSource": {
"relationalPath": [
"default"
],
"columns": [
"fmid",
Relational
{
"visible": true,
"numFlows": 0,
"size": -1,
"type": "jdbc",
"jdbcType": "TABLE",
"jdbcPath": [
"public"
],
"jdbcTable": "datasources",
"columns": [
"id",
"size",
"path"
],
"connectionId": 3,
"name": "My DB Table"
}
{
"jdbcTable": "datasources",
"jdbcPath": [
"public"
],
"columns": [
"id",
"size",
"path"
],
"filter": null,
"raw": null,
"id": 23,
"size": "-1",
"path": null,
"dynamicPath": null,
You can submit custom SQL queries to relational or hive connections. These custom SQLs can be used to
pre-filter the data inside the database, improving performance of the query and the overall dataset.
For more information, see Enable Custom SQL Query.
Request Body:
Notes:
See previous notes on queries to relational sources.
As part of the request body, you must submit the custom SQL query as the value for the raw property.
The following example is valid for Oracle databases. Note the escaping of the double-quote marks.
NOTE: Syntax for the custom SQL query varies between relational systems. For more information on
syntax examples, see Create Dataset with SQL.
{
"visible": true,
"numFlows": 0,
"size": -1,
"type": "jdbc",
"jdbcType": "TABLE",
"connectionId": 1,
"raw": "SELECT * FROM `default`.`farmers_market_recipe_tri`",
"name": "Farmer's Market Data - Custom SQL Query"
}
Response Body:
In the response, note that the source of the data is defined by the connectionId value and the SQL defined in
the raw value.
{
"jdbcTable": null,
"jdbcPath": null,
"columns": null,
"filter": null,
"raw": [
"SELECT * FROM `default`.`farmers_market_recipe_tri`"
],
"id": 21,
"size": "-1",
"path": null,
"dynamicPath": null,
"type": "jdbc",
"bucket": null,
Reference
For more information on the properties of an imported dataset, see API ImportedDatasets Get v4.
API ImportedDatasets Delete v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/importedDatasets/<id>
where:
Property Description
/v4/importedDatasets/2
Request Body:
Response
Reference
For more information on the properties of an imported dataset, see API ImportedDatasets Get v4.
API ImportedDatasets Get List v4
Contents:
Required Permissions
Request
Response
Reference
Get the list of accessible imported datasets for the authenticated user.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/importedDatasets
/v4/importedDatasets/?embed=connection
/v4/importedDatasets?embed=connection&limit=100&offset=2
If the count of retrieved results is less than the limit, you have reached the end of the results.
Request Body:
Empty.
Response
{
"data": [
{
"id": 56,
"size": "-1",
"path": null,
"dynamicPath": null,
"type": "jdbc",
"bucket": null,
"isSchematized": true,
"isDynamic": false,
"disableTypeInference": false,
"createdAt": "2018-01-31T23:51:36.179Z",
"updatedAt": "2018-01-31T23:51:37.025Z",
"parsingRecipe": {
"id": 111
},
"relationalSource": {
"relationalPath": [
"public"
],
"columns": [
"start_date",
"end_date"
],
"filter": null,
"raw": null,
"id": 10,
"tableName": "datetable",
"createdAt": "2018-01-31T23:51:36.187Z",
"updatedAt": "2018-01-31T23:51:36.187Z",
"importedDataset": {
Reference
For more information on the properties of an imported dataset, see API ImportedDatasets Get v4.
API ImportedDatasets Get v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/importedDatasets/<id>
where:
Parameter Description
Use the following embedded reference to embed in the response data about the connection used to acquire the
source dataset if it was created from a Hive or relational connection.
/v4/importedDatasets/<id>?embed=connection
Request Body:
Empty.
Response
{
"jdbcTable": "datetable",
"jdbcPath": [
"public"
],
"columns": [
"start_date",
"end_date"
],
"filter": null,
"raw": null,
"id": 56,
"size": "-1",
"path": null,
"dynamicPath": null,
"type": "jdbc",
"bucket": null,
"isSchematized": true,
"isDynamic": false,
"disableTypeInference": false,
"createdAt": "2018-01-31T23:51:36.179Z",
"updatedAt": "2018-01-31T23:51:37.025Z",
"connection": {
"id": 2,
"name": "redshift",
"description": "",
"type": "jdbc",
"isGlobal": true,
"credentialType": "custom",
"credentialsShared": true,
"uuid": "c54bbec0-e05f-11e7-aa39-995f61171ffd",
"disableTypeInference": false,
"createdAt": "2017-12-13T23:45:59.468Z",
"updatedAt": "2017-12-13T23:46:09.039Z",
"creator": {
"id": 1
},
"updater": {
"id": 1
}
},
"parsingRecipe": {
"id": 111
},
"relationalSource": {
"relationalPath": [
"public"
],
{
"id": 29,
"size": "292817",
"path":
"/trifacta/uploads/1/efeb54fc-efee-4d5f-a92b-a44c09c60aaa/POS-r01.txt",
"dynamicPath":
"/trifacta/uploads/1/efeb54fc-efee-4d5f-a92b-a44c09c60aaa/POS-r.txt",
"type": "hdfs",
"bucket": null,
"blobHost": null,
"container": null,
"isSchematized": true,
"isDynamic": true,
"disableTypeInference": false,
"createdAt": "2018-03-26T22:33:17.386Z",
"updatedAt": "2018-03-26T22:33:18.337Z",
"parsingRecipe": {
"id": 43
},
"runParameters": {
"data": [
{
"value": {
"pattern": {
"regex": {
"value": "[0-9][0-9]"
}
}
},
"insertionIndices": [
{
"index": 62,
"order": 0
}
],
"id": 2,
"type": "path",
"createdAt": "2018-03-26T22:33:17.533Z",
"updatedAt": "2018-03-26T22:33:17.662Z",
"runParameterEdit": {
"value": {
"pattern": {
"regex": {
Reference
Common Properties:
The following properties are common to file-based and JDBC datasets.
Property Description
path For HDFS and S3 file sources, this value defines the path to the source.
For uploaded sources, this value specifies the location on the default backend storage layer where the dataset has
been uploaded.
container (Azure only) If the dataset is stored in on ADLS, this value specifies the container on the blob host where the
source is stored.
type Identifies where the type of storage where the source is located. Values:
hdfs
s3
jdbc
blobHost (Azure only) If the dataset is stored in on ADLS, this value specifies the blob host where the source is stored.
dynamicPath (Dataset with parameters only) Specifies the path without the parameters inserted into it. Full path is defined based
on this value and the data in the runParameters area.
isSchematized (If source file is avro, or type=jdbc) If true, schema information is available for the source.
isDynamic If true, the imported dataset is a dynamic dataset (dataset with parameters). For more information, see
Overview of Parameterization.
isConverted If true, the imported dataset has been converted to CSV format for storage.
disableTypeInference If true, the initial type inferencing performed on schematized sources by the Trifacta platform is disabled for this
source. For more information, see Configure Type Inference.
hasStructuring If true, initial parsing steps have been applied to the dataset.
runParameters If runtime parameters have been applied to the dataset, they are listed here. See below for more information.
name
creator.id Internal identifier of the user who created the imported dataset
workspace.id Internal identifier of the workspace into which the dataset has been imported.
parsingRecipe.id If initial parsing is applied, this value contains the internal identifier of the recipe that performs the parsing.
connection.id Internal identifier of the connection to the server hosting the dataset.
If this value is null, the file was uploaded from a local file system.
To acquire the entire connection for this dataset, you can use either of the following endpoints:
/v4/importedDatasets?embed=connection
/v4/importedDatasets/:id?embed=connection
runParameters reference:
The following properties are available in the runParameters area:
Property Description
insertionIndices.index Index value for the location in the path where the parameter is applied
0 - ascending
1 - descending
id Internal identiier for the parameter.
runParameterEdit Any runtime overrides applied to the parameter during job execution
importedDataset.id Internal identifier for the dataset to which the parameter is applied
creator.id Internal identifier of the user who created the dataset with parameters
updater.id Internal identifier of the last user who modified the dataset with parameters
overrideKey Any override values applied to the dataset with parameters at run time
storageLocation reference:
The following properties are available in the storageLocation area:
Property Description
fullUri The full URI to the location where the dataset is stored.
path For HDFS and S3 file sources, this value defines the path to the source.
For uploaded sources, this value specifies the location on the default backend storage layer where the dataset has been
uploaded.
workspaceId Internal identifier of the workspace into which the dataset has been imported.
type Identifies where the type of storage where the source is located. Values:
hdfs
s3
jdbc
bucket (If type=s3) Bucket on S3 where source is stored.
blobHost (Azure only) If the dataset is stored in on ADLS, this value specifies the blob host where the source is stored.
container (Azure only) If the dataset is stored in on ADLS, this value specifies the container on the blob host where the source is
stored.
Tip: Changes in this value indicate that the source file has been modified.
Property Description
jdbcPath Name of the database from which the source was queried.
raw If custom SQL has been applied to the data source to filter the data before it is imported, all SQL statements are listed.
Size Size in bytes of the data. For relational sources, this value is -1, as the data is not available.
File:
File-based datasets support the common properties only.
Embedded connection:
For more information on the properties when the connection is embedded in the response, see
API Connections Create v4.
API ImportedDatasets Patch v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/importedDatasets/<id>
where:
Parameter Description
/v4/importedDataset/8
Request Body:
Only the name and description properties should be modified. Modified properties must be included in the request.
In the following example, the name and the description of the imported dataset are modified:
{
"name": "My Imported DS"
"description": "This is my imported dataset."
}
Response
{
"id": 8,
"updater": {
"id": 1
},
"updatedAt": "2019-02-14T23:19:27.648Z"
}
Reference
For more information on the properties of an imported dataset, see API ImportedDatasets Get v4.
API ImportedDatasets Post AddToFlow v4
Contents:
Required Permissions
Request
Response
Reference
Add the specified imported dataset to a flow based on its internal identifier.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/importedDatasets/<id>/addToFlow
where:
Parameter Description
Request Body:
{
"flow": {
"id": 12
}
}
Response
Response Body:
{
"id": 47,
"wrangled": false,
"updatedAt": "2019-02-12T00:51:59.961Z",
"createdAt": "2019-02-12T00:51:59.931Z",
"referenceInfo": null,
"activeSample": {
"id": 52
},
"creator": {
"id": 1
},
"updater": {
"id": 1
},
"recipe": {
"id": 37
},
"referencedFlowNode": null,
"flow": {
"id": 12
}
}
Property Description
referenceInfo Reference information for the new object in the flow. Since the dataset has just been added, this value should
be null.
activeSample.id Internal identifier for the currently active sample for the dataset.
If null, the dataset has not been wrangled in the Transformer page and does not have initial parsing steps.
referencedFlowNode Internal identifier of the node of the flow that this dataset references. Since this dataset is an imported dataset,
there is no reference. This value should be null.
For more information on the other properties, see API ImportedDatasets Get v4.
API JobGroups Cancel v4
Contents:
Required Permissions
Request
Response
Reference
NOTE: If the job has completed, this endpoint does nothing. You must delete the completed job instead.
See API JobGroups Delete v4.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/jobGroups/<id>/cancel
where:
Parameter Description
/v4/jobGroups/14/cancel
Request Body:
Empty.
Response
Reference
For more information on the available status messages, see API JobGroups Get v4.
API JobGroups Create v4
Contents:
Required Permissions
Request
Response
Reference
Create a jobGroup, which launches the specified job as the authenticated user.
The request specification depends on one of the following conditions:
Dataset has already had a job run against it and just needs to be re-run.
Dataset has not had a job run, or the job definition needs to be re-specified.
NOTE: In this release, you cannot execute jobs sourced from datasets in Redshift or SQL DW or publish
to these locations via the API. This known issue will be fixed in a future release.
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/jobGroups
{
"wrangledDataset": {
"id": 7
}
}
Response
{
"reason": "JobStarted",
"sessionId": "eb3e98e0-02e3-11e8-a819-25c9559a2a2c",
"id": 9
}
Reference
Request Reference:
Property Description
wrangledDataset (required) Internal identifier for the object whose results you wish to generate. The recipes of all
preceding datasets on which this dataset depends are executed as part of the job.
photon
spark
overrides.profiler (required, if first time running the job) When set to true, a visual profile of the job is generated as
specified by the profiling options for the platform. See Profiling Options.
overrides.writesettings (required, if first time running the job) These settings define the publishing options for the job.
ranfrom (optional) If this value is set to null, then the job does not show up in the Job Details page.
Response reference:
Property Description
reason Current state of the job group at time of API call. Since this call creates the job group, this value is always Job started in
the response to this call.
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/jobGroups/<id>
Parameter Description
/v4/jobGroups/2
Request Body:
Empty.
Response
Reference
None.
API JobGroups Get Jobs v4
Contents:
Required Permissions
Request
Response
Reference
Get list of jobs for the specified jobGroup. For more information on jobGroups, see API JobGroups Get v4.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/jobGroups/<id>/jobs
Parameter Description
/v4/jobGroups/20/jobs
Request Body:
Empty.
Response
{
"data": [
{
"id": 41,
"status": "Complete",
"jobType": "wrangle",
"sampleSize": 100,
"percentComplete": 100,
"lastHeartbeatAt": "2019-02-11T23:55:32.604Z",
"createdAt": "2019-02-11T23:55:32.044Z",
"updatedAt": "2019-02-11T23:55:34.563Z",
"creator": {
"id": 1
},
"jobGroup": {
"id": 20
},
"errorMessage": null
},
{
"id": 42,
"status": "Complete",
"jobType": "filewriter",
"sampleSize": 100,
"percentComplete": 100,
"lastHeartbeatAt": "2019-02-11T23:55:34.676Z",
"createdAt": "2019-02-11T23:55:32.087Z",
"updatedAt": "2019-02-11T23:55:35.006Z",
"creator": {
"id": 1
},
"jobGroup": {
"id": 20
},
Reference
Property Description
status Current status of the job. See API JobGroups Get v4.
percentComplete Percentage of completion of the job at the time of the request. 100 means that the job has finished or failed.
jobGroup.id Internal identifier for the job group to which the job belongs.
For more information on the other properties, see API JobGroups Get v4.
API JobGroups Get List v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
Request
Endpoint:
/v4/jobGroups
/v4/jobGroups/?embed=jobs,wrangledDataset
/v4/jobGroups/?embed=jobs,wrangledDataset&limit=100%offset=2
If the count of retrieved results is less than the limit, you have reached the end of the results.
Request Body:
Empty.
Response
{
"data": [
{
"id": 20,
"status": "Complete",
"ranfrom": "ui",
"ranfor": "recipe",
"createdAt": "2019-02-11T23:55:31.804Z",
"profilingEnabled": true,
"updatedAt": "2019-02-11T23:55:35.445Z",
"runParameterReferenceDate": "2019-02-11T23:55:31.000Z",
"name": null,
"wrangledDataset": {
"id": 45,
"flow": {
Reference
Some properties related to the jobGroup appear only in this endpoint. They are listed below.
For more information on the properties of a jobGroup, see API JobGroups Get v4.
Property Description
wrangledDataset.flow.id Internal identifier for the flow containing the recipe that was executed with the job.
wrangledDataset.flow.name Name of the flow containing the recipe that was executed with the job.
wrangledDataset.flow.associatedPeople.* All users who have access to the flow and their roles
jobGroupRunParameterOverrides.* These values define any parameter overrides that were applied during the job in the following
key-value form:
"jobGroupRunParameterOverrides": {
"data": [
{
"value": {
"variable": {
"value": "basic_types1"
}
},
"id": 1,
"overrideKey": "name",
"isImplicit": false,
"createdAt":
"2018-03-21T06:56:57.042Z",
"updatedAt":
"2018-03-21T06:56:57.042Z",
"jobGroup": {
"id": 93
}
}
]
},
runParameterEdits.* runParameterEdits contains the state history of all parameters and their values during
job execution
Contents:
Get list of publications for the specified jobGroup. A publication is an export of job results from the platform after
they have been initially generated.
For more information on publications, see API Publications Get v4.
For more information on jobGroups, see API JobGroups Get v4.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/jobGroups/<id>/publications
where:
Parameter Description
/v4/jobGroups/22/publications
Request Body:
Empty.
Response
{
"data": [
{
"path": [
"default"
],
Reference
For more information on the properties of a publication, see API Publications Get v4.
For more information on the other properties, see API JobGroups Get v4.
API JobGroups Get Status v4
Contents:
Required Permissions
Request
Response
Reference
Get current status of the specified jobGroup. For more information on jobGroups, see API JobGroups Get v4.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/jobGroups/<id>/status
where:
Parameter Description
/v4/jobGroups/2/status
/v4/jobGroups/?status=Failed
Request Body:
Empty.
Response Body:
Returned response is the jobGroup definition for all accessible jobGroups where status=Failed. See
API JobGroups Get v4.
Response
"Complete"
Reference
For more information on the available status messages, see API JobGroups Get v4.
API JobGroups Get v4
Contents:
Required Permissions
Request
Response
Reference
Get information on the specified job group. A job group is a job that is executed from a specific node in a flow.
The job group may contain:
Wrangling job on the dataset associated with the node
Jobs on all datasets on which the selected job may depend
A profiling job for the job group
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Endpoint:
/v4/jobGroups/<id>
where:
Parameter Description
Embed Description
Parameter
jobs Embed information about the child jobs within the job group. Array includes information on transformation, profiling, and
publishing jobs that are part of the job group.
wrangledDataset This field contains the internal identifier for the dataset on which the job was run.
/v4/jobGroups/<id>?embed=jobs,wrangledDataset
/v4/jobGroups/8
Request Body:
Empty.
Response
{
"id": 8,
"name": null,
"description": null,
"ranfrom": "ui",
"ranfor": "recipe",
"status": "Complete",
"profilingEnabled": true,
"runParameterReferenceDate": "2018-01-25T18:01:15.000Z",
"createdAt": "2018-01-25T18:01:16.456Z",
"updatedAt": "2018-01-25T18:01:21.082Z",
"jobs": {
"data": [
{
"id": 2,
"createdAt": "2018-01-25T18:01:16.687Z",
"updatedAt": "2018-01-25T18:01:21.071Z",
Reference
Property Description
ui - Trifacta application
cli - command line interface
ranfor Description of the object for which the job was run:
Created - job group has been created based on the current action.
Pending - job group is queued for execution.
InProgress - job group is currently running.
Complete - job group has completed successfully.
Failed - job group has failed.
Canceled - job group was canceled by user action.
profilingEnabled When true, a profiling job was executed as part of this job group.
runParameterReferenceDate When a recipe is executed with dynamic parameters, this parameter is stored with the timestamp at the time
of execution. It can be used in debugging execution issues. Do not modify this value.
updatedAt Timestamp for when the job group was last updated
jobs A list of all jobs that were launched based on this job group. For more information, see
API JobGroups Get Jobs v4.
wrangledDataset Internal identifier of the object from where the job group was executed. For more information, see
API WrangledDatasets Get v4.
workspace.id Internal identifier for the workspace where the job was executed
creator.id Internal identifier for the user who created the job group
updater.id Internal identifier for the user who last updated the job group
snapshot.id Internal identifier of the data snapshot for the job group
flowRun
NOTE: This parameter is used for internal platform purposes. Do not modify.
Contents:
Required Permissions
Request
Response
Reference
For a specified jobGroup, this endpoint performs an ad-hoc publish of the results to the designated target.
Target information is based on the specified connection.
Job results to published are based on the specified jobGroup.
You can specify:
Database and table to which to publish
Type of action to be applied to the target table. Details are below.
Supported targets:
Hive
Redshift
For more information on jobGroups, see API JobGroups Get v4.
For additional examples, see API Workflow - Publish Results.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/jobGroups/<id>/publish
where:
Parameter Description
/v4/jobGroups/31/publish
Response
{
"jobgroupId": 31,
"reason": "JobStarted",
"sessionId": "f6c5f350-2102-11e9-bb80-9faf7b15f235"
}
Reference
Request Reference:
Property Description
path Name of database to which to write the results. This value must be enclosed in square brackets.
action Type of writing action to perform with the results. Supported actions:
create - Create a new table with each publication. This table is empty except for the schema, which is taken
from the results. A new table receives a timestamp extension to its name.
load - Append a pre-existing table with the results of the data. The schema of the results and the table must
match.
createAndLoad - Create a new table with each publication and load it with the results data. A new table
receives a timestamp extension to its name.
truncateAndLoad - Truncate a pre-existing table and load it with fresh data from the results.
dropAndLoad - Drop the target table and load a new table with the schema and data from the results.
Hive:
avro
pqt
Redshift:
NOTE: For results to be written to Redshift, the source must be stored in S3 and accessed through an S3
connection.
NOTE: By default, data is published to Redshift using the public schema. To publish using a different
schema, preface the table value with the name of the schema to use: MySchema.MyTable.
csv
json
avro
flowNodeId The internal identifier for the recipe (wrangledDataset) from which the job was executed.
For more information on the available status messages, see API JobGroups Put Publish v4.
API OutputObjects Create v4
Contents:
Required Permissions
Request
Response
Reference
Create an outputobject.
Version: v4
Terms...
Relevant terms:
Term Description
outputobjects An outputobject is a definition of one or more types of outputs and how they are generated. It must be
associated with a recipe.
NOTE: An outputobject must be created for a recipe before you can run a job on it. One and only one
outputobject can be associated with a recipe.
publications A publications object is used to specify a table-based output and is associated with an outputobject. Settings
include the connection to use, path, table type, and write action to apply.
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
NOTE: If an outputobject already exists for the recipe (flowNodeId) to which you are posting, you must
either modify the object instead or delete it before posting your new object.
Endpoint:
/v4/outputobjects
Request Body:
Following creates an outputobject with an embedded writesettings object to write an Avro file to the specified
location:
Response
Reference
For more information on the properties of an outputobject, see API OutputObjects Get v4.
API OutputObjects Delete v4
Contents:
Required Permissions
Request
Response
Reference
Terms...
Relevant terms:
Term Description
outputobjects An outputobject is a definition of one or more types of outputs and how they are generated. It must be
associated with a recipe.
NOTE: An outputobject must be created for a recipe before you can run a job on it. One and only one
outputobject can be associated with a recipe.
publications A publications object is used to specify a table-based output and is associated with an outputobject. Settings
include the connection to use, path, table type, and write action to apply.
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/outputobjects/<id>
where:
Property Description
/v4/outputobjects/3
Request Body:
Empty.
Response
Reference
For more information on the properties of an outputobject, see API OutputObjects Get v4.
API OutputObjects Get List v4
Contents:
Required Permissions
Request
Terms...
Relevant terms:
Term Description
outputobjects An outputobject is a definition of one or more types of outputs and how they are generated. It must be
associated with a recipe.
NOTE: An outputobject must be created for a recipe before you can run a job on it. One and only one
outputobject can be associated with a recipe.
writesettings A writesettings object defines file-based outputs within an outputobject. Settings include path, format,
compression, and delimiters.
publications A publications object is used to specify a table-based output and is associated with an outputobject. Settings
include the connection to use, path, table type, and write action to apply.
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/outputobjects
Response
{
"data": [
{
Reference
For more information on the properties of a write setting, see API OutputObjects Get v4.
API OutputObjects Get v4
Contents:
Required Permissions
Request
Response
Reference
Terms...
Relevant terms:
Term Description
outputobjects An outputobject is a definition of one or more types of outputs and how they are generated. It must be
associated with a recipe.
NOTE: An outputobject must be created for a recipe before you can run a job on it. One and only one
outputobject can be associated with a recipe.
writesettings A writesettings object defines file-based outputs within an outputobject. Settings include path, format,
compression, and delimiters.
publications A publications object is used to specify a table-based output and is associated with an outputobject. Settings
include the connection to use, path, table type, and write action to apply.
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/outputobjects/<id>?embed=writesettings,publications
/v4/outputobjects/3/
Request Body:
Empty.
Response
{
"id": 3,
"execution": "photon",
"profiler": true,
"isAdhoc": true,
"createdAt": "2018-11-08T18:51:56.633Z",
"updatedAt": "2018-11-08T18:52:44.535Z",
"creator": {
"id": 1
},
"updater": {
"id": 1
},
"flowNode": {
"id": 13
}
}
{
"id": 3,
Reference
Property Description
execution The execution engine on which the output is generated. Possible values:
photon - Job is executed on the Trifacta Server. This environment is suitable for small- to medium-sized jobs.
spark - Job is executed on the Hadoop cluster to which the Trifacta platform is connected. See Configure for Spark.
emrSpark - Job is executed on integrated EMR cluster. See Configure for EMR.
databricksSpark - Job is executed on the Azure Databricks cluster connected to the platform. See
Configure for Azure Databricks.
profiler If true, a visual profile of the results is generated as part of the output.
isAdhoc If true, the outputobject is used for ad-hoc execution of the recipe.
If false, the outputobject is used when a schedule job on the recipe is executed.
updater.id Internal identifier of the user who last updated the object
flowNodeId Internal identifier of the recipe with which the object is associated
For more information on the writesettings properties, see API WriteSettings Get v4.
For more information on the publications properties, see API Publications Get v4.
API OutputObjects Update v4
Contents:
Required Permissions
Request
Response
Reference
Terms...
Relevant terms:
Term Description
outputobjects An outputobject is a definition of one or more types of outputs and how they are generated. It must be
associated with a recipe.
NOTE: An outputobject must be created for a recipe before you can run a job on it. One and only one
outputobject can be associated with a recipe.
writesettings A writesettings object defines file-based outputs within an outputobject. Settings include path, format,
compression, and delimiters.
publications A publications object is used to specify a table-based output and is associated with an outputobject. Settings
include the connection to use, path, table type, and write action to apply.
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/outputobjects/<id>
/v4/outputobjects/3/
Request Body:
The following changes the running environment used for the outputobject and enables visual profiling. For more
information on the available properties, see API OutputObjects Get v4.
{
"execution": "spark",
"profiler": true
}
Response
Reference
For more information on the properties of the outputobject, see API OutputObjects Get v4.
API People Create v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/people/
Request Body:
{
"accept": "accept",
"password": "foo",
"password2": "foo",
"email": "[email protected]",
"name": "abc2",
"isAdmin": false,
"ssoPrincipal": null,
"hadoopPrincipal": null,
"lastLoginTime": null,
"awsConfig": null
}
Response
{
"isDisabled": false,
"forcePasswordChange": false,
"state": "active",
"id": 9,
"email": "[email protected]",
"name": "Test1",
"ssoPrincipal": null,
"hadoopPrincipal": null,
"isAdmin": false,
"updatedAt": "2019-01-09T20:23:31.560Z",
"createdAt": "2019-01-09T20:23:31.560Z",
"outputHomeDir": "/trifacta/queryResults/[email protected]",
"lastStateChange": null,
"fileUploadPath": "/trifacta/uploads",
"awsConfig": null
}
Request properties:
Property Description
password2 This value confirms the value for password. These two property values must be identical.
For more information on the properties of a user, see API People Get v4.
API People Delete v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/people/<id>
where:
Parameter Description
/v4/people/6
Request Body:
Empty.
Reference
For more information on the properties of a user, see API People Get v4.
API People Get List v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/people
/v4/people?limit=100%offset=2
If the count of retrieved results is less than the limit, you have reached the end of the results.
Response
{
"data": [
{
"id": 4,
"email": "[email protected]",
"name": "Test1",
"ssoPrincipal": null,
"hadoopPrincipal": null,
"isAdmin": false,
"outputHomeDir": "/trifacta/queryResults/[email protected]",
"isDisabled": false,
"forcePasswordChange": false,
"state": "active",
"lastStateChange": null,
"createdAt": "2019-01-09T20:23:31.560Z",
"updatedAt": "2019-01-09T20:25:03.000Z",
"fileUploadPath": "/trifacta/uploads",
"awsConfig": null
},
{
"id": 3,
"email": "[email protected]",
"name": "Test User95203645",
"ssoPrincipal": null,
"hadoopPrincipal": null,
"isAdmin": false,
"outputHomeDir":
"/trifacta/queryResults/[email protected]",
"isDisabled": false,
"forcePasswordChange": false,
"state": "active",
"lastStateChange": null,
"createdAt": "2019-01-09T10:39:50.310Z",
"updatedAt": "2019-01-09T10:39:50.349Z",
"fileUploadPath": "/trifacta/uploads",
"awsConfig": null
},
{
"id": 2,
"email": "[email protected]",
"name": "Test User83466845",
"ssoPrincipal": null,
Reference
For more information on the properties of a user, see API People Get v4.
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/people/<id>
where:
Parameter Description
/v4/people/4
Request Body:
Response
{
"id": 2,
"email": "[email protected]",
"name": "Joe Guy",
"ssoPrincipal": null,
"hadoopPrincipal": null,
"isAdmin": false,
"isDisabled": false,
"forcePasswordChange": false,
"state": "active",
"lastStateChange": null,
"createdAt": "2019-02-12T09:04:52.073Z",
"updatedAt": "2019-02-12T09:04:52.073Z",
"outputHomeDir": "/trifacta/queryResults/[email protected]",
"fileUploadPath": "/trifacta/uploads",
"awsConfig": null
}
Property Description
ssoPrincipal (If SSO is enabled) Principal value of the user for single-sign on
hadoopPrincipal (If secure impersonation is enabled) Hadoop principal value for the user, which determines permissions on the
Hadoop cluster
forcePasswordChange (if enabled) When set to true, the user must change the account password on next login.
lastStateChange Timestamp for when the value of the state parameter was changed.
updatedAt Timestamp for when the user account was last modified
outputHomeDir Home directory where the user's generated results are written
fileUploadPath Path on backend datastore where files uploaded from the user's desktop are stored for use as imported datasets.
awsConfig (If AWS integration is enabled) Value contains the S3 credentials, default bucket, and any extra buckets to which
the user has access
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
Request
Endpoint:
/v4/people/<id>
where:
Parameter Description
/v4/people/6
Request Body:
NOTE: For the PATCH method, only the properties that are being patched need to be submitted.
{
"outputHomeDir": "/trifacta/queryResults/[email protected]",
"email": "[email protected]",
"name": "Joe Example",
"ssoPrincipal": null,
"hadoopPrincipal": null,
"isAdmin": false,
"isDisabled": false,
"forcePasswordChange": true,
"awsConfig": null
}
{
"isAdmin": false,
"isDisabled": true
}
{
"id": 6,
"updatedAt": "2018-01-24T23:49:08.199Z"
}
Reference
For more information on these properties, see API People Get v4.
Contents:
Required Permissions
Request
Response
Reference
Terms...
Relevant terms:
Term Description
outputobjects An outputobject is a definition of one or more types of outputs and how they are generated. It must be
associated with a recipe.
NOTE: An outputobject must be created for a recipe before you can run a job on it. One and only one
outputobject can be associated with a recipe.
writesettings A writesettings object defines file-based outputs within an outputobject. Settings include path, format,
compression, and delimiters.
publications A publications object is used to specify a table-based output and is associated with an outputobject. Settings
include the connection to use, path, table type, and write action to apply.
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Endpoint:
/v4/publications
Request Body:
Following creates a publications object and associates it with outputobject ID=3:
{
"path": [
"default"
],
"tableName": "newTable",
"targetType": "hive",
"action": "create",
"outputObject": {
"id": 3
},
"connection": {
"id": 1
}
}
Response
Reference
For more information on the properties of a publications object, see API Publications Get v4.
API Publications Delete v4
Contents:
Required Permissions
Request
Response
Reference
Terms...
Relevant terms:
Term Description
NOTE: An outputobject must be created for a recipe before you can run a job on it. One and only one
outputobject can be associated with a recipe.
writesettings A writesettings object defines file-based outputs within an outputobject. Settings include path, format,
compression, and delimiters.
publications A publications object is used to specify a table-based output and is associated with an outputobject. Settings
include the connection to use, path, table type, and write action to apply.
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/publications/<id>
where:
Property Description
/v4/publications/3
Request Body:
Empty.
Response
Reference
For more information on the properties of a publications object, see API Publications Get v4.
API Publications Get List v4
Contents:
Required Permissions
Request
Response
Reference
Terms...
Relevant terms:
Term Description
outputobjects An outputobject is a definition of one or more types of outputs and how they are generated. It must be
associated with a recipe.
NOTE: An outputobject must be created for a recipe before you can run a job on it. One and only one
outputobject can be associated with a recipe.
writesettings A writesettings object defines file-based outputs within an outputobject. Settings include path, format,
compression, and delimiters.
publications A publications object is used to specify a table-based output and is associated with an outputobject. Settings
include the connection to use, path, table type, and write action to apply.
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/publications
Response
For more information on the properties of a publication, see API Publications Get v4.
API Publications Get v4
Contents:
Required Permissions
Request
Response
Reference
Terms...
Relevant terms:
Term Description
outputobjects An outputobject is a definition of one or more types of outputs and how they are generated. It must be
associated with a recipe.
NOTE: An outputobject must be created for a recipe before you can run a job on it. One and only one
outputobject can be associated with a recipe.
writesettings A writesettings object defines file-based outputs within an outputobject. Settings include path, format,
compression, and delimiters.
publications A publications object is used to specify a table-based output and is associated with an outputobject. Settings
include the connection to use, path, table type, and write action to apply.
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/publications/<id>
/v4/publications/1/
Response
{
"path": [
"default"
],
"id": 1,
"tableName": "POS_r01",
"targetType": "hive",
"action": "dropAndLoad",
"createdAt": "2018-11-08T18:52:43.871Z",
"updatedAt": "2018-11-08T18:52:43.871Z",
"creator": {
"id": 1
},
"updater": {
"id": 1
},
"outputObject": {
"id": 3
},
"connection": {
"id": 1
}
}
Property Description
hive
redshift
sqldatawarehouse
For more information, see Connection Types.
action The write action to apply to the table, in the event that the table exists:
updatedAt Timestamp for when the publications object was last updated
updater.id Internal identifier of the user who last updated the object
outputObject.id Internal identifier of the outputobject with which the publication object is associated
Contents:
Required Permissions
Request
Response
Reference
Terms...
Relevant terms:
Term Description
NOTE: An outputobject must be created for a recipe before you can run a job on it. One and only one
outputobject can be associated with a recipe.
writesettings A writesettings object defines file-based outputs within an outputobject. Settings include path, format,
compression, and delimiters.
publications A publications object is used to specify a table-based output and is associated with an outputobject. Settings
include the connection to use, path, table type, and write action to apply.
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/publications/<id>
/v4/publications/3/
Request Body:
The following changes the path, table name, and action applied on the table for the specified publication. For
more information on the properties, see API Publications Get v4.
{
"path": [
"default"
],
"tableName": "MyTable-DropAndLoad",
"action": "dropAndLoad"
}
Response
Reference
For more information on the properties of the publications object, see API Publications Get v4.
API Releases Create DryRun v4
Contents:
Required Permissions
Request
Response
Reference
Perform a dry-run of creating a release for the specified deployment, which performs a check of all permissions
required to import the package, as well as any specified import rules.
For more information on import rules, see Define Import Mapping Rules.
If they occur, errors are reported in the response.
After you have successfully completed a dry-run, you can formally create the release via API. See
API Releases Create v4.
NOTE: Releases pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
/v4/deployments/<id>/releases/dryRun
where:
Parameter Description
/v4/deployments/2/releases/dryRun
Request Body:
The request body must include the following key and value combination submitted as form data:
key value
data "@path-to-file"
curl -X POST \
http://example.com:3005/v4/deployments/1/releases/dryRun \
-H 'authorization: Basic c29sc29uQHRyaWZhY3RhLmNvbTphZG1pbg==' \
-H 'cache-control: no-cache' \
-H 'content-type: multipart/form-data' \
-F [email protected]
Response
{
"importRuleChanges": {
"object": [],
"value": []
},
"deletedObjects": {},
"primaryFlowIds": [
8
],
"flows": [
{
"id": 8,
"name": "2013 POS",
"description": null,
"deleted_at": null,
Reference
For more information on import rule changes, see Define Import Mapping Rules.
Contents:
Required Permissions
Request
Response
Reference
NOTE: Releases pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/deployments/<id>/releases
where:
Parameter Description
/v4/deployments/2/releases
Request Body:
The request body must include the following key and value combination submitted as form data:
key value
data "@path-to-file"
curl -X POST \
http://example.com:3005/v4/deployments/1/releases \
-H 'authorization: Basic c29sc29uQHRyaWZhY3RhLmNvbTphZG1pbg==' \
-H 'cache-control: no-cache' \
-H 'content-type: multipart/form-data' \
-F [email protected]
Response
{
"importRuleChanges": {
"object": [],
"value": []
},
"deletedObjects": {},
"primaryFlowIds": [
6
],
"flows": [
{
"id": 6,
"name": "2013 POS",
"description": null,
"deleted_at": null,
"cpProject": null,
"workspaceId": 1,
"createdAt": "2019-02-13T18:39:03.426Z",
"updatedAt": "2019-02-13T20:39:41.775Z",
"createdBy": 7,
"updatedBy": 7,
"folderId": null
}
],
"datasources": [
Reference
For more information on import rule changes, see Define Import Mapping Rules.
Contents:
Required Permissions
Request
Response
Reference
NOTE: Releases pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/releases/<id>
where:
Parameter Description
Request Body:
Empty.
Response
Reference
None.
API Releases Get v4
Contents:
Required Permissions
Request
Response
Reference
NOTE: Releases pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/releases/<id>
Parameter Description
/v4/releases/1
Request Body:
Empty.
Response
{
"id": 1,
"notes": "example_flow",
"packageUuid": "9bae78c0-2fcb-11e9-9523-77f56ed58844",
"active": null,
"createdAt": "2019-02-13T20:39:41.764Z",
"updatedAt": "2019-02-13T20:42:08.746Z",
"deployment": {
"id": 1
},
"creator": {
"id": 7
},
"updater": {
"id": 7
}
}
Property Description
notes Display value for notes that you can add to describe the release
active If true, the release is the active one for the deployment.
deployment.id Internal identifier for the deployment to which the release is assigned
creator.id Internal identifier for the user who created the release
updater.id Internal identifier for the user who last updated the release
Contents:
Required Permissions
Request
Response
Reference
Retrieve a package containing the definition of the flow for the specified release.
NOTE: Releases pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
NOTE: This method exports flows from a Product instance, which is different from exporting using the fl
ows/:id/package, which exports from the Dev instance. Connection identifiers and paths may differ
between the two instances. This method is typically used for archiving flows from the Deployment
Manager.
Response body is the contents of the package. Package contents are a ZIPped version of the flow definition.
Version: v4
Required Permissions
Request
Endpoint:
/v4/releases/<id>/package
Parameter Description
/v4/releases/7/package
Request Body:
None.
Response
Reference
None.
API Releases Patch v4
Contents:
Required Permissions
Request
Response
Reference
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/releases/<id>
where:
Parameter Description
/v4/releases/2
NOTE: You can have only one active release per deployment. If this release is made active as part of this
execution, the currently active release is made inactive.
NOTE: For the PATCH method, only the properties that are being patched need to be submitted.
Tip: You can use this endpoint to deactivate a release, which prevents its jobs from being run. If there is
no active release for the deployment, no jobs are run via the deployment job run endpoint. See
API Deployments Run v4.
{
"active": true
}
Response
{
"id": 2,
"updater": {
"id": 7
},
"updatedAt": "2019-02-13T20:55:21.276Z"
}
Reference
For more information on the properties of a release, see API Releases Get v4.
API WrangledDatasets Create v4
Contents:
Required Permissions
Request
Response
Reference
Create a new wrangled dataset from the specified imported dataset or wrangled dataset. Wrangled dataset is
owned by the authenticated user.
Tip: In the Trifacta application UI, the WrangledDataset object is called a recipe.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/wrangledDatasets/
{
"name": "POS - PROD 3",
"wrangledDataset": {
"id": 8
},
"flow": {
"id": 1
}
}
Response
Reference
For more information on the properties of a wrangled dataset, see API WrangledDatasets Get v4.
Contents:
Required Permissions
Request
Response
Reference
Tip: In the Trifacta application UI, the WrangledDataset object is called a recipe.
Version: v4
Required Permissions
Request
Endpoint:
/v4/wrangledDatasets/<id>
where:
Parameter Description
/v4/wrangledDatasets/7
Request Body:
Empty.
Response
Reference
For more information on the properties of a wrangled dataset, see API WrangledDatasets Get v4.
API WrangledDatasets Get List v4
Contents:
Required Permissions
Request
Response
Reference
Get the list of accessible wrangled datasets for the authenticated user.
Tip: In the Trifacta application UI, the WrangledDataset object is called a recipe.
Version: v4
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/wrangledDatasets
/v4/wrangledDatasets?embed=flow
/v4/wrangledDatasets?embed=flow&limit=100%offset=2
If the count of retrieved results is less than the limit, you have reached the end of the results.
Request Body:
Empty.
Response
{
"data": [
{
"id": 7,
"wrangled": true,
"createdAt": "2018-02-06T19:47:56.146Z",
"updatedAt": "2018-02-06T19:47:56.183Z",
"recipe": {
"id": 7
},
"flow": {
"id": 1,
Reference
For more information on the properties of a wrangled dataset, see API ImportedDatasets Get v4.
For more information on the embedded flow properties, see API Flows Get v4.
API WrangledDatasets Get PrimaryInputDataset v4
Contents:
Required Permissions
Request
Response
Reference
Get the primary input dataset for the specified wrangled dataset. For a wrangled dataset, its primary input
dataset is the original dataset from which the wrangled dataset was created.
Tip: In the Trifacta application UI, the WrangledDataset object is called a recipe.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/wrangledDatasets/<id>/primaryInputDataset
where:
Parameter Description
Request Body:
Empty.
Response
{
"wrangledDataset": {
"id": 11,
"wrangled": true,
"createdAt": "2018-04-24T16:12:14.018Z",
"updatedAt": "2018-04-24T17:05:06.741Z",
"referenceInfo": {
"id": 1
},
"activeSample": {
"id": 11
},
"creator": {
"id": 1
},
"updater": {
"id": 1
},
"recipe": {
"id": 11
},
"flow": {
"id": 3
}
}
}
Reference
Imported Dataset:
For more information on these properties, see API ImportedDatasets Get v4.
Wrangled Dataset:
Property Description
referenceInfo.id Internal identifier of the object that provides input to this one.
activeSample.id Internal identifier of the sample currently associated with this recipe.
updater.id Internal identifier of the user who last modified this recipe.
For more information on the other properties, see API WrangledDatasets Get v4.
API WrangledDatasets Get v4
Contents:
Required Permissions
Request
Response
Reference
Tip: In the Trifacta application UI, the WrangledDataset object is called a recipe.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/wrangledDatasets/<id>
/v4/wrangledDatasets/<id>?embed=flow
where:
Parameter Description
Request Body:
Empty.
Response
{
"id": 6,
"wrangled": true,
"createdAt": "2018-02-06T19:43:02.791Z",
"updatedAt": "2018-02-06T19:43:02.838Z",
"recipe": {
"id": 6
},
"name": "POS-r01",
"description": null,
"referenceInfo": null,
"activeSample": {
"id": 6
},
"creator": {
"id": 1
},
"updater": {
"id": 1
},
"flow": {
"id": 1
}
}
Reference
Wrangled Dataset:
These properties apply to the source of the wrangled dataset.
Property Description
activeSample Internal identifier of the currently active sample for this dataset
creator Internal identifier of the user who created the wrangled dataset
updater Internal identifier of the user who last updated the wrangled dataset
Embedded Flow:
For more information on the embedded flow properties, see API Flows Get v4.
API WrangledDatasets Patch v4
Contents:
Required Permissions
Request
Response
Reference
Tip: In the Trifacta application UI, the WrangledDataset object is called a recipe.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/wrangledDatasets/<id>
where:
Parameter Description
/v4/wrangledDataset/12
{
"name": "Wrangled DS 2"
"description": "This is my wrangled dataset #2."
}
Response
{
"id": 12,
"updater": {
"id": 1
},
"updatedAt": "2019-02-14T23:08:44.984Z"
}
Reference
For more information on the properties of a wrangled dataset, see API WrangledDatasets Get v4.
API WrangledDatasets Post AddToFlow v4
Contents:
Required Permissions
Request
Response
Reference
Add the specified wrangled dataset to a flow as a reference. A reference is a link from one flow to the output of a
wrangled dataset that is sourced from another flow.
Tip: In the Trifacta application UI, the WrangledDataset object is called a recipe.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Endpoint:
/v4/wrangledDatasets/<id>/addToFlow
where:
Parameter Description
/v4/wrangledDatasets/15/addToFlow
Request Body:
{
"flow": {
"id": 27
}
}
Response
Response Body:
Reference
Property Description
referenceInfo If the wrangled dataset has a reference object defined for it, its information is listed here.
activeSample Internal identifier for the currently active sample for the dataset.
recipe Internal identifier for the recipe associated with the dataset in its new flow.
If null, the dataset has not been wrangled in the Transformer page.
Updated the primary input dataset for the specified wrangled dataset. Each wrangled dataset must have one and
only one primary input dataset, which can be an imported or wrangled dataset.
Tip: In the Trifacta application UI, the WrangledDataset object is called a recipe.
This action performs a dataset swap for the source of a wrangled dataset, which can be done through the UI. See
Flow View Page.
Tip: After you have created a job via API, you can use this API to swap out the source data for the job's
dataset. In this manner, you can rapidly re-execute a pre-existing job using fresh data. See
API JobGroups Create v4.
Version: v4
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/wrangledDatasets/<id>/primaryInputDataset
where:
Parameter Description
/v4/wrangledDatasets/14/primaryInputDataset
{
"wrangledDataset": {
"id": 13
}
}
Response
Reference
For more information on these properties, see API WrangledDatasets Get PrimaryInputDataset v4.
API WriteSettings Create v4
Contents:
Terms...
Relevant terms:
Term Description
outputobjects An outputobject is a definition of one or more types of outputs and how they are generated. It must be
associated with a recipe.
NOTE: An outputobject must be created for a recipe before you can run a job on it. One and only one
outputobject can be associated with a recipe.
writesettings A writesettings object defines file-based outputs within an outputobject. Settings include path, format,
compression, and delimiters.
publications A publications object is used to specify a table-based output and is associated with an outputobject. Settings
include the connection to use, path, table type, and write action to apply.
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/writesettings
Request Body:
Following creates a writesettings object that writes a new Parquet file to the designated location each time that the
job is run.
Response
{
"delim": ",",
"id": 7,
"path":
"hdfs://hadoop:50070/trifacta/queryResults/[email protected]/POS_r03.p
qt",
"action": "create",
"format": "pqt",
"compression": "none",
"header": false,
"asSingleFile": false,
"prefix": null,
"suffix": "_increment",
"hasQuotes": false,
"updatedAt": "2018-11-08T00:15:22.948Z",
"createdAt": "2018-11-08T00:15:22.948Z",
"creator": {
"id": 1
},
"updater": {
"id": 1
},
"outputObjectId": 5
}
Reference
For more information on the properties of a writesettings object, see API WriteSettings Get v4.
API WriteSettings Delete v4
Contents:
Required Permissions
Request
Response
Reference
Terms...
Relevant terms:
Term Description
outputobjects An outputobject is a definition of one or more types of outputs and how they are generated. It must be
associated with a recipe.
NOTE: An outputobject must be created for a recipe before you can run a job on it. One and only one
outputobject can be associated with a recipe.
writesettings A writesettings object defines file-based outputs within an outputobject. Settings include path, format,
compression, and delimiters.
publications A publications object is used to specify a table-based output and is associated with an outputobject. Settings
include the connection to use, path, table type, and write action to apply.
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/writesettings/<id>
where:
Property Description
Request Body:
Empty.
Response
Reference
For more information on the properties of a writesettings object, see API WriteSettings Get v4.
API WriteSettings Get List v4
Contents:
Required Permissions
Request
Response
Reference
Get the list of accessible writesettings objects for the authenticated user.
Version: v4
Terms...
Relevant terms:
Term Description
outputobjects An outputobject is a definition of one or more types of outputs and how they are generated. It must be
associated with a recipe.
NOTE: An outputobject must be created for a recipe before you can run a job on it. One and only one
outputobject can be associated with a recipe.
writesettings A writesettings object defines file-based outputs within an outputobject. Settings include path, format,
compression, and delimiters.
publications A publications object is used to specify a table-based output and is associated with an outputobject. Settings
include the connection to use, path, table type, and write action to apply.
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Endpoint:
/v4/writesettings
Response
{
"data": [
{
"delim": ",",
"id": 6,
"path":
"hdfs://hadoop:50070/trifacta/queryResults/[email protected]/POS_r01.pqt"
,
"action": "create",
"format": "pqt",
"compression": "none",
"header": false,
"asSingleFile": false,
"prefix": null,
"suffix": "_increment",
"hasQuotes": false,
"createdAt": "2018-11-07T23:47:15.144Z",
"updatedAt": "2018-11-07T23:47:15.144Z",
"creator": {
"id": 1
},
"updater": {
"id": 1
},
"outputObject": {
"id": 5
}
},
{
"delim": ",",
"id": 1,
"path":
"hdfs://hadoop:50070/trifacta/queryResults/[email protected]/AllFileForma
Reference
For more information on the properties of a writesettings object, see API WriteSettings Get v4.
API WriteSettings Get v4
Contents:
Required Permissions
Request
Response
Reference
NOTE: writesettings values are required if you are running this specified job for the dataset for the
first time.
NOTE: To specify multiple outputs, you can include additional writesettings objects in the request.
For example, if you want to generate output to csv and json, you can duplicate the writesettings obj
ect for csv and change the format value in the second one to json.
These settings correspond to values that you can apply through the UI or through the command line interface.
For UI information, see Run Job Page.
For CLI information, see CLI for Jobs.
Version: v4
Terms...
Relevant terms:
Term Description
outputobjects An outputobject is a definition of one or more types of outputs and how they are generated. It must be
associated with a recipe.
NOTE: An outputobject must be created for a recipe before you can run a job on it. One and only one
outputobject can be associated with a recipe.
writesettings A writesettings object defines file-based outputs within an outputobject. Settings include path, format,
compression, and delimiters.
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/writesettings/<id>
/v4/writesettings/6/
Request Body:
Empty.
Response
Property Description
path (required) The fully qualified path to the output location where to write the results
action (required) If the output file or directory exists, you can specify one of the following actions:
csv
json
avro
pqt (Parquet)
tde (Tableau)
NOTE: To specify multiple output formats, create additional writesettings object for each output format.
compression (optional) For csv and json results, you can optionally compress them using bzip2 or gzip compression. Default
is none.
NOTE: If compression is applied, the filename in the path value must end with the appropriate extension for
the type of compression:
header (optional) For csv results with action set to create or append, this value determines if a header row with
column names is inserted at the top of the results. Default is false.
asSingleFile (optional) For csv and json results, this value determines if the results are concatenated into a single file or stored as
multiple files. Default is false.
prefix
NOTE: Do not use or modify. For internal platform use only.
suffix
NOTE: Do not use or modify. For internal platform use only.
updatedAt Timestamp for when the writesettings object was last updated
updater.id Internal identifier of the user who last updated the object
outputObject.id If specified, this value is the internal identifier of the outputobject with which this writesettings object is associated.
Contents:
Required Permissions
Request
Response
Reference
Terms...
Relevant terms:
Term Description
outputobjects An outputobject is a definition of one or more types of outputs and how they are generated. It must be
associated with a recipe.
NOTE: An outputobject must be created for a recipe before you can run a job on it. One and only one
outputobject can be associated with a recipe.
writesettings A writesettings object defines file-based outputs within an outputobject. Settings include path, format,
compression, and delimiters.
publications A publications object is used to specify a table-based output and is associated with an outputobject. Settings
include the connection to use, path, table type, and write action to apply.
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v4/writesettings/<id>
/v4/writesettings/6/
Request Body:
Response
{
"id": 6,
"updater": {
"id": 1
},
"updatedAt": "2018-11-08T00:13:43.819Z"
}
Reference
For more information on the properties of the writesettings object, see API WriteSettings Get v4.
v3 Endpoints
These endpoints apply to version 3 of the APIs for the Trifacta® platform.
For more information on support for this version, see API Version Support Matrix.
Connections
Flows
/flows/package/dryRun POST Import dry run API Flows Package Post DryRun v3
/flows/:id/package/dryRun GET Export dry run API Flows Package Get DryRun v3
Miscellaneous
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Connections Create v4
Contents:
Required Permissions
Request
Response
Reference
NOTE: In this release, you cannot create SQL DW connections via the API. Please create these
connections through the application. This known issue will be fixed in a future release.
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
NOTE: Relational connections require the creation and installation of an encryption key file on the Trifacta
node. This file must be present before the connection is created. See Create Encryption Key File.
This example creates a SQL Server connection of basic type. A valid username/password combination must be
specified in the credentials property.
{
"name": "sqlserver",
"description": "",
"isGlobal": false,
"type": "jdbc",
"host": "sqlserver.example.com",
"port": 1433,
"vendor": "sqlserver",
"params": {
"connectStrOpts": ""
},
"ssl": false,
"credentialType": "basic",
"credentials": [
{
"username": "<username>",
"password": "<password>"
}
]
}
Property Description
isGlobal (Optional) If true, the connection is available to all users. The default is false.
type For more information on the value to insert for the connection, see Connection Types.
port Port number for the relational server. The default value varies between database vendors. For more information, please
see the documentation provided with your database distribution.
vendor For more information on the value to insert for the connection, see Connection Types.
params (Optional) Set of JSON parameters that are passed to the database when initializing the connection. Depending on the
database vendor, you may be required to submit via this parameter the name of the default database. You can also pass in
optional parameters through the ConnecStrOpts parameter. For more information, see CLI for Connections.
NOTE: If you connect over SSL, you must modify the hostname value to use HTTPS.
credentials (Optional) If credentialType=basic, this property must contain the username and password to use to connect to
the relational source.
{
"host": "hadoop",
"port": 10000,
"vendor": "hive",
"params": {
"jdbc": "hive2",
"connectStrOpts": "",
"defaultDatabase": ""
},
"ssl": false,
"name": "hiveAPI",
"description": "Hive conn via API",
"type": "jdbc",
"isGlobal": true,
"credentialType": "conf",
"credentialsShared": true
}
Property Description
host Set this value to hadoop to integrate with the Hive instance for the Hadoop cluster to which the Trifacta platform
is connected.
"jdbc": "hive2",
isGlobal
NOTE: For Hive connections, this value must be set to true.
{
"host": "redshift.example.net",
"port": 5439,
"vendor": "redshift",
"params": {
"connectStrOpts": "",
"defaultDatabase": "dev",
"extraLoadParams": "BLANKSASNULL EMPTYASNULL TRIMBLANKS
TRUNCATECOLUMNS"
},
"ssl": true,
"name": "redshift",
"description": "Redshiftconn",
"type": "jdbc",
"isGlobal": true,
"credentialType": "custom",
"credentialsShared": true,
"credentials": [
{"key":"user","value":"<userId>"},
{"key":"password","value":"<PWD>"},
{"key":"iamRoleArn","value":"<IAM_role_ARN>"}
]
}
Property Description
The extraLoadParams value is used when you publish results to Redshift. For more information on these
values, see http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-conversion.html.
credentials username and password must be specified in this key-value format, although the value for either can be an
empty string.
For more information on parameters and credentials, see Create Redshift Connections.
Response
{
"connectString": "jdbc:sqlserver://sqlserver.example.com:1433",
"id": 5,
"host": "sqlserver.example.com",
"port": 1433,
"vendor": "sqlserver",
"params": {
"connectStrOpts": ""
},
"ssl": false,
"name": "sqlserver",
"description": "",
"type": "jdbc",
"createdBy": 1,
"isGlobal": false,
"credentialType": "basic",
"createdAt": "2017-07-05T18:00:19.165Z",
"updatedAt": "2017-07-05T18:00:19.165Z",
"updatedBy": 1,
"credentials": [
{
"username": "<username>"
}
]
}
Reference
For more information on the response body properties, see API Connections Get v3.
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/connections/<id>
where:
Parameter Description
/v3/connections/4
Request Body:
Empty.
Response
Reference
None.
API Connections Get List v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/connections
/v3/connections?limit=100%offset=2
If the count of retrieved results is less than the limit, you have reached the end of the results.
Request Body:
Empty.
Response
{
"data": [
{
Reference
For more information on the properties of a connection, see API Connections Get v3.
API Connections Get Status v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Connections Get Status v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/connections/<id>/status
where:
Parameter Description
/v3/connections/10/status
Request Body:
Response
{
"result": "SUCCESS",
"reason": null
}
Reference
Property Description
For more information on debugging failures in relational connections, see Enable Relational Connections.
For more information on debugging Hive connections. see Configure for Hive.
For more information on debugging S3 connections, see Enable S3 Access.
reason If the result value is not SUCCESS, additional information may be included here.
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Connections Get v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
where:
Parameter Description
/v3/connections/3
Request Body:
Empty.
Response
{
"data": [
{
"connectParams": {
"vendor": "postgres",
"host": "localhost",
"port": "5432",
"database": "trifacta"
},
"id": 10,
"host": "localhost",
"port": 5432,
"vendor": "postgres",
"params": {
"connectStrOpts": "",
"database": "trifacta"
},
"ssl": false,
"name": "postgres",
"description": "",
"type": "jdbc",
"createdBy": 1,
"isGlobal": false,
"credentialType": "basic",
"credentialsShared": true,
"uuid": "7d173c90-c4e1-11e7-a768-71cd1fa636c3",
"createdAt": "2017-11-09T00:04:00.345Z",
"updatedAt": "2017-11-09T00:04:00.345Z",
"updatedBy": 1,
"credentials": [
Property Description
params This setting is populated with any parameters that are passed to the source during connection and operations.
For relational sources, this setting may include the default database.
ssl When true, the Trifacta platform uses SSL to connect to the source.
createdBy Internal identifier for the user who created the connection
NOTE: After a connection has been made public, it cannot be made private again. It must be deleted
and recreated.
Default is false. A connection can be made public through the command line interface or the Connections
page. See Connections Page.
credentialType The type of credentials used for the connection. This value varies depending on where the credentials are
stored. See CLI for Connections.
credentialsShared If true, the credentials used for the connection are available for use by users who have been shared the
connection.
uuid A universal object identifier, which is unique across instances of the platform.
This internal identifier is particularly useful when create import mapping rules.
updatedBy Internal identifier for the user who last updated the connection
credentials If present, these values are the credentials used to connect to the database.
NOTE: For security reasons, you can store the connection's credentials in an external file on the Trifa
cta Server, after which they do not appear in this setting. See CLI for Connections.
Contents:
Required Permissions
Request
Response
Reference
NOTE: Deployments pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/deployments/
Request Body:
{
"name": "Test Deployment"
}
Response
Reference
For more information on properties of a deployment, see API Deployments Get v3.
API Deployments Delete v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Deployments Delete v4
Contents:
Required Permissions
Request
Response
Reference
Deleting a deployment removes all releases, packages, and flows underneath it. This step cannot
be undone.
NOTE: Deployments pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/deployments/<id>
Parameter Description
/v3/deployments/4
Request Body:
Empty.
Response
Reference
None.
API Deployments Get List v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Deployments Get List v4
Contents:
Required Permissions
Request
Response
Reference
NOTE: Deployments pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Endpoint:
/v3/deployments
/v3/deployments?limit=100%offset=2
If the count of retrieved results is less than the limit, you have reached the end of the results.
Request Body:
Empty.
Response
{
"data": [
{
"id": 2,
"name": "Deployment 2",
"createdAt": "2017-10-12T17:45:18.485Z",
"updatedAt": "2017-10-12T17:45:18.485Z",
"createdBy": 1,
"updatedBy": 1
},
{
"id": 1,
"name": "My First Deployment",
"createdAt": "2017-10-10T00:36:49.278Z",
"updatedAt": "2017-10-10T00:36:49.278Z",
"createdBy": 1,
"updatedBy": 1
}
],
"count": 2
}
For more information on the properties of a deployment, see API Deployments Get v3.
API Deployments Get Release List v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Deployments Get Release List v4
Contents:
Required Permissions
Request
Response
Reference
Get the list of releases for the specified deployment for the authenticated user.
NOTE: Deployments and releases pertain to Production instances of the Trifacta® platform. For more
information, see Overview of Deployment Manager.
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/deployments/:id?embed=releases
/v3/deployments/:id?embed=releases&limit=100%offset=2
If the count of retrieved results is less than the limit, you have reached the end of the results.
Request Body:
Response
{
"id": 1,
"name": "Payment10-SteveO",
"createdAt": "2017-09-26T07:00:00.000Z",
"updatedAt": "2017-10-12T23:47:56.801Z",
"createdBy": 1,
"updatedBy": 1,
"releases": [
{
"id": 2,
"notes": "Testing with a new format",
"packageUuid": "11d472a0-a799-11e7-9c5c-9dd7feba47aa",
"active": null,
"createdAt": "2017-10-02T19:07:24.311Z",
"updatedAt": "2017-10-05T12:21:46.177Z",
"deploymentId": 1,
"createdBy": 1,
"updatedBy": 1
},
{
"id": 1,
"notes": null,
"packageUuid": "6648f8c0-a9e6-11e7-a092-8394937c7038",
"active": true,
"createdAt": "2017-10-05T16:01:27.881Z",
"updatedAt": "2017-10-12T20:07:42.143Z",
"deploymentId": 1,
"createdBy": 1,
"updatedBy": 1
}
]
}
Reference
For more information on the properties of a release, see API Releases Get v3.
API Deployments Get v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Deployments Get v4
Contents:
NOTE: Deployments pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/deployments/<id>
where:
Parameter Description
/v3/deployments/3
Request Body:
Empty.
Response
Reference
Property Description
name Display name for the deployment. This value appears in the user interface.
createdBy Internal identifier for the user who created the deployment.
updatedBy Internal identifier for the user who last updated the deployment.
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Deployments Object Import Rules Patch v4
Contents:
Required Permissions
Request
Response
Reference
Create a list of object-based import rules for the specified deployment. Delete all previous rules applied to the
same object.
The generated rules apply to all flows that are imported into the deployment after they has been created.
NOTE: Deployments pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
The response contains any previously created rules that have been deleted as a result of this change.
You can also make replacements in the import package based on value mappings. See
API Deployments Value Import Rules Patch v3.
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/deployments/<id>/objectImportRules
where:
Parameter Description
/v3/deployments/4/objectImportRules
[{"tableName":"connections","onCondition":{"uuid":"d75255f0-a245-11e7-861
8-adc1dbb4bed0"},"withCondition":{"id":1}}]
This type of replacement applies if the imported packages contain sources that are imported through two separate
connections:
{"tableName":"connections","onCondition":{"uuid":"d75255f0-a245-11e7-8618
-adc1dbb4bed0"},"withCondition":{"id":1}},
{"tableName":"connections","onCondition":{"uuid":"d552045e0-c314-22b5-941
0-acd1bcd8eea2"},"withCondition":{"id":2}}
]
Response
The response body contains any previously created rules that have been deleted as a result of this update.
Response Body Example: All new rule, no deletions
If the update does not overwrite any previous rules, then no rules are deleted. So, the response looks like the
following:
{
"deleted": []
}
Reference
Property Description
onCondition The matching object identifier and the specified literal or pattern to match.
withCondition The identifier for the object type, as specified in by the tableName value, which is being modified.
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Deployments Patch v4
Contents:
Required Permissions
Request
Response
Reference
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/deployments/<id>
where:
Parameter Description
/v3/deployments/2
{
"name": "New Deployment Name"
}
Response
{
"id": 2,
"updatedBy": 1,
"updatedAt": "2017-10-13T00:06:12.147Z"
}
Reference
For more information on the properties of a deployment, see API Deployments Get v3.
API Deployments Run v3
Contents:
Required Permissions
Request
Response
Reference
Run the job for the active release of the specified deployment.
At least one manual output must be specified for the main flow within the package. See Flow View Page.
An active release must be specified for the deployment. See API Releases Patch v3.
NOTE: Deployments pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/deployments/<id>/run
where:
Parameter Description
/v3/deployments/4/run
Request Body:
Empty.
Request Body - dataset with parameters:
You can apply parameter overrides when running a deployment. Add the following structure to the request body:
where:
Item Description
value The string value to assign to the parameter for the job run.
Response
{
"data": [
{
"reason": "JobStarted",
"sessionId": "dd6a90e0-c353-11e7-ad4e-7f2dd2ae4621",
"id": 33,
"jobs": {
"data": [
{
"id": 68
},
{
"id": 69
},
{
"id": 70
}
]
}
}
]
}
Property Description
id JobGroup identifier. For more information, see API JobGroups Get v3.
jobs.data.id Internal identifier for the individual jobs that compose the job group being executed.
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Deployments Value Import Rules Patch v4
Contents:
Required Permissions
Request
Response
Reference
Create a list of value-based import rules for the specified deployment. Delete any previous rules applied to the
same values.
The generated rules apply to all flows that are imported into the Production instance after they have been created.
NOTE: Deployments pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
The response contains any previously created rules that have been deleted as a result of this change.
You can also make replacements in the import package based on object references. See
API Deployments Object Import Rules Patch v3.
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
where:
Parameter Description
/v3/deployments/4/valueImportRules
NOTE: The executing user of any job must have access to any data source that is remapped in the new
instance.
[{"type":"s3Bucket","on":"wrangle-dev","with":"wrangle-prod"}]
NOTE: Rules are applied in the listed order. If you are applying multiple rules to the same object in the
import package, the second rule must reference the expected changes applied by the first rule.
[
{"type":"fileLocation","on":"klamath","with":"klondike"},
{"type":"fileLocation","on":"/\/dev\//","with":"/prod/"}
]
In the above:
The first rule replaces the string klamath in the path to the source with the following value: klondike.
The second rule performs a regular expression match on the string /dev/. Since the match is described
using the regular expression syntax, the backslashes must be escaped. The replacement value is the
following literal: /prod/.
You can specify matching values using the following types of matches:
Response
The response body contains any previously created rules that have been deleted as a result of this update.
Response Body Example: All new rule, no deletions
If the update does not overwrite any previous rules, then no rules are deleted. So, the response looks like the
following:
{
"deleted": []
}
{
"deleted": [
{
"on": "wrangle-dev",
"id": 1,
"type": "s3Bucket",
"with": "wrangle-prod",
"createdBy": 3,
"updatedBy": 3,
"createdAt": "2017-11-07T02:16:57.743Z",
"updatedAt": "2017-11-07T02:16:57.743Z",
"deploymentId": 1
}
]
}
Property Description
deploymentId Internal identifier for the deployment to which to apply the import rule.
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Flows Create v4
Contents:
Required Permissions
Request
Response
Reference
NOTE: You cannot add datasets to the flow through this endpoint. Moving pre-existing datasets into a
flow is not supported in this release. Create the flow first and then when you create the datasets,
associate them with the flow at the time of creation.
See API ImportedDatasets Create v3.
See API WrangledDatasets Create v3.
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
Request Body:
A name value is required.
{
"name": "My Flow",
"description": "This is my flow."
}
Response
{
"id": 6,
"name": "My Flow",
"description": "This is my flow.",
"createdBy": 1,
"updatedBy": 1,
"updatedAt": "2017-02-17T17:08:57.848Z",
"createdAt": "2017-02-17T17:08:57.848Z"
}
Reference
For more information on the properties of a flow, see API Flows Get v3.
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Flows Delete v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/flows/<id>
where:
Parameter Description
/v3/flows/2
Request Body:
Empty.
Response
Reference
For more information on the properties of a user, see API Flows Get v3.
API Flows Get List v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Flows Get List v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
If you are not logged in or are logged as a non-admin user, you can retrieve only your flows.
If you are logged in as an admin, you can retrieve all flows in the platform.
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/flows
/v3/flows?limit=100%offset=2
If the count of retrieved results is less than the limit, you have reached the end of the results.
Request Body:
Empty.
Response
For more information on the properties of a user, see API Flows Get v3.
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Flows Get v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/flows/<id>
Parameter Description
/v3/flows/10
Request Body:
Empty.
[
{
"id": 10,
"name": "2013 POS",
"description": null,
"deleted_at": null,
"cpProject": null,
"createdAt": "2017-11-07T17:02:34.662Z",
"updatedAt": "2017-11-07T17:02:34.662Z",
"createdBy": 1,
"updatedBy": 1,
"associatedPeople": [
{
"outputHomeDir":
"/trifacta/queryResults/[email protected]",
"name": "Administrator",
"email": "[email protected]",
"id": 1,
"flowpermission": {
"flowId": 3,
"personId": 1,
"role": "owner"
}
}
]
}
]
Property Description
updatedBy Internal identifier of the user who last updated the flow
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Flows Package Get DryRun v4
Contents:
Required Permissions
Request
Response
Reference
Performs a dry-run of generating a flow package and exporting it, which performs a check of all permissions
required to export the package.
If they occur, permissions errors are reported in the response.
Version: v3
Required Permissions
Request
Endpoint:
/v3/flows/<id>/package/dryRun
Parameter Description
/v3/flows/7/package/dryRun
Request Body:
None.
Response
{}
Reference
None.
API Flows Package Get v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Flows Package Get v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/flows/<id>/package
Parameter Description
/v3/flows/7/package
Request Body:
None.
Response
Reference
None.
API Flows Package Post DryRun v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Flows Package Post DryRun v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/flows/package/dryRun
/v3/flows/package/dryRun
Request Body:
The request body must include the following key and value combination submitted as form data. This path is the
location of the ZIP package that you are importing.
key value
data "@path-to-file"
curl -X POST \
http://example.com:3005/v3/flows/package/dryRun \
-H 'authorization: Basic c29sc29uQHRyaWZhY3RhLmNvbTphZG1pbg==' \
-H 'cache-control: no-cache' \
-H 'content-type: multipart/form-data' \
-F [email protected]
Response
Reference
None.
API Flows Package Post v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Flows Package Post v4
Contents:
Required Permissions
Request
Response
Reference
Performs an import of a flow package, which also applies any specified import rules.
Before you import, you can perform a dry-run to check for errors. See API Flows Package Post DryRun v3.
For more information on import rules, see Define Import Mapping Rules.
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/flows/package/
/v3/flows/package
key value
data "@path-to-file"
curl -X POST \
http://example.com:3005/v3/flows/package \
-H 'authorization: Basic c29sc29uQHRyaWZhY3RhLmNvbTphZG1pbg==' \
-H 'cache-control: no-cache' \
-H 'content-type: multipart/form-data' \
-F [email protected]
Response
{
"importRuleChanges": {
"object": [],
"value": []
},
"flowName": "[267f4340] 2013 POS"
}
Reference
None.
API Flows Patch v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Flows Patch v4
Contents:
Required Permissions
Request
Response
Reference
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/flows/<id>
Parameter Description
/v3/flows/6
Request Body:
You can modify the following properties.
{
"name": "My Flow",
"description": "This is my flow."
}
Response
Reference
For more information on the properties of a flow, see API Flows Get v3.
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API ImportedDatasets Create v4
Contents:
Required Permissions
Request and Response
Examples by Type
File (HDFS and S3 sources)
Hive
Relational
Relational with Custom SQL Query
Reference
Create an imported dataset from an available resource. Created dataset is owned by the authenticated user.
NOTE: When an imported dataset is created via API, it is always imported as an unstructured dataset.
Any recipe that references this dataset should contain initial parsing steps required to structure the data.
NOTE: Do not create an imported dataset from a file that is being used by another imported dataset. If
you delete the newly created imported dataset, the file is removed, and the other dataset is corrupted.
Use a new file or make a copy of the first file first.
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Endpoint:
Examples by Type
Below, you can review the basic request body for creating imported datasets for various types of sources:
File (HDFS or S3 source)
Hive
Relational
Relation with Custom SQL Query
NOTE: The path value should not include the HDFS protocol, host, or port information. You only need to
provide the path on HDFS.
{
"path":
"/trifacta/uploads/1/4aee9852-cf92-47a8-8c6a-9ff2adeb3b4a/POS-r02.txt",
"type": "hdfs",
"bucket": null,
"name": "POS-r02b.txt",
"description": "POS-r02 - copy"
}
NOTE: The path value should not include the S3 protocol, host, or port information. You only need to
provide the path on S3.
{
"path":
"/trifacta/uploads/1/4aee9852-cf92-47a8-8c6a-9ff2adeb3b4a/POS-r02.txt",
"type": "s3",
"bucket": "myBucket",
"name": "POS-r02b.txt",
"description": "POS-r02 - copy"
}
{
"id": 8,
"size": "281032",
"path":
"/trifacta/uploads/1/4aee9852-cf92-47a8-8c6a-9ff2adeb3b4a/POS-r02.txt",
"isSharedWithAll": false,
"type": "hdfs",
"bucket": null,
"isSchematized": false,
"createdBy": 1,
"updatedBy": 1,
"updatedAt": "2017-02-08T18:38:56.640Z",
"createdAt": "2017-02-08T18:38:56.560Z",
"connectionId": null,
"parsingScriptId": 14,
"cpProject": null
}
Hive
{
"visible": true,
"numFlows": 0,
"size": -1,
"type": "jdbc",
"jdbcType": "TABLE",
"jdbcPath": [
"DB1"
],
"jdbcTable": "MyHiveTable",
"columns": [
"column1",
"column2"
],
"connectionId": 16,
"name": "My Hive Table"
}
Relational
{
"jdbcTable": "MyOracleTable",
"jdbcPath": [
"OracleDB_1"
],
"columns": [
"I",
"J",
"K"
],
"filter": null,
"raw": null,
"isSharedWithAll": false,
"id": 195,
"size": "65536",
"type": "jdbc",
"connectionId": 7,
"createdBy": 1,
"updatedBy": 1,
"updatedAt": "2017-02-17T18:10:48.662Z",
"createdAt": "2017-02-17T18:10:47.441Z",
"path": null,
"bucket": null,
"parsingScriptId": 372,
"cpProject": null,
"isSchematized": true
}
You can submit custom SQL queries to relational or hive connections. These custom SQLs can be used to
pre-filter the data inside the database, improving performance of the query and the overall dataset.
For more information, see Enable Custom SQL Query.
Request Body:
Notes:
See previous notes on queries to relational sources.
As part of the request body, you must submit the custom SQL query as the value for the raw property.
The following example is valid for Oracle databases. Note the escaping of the double-quote marks.
NOTE: Syntax for the custom SQL query varies between relational systems. For more information on
syntax examples, see Create Dataset with SQL.
{
"visible": true,
"numFlows": 0,
"type": "jdbc",
"jdbcType": "TABLE",
"connectionId": 7,
"raw": "SELECT INST#,BUCKET#,INST_LOB# FROM
\"AUDSYS\".\"CLI_SWP$7395268a$1$1\"",
"size": -1,
"name": "SQL Dataset 1"
}
Response Body:
In the response, note that the source of the data is defined by the connectionId value and the SQL defined in
the raw value.
Reference
For more information on the properties of an imported dataset, see API ImportedDatasets Get v3.
API ImportedDatasets Delete v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API ImportedDatasets Delete v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Endpoint:
/v3/importedDatasets/<id>
where:
Property Description
/v3/importedDatasets/2
Request Body:
Empty.
Response
Reference
For more information on the properties of an imported dataset, see API ImportedDatasets Get v3.
API ImportedDatasets Get List v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API ImportedDatasets Get List v4
Contents:
Required Permissions
Request
Response
Reference
Get the list of accessible imported datasets for the authenticated user.
Version: v3
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/importedDatasets
/v3/importedDatasets/?embed=connection
/v3/importedDatasets?embed=connection&limit=100%offset=2
If the count of retrieved results is less than the limit, you have reached the end of the results.
Request Body:
Empty.
Response
{
"data": [
{
"id": 15,
"size": "-1",
"path": null,
"isSharedWithAll": false,
Reference
For more information on the properties of an imported dataset, see API ImportedDatasets Get v3.
API ImportedDatasets Get v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API ImportedDatasets Get v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/importedDatasets/<id>
where:
Parameter Description
Use the following embedded reference to embed in the response data about the connection used to acquire the
source dataset if it was created from a Hive or relational connection.
/v3/importedDatasets/<id>?embed=connection
/v3/importedDatasets/196
Request Body:
Empty.
Response
{
"id": 196,
"size": "-1",
"path": null,
"isSharedWithAll": false,
"type": "jdbc",
"bucket": null,
"isSchematized": true,
"createdAt": "2017-02-17T19:07:12.757Z",
"updatedAt": "2017-02-17T19:09:10.117Z",
"createdBy": 1,
"updatedBy": 1,
"name": "SQL Dataset 1 – 4",
"description": null,
"connection": {
"id": 7
},
"parsingRecipe": {
"id": 378
},
"relationalSource": {
"relationalPath": null,
"columns": null,
"filter": null,
"raw": [
"SELECT INST#,BUCKET#,INST_LOB# FROM
\"AUDSYS\".\"CLI_SWP$7395268a$1$1\""
],
"id": 109,
"tableName": null,
"createdAt": "2017-02-17T19:07:12.767Z",
"updatedAt": "2017-02-17T19:07:12.767Z",
"datasourceId": 196
}
}
{
"id": 313,
"size": "35651584",
"path": null,
"isSharedWithAll": false,
"type": "jdbc",
"bucket": null,
"isSchematized": true,
"createdAt": "2017-02-22T23:33:54.400Z",
"updatedAt": "2017-02-22T23:34:18.148Z",
"createdBy": 1,
"updatedBy": 1,
"name": "TestOracleDS",
"description": null,
"connection": {
"id": 7,
"name": "Oracle",
"description": "",
"type": "jdbc",
"createdBy": 1,
"isGlobal": true,
"credentialType": "basic",
"createdAt": "2017-01-11T01:21:54.950Z",
"updatedAt": "2017-01-11T01:22:20.107Z",
"updatedBy": 1
},
"parsingRecipe": {
"id": 645
},
"relationalSource": {
"relationalPath": [
"AUDSYS"
],
"columns": [
"INST#",
"BUCKET#",
"INST_LOB#",
"MAX_SEQ#",
"FLUSH_SCN",
"FLUSH_TIME",
"MIN_SCN",
"MAX_SCN",
"MIN_TIME",
"MAX_TIME",
"SID#",
"SERIAL#",
"STATUS",
"LOG_PIECE"
],
"filter": null,
Reference
Common Properties:
The following properties are common to file-based and JDBC datasets.
Property Description
path For HDFS and S3 file sources, this value defines the path to the source.
isSharedWithAll If true, the source is shared among all users of the platform.
type Identifies where the type of storage where the source is located. Values:
hdfs
s3
jdbc
bucket (If type=s3) Bucket on S3 where source is stored.
isSchematized (If source file is avro, or type=jdbc) If true, schema information is available for the source.
createdBy Internal identifier of the user who created the imported dataset
updatedBy Internal identifier of the user who last updated the imported dataset
connection Internal identifier of the connection to the server hosting the dataset.
If this value is null, the file was uploaded from a local file system.
To acquire the entire connection for this dataset, you can use either of the following endpoints:
/v3/importedDatasets?embed=connection
/v3/importedDatasets/:id?embed=connection
parsingRecipe Internal identifier of the recipe that is used to parse the imported dataset for wrangling.
relationalPath Name of the database from which the source was queried.
raw If custom SQL has been applied to the data source to filter the data before it is imported, all SQL statements are listed.
File:
File-based datasets support the common properties only.
API ImportedDatasets Post AddToFlow v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API ImportedDatasets Post AddToFlow v4
Contents:
Required Permissions
Request
Response
Reference
Add the specified imported dataset to a flow based on its internal identifier.
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
where:
Parameter Description
/v3/importedDatasets/4/addToFlow
Request Body:
{
"flow": {
"id": 4
}
}
Response
Response Body:
{
"id": 14,
"createdBy": 1,
"updatedBy": 1,
"scriptId": 7,
"flowId": 4,
"wrangled": false,
"updatedAt": "2017-06-28T19:38:29.275Z",
"createdAt": "2017-06-28T19:38:29.016Z",
"flowNodeId": null,
"deleted_at": null,
"activesampleId": 15
}
Property Description
If null, the dataset has not been wrangled in the Transformer page.
activesampleId Internal identifier for the currently active sample for the dataset.
For more information on the other properties, see API ImportedDatasets Get v3.
API JobGroups Create v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API JobGroups Create v4
Contents:
Required Permissions
Request
Response
Reference
Create a jobGroup, which launches the specified job as the authenticated user.
The request specification depends on one of the following conditions:
Dataset has already had a job run against it and just needs to be re-run.
Dataset has not had a job run, or the job definition needs to be re-specified.
NOTE: In this release, you cannot execute jobs sourced from datasets in Redshift or SQL DW or publish
to these locations via the API. This known issue will be fixed in a future release.
Version: v3
Required Permissions
Request
Endpoint:
/v3/jobGroups
{
"wrangledDataset": {
"id": 7
}
}
{
"wrangledDataset": {
"id": 1
},
"overrides": {
"execution": "photon",
"profiler": false,
"writesettings": [
{
"path":
"hdfs://hadoop:50070/trifacta/queryResults/[email protected]/cdr_txt.csv
",
"action": "create",
"format": "csv",
"compression": "none",
"header": false,
"asSingleFile": false
}
]
},
"ranfrom": "cli"
}
{
"wrangledDataset": {
"id": 1
},
"ranfrom": "cli",
"runParameters": {
"overrides": {
"data": [
{
"key": "myParamName",
"value": "override value"
}
]
}
},
"overrides": {
"execution": "photon",
"profiler": false,
"writesettings": [
{
"path":
"hdfs://hadoop:50070/trifacta/queryResults/[email protected]/cdr_txt.csv
",
"action": "create",
"format": "csv",
"compression": "none",
"header": false,
"asSingleFile": false
}
]
}
}
Response
Reference
Request Reference:
Property Description
wrangledDataset (required) Internal identifier for the object whose results you wish to generate. The recipes of all
preceding datasets on which this dataset depends are executed as part of the job.
runParameters.overrides Use this section specify key-value pairs for parameter overrides to be applied during job execution.
overrides.execution (required, if first time running the job) Indicates the running environment on which the job is
executed. Accepted values:
photon
spark - Spark job on the integrated Hadoop cluster
databricksSpark - Spark implementation on Azure Databricks
For more information, see Running Environment Options.
overrides.profiler (required, if first time running the job) When set to true, a visual profile of the job is generated as
specified by the profiling options for the platform. See Profiling Options.
overrides.writesettings (required, if first time running the job) These settings define the publishing options for the job. See
below.
ranfrom (optional) If this value is set to null, then the job does not show up in the Job Details page.
writesettings Reference:
The writesettings values allow you to specify aspects of the publication of results to the specified path locati
on.
NOTE: writesettings values are required if you are running this specified job for the dataset for the
first time.
NOTE: To specify multiple outputs, you can include additional writesettings objects in the request.
For example, if you want to generate output to csv and json, you can duplicate the writesettings obj
ect for csv and change the format value in the second one to json.
These settings correspond to values that you can apply through the UI or through the command line interface.
For UI information, see Run Job Page.
For CLI information, see CLI for Jobs.
path (required) The fully qualified path to the output location where to write the results
action (required) If the output file or directory exists, you can specify one of the following actions:
format (required) Output format for the results. Specify one of the following values:
csv
json
avro
pqt
NOTE: To specify multiple output formats, create additional writesettings object for each output
format.
compression (optional) For csv and json results, you can optionally compress them using bzip2 or gzip compression.
Default is none.
header (optional) For csv results with action set to create or append, this value determines if a header row with
column names is inserted at the top of the results. Default is false.
asSingleFile (optional) For csv and json results, this value determines if the results are concatenated into a single file or stored
as multiple files. Default is false.
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API JobGroups Delete v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
/v3/jobGroups/<id>
where:
Parameter Description
/v3/jobGroups/2
Request Body:
Empty.
Response
Reference
None.
API JobGroups Get Jobs v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API JobGroups Get Jobs v4
Contents:
Required Permissions
Request
Response
Reference
Get list of jobs for the specified jobGroup. For more information on jobGroups, see API JobGroups Get v3.
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Endpoint:
/v3/jobGroups/<id>/jobs
where:
Parameter Description
/v3/jobGroups/2/jobs
Request Body:
Empty.
Response
{
"data": [
{
"id": 5,
"writeSetting": {
"id": 3
},
"scriptResult": {
"id": 4
},
"createdAt": "2017-05-05T20:38:15.883Z",
"updatedAt": "2017-05-05T20:38:19.411Z",
"status": "Complete",
"jobType": "filewriter",
"sampleSize": 100,
"percentComplete": 100,
"cpJob": null,
"createdBy": 1,
"errorMessage": null,
"jobGroup": {
"id": 3
}
},
{
"id": 6,
Reference
Property Description
writeSetting User-visible output settings. Contents may vary depending on the type of output.
scriptResult Internal identifier for job execution. Used by other dependent jobs to identify where to write results to or to collect
results from.
executionLanguage Indicator for the running environment where the job was executed. Values:
status Current status of the job. See API JobGroups Get v3.
percentComplete Percentage of completion of the job at the time of the request. 100 means that the job has finished or failed.
For more information on the other properties, see API JobGroups Get v3.
API JobGroups Get List v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API JobGroups Get List v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/jobGroups
/v3/jobGroups/?embed=jobs,wrangledDataset
/v3/jobGroups/?embed=jobs,wrangledDataset&limit=100%offset=2
If the count of retrieved results is less than the limit, you have reached the end of the results.
Request Body:
Empty.
Response
For more information on the properties of a jobGroup, see API JobGroups Get v3.
API JobGroups Get Status v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API JobGroups Get Status v4
Contents:
Required Permissions
Request
Response
Reference
Get current status of the specified jobGroup. For more information on jobGroups, see API JobGroups Get List v3.
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/jobGroups/<id>/status
where:
Parameter Description
/v3/jobGroups/2/status
Request Body:
Empty.
Response
Reference
For more information on the available status messages, see API JobGroups Get List v3.
API JobGroups Get v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API JobGroups Get v4
Contents:
Required Permissions
Request
Response
Reference
Get information on the specified job group. A job group is a job that is executed from a specific node in a flow.
The job group may contain:
Wrangling job on the dataset associated with the node
Jobs on all datasets on which the selected job may depend
A profiling job for the job group
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/jobGroups/<id>
where:
Parameter Description
Embed Description
Parameter
wrangledDataset This field contains the internal identifier for the dataset on which the job was run.
/v3/jobGroups/<id>?embed=jobs,wrangledDataset
/v3/jobGroups/8
Request Body:
Empty.
Response
{
"id": 8,
"name": null,
"description": null,
"ranfrom": "ui",
"status": "Complete",
"profilingEnabled": true,
"createdAt": "2017-01-31T19:59:23.804Z",
"updatedAt": "2017-01-31T20:00:28.278Z",
"createdBy": 2,
"updatedBy": 2,
"wrangledDataset": {
"id": 92
},
"snapshot": {
"id": 53
},
"wrangleScript": {
"id": 60
},
"jobs": {
"data": null
}
}
Reference
Property Description
ui - Trifacta application
cli - command line interface
status Current status of the job group:
Created - job group has been created based on the current action.
Pending - job group is queued for execution.
InProgress - job group is currently running.
Complete - job group has completed successfully.
Failed - job group has failed.
Canceled - job group was canceled by user action.
profilingEnabled When true, a profiling job was executed as part of this job group.
updatedAt Timestamp for when the job group was last updated
createdBy Internal identifier for the user who created the job group
updatedBy Internal identifier for the user who last updated the job group
wrangledDataset Internal identifier of the object from where the job group was executed.
snapshot Internal identifier of the data snapshot for the job group
wrangleScript Internal identifier of the Wrangle script to execute for the job group
jobs A list of all jobs that were launched based on this job group
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API JobGroups Put Publish v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/jobGroups/<id>/publish
where:
Parameter Description
/v3/jobGroups/2/publish
Request Body:
{
"connection": {
"id": 1
},
"path": ["default"],
"table": "test_table3",
"action": "create",
"inputFormat": "avro",
"flowNodeId": 10
}
{
"jobgroupId": 2,
"jobIds": [
11
],
"reason": "JobStarted",
"sessionId": "d9f13aa0-3b35-11e7-9bff-c764dff4fad8"
}
Reference
Request Reference:
Property Description
action Type of writing action to perform with the results. Supported actions:
create - Create a new table with each publication. This table is empty except for the schema, which is taken
from the results. A new table receives a timestamp extension to its name.
load - Append a pre-existing table with the results of the data. The schema of the results and the table must
match.
createAndLoad - Create a new table with each publication and load it with the results data. A new table
receives a timestamp extension to its name.
truncateAndLoad - Truncate a pre-existing table and load it with fresh data from the results.
dropAndLoad - Drop the target table and load a new table with the schema and data from the results.
Hive:
avro
pqt
Redshift:
NOTE: For results to be written to Redshift, the source must be stored in S3 and accessed through an S3
connection.
NOTE: By default, data is published to Redshift using the public schema. To publish using a different
schema, preface the table value with the name of the schema to use: MySchema.MyTable.
csv
json
avro
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API People Create v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/people/
Request Body:
{
"accept": "accept",
"password": "foo",
"password2": "foo",
"email": "[email protected]",
"name": "abc2"
}
Response
Reference
Request properties:
Property Description
password2 This value confirms the value for password. These two property values must be identical.
For more information on the properties of a user, see API People Get v3.
API People Delete v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API People Delete v4
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Endpoint:
/v3/people/<id>
where:
Parameter Description
/v3/people/2
Request Body:
Empty.
Response
Reference
For more information on the properties of a user, see API People Get v3.
API People Get List v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API People Get List v4
Contents:
Required Permissions
Request
Response
Reference
If you are not logged in or are logged as a non-admin user, you can retrieve a very limited set of properties
for each user.
If you are logged in as an admin, you can retrieve the full property set for each user.
Request
Endpoint:
/v3/people
/v3/people?limit=100%offset=2
If the count of retrieved results is less than the limit, you have reached the end of the results.
Request Body:
Empty.
Response
{
"data": [
{
"name": "ExampleUser",
"email": "[email protected]",
"id": 955
},
{
"name": "Example User 2",
"email": "[email protected]",
"id": 888
}
]
}
{
"data": [
{
"outputHomeDir": "/trifacta/queryResults/[email protected]",
"id": 955,
"email": "[email protected]",
"name": "ExampleUser",
"ssoPrincipal": null,
"hadoopPrincipal": "ExampleUser",
"cpPrincipal": null,
"isAdmin": false,
"isDisabled": false,
"lastLoginTime": null,
"createdAt": "2017-01-25T23:22:36.707Z",
"updatedAt": "2017-01-25T23:22:36.707Z",
"awsConfig": null
},
{
"outputHomeDir": "/trifacta/queryResults/[email protected]",
"id": 888,
"email": "[email protected]",
"name": "Example User 2",
"ssoPrincipal": null,
"hadoopPrincipal": null,
"cpPrincipal": null,
"isAdmin": false,
"isDisabled": false,
"lastLoginTime": null,
"createdAt": "2017-01-25T21:38:59.537Z",
"updatedAt": "2017-01-25T21:38:59.537Z",
"awsConfig": null
}
]
}
Reference
For more information on the properties of a user, see API People Get v3.
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API People Get v4
Contents:
Required Permissions
Request
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/people/<id>
where:
Parameter Description
/v3/people/2
Request Body:
Empty.
Response
Reference
Property Description
outputHomeDir Home directory where the user's generated results are written
ssoPrincipal (If SSO is enabled) Principal value of the user for single-sign on
hadoopPrincipal (If secure impersonation is enabled) Hadoop principal value for the user, which determines permissions on the Hadoop
cluster
cpPrincipal (If enabled) Principal value used to integrate with cloud platform
lastLoginTime Timestamp for the last time that the user logged in
updatedAt Timestamp for when the user account was last modified
awsConfig (If AWS integration is enabled) Value contains the S3 credentials, default bucket, and any extra buckets to which the user
has access
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API People Patch v4
Contents:
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/people/<id>
where:
Parameter Description
/v3/people/2
Request Body:
{
"outputHomeDir": "/trifacta/queryResults/[email protected]",
"email": "[email protected]",
"name": "Joe Example",
"ssoPrincipal": null,
"hadoopPrincipal": null,
"cpPrincipal": null,
"isAdmin": false,
"isDisabled": false,
"awsConfig": null
}
{
"id": 2,
"updatedAt": "2017-05-18T19:46:46.839Z"
}
Reference
For more information on these properties, see API People Get v3.
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Releases Create DryRun v4
Contents:
Required Permissions
Request
Response
Reference
Perform a dry-run of creating a release for the specified deployment, which performs a check of all permissions
required to import the package, as well as any specified import rules.
For more information on import rules, see Define Import Mapping Rules.
If they occur, errors are reported in the response.
After you have successfully completed a dry-run, you can formally create the release via API. See
API Releases Create v3.
NOTE: Releases pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
/v3/deployments/<id>/releases/dryRun
where:
Parameter Description
/v3/deployments/2/releases/dryRun
Request Body:
The request body must include the following key and value combination submitted as form data:
key value
data "@path-to-file"
curl -X POST \
http://example.com:3005/v3/deployments/1/releases/dryRun \
-H 'authorization: Basic c29sc29uQHRyaWZhY3RhLmNvbTphZG1pbg==' \
-H 'cache-control: no-cache' \
-H 'content-type: multipart/form-data' \
-F [email protected]
Response
{
"importRuleChanges": {
"object": [],
"value": []
},
"flowName": "POS-r01 Flow"
}
Reference
For more information on import rule changes, see Define Import Mapping Rules.
Contents:
Required Permissions
Request
Response
Reference
NOTE: Releases pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/deployments/<id>/releases
where:
Parameter Description
/v3/deployments/2/releases
Request Body:
The request body must include the following key and value combination submitted as form data:
key value
data "@path-to-file"
Response
{
"importRuleChanges": {
"object": [],
"value": []
},
"flowName": "POS-r01 Flow"
}
Reference
For more information on import rule changes, see Define Import Mapping Rules.
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Releases Delete v4
Contents:
Required Permissions
Request
Response
Reference
NOTE: Releases pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
Version: v3
Required Permissions
Request
Endpoint:
/v3/releases/<id>
where:
Parameter Description
/v3/releases/2
Request Body:
Empty.
Response
Reference
None.
API Releases Get v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Releases Get v4
Contents:
Required Permissions
Request
Response
Reference
NOTE: Releases pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/releases/<id>
where:
Parameter Description
/v3/releases/2
Request Body:
Empty.
Response
{
"id": 2,
"notes": "My second release",
"packageUuid": "b6b76bc0-a1c6-11e7-8c9d-f53cb0bb7b0a",
"active": null,
"createdAt": "2017-08-01T07:00:00.000Z",
"updatedAt": "2017-10-05T12:26:36.326Z",
"deploymentId": 1,
"createdBy": 1,
"updatedBy": 2
}
Property Description
notes Display value for notes that you can add to describe the release
active If true, the release is the active one for the deployment.
createdBy Internal identifier for the user who created the release.
updatedBy Internal identifier for the user who last updated the release.
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Releases Package Get v4
Contents:
Required Permissions
Request
Response
Reference
Retrieve a package containing the definition of the flow for the specified release.
NOTE: Releases pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
NOTE: This method exports flows from a Product instance, which is different from exporting using the fl
ows/:id/package, which exports from the Dev instance. Connection identifiers and paths may differ
between the two instances. This method is typically used for archiving flows from the Deployment
Manager.
Response body is the contents of the package. Package contents are a ZIPped version of the flow definition.
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Endpoint:
/v3/releases/<id>/package
Parameter Description
/v3/releases/7/package
Request Body:
None.
Response
Reference
None.
API Releases Patch v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API Releases Patch v4
Contents:
Required Permissions
Request
Response
Reference
NOTE: Releases pertain to Production instances of the Trifacta® platform. For more information, see
Overview of Deployment Manager.
Version: v3
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/releases/<id>
where:
Parameter Description
/v3/releases/2
NOTE: You can have only one active release per deployment. If this release is made active as part of this
execution, the currently active release is made inactive.
Tip: You can use this endpoint to deactivate a release, which prevents its jobs from being run. If there is
no active release for the deployment, no jobs are run via the deployment job run endpoint. See
API Deployments Run v3.
{
"active": true
}
Response
Reference
For more information on the properties of a release, see API Releases Get v3.
API WrangledDatasets Create v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API WrangledDatasets Create v4
Contents:
Required Permissions
Request
Response
Reference
Create a new wrangled dataset from the specified imported dataset or wrangled dataset. Wrangled dataset is
owned by the authenticated user.
Tip: In the Trifacta application UI, the WrangledDataset object is called a recipe.
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/wrangledDatasets/
{
"name": "Copy of Wrangled Dataset 18",
"wrangledDataset": {
"id": 18
},
"flow": {
"id": 1
}
}
Response
{
"id": 23,
"flowId": 2,
"scriptId": 24,
"wrangled": true,
"createdBy": 1,
"updatedBy": 1,
"updatedAt": "2017-02-08T20:28:06.067Z",
"createdAt": "2017-02-08T20:28:06.067Z",
"flowNodeId": null,
"deleted_at": null,
"activesampleId": null,
"name": "Copy of Imported Dataset 2",
"active": true
}
Reference
For more information on the properties of a wrangled dataset, see API WrangledDatasets Get v3.
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API WrangledDatasets Delete v4
Contents:
Required Permissions
Request
Response
Reference
Tip: In the Trifacta application UI, the WrangledDataset object is called a recipe.
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
where:
Parameter Description
/v3/wrangledDatasets/2
Request Body:
Empty.
Response
Reference
For more information on the properties of a wrangled dataset, see API ImportedDatasets Get v3.
API WrangledDatasets Get List v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API WrangledDatasets Get List v4
Contents:
Required Permissions
Request
Response
Reference
Get the list of accessible wrangled datasets for the authenticated user.
Tip: In the Trifacta application UI, the WrangledDataset object is called a recipe.
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Endpoint:
/v3/wrangledDatasets
/v3/wrangledDatasets?embed=flow
/v3/wrangledDatasets?embed=flow&limit=100%offset=2
If the count of retrieved results is less than the limit, you have reached the end of the results.
Request Body:
Empty.
Response
{
"data": [
{
"id": 35,
"wrangled": true,
"createdAt": "2017-02-03T05:16:55.844Z",
"updatedAt": "2017-02-03T05:16:56.998Z",
"createdBy": 1,
"updatedBy": 1,
"name": "base_type_map_array_record_large",
"description": null,
"activeSample": {
"id": 12
},
"flow": {
"id": 12
},
Reference
For more information on the properties of a wrangled dataset, see API ImportedDatasets Get v3.
For more information on the embedded flow properties, see API Flows Get v3.
API WrangledDatasets Get PrimaryInputDataset v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API WrangledDatasets Get PrimaryInputDataset v4
Contents:
Required Permissions
Request
Response
Reference
Get the primary input dataset for the specified wrangled dataset. For a wrangled dataset, its primary input
dataset is the original dataset from which the wrangled dataset was created.
Tip: In the Trifacta application UI, the WrangledDataset object is called a recipe.
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/wrangledDatasets/<id>/primaryInputDataset
where:
Parameter Description
Request Body:
Empty.
Response
{
"importedDataset": {
"id": 47,
"size": "292817",
"path":
"/trifacta/uploads/1/2a677cbe-ca19-4d47-b038-65cda938588d/POS-r01.txt",
"isSharedWithAll": false,
"type": "hdfs",
"cpProject": null,
"bucket": null,
"connectionId": null,
"isSchematized": false,
"createdAt": "2017-02-21T17:54:56.621Z",
"updatedAt": "2017-02-21T17:54:56.840Z",
"createdBy": 1,
"updatedBy": 1,
"parsingScriptId": 92
}
}
Reference
Imported Dataset:
For more information on these properties, see API ImportedDatasets Get v3.
Wrangled Dataset:
Property Description
flowNodeId Internal identifier for the node of the flow to which the dataset is attached
For more information on the other properties, see API WrangledDatasets Get v3.
API WrangledDatasets Get v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API WrangledDatasets Get v4
Contents:
Required Permissions
Request
Response
Reference
Version: v3
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/wrangledDatasets/<id>
/v3/wrangledDatasets/<id>?embed=flow
where:
Parameter Description
/v3/wrangledDatasets/35/
Request Body:
Empty.
Response
{
"id": 35,
"wrangled": true,
"createdAt": "2017-02-03T05:16:55.844Z",
"updatedAt": "2017-02-03T05:16:56.998Z",
"createdBy": 1,
"updatedBy": 1,
"name": "base_type_map_array_record_large",
"description": null,
"activeSample": {
"id": 12
},
"flow": {
"id": 12,
"name": "base_type_map_array_record_large Flow",
"description": null,
"createdAt": "2017-02-03T05:16:55.684Z",
"updatedAt": "2017-02-03T05:16:55.684Z",
"createdBy": 1,
"updatedBy": 1
},
"script": {
"id": 36
}
}
Wrangled Dataset:
These properties apply to the source of the wrangled dataset.
Property Description
createdBy Internal identifier of the user who created the wrangled dataset
updatedBy Internal identifier of the user who last updated the wrangled dataset
activeSample Internal identifier of the currently active sample for this dataset
Embedded Flow:
For more information on the embedded flow properties, see API Flows Get v3.
API WrangledDatasets Put PrimaryInputDataset v3
The v3 APIs are scheduled for End of Life (EOL). You should migrate to using the latest version. For more
information, see API Version Support Matrix.
Latest version of this endpoint: API WrangledDatasets Put PrimaryInputDataset v4
Contents:
Required Permissions
Request
Response
Reference
Updated the primary input dataset for the specified wrangled dataset. Each wrangled dataset must have one and
only one primary input dataset, which can be an imported or wrangled dataset.
Tip: In the Trifacta application UI, the WrangledDataset object is called a recipe.
This action performs a dataset swap for the source of a wrangled dataset, which can be done through the UI. See
Flow View Page.
Tip: After you have created a job via API, you can use this API to swap out the source data for the job's
dataset. In this manner, you can rapidly re-execute a pre-existing job using fresh data. See
API JobGroups Create v3.
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v3/wrangledDatasets/<id>/primaryInputDataset
where:
Parameter Description
/v3/wrangledDatasets/3/primaryInputDataset
{
"importedDataset": {
"id": <id>
}
}
{
"wrangledDataset": {
"id": <id>
}
}
Response
Reference
For more information on these properties, see API WrangledDatasets Get PrimaryInputDataset v3.
API Session Get
Contents:
Required Permissions
Request
Response
Reference
Required Permissions
NOTE: Each request to the Trifacta® platform must include authentication credentials. See
API Authentication.
Request
Endpoint:
/v2/session
Request Body:
Empty.
{
"outputHomeDir": "/trifacta/queryResults/[email protected]",
"id": 1,
"email": "[email protected]",
"name": "My Account",
"ssoPrincipal": null,
"hadoopPrincipal": null,
"cpPrincipal": null,
"isAdmin": true,
"isDisabled": false,
"forcePasswordChange": false,
"lastLoginTime": "2018-01-24T21:03:54.813Z",
"deleted_at": null,
"createdAt": "2018-01-24T08:29:11.248Z",
"updatedAt": "2018-01-24T21:03:54.813Z",
"awsconfigId": null,
"roles": [
{
"id": 1,
"role": "dataAdmin",
"createdAt": "2018-01-24T08:28:34.369Z",
"updatedAt": "2018-01-24T08:28:34.369Z",
"peopleworkspaces": {
"workspaceId": 1,
"personId": 1,
"roleId": 1,
"createdAt": "2018-01-24T08:29:11.360Z",
"updatedAt": "2018-01-24T08:29:11.360Z"
}
},
{
"id": 2,
"role": "wrangler",
"createdAt": "2018-01-24T08:28:34.369Z",
"updatedAt": "2018-01-24T08:28:34.369Z",
"peopleworkspaces": {
"workspaceId": 1,
"personId": 1,
"roleId": 2,
"createdAt": "2018-01-24T08:29:11.362Z",
"updatedAt": "2018-01-24T08:29:11.362Z"
}
},
{
"id": 4,
"role": "admin",
Reference
User information:
For more information on user properties, see API People Get v3.
Roles:
Property Description
dataAdmin
Wrangler
NOTE: All valid user accounts must have the Wrangler role.
createdAt Timestamp for when the role was added to the user account.
updatedAt Timestamp for when the role was last updated to the user account.
API Status Active Support Active Support End/ End of Life Date
Version Start Date Maintenance Start Date
v4 Active 2018-04-06
(Release 5.0)
v3 Active 2017-02-27 The next release of Trifacta Wrangler The next release of Trifacta Wrangler
(Release 4.0) Enterprise after Release 6.0 Enterprise after Release 6.0
API Migration to v4
Contents:
Connections
Datasets and Recipes
Flows
Flow import and export
This document describes how to migrate your existing usage of the v3 endpoints to their v4 equivalents.
NOTE: In the next Trifacta Wrangler Enterprise after Release 6.0, the v3 endpoints reach End of Life
(EOL) and will no longer be available in the product. You must migrate your API endpoint usage to v4.
This section contains a mapping of documentation between the publicly available v3 endpoints and their v4
equivalents.
NOTE: Except as noted, these v3 behaviors should be reflected in the v4 endpoints. Please be sure to
review the notes.
Legend:
Item Description
:id indicates that a numerical internal identifier for the relevant object must be included.
Method REST method to execute
Connections
/vX/connections POST API Connections Create API Connections Create Changes to request body
v3 v4 Changes to response body (connection
object)
/vX/connections/:id GET API Connections Get v3 API Connections Get v4 Reference content on new version of
connection object
Flows
/vX/flows GET API Flows Get List v3 API Flows Get List v4
/vX/flows/:id GET API Flows Get v3 API Flows Get v4 associatedPeople parameters from v3 are no longer available.
/vX/flows/package/dryRun POST API Flows Package Post API Flows Package Post
DryRun v3 DryRun v4
/vX/flows/package POST API Flows Package Post API Flows Package Post More parameters returned in the
v3 v4 generated object
/vX/flows/:id/package/dryRun GET API Flows Package Get API Flows Package Get
DryRun v3 DryRun v4
/vX/flows/:id/package GET API Flows Package Get API Flows Package Get
v3 v4
/v4/jobGroups POST API JobGroups API JobGroups v4 version supports adding overrides for datasets with
Create v3 Create v4 parameters through API endpoint
/v4/jobGroups GET API JobGroups API JobGroups Returned information in v4 version includes runtime
Get List v3 Get List v4 parameter overrides that were applied
/v4/jobGroups//:id/jobs GET API JobGroups API JobGroups Leaner and more informative response in v4
Get Jobs v3 Get Jobs v4
/v4/jobGroups/:id/status GET API JobGroups API JobGroups v4 doc includes method of acquiring status of all jobs
Get Status v3 Get Status v4 for a specified status (e.g. Failed)
/vX/deployments GET API Deployments Get List v3 API Deployments Get List v4
/vX/deployments/:id/objectImportRules PATCH API Deployments Object Import Rules API Deployments Object Import Rules
Patch v3 Patch v4
/vX/deployments/:id/valueImportRules PATCH API Deployments Value Import Rules API Deployments Value Import Rules
Patch v3 Patch v4
/vX/deployments/:id?embed=releases GET API Deployments Get Release List v3 API Deployments Get Release List v4
/vX/deployments/:id/releases/dryRun POST API Releases Create DryRun v3 API Releases Create DryRun v4
Users
/vX/people POST API People Create v3 API People Create v4 Same set of required parameters
More available parameters through v4 endpoint
/vX/people GET API People Get List v3 API People Get List v4
/vX/people:id PATCH API People Patch v3 API People Patch v4 More editable parameters
/vX/people/:id GET API People Get v3 API People Get v4 More parameters available in the user object
API - UI Integrations
You can automate execution of tasks against the Trifacta platform by referencing URL destinations through your
Chrome browser.
Pre-requisites:
If the user has not authenticated with the Trifacta platform, the user is redirected to the Login page, where
s/he can login before completing the UI integration.
The integration must be executed through Google Chrome.
How to:
To execute a UI integration, login to the platform and execute the following:
[http|https]://<host>:<port>/<UI endpoint>
UI Integrations:
UI Integration - Create Dataset
Contents:
Pre-requisites
Authentication
Sources of Data
Step-by-Step Guide
Using the following URL endpoint, you can create a dataset from another application through the Trifacta
application.
NOTE: This integration is not supported in the Wrangler Enterprise desktop application.
Pre-requisites
If you are calling from a source application, you must be logged into that application first. See
Authentication below.
You must authenticate with the Trifacta platform before you are redirected to the target destination. See
API - UI Integrations.
This URL integration is supported on HDFS and S3 datastores.
It is assumed that there are no conflicting datasets with the names that are used to create the dataset in
this set of steps. No name validation is performed as part of this action.
Authentication
NOTE: Before using any UI integration, you must first login to the application. If you are not logged in, you
are redirected to the login page, where you can input your credentials before reaching your target URL.
In addition to authentication with the Trifacta platform, the authenticated user must also have the appropriate
permissions to access the assets on the datastore. This includes:
Permissions to access the folder or directory
Appropriate impersonated user configured for the account, if secure impersonation is enabled.
If this dataset is going to be executed later via command line interface, you must create the dataset with
the same user that will execute the job.
For more information:
Topic Section
See S3 Browser.
Sources of Data
You can use this integration to create datasets from single files or a single directory. Below are some example
URLs for sources from Hadoop HDFS or S3:
HDFS Directory hdfs:///user/warehouse/campaign_data/ User can choose the file through the UI to use for the
dataset.
File hdfs:///user/warehouse/campaign_data/d000001_01.csv User can complete the steps through the UI to create
the dataset.
S3 Directory s3:///3fad-demo/data/biosci/source/ User can choose the file through the UI to use for the
dataset.
File s3:///3fad-demo/data/biosci/source/1-DRUG15Q1.txt User can complete the steps through the UI to create
the dataset.
NOTE: The above results assume that the user has the appropriate permissions to access the file or
directory. If the user lacks permissions, an HTTP 404 error is displayed.
Step-by-Step Guide
Steps:
1. Acquire the target URL for the datastore through the Trifacta® application or through the datastore itself.
Examples URLs:
a. HDFS (file):
hdfs:///user/warehouse/campaign_data/d000001_01.csv
b. S3 (directory):
s3:///3fad-demo/data/biosci/source/
2. Navigate the browser to the appropriate URL in the Trifacta platform. The following example applies to the
HDFS file example from above. It must be preceded by the base URL for the platform. For more
information, see API - UI Integrations.
<base_url>/import/data?uri=hdfs:///user/warehouse/campaign_data/d000
001_01.csv
http://latest-dev.trifacta.net:3005/flows/31#dataset=186 31 186
The flowId is consistent across all datasets that you imported through the above steps.
6. You can open the datasets and wrangle them as needed.
You can run jobs on the dataset through the following interfaces:
UI: See Run Job Page.
API: See API JobGroups Create v3.
CLI: See CLI for Jobs.
API Workflows
In this section, you can review examples of how to execute workflows using one or more of the available API
endpoints.
Topics:
API Workflow - Develop a Flow
API Workflow - Deploy a Flow
API Workflow - Run Job on Dataset with Parameters
API Workflow - Publish Results
API Workflow - Manage Outputs
API Workflow - Swap Datasets
Contents:
Overview
Example Datasets
Step - Create Containing Flow
Step - Create Datasets
Step - Wrangle Data
Step - Run Job
Step - Monitoring Your Job
Step - Re-run Job
This example walks through the process of creating, identifying, and executing a job through automated methods.
For this example, these tasks are accomplished using the following methods:
NOTE: This API workflow applies to a Development instance of the Trifacta® platform, which is the
default platform instance type. For more information on Development and Production instance, see
Overview of Deployment Manager.
1. Locate or create flow. The datasets that you wrangle must be contained within a flow. You can add them
to an existing flow or create a new one through the APIs.
2. Create dataset. Through the APIs, you create an imported dataset from an asset that is accessible through
one of the established connections. Then, you create the recipe object through the API.
a. For the recipe, you must retrieve the internal identifier.
b. Through the application, you modify the recipe for the dataset.
3. Automate job execution. Using the APIs, you can automate execution of the wrangling of the dataset.
a. As needed, this job can be re-executed on a periodic basis or whenever the source files are
updated.
Example Datasets
In this example, you are attempting to wrangle monthly point of sale (POS) data from three separate regions into a
single dataset for the state. This monthly data must be enhanced with information about the products and stores
in the state. So, the example has a combination of transactional and reference data, which must be brought
together into a single dataset.
Tip: To facilitate re-execution of this job each month, the transactional data should be stored in a
dedicated directory. This directory can be overwritten with next month's data using the same filenames.
As long as the new files are structured in an identical manner to the original ones, the new month's data
can be processed by re-running the API aspects of this workflow.
Example Files:
The following files are stored on your HDFS deployment:
NOTE: The reference and transactional data are stored in separate directories. In this case, you can
assume that the user has read access through his Trifacta account to these directories, although this
access must be enabled and configured for real use cases.
Base URL:
For purposes of this example, the base URL for the Trifacta platform is the following:
To begin, you must locate a flow or create a flow through the APIs to contain the datasets that you are importing.
NOTE: You cannot add datasets to the flow through the flows endpoint. Moving pre-existing datasets
into a flow is not supported in this release. Create or locate the flow first and then when you create the
datasets, associate them with the flow at the time of creation.
See API ImportedDatasets Create v4.
See API WrangledDatasets Create v4.
Locate:
NOTE: If you know the display name value for the flow and are confident that it is not shared with any
other flows, you can use the APIs to retrieve the flowId. See API Flows Get List v4.
Endpoint http://www.example.com:3005/v4/flows
Authentication Required
Method POST
Request Body
{
"name": "Point of Sale - 2013",
"description": "Point of Sale data for state"
}
2. The response should be status code 201 - Created with a response body like the following:
Checkpoint: You have identified or created the flow to contain your dataset or datasets.
Endpoint http://www.example.com:3005/v4/importedDataset
Authentication Required
Method POST
3. You should receive a 201 - Created response with a response body similar to the following:
{
"id": 8,
"size": "281032",
"path": "/user/pos/POS-r01.txt",
"dynamicPath": null,
"type": "hdfs",
"bucket": null,
"isSchematized": true,
"isDynamic": false,
"disableTypeInference": false,
"updatedAt": "2017-02-08T18:38:56.640Z",
"createdAt": "2017-02-08T18:38:56.560Z",
"parsingScriptId": {
"id": 14
},
"runParameters": {
"data": []
},
"name": "POS-r01.txt",
"description": "POS-r01.txt",
"creator": {
"id": 1
},
"updater": {
"id": 1
},
"connection": null
}
4. You must retain the id value so you can reference it when you create the recipe.
5. See API ImportedDatasets Create v4.
6. Next, you create the recipe. Construct the following request:
Endpoint http://www.example.com:3005/v4/wrangledDatasets
Authentication Required
Method POST
7. You should receive a 201 - Created response with a response body similar to the following:
{
"id": 23,
"wrangled": true,
"updatedAt": "2018-02-06T19:59:22.735Z",
"createdAt": "2018-02-06T19:59:22.698Z",
"name": "POS-r01",
"active": true,
"referenceInfo": null,
"activeSample": {
"id": 23
},
"creator": {
"id": 1
},
"updater": {
"id": 1
},
"recipe": {
"id": 23
},
"flow": {
"id": 10
}
}
8. From the recipe, you must retain the value for the id. For more information, see
API WrangledDatasets Create v4.
9. Repeat the above steps for each of the source files that you are adding to your flow.
Checkpoint: You have created a flow with multiple imported datasets and recipes.
After you have created the flow with all of your source datasets, you can wrangle the base dataset to integrate all
of the source into it.
Steps for Transactional data:
NOTE: When you join or union one dataset into another, changes made in the joined
dataset are automatically propagated to the dataset where it has been joined.
You can repeat the above general process to integrate the reference data for stores.
Checkpoint: You have created a flow with multiple datasets and have integrated all of the relevant data
into a single dataset.
Through the APIs, you can specify and run a job. In the above example, you must run the job for the terminal
dataset, which is POS-r01 in this case. This dataset contains references to all of the other datasets. When the job
is run, the recipes for the other datasets are also applied to the terminal dataset, which ensures that the output
reflects the proper integration of these other datasets into POS-r01.
Steps:
1. Acquire the internal identifier for the recipe for which you wish to execute a job. In the previous example,
this identifier was 23.
2. Construct a request using the following:
Endpoint http://www.example.com:3005/v4/jobGroups
Authentication Required
Method POST
Request Body:
3. In the above example, the specified job has been launched for recipe 23 to execute on the Photon running
environment with profiling enabled.
a. Output format is CSV to the designated path. For more information on these properties, see
API JobGroups Create v4.
b. Output is written as a new file with no overwriting of previous files.
4. A response code of 201 - Created is returned. The response body should look like the following:
{
"reason": "JobStarted",
"sessionId": "9c2c6220-ef2d-11e6-b644-6dbff703bdfc"
"id": 3,
}
You can monitor the status of your job through the following endpoint:
Endpoint http://www.example.com:3005/v4/jobgroup/<id>/status
Authentication Required
Method GET
Request Body None.
When the job has successfully completed, the returned status message is the following:
In the future, you can re-run the job exactly as you specified it by executing the following call:
Tip: You can swap imported datasets before re-running the job. For example, if you have uploaded a new
file, you can change the primary input dataset for the dataset and then use the following API call to re-run
the job as specified. See API WrangledDatasets Put PrimaryInputDataset v4.
Endpoint http://www.example.com:3005/v4/jobGroups
Authentication Required
Method POST
Request Body
{
"wrangledDataset": {
"id": 23
}
}
Contents:
Overview
Pre-requisites
Workflow
Step - Get Flow Id
Step - Export a Flow
Step - Create Deployment
Step - Create Connection
Step - Create Import Rules
Step - Import Release
Step - Activate Release
Step - Run Deployment
Step - Iterate
Step - Set up Production Schedule
Pre-requisites
Finished flow: This example assumes that you have finished development of a flow with the following
characteristics:
Single dataset imported from a table through a Redshift connection
Single JSON output
Separate Dev and Prod instances: Although it is possible to deploy flows to the same instance in which they are
developed, this example assumes that you are deploying from a Dev instance to a completely separate Prod
instance. The following implications apply:
Separate user accounts to access Dev (User1) and Prod (Admin2) instances.
Tip: You should do all of your recipe development and testing in Dev/Test. Avoid making changes
in a Prod environment.
NOTE: Although these are separate user accounts, the assumption is that the same admin-level
user is using these accounts through the APIs.
New connections must be created in the Prod instance to access the production version of the database
table.
Workflow
In this example, your environment contains separate Dev and Prod instances, each of which has a different set of
users.
Tip: Dev environment work can be done through the UI, which may
be easier.
Example Flow:
The first general step is for the Dev user (User1) to get the flowId and export the flow from the Dev instance.
Steps:
Tip: If it's easier, you can gather the flowId from the user interface in Flow View. In the following example,
the flowId is 21:
http://www.wrangle-dev.example.com:3005/flows/21
1. Through the APIs, you can create a flow using the following call:
Endpoint http://www.wrangle-dev.example.com:3005/v4/flows
Authentication Required
Method GET
Request Body None.
2. The response should be status code 200 - OK with a response body like the following:
Tip: This step may be easier to do through the UI in the Dev instance.
Steps:
1. Export flowId=21:
Endpoint http://www.wrangle-dev.example.com:3005/v4/flows/21/package
Authentication Required
Method GET
Request Body None.
2. The response should be status code 200 - OK. The response body is the flow itself.
3. Download and save this file to your local desktop. Let's assume that the filename you choose is flow-Wra
ngleOrders.zip.
For more information, see API Flows Package Get v4.
In the Prod environment, you can create the deployment from which you can manage the new flow. Note that the
following information has changed for this environment:
userId Admin2
baseURL http://www.wrangle-prod.example.com:3005
Steps:
1. Through the APIs, you can create a deployment using the following call:
Endpoint http://www.wrangle-prod.example.com:3005/v4/deployments
Authentication Required
NOTE: Username and password credentials must be submitted for the Admin2 account.
Method POST
Request Body
{
"name": "Production Orders"
}
2. The response should be status code 201 - Created with a response body like the following:
When a flow is exported, its connections are not included in the export. Before you import the flow into a new
environment:
Connections must be created or recreated in the Prod environment. In some cases, you may need to point
to production versions of the data contained in completely different databases.
Rules must be created to remap the connection to use in the imported flow.
This section and the following step through these processes.
Steps:
1. From the Dev environment, you collect the connection information for the flow:
Endpoint http://www.wrangle-dev.example.com:3005/v4/connections
Authentication Required
NOTE: Username and password credentials must be submitted for the User1 account.
Method GET
Request Body None.
2. The response should be status code 200 - Ok with a response body like the following:
{
"data": [
{
"connectParams": {
"vendor": "redshift",
"vendorName": "redshift",
"host": "dev-redshift.example.com",
"port": "5439"
},
"id": 9,
Endpoint http://www.wrangle-prod.example.com:3005/v4/connections
Authentication Required
NOTE: Username and password credentials must be submitted for the Admin2 account.
Method POST
Request Body
{
"connectParams": {
"vendor": "redshift",
"vendorName": "redshift",
"host": "prod-redshift.example.com",
"port": 1433
}
"host": "prod-redshift.example.com",
"port": 1433,
"vendor": "redshift",
"params": {
"connectStrOpts": "",
"defaultDatabase": "prodWrangleDB",
"extraLoadParams": "BLANKSASNULL EMPTYASNULL
TRIMBLANKS TRUNCATECOLUMNS"
},
"vendorName": "redshift",
"name": "Redshift Conn Prod",
"description": "",
"isGlobal": true,
"type": "jdbc",
"ssl": false,
"credentialType": "custom",
"credentials": [
{
"username": "prodDBUser",
"password": "<password>"
}
]
}
5. The response should be status code 201 - Created with a response body like the following:
6. When you hit the /v4/connections endpoint again, you can retrieve the connectionId for this
connection. In this case, let's assume that the connectionId value is 12.
Now that you have defined the connection to use to acquire the production data from within the production
environment, you must create an import rule to remap from the Dev connection to the Prod connection within the
flow definition. This rule is applied during the import process to ensure that the flow is working after it has been
imported.
In this case, you must remap the uuid value for the Dev connection, which is written into the flow definition, with
the connection Id value from the Prod instance.
For more information on import rules, see Define Import Mapping Rules.
Steps:
1. From the Dev environment, you collect the connection information for the flow:
Endpoint http://www.wrangle-dev.example.com:3005/v4/connections
Authentication Required
NOTE: Username and password credentials must be submitted for the User1 account.
Method GET
Request Body None.
2. The response should be status code 200 - Ok with a response body like the following:
{
"data": [
{
"connectParams": {
"vendor": "redshift",
"vendorName": "redshift",
"host": "dev-redshift.example.com",
"port": "5439"
},
"id": 9,
"host": "dev-redshift.example.com",
"port": 5439,
"vendor": "redshift",
"params": {
"connectStrOpts": "",
"defaultDatabase": "devWrangleDB",
"extraLoadParams": "BLANKSASNULL EMPTYASNULL
TRIMBLANKS TRUNCATECOLUMNS"
},
"ssl": false,
"vendorName": "redshift",
"name": "Dev Redshift Conn",
"description": "",
"type": "jdbc",
"isGlobal": true,
"credentialType": "custom",
3. From the above information, you retain the following, which uniquely identifies the connection object,
regardless of the instance to which it belongs:
"uuid": "b8014610-ce56-11e7-9739-27deec2c3249",
4. Against the Prod environment, you now create an import mapping rule:
Endpoint http://www.wrangle-prod.example.com:3005/v4/deployments/3/objectImportRules
Authentication Required
Method PATCH
Request Body:
[{"tableName":"connections","onCondition":{"uuid":
"b8014610-ce56-11e7-9739-27deec2c3249"},"withCondition":{"id":12}}]
5. The response should be status code 200 - Ok with a response body like the following:
{
"deleted": []
}
Since the method is a PATCH, you are updating the rules set that applies to all imports for this deployment.
In this case, there were no pre-existing rules, so the response indicates that nothing was deleted. If another
set of import rules is submitted, then the one you just created is deleted.
See API Deployments Object Import Rules Patch v4.
See API Deployments Value Import Rules Patch v4.
You are now ready to import the package into the release.
Steps:
1. Against the Prod environment, you now import the package:
Endpoint http://www.wrangle-prod.example.com:3005/v4/deployments/3/releases
Authentication Required
Method POST
key value
data "@path-to-flow-WrangleOrders.zip"
2. The response should be status code 201 - Created with a response body like the following:
{ "importRuleChanges": {
"object": [{"tableName":"connections","onCondition":{"uuid":
"b8014610-ce56-11e7-9739-27deec2c3249"},"withCondition":{"id":12}}],
"value": []
},
"flowName": "Wrangle Orders"
}
When a package is imported into a release, the release is automatically set as the active release for the
deployment. If at some point in the future, you need to change the active release, you can use the following
endpoint to do so.
Steps:
1. Against the Prod environment, use the following endpoint:
Endpoint http://www.wrangle-prod.example.com:3005/v4/releases/5
Authentication Required
Method PATCH
Request Body
{
"active": true
}
2. The response should be status code 200 - OK with a response body like the following:
{
"id": 3,
"updater": {
"id": 3,
}
"updatedAt": "2017-11-28T00:06:12.147Z"
}
You can now execute a test run of the deployment to verify that the job executes properly.
NOTE: When you run a deployment, you run the primary flow in the active release for that deployment.
Running the flow generates the output objects for all recipes in the flow.
NOTE: For datasets with parameters, you can apply parameter overrides through the request body
through the following API call. For more information, see API Deployments Run v4.
Steps:
1. Against the Prod environment, use the following endpoint:
Endpoint http://www.wrangle-prod.example.com:3005/v4/deployments/3/run
Authentication Required
Method POST
Request Body None.
2. The response should be status code 201 - Created with a response body like the following:
{
"data": [
{
"reason": "JobStarted",
"sessionId": "dd6a90e0-c353-11e7-ad4e-7f2dd2ae4621",
"id": 33
}
]
}
Step - Iterate
If you need to make changes to fix issues related to running the job:
Recipe changes should be made in the Dev environment and then passed through export and import of the
flow into the Prod deployment.
Connection issues:
Check Flow View in the Prod instance to see if there are any red dots on the objects in the package.
If so, your import rules need to be fixed.
Verify that you can import data through the connection.
Output problems could be related to permissions on the target location.
When you are satisfied with how the production version of your flow is working, you can set up periodic schedules
using a third-party tool to execute the job on a regular basis.
Contents:
Overview
Basic Workflow
Example Datasets
Step - Create Containing Flow
Step - Create Datasets with Parameters
Example 1 - Dataset with Datetime parameter
Example 2 - Dataset with Variable
Example 3 - Dataset with pattern parameter
Step - Wrangle Data
Step - Run Job
Example 1 - Dataset with Datetime parameter
Example 2 - Dataset with Variable
Example 3 - Dataset with pattern parameter
Step - Monitoring Your Job
Step - Re-run Job
Overview
This example workflow describes how to run jobs on datasets with parameters through the Trifacta® platform. A d
ataset with parameters is a dataset in which some part of the path to the data objects has been parameterized.
Since one or more of the parts of the path can vary, you can build a dataset with parameters to capture data that
spans multiple files. For example, datasets with parameters can be used to parameterize serialized data by region
or data or other variable.
NOTE: This API workflow only works with version 4 (v4) or later of the APIs.
Basic Workflow
The basic method by which you build and run a job for a dataset with parameters is very similar to the
non-parameterized dataset method with a few notable exceptions. The steps in this workflow follow the same
steps for the standard workflow. Where the steps overlap links have been provided to the non-parameterized
workflow. For more information, see API Workflow - Develop a Flow.
Example Datasets
This example covers three different datasets, each of which features a different type of dataset with parameters.
1 Datetime In this example, a directory is used to store daily orders transactions. This dataset must be defined with a
parameter Datetime parameter to capture the preceding 7 days of data. Jobs can be configured to process all of this data as
it appears in the directory.
2 Variable This dataset segments data into four timezones across the US. These timezones are defined using the following
text values in the path: pacific, mountain, central, and eastern. In this case, you can create a
parameter called region, which can be overridden at runtime to be set to one of these four values during job
execution.
You must create the flow to host your dataset with parameters.
In the response, you must capture and retain the flow Identifer.
For more information, see API Workflow - Develop a Flow.
NOTE: When you import a dataset with parameters, only the first matching dataset is used for the initial
file. If you want to see data from other matching files, you must collect a new sample within the
Transformer page.
MyFiles/1/Datetime/2018-04-06-orders.csv
MyFiles/1/Datetime/2018-04-05-orders.csv
MyFiles/1/Datetime/2018-04-04-orders.csv
MyFiles/1/Datetime/2018-04-03-orders.csv
MyFiles/1/Datetime/2018-04-02-orders.csv
MyFiles/1/Datetime/2018-04-01-orders.csv
MyFiles/1/Datetime/2018-03-31-orders.csv
When you navigate to the directory through the application, you mouse over one of these files and select Paramet
erize.
In the window, select the date value (e.g. YYYY-MM-DD) and then click the Datetime icon.
Datetime Parameter:
Format: YYYY-MM-DD
Date Range: Date is last 7 days.
Click Save.
The Datetime parameter should match with all files in the directory. Import this dataset and wrangle it.
After you wrangle the dataset, return to its flow view and select the recipe. You should be able to extract the flowId
and recipeId values from the URL.
For purposes of this example, here are some key values:
flowId: 35
recipeId: 127
When you navigate to the directory through the application, you mouse over one of these files and select Paramet
erize.
In the window, select the region value, which could be one of the following depending on the file: eastern, cent
ral, mountain, or pacific. Click the Variable icon.
Variable Parameter:
Name: region
Default Value:Set this default to pacific.
Click Save.
In this case, the variable only matches one value in the directory. However, when you apply runtime overrides to
the region variable, you can set it to any value.
MyFiles/1/pattern/POS-r01.csv
MyFiles/1/pattern/POS-r02.csv
MyFiles/1/pattern/POS-r03.csv
When you navigate to the directory through the application, you mouse over one of these files and select Paramet
erize.
In the window, select the two numeric digits (e.g. 02). Click the Pattern icon.
Pattern Parameter:
Type: Trifacta pattern
Matching regular expression: {digit}{2}
Click Save.
In this case, the Trifacta pattern should match any sequence of two digits in a row. In the above example, this
expression matches: 01, 02, and 03, all of the files in the directory.
Checkpoint: You have created flows for each type of dataset with parameters.
After you have created your dataset with parameter, you can wrangle it through the application. For more
information, see Transformer Page.
Below, you can review the API calls to run a job for each type of dataset with parameters, including relevant
information about overrides.
In the following example, the Datetime parameter has been overridden with the value 2018-04-03 as part of the
job creation.
NOTE: You cannot apply overrides to these types of datasets with parameters.
1. Endpoint http://www.example.com:3005/v4/jobGroups
Authentication Required
Method POST
Request Body
{
"wrangledDataset": {
"id": 127
},
"overrides": {
"execution": "photon",
"profiler": true,
"writesettings": [
{
"path":
"MyFiles/queryResults/[email protected]/2018-04-03-orders.csv",
"action": "create",
"format": "csv",
"compression": "none",
"header": false,
"asSingleFile": false
}
]
},
"runParameters": {}
}
{
"reason": "JobStarted",
"sessionId": "5b883530-3920-11e8-a37a-db6dae3c6e43",
"id": 29
}
In the following example, the region variable has been overwritten with the value central to execute the job
on orders-central.csv:
1. Endpoint http://www.example.com:3005/v4/jobGroups
Authentication Required
Method POST
2. In the above example, the job has been launched for recipe 123 to execute on the Photon running
environment with profiling enabled.
a. Output format is CSV to the designated path. For more information on these properties, see
API JobGroups Create v4.
b. Output is written as a new file with no overwriting of previous files.
3. A response code of 201 - Created is returned. The response body should look like the following:
{
"reason": "JobStarted",
"sessionId": "aa0f9f00-391f-11e8-a37a-db6dae3c6e43",
"id": 27
}
In the following example, the value 02 has been inserted into the pattern to execute the job on POS-r02.csv:
1. Endpoint http://www.example.com:3005/v4/jobGroups
Authentication Required
Method POST
Request Body
{
"wrangledDataset": {
"id": 121
},
"overrides": {
"execution": "photon",
"profiler": false,
"writesettings": [
{
"path":
"hdfs://hadoop:50070/trifacta/queryResults/[email protected]/POS-r02.
"action": "create",
"format": "csv",
"compression": "none",
"header": false,
"asSingleFile": false
}
]
},
"runParameters": {}
}
2. In the above example, the job has been launched for recipe 121 to execute on the Photon running
environment with profiling enabled.
a. Output format is CSV to the designated path. For more information on these properties, see
API JobGroups Create v4.
b. Output is written as a new file with no overwriting of previous files.
3. A response code of 201 - Created is returned. The response body should look like the following:
{
"reason": "JobStarted",
"sessionId": "16424a60-3920-11e8-a37a-db6dae3c6e43",
"id": 28
}
After the job has been created and you have captured the jobGroup Id, you can use it to monitor the status of your
job. For more information, see API JobGroups Get Status v4.
If you need to re-run the job as specified, you can use the wrangledDataset identifier to re-run the most recent job.
Tip: When you re-run a job, you can change any variable values as part of the request.
Example request:
Endpoint http://www.example.com:3005/v4/jobGroups
Authentication Required
Method POST
Request Body
{
"wrangledDataset": {
"id": 123
},
"runParameters": {
"overrides": {
"data": [{
"key": "region",
"value": "central"
}
]}
}
}
Contents:
Overview
Basic Workflow
Step - Create Connections
Step - Run Job
Step - Publish Results to Hive
Step - Publish Results to Redshift
Step - Publish Results to Tableau Server
Step - Publish Results to SQL DW
Overview
After you have run a job to generate results, you can publish those results to different targets as needed. This
section describes how to automate those publishing steps through the APIs.
NOTE: This workflow applies to re-publishing job results after you have already generated them.
In the application, you can publish after generating results. See Publishing Dialog.
Basic Workflow
1. Create connections to each target to which you wish to publish. Connections must support write
operations.
2. Specify a job whose output meets the requirements for the target.
3. Run the job.
4. When the job completes, publish the results to the target(s).
For each target, you must have access to create a connection to it. After a connection is created, it can be reused,
so you may find it easier to create them through the application.
Other connections must be created through the application. Links to instructions are provided below.
NOTE: Connections created through the application must be created through the Connections page,
which is used for creating read/write connections. Do not create these connections through the Import
Data page. See Connections Page.
Redshift Avro 2 N Create Redshift Requires S3 set as the base storage layer. See
Connections Set Base Storage Layer.
Before you publish results to a different datastore, you must generate results and store them in HDFS.
NOTE: To produce some output formats, you must run the job on your Hadoop cluster.
Identifier Value
jobId 2
flowId 3
For more information on running a job, see API JobGroups Create v4.
For more information on the publishing endpoint, see API JobGroups Put Publish v4.
The following uses the Avro results from the specified job (jobId = 2) to publish the results to the test_table tab
le in the default Hive schema through connectionId=1.
NOTE: To publish to Hive, the targeted database is predefined in the connection object. For the path val
ue in the request body, you must specify the schema in this database to use. Schema information is not
available through API. To explore the available schemas, click the Hive icon in the Import Data page. The
schemas are the first level of listed objects. For more information, see Import Data Page.
Request:
Endpoint http://www.wrangle-dev.example.com:3005/v4/jobGroups/2/publish
Authentication Required
Method PUT
Request Body
{
"connection": {
"id": 1
},
"path": ["default"],
"table": "test_table",
"action": "create",
"inputFormat": "avro",
"flowNodeId": 10
}
Response:
The following uses the Avro results from the specified job (jobId = 2) to publish the results to the test_table2 ta
ble in the public Redshift schema through connectionId=2.
Request:
Endpoint http://www.wrangle-dev.example.com:3005/v4/jobGroups/2/publish
Authentication Required
Method PUT
Request Body
{
"connection": {
"id": 2
},
"path": ["public"],
"table": "test_table2",
"action": "create",
"inputFormat": "avro",
"flowNodeId": 10
}
Response:
The following uses the TDE results from the specified job (jobId = 2) to publish the results to the test_table3 ta
ble in the default Tableau Server database through connectionId=3.
Request:
Endpoint http://www.wrangle-dev.example.com:3005/v4/jobGroups/2/publish
Authentication Required
Method PUT
Response:
The following uses the Parquet results from the specified job (jobId = 2) to publish the results to the test_table
4 table in the dbo SQL DW database through connectionId=4.
Request:
Endpoint http://www.wrangle-dev.example.com:3005/v4/jobGroups/2/publish
Authentication Required
Method PUT
Request Body
{
"connection": {
"id": 4
},
"path": ["dbo"],
"table": "test_table4",
"action": "createAndLoad",
"inputFormat": "pqt",
"flowNodeId": 10
}
Response:
Contents:
Overview
Basic Workflow
Step - Get Recipe ID
Step - Create OutputObject
Step - Run a Test Job
Step - Create WriteSettings Object
Step - Get Connection ID for Publication
Step - Create a Publication
Overview
Through the APIs, you can separately manage the outputs associated with an individual recipe. This workflow
describes how to create output objects, which are associated with your recipe, and how to publish those outputs
to different datastores in varying formats. You can continue to modify the output objects and their related write
settings and publications independently of managing the wrangling process. Whenever you need new results, you
can reference the wrangled dataset with which your outputs have been associated, and the job is executed and
published in the appropriate manner to your targets.
Terms...
Relevant terms:
Term Description
outputobjects An outputobject is a definition of one or more types of outputs and how they are generated. It must be
associated with a recipe.
NOTE: An outputobject must be created for a recipe before you can run a job on it. One and only one
outputobject can be associated with a recipe.
writesettings A writesettings object defines file-based outputs within an outputobject. Settings include path, format,
compression, and delimiters.
publications A publications object is used to specify a table-based output and is associated with an outputobject. Settings
include the connection to use, path, table type, and write action to apply.
NOTE: If you need to make changes for purposes of a specific job run, you can add overrides to the
request for the job. These overrides apply only for the current job. For more information, see
API JobGroups Create v4.
1. Get the internal identifier for the recipe for which you are building outputs.
2. Create the outputobject for the recipe.
3. Create a writesettings object and associate it with the outputobject.
4. Run a test job, if desired.
5. For any publication, get the internal identifier for the connection to use.
6. Create a publication object and associate it with the outputobject.
7. Run your job.
NOTE: In the APIs, a recipe is identified by its internal name, a wrangled dataset.
Request:
Endpoint http://www.wrangle-dev.example.com:3005/v4/wrangleddatasets
Authentication Required
Method GET
Request Body None.
Response:
curl -X GET \
http://www.wrangle-dev.example.com:3005/v4/connections \
-H 'authorization: Basic <auth_token>' \
-H 'cache-control: no-cache'
Terms...
Relevant terms:
Term Description
NOTE: This token must be passed with each request to the platform.
Checkpoint: In the above, let's assume that the recipe identifier of interest is wrangledDataset=11.
This means that the flow where it is hosted is flow.id=4. Retain this information for later.
The following example includes an embedded writesettings object, which generates a CSV file output. You
can remove this embedded object if desired, but you must create a writesettings object before you can
generate an output.
Request:
Endpoint http://www.wrangle-dev.example.com:3005/v4/outputobjects
Authentication Required
Method POST
Response:
curl -X POST \
http://www.wrangle-dev.example.com/v4/outputobjects \
-H 'authorization: Basic <auth_token>' \
-H 'cache-control: no-cache' \
-H 'content-type: application/json' \
-d '{
"execution": "photon",
"profiler": true,
"isAdhoc": true,
"writeSettings": {
"data": [
{
"delim": ",",
"path":
"hdfs://hadoop:50070/trifacta/queryResults/[email protected]/POS_01.avro"
,
"action": "create",
"format": "avro",
"compression": "none",
"header": false,
"asSingleFile": false,
"prefix": null,
"suffix": "_increment",
"hasQuotes": false
}
]
},
"flowNode": {
"id": 11
}
}'
Terms...
Relevant terms:
Term Description
NOTE: This token must be passed with each request to the platform.
Checkpoint: You've created an outputobject (id=4) and an embedded writesettings object and have
associated them with the appropriate recipe flowNodeId=11. You can now run a job for this recipe
generating the specified output.
Now that outputs have been defined for the recipe, you can just execute a job on the specified recipe flowNodeI
d=11:
Request:
Endpoint http://www.wrangle-dev.example.com:3005/v4/jobGroups
Authentication Required
Method POST
Request Body
{
"wrangledDataset": {
"id": 11
}
}
Response:
NOTE: To re-run the job against its currently specified outputs, writesettings, and publications, you only
need the recipe ID. If you need to make changes for purposes of a specific job run, you can add overrides
to the request for the job. These overrides apply only for the current job. For more information, see
API JobGroups Create v4.
Endpoint http://www.wrangle-dev.example.com:3005/v4/writesettings/
Authentication Required
Method POST
Request Body
{
"delim": ",",
"path":
"hdfs://hadoop:50070/trifacta/queryResults/[email protected]/POS_r03.pqt",
"action": "create",
"format": "pqt",
"compression": "none",
"header": false,
"asSingleFile": false,
"prefix": null,
"suffix": "_increment",
"hasQuotes": false,
"outputObjectId": 4
}
Response:
cURL example:
curl -X POST \
http://www.wrangle-dev.example.com/v4/writesettings \
-H 'authorization: Basic <auth_token>' \
-H 'cache-control: no-cache' \
-H 'content-type: application/json' \
-d '{ "delim": ",",
"path":
"hdfs://hadoop:50070/trifacta/queryResults/[email protected]/POS_r03.pqt"
,
"action": "create",
"format": "pqt",
"compression": "none",
"header": false,
"asSingleFile": false,
"prefix": null,
"suffix": "_increment",
"hasQuotes": false,
"outputObject": {
"id": 4
}
}
Term Description
NOTE: This token must be passed with each request to the platform.
Checkpoint: You've added a new writesettings object and associated it with your outputobject (id=4).
When you run the job again, the Parquet output is also generated.
To generate a publication, you must identify the connection through which you are publishing the results.
Below, the request returns a single connection to Hive (id=1).
Request:
Endpoint http://www.wrangle-dev.example.com:3005/v4/connections
Authentication Required
Method GET
Request Body None.
Response:
cURL example:
Terms...
Relevant terms:
Term Description
NOTE: This token must be passed with each request to the platform.
You can create publications that publish table-based outputs through specified connections. In the following, a
Hive table is written out to the default database through connectionId = 1. This publication is associated with
the outputObject id=4.
Request:
Endpoint http://www.wrangle-dev.example.com:3005/v4/publications
Authentication Required
Method POST
Request Body
{
"path": [
"default"
],
"tableName": "myPublishedHiveTable",
"targetType": "hive",
"action": "create",
"outputObject": {
"id": 4
},
"connection": {
"id": 1
}
}
cURL example:
curl -X POST \
http://latest-dev.trifacta.net:3005/v4/publications \
-H 'authorization: Basic <auth_token>' \
-H 'cache-control: no-cache' \
-H 'content-type: application/json' \
-d '{
"path": [
"default"
],
"tableName": "myPublishedHiveTable",
"targetType": "hive",
"action": "create",
"outputObject": {
"id": 4
},
"connection": {
"id": 1
}
}'
Term Description
NOTE: This token must be passed with each request to the platform.
Contents:
Overview
Example Datasets
Assumptions
Step - Import Dataset
Step - Swap Dataset from Recipe
Step - Rerun Job
Step - Monitor Your Job
Step - Schedule Your Job
Overview
After you have created a flow, imported a dataset, and created a recipe for that dataset, you may need to swap in
a different dataset and run the recipe against that one. This workflow steps through that process via the APIs.
NOTE: If you are processing multiple parallel datasources in a single job, you should create a dataset
with parameters and then run the job. For more information, see
API Workflow - Run Job on Dataset with Parameters .
Example Datasets
In this example, you are wrangling data from orders placed in different regions on a quarterly basis. When a new
file drops, you want to be able to swap out the current dataset that is assigned to the recipe and swap in the new
one. Then, run the job.
Example Files:
The following files are stored on your HDFS deployment:
Assumptions
You have already created a flow, which contains the following imported dataset and recipe:
NOTE: When an imported dataset is created via API, it is always imported as an unstructured dataset.
Any recipe that references this dataset should contain initial parsing steps required to structure the data.
Tip: Through the UI, you can import one of your datasets as unstructured. Create a recipe for this dataset
and then edit it. In the Recipe panel, you should be able to see the structuring steps. Back in Flow View,
you can chain your structural recipe off of this one. Dataset swapping should happen on the first recipe.
flow MyCo-Orders-Quarter 2
Job n/a 3
Base URL:
For purposes of this example, the base URL for the Trifacta platform is the following:
http://www.example.com:3005
NOTE: You cannot add datasets to the flow through the flows endpoint. Moving pre-existing datasets
into a flow is not supported in this release. Create or locate the flow first and then when you create the
datasets, associate them with the flow at the time of creation.
See API ImportedDatasets Create v4.
See API WrangledDatasets Create v4.
NOTE: When an imported dataset is created via API, it is always imported as an unstructured dataset.
Any recipe that references this dataset should contain initial parsing steps required to structure the data.
The following steps describe how to create an imported dataset and assign it to the flow that has already been
created (flowId=2).
Steps:
1. To create an imported dataset, you must acquire the following information about the source.
a. path
b. type
c. name
d. description
e. bucket (if a file stored on S3)
2. In this example, the file you are importing is MyCo-orders-west-Q2.txt. Since the files are similar in
nature and are stored in the same directory, you can acquire this information by gathering the information
from the imported dataset that is already part of the flow. Execute the following:
Endpoint http://www.example.com:3005/v4/importedDatasets
Authentication Required
Method POST
Request Body
{
"path": "/user/orders/MyCo-orders-west-Q2.txt",
"type": "hdfs",
"bucket": null,
"name": "MyCo-orders-west-Q2.txt",
"description": "MyCo-orders-west-Q2"
}
3. The response should be a 201 - Created status code with something like the following:
4. You must retain the id value so you can reference it when you create the recipe.
5. See API ImportedDatasets Create v4.
Checkpoint: You have imported a dataset that is unstructured and is not associated with any flow.
The next step is to swap the primary input dataset for the recipe to point at the newly imported dataset. This step
automatically adds the imported dataset to the flow and drops the previous imported dataset from the flow.
1. Use the following to swap the primary input dataset for the recipe:
Endpoint http://www.example.com:3005/v4/wrangledDatasets/9/primaryInputDataset
Authentication Required
Method PUT
2. The response should be a 200 - OK status code with something like the following:
3. The new imported dataset is now the primary input for the recipe, and the old imported dataset has been
removed from the flow.
To execute a job on this recipe, you can simply re-run any job that was executed on the old imported dataset,
since you reference the job by jobId and wrangledDataset (recipe) Id.
Method POST
Request Body
{
"wrangledDataset": {
"id": 9
}
}
After the job has been queued, you can track it to completion. See API Workflow - Develop a Flow.
When you are satisfied with how your flow is working, you can set up periodic schedules using a third-party tool to
execute the job on a regular basis.
The tool must hit the above endpoints to swap in the new dataset and run the job.