Case study: Defining depth through parameterization in DataStage

―Enterpriseㅤ―Product and UXㅤ―Web

Paul Hwang
4 min readJul 17, 2021

DataStage is an AI-powered data integration service that lives within the Cloud Pak for Data platform. Our team has been working on modernizing this product, translating the legacy version to something more fitting for current customers. One key feature of DataStage is the ability to parameterize any property, where a user can place variables in place of values on any property within a stage, allowing them to expedite their workflow with pre-set values and be flexible with their values during job runtime.

Before picking up this project, it was left with an initial set of mockups, created with only development resources. Most of the user needs were already throughly defined by project management, and backed up with plenty of documentation. This set a good springboard for design to go straight into the iteration phase.

The design work done with parameters in DataStage would set the paradigm of how this feature would be handled throughout all products within IBM.

👥 Persona and use cases

Daniel, a data engineer, is responsible for transforming data into a more useful format for analysis. He works with databases (Amazon, DB2, Microsoft SQL servers, etc.) and transforms them in DataStage. He utilizes parameters to put data where other personas (Data Stewards, Data Scientist, and Business Analysts) can find them, so that they can easily plug in their own data. In general, his pain points include needing to correct pipelines that weren’t designed by him, many are dependent on him to fix failures, and can’t put data where other personas can utilize it. He desires templating and automation to do his job more quickly.

There were two goals: modernize the legacy product and rethink the experience to be intuitive to new users.

Within the product, parameters appear in three places:

  • On the DataStage canvas, where users create flows
  • In Watson projects, where they can be managed
  • In jobs, when scheduling runs, where new values can be assigned before runtime.

(For the sake of this case study, I will be going over the flows for using parameters on the canvas and when going through jobs.)

From the canvas

This feature was designed and implemented with the canvas view first. From the canvas, parameters can be defined on the fly from the properties panel within each input field. They can also be defined from the flow parameters tearsheet prior to creating any flows.

Typically, a user would go to the field they wanted to parameterize, click on the appearing icon, go into the tearsheet, and define/select the parameters they wanted to assign.

We implemented a secondary solution alongside this for power users, where they would be able to manually input their parameter without the tearsheet, which features a “look-head” enhancement.

Setting runtime parameters while scheduling jobs

After designing a flow, users can run it with the default values, or the values they manually set, on the canvas. However, a key use case for our persona is the ability to automate these processes. Within Cloud Pak for Data, users can schedule jobs to run on their own, and to run with sets of values already created for these jobs.

--

--