Understand Complex Data Flows and Add a New Feature in Java Spring.
Posted 06 May 2024
- java
In this chat transcript, the user wants to add a new feature to Waltz – a complex Java Spring application. Waltz is a large, open source project that is currently used by Fortune 500 financial institutions.
The data type of a physical flow in the context of the provided code and project, "Waltz," refers to the type of data that is being transferred or communicated through a physical flow within the system. Physical flows represent the movement or transfer of data from one system component to another, and data types classify this data into specific categories for better organization, management, and understanding of how information is structured and utilized throughout the application.
Determination of Data Type for a Physical FlowIn the provided code snippets, several key pieces help us understand how data types for physical flows are determined, managed, and utilized:
- Physical Flow and Data Type Association In the file waltz-data/src/main/java/org/finos/waltz/data/physical_flow/PhysicalFlowDao.java (line 488-498), a query associates physical flows with data types through the PHYSICAL_SPEC_DATA_TYPE table. This association is crucial for understanding what kind of data a physical flow carries.
- Data Type Selection in UI The Svelte component DataTypeSelectionStep.svelte (line 96-130) within the waltz-ng/client/physical-flows/svelte directory provides a UI for users to select data types from a tree structure. When users select data types, these selections are added to the physical specification and, by extension, to the logical flow.
- Storing and Querying Data Types Queries and functions across the PhysicalFlowDao and other parts of the Waltz application negotiate the relationships between physical flows, their specifications, and associated data types. This underscores the system's design to link data types explicitly with physical flows through their specifications, facilitating a better understanding of the data being transferred.
dsl.select(PHYSICAL_SPECIFICATION.ID, DATA_TYPE.ID, DATA_TYPE.NAME)
...
.innerJoin(DATA_TYPE).on(PHYSICAL_SPEC_DATA_TYPE.DATA_TYPE_ID.eq(DATA_TYPE.ID))
...
DataTypeTreeSelector multiSelect={true}
on:select={onSelectDataType}
...
A physical flow's data type categorizes the type of data the flow carries or transfers between parts of the system. This categorization is realized through the system's database structure and is reflected in the application's UI, where users can manage and understand these data types. Through this framework, "Waltz" enables precise control and visibility over data flows within the application, aiding in data governance, architecture, and system analysis efforts.
The relationship between physical_flow, physical_specification, and physical_spec_data_type within Waltz is aimed at tracking and managing detailed information about how data moves within an organization, from a more granular perspective. Here's an explanation based on the provided code snippets and design documents:
- Physical Flow (physical_flow)
- Physical Specification (physical_specification)
- Physical Spec Data Type (physical_spec_data_type)
Represents an actual instance of data moving from a source to a destination within the organization. This could encapsulate a variety of transport mechanisms, data formats, and frequencies of data exchange. Physical flows are linked to logical flows, providing a detailed implementation view of the abstract pathways defined by logical flows.
Defines the technical characteristics of how data is formatted and transported in a physical flow. It includes details such as the data format, transport method, and any schemas necessary for understanding the data being exchanged. Physical specifications provide the technical blueprint which physical flows implement.
Associates a data type with a physical specification. This association specifies what kind of data (defined by data types) a physical specification can carry or process. It allows for the management of metadata about the data flowing through physical flows, including the ability to track data lineage, govern data quality, and manage data lifecycle.
Relationship Dynamics
[ Logical Flow ] --- has many ---> [ Physical Flow ] | | | '--- realized by ---> [ Physical Specification ] | | '-------------------------------------------'- has datatypes ---> [ Physical Spec Data Type ]This structured approach facilitates comprehensive data flow management, from abstract data movement policies to concrete data exchange implementations, governed by well-defined data types.
The export request for physical flows in the Waltz application involves a series of steps, combining data from various tables and utilizing conditions to filter and present the relevant information. The primary handling is done within the PhysicalFlowExtractor.java class, with specific emphasis on creating queries to fetch the necessary data.
Export Handling Process- Initiating the Export Request
- The export request is likely initiated through an HTTP endpoint, as indicated by the sequence diagram, where a POST request is made to the /data-exact/physical-flows/all/APPLICATION/ route.
- Preparing Queries
- prepareAllFlowsQuery(EntityReference ref) creates a union query of both produced and consumed physical flows relative to a given entity, identified by ref.
- It relies on two helper methods: prepareProducesQuery and prepareConsumesQuery, which construct conditions based on ownership and participation (source or target) in the logical and physical flows.
- Constructing Condition-based Queries
- The getQuery(Condition condition) method constructs the core query, filtering physical flows based on various criteria such as criticality, transport, frequency, and data format kind.
- This signifies the use of joined data from multiple tables including PHYSICAL_FLOW, PHYSICAL_SPECIFICATION, LOGICAL_FLOW, and dynamically linked ENUM_VALUE tables for attributes like criticality value, transport value, frequency value, and data format kind value.
- Fetching EnumValue Attributes
- This step involves dynamically aliasing and joining ENUM_VALUE table for each attribute (e.g., criticality, frequency) to get their display names for the export, indicating a comprehensive extraction of attributes affecting physical flows.
- Involved Tables
- PHYSICAL_FLOW and PHYSICAL_SPECIFICATION tables are primary sources, storing details about each physical flow and its specifications respectively.
- LOGICAL_FLOW table is involved to relate physical flows to their logical counterparts.
- ENUM_VALUE table is queried multiple times with different aliases (e.g., criticality_value, transport_value, etc.) to resolve the enumerations for flow characteristics like transport kind, frequency, and criticality.
- The export process is highly dependent on conditional querying and dynamic joins, especially with the ENUM_VALUE table to enrich the export data with meaningful labels instead of raw IDs or keys.
- The sequence of operations from retrieving, filtering, and joining data suggests a complex interplay between different entities representing the flows within Waltz to ensure that exported data is both comprehensive and contextually rich.
This code architecture and process outline how Waltz handles export requests for physical flows, showcasing the application's robust backend capabilities to process and present data from its complex schema.
To address the issue described, we need to modify the getQuery
method to include data type information by joining the PHYSICAL_SPEC_DATA_TYPE
and PHYSICAL_SPECIFICATION
tables, ensuring we do not alter the existing functionality of the physical flow export. Here is the enhanced method with the required additions:
waltz-data/src/main/java/org/finos/waltz/data/physical_flow/PhysicalFlowDao.java
Enhanced getQuery
Method:
private SelectConditionStep Record getQuery(Condition condition) {
EnumValue criticalityValue = ENUM_VALUE.as("criticality_value");
EnumValue transportValue = ENUM_VALUE.as("transport_value");
EnumValue frequencyValue = ENUM_VALUE.as("frequency_value");
EnumValue dataFormatKindValue = ENUM_VALUE.as("data_format_kind_value");
// Assuming there's a suitable alias for PHYSICAL_SPEC_DATA_TYPE table
Table Record physicalSpecDataType = PHYSICAL_SPEC_DATA_TYPE.as("physical_spec_data_type");
return dsl
.select(
PHYSICAL_FLOW.ID,
PHYSICAL_SPECIFICATION.NAME.as("Name"),
PHYSICAL_FLOW.EXTERNAL_ID.as("External Id"))
.select(SOURCE_AND_TARGET_NAME_AND_ASSET_CODE)
.select(
dataFormatKindValue.DISPLAY_NAME.as("Format"),
transportValue.DISPLAY_NAME.as("Transport"),
frequencyValue.DISPLAY_NAME.as("Frequency"),
criticalityValue.DISPLAY_NAME.as("Criticality"),
PHYSICAL_SPECIFICATION.DESCRIPTION.as("Description"),
// Added to include data type information
physicalSpecDataType.field("data_type_id").as("DataTypeId"))
.from(PHYSICAL_SPECIFICATION)
.innerJoin(PHYSICAL_FLOW)
.on(PHYSICAL_FLOW.SPECIFICATION_ID.eq(PHYSICAL_SPECIFICATION.ID))
.innerJoin(LOGICAL_FLOW)
.on(LOGICAL_FLOW.ID.eq(PHYSICAL_FLOW.LOGICAL_FLOW_ID))
// Join with PHYSICAL_SPEC_DATA_TYPE to fetch data type information
.leftJoin(physicalSpecDataType)
.on(PHYSICAL_SPECIFICATION.ID.eq(physicalSpecDataType.field("specification_id", Long.class)))
.leftJoin(criticalityValue).on(PHYSICAL_FLOW.CRITICALITY.eq(criticalityValue.KEY)
.and(criticalityValue.TYPE.eq(EnumValueKind.PHYSICAL_FLOW_CRITICALITY.dbValue())))
.leftJoin(transportValue).on(PHYSICAL_FLOW.TRANSPORT.eq(transportValue.KEY)
.and(transportValue.TYPE.eq(EnumValueKind.TRANSPORT_KIND.dbValue())))
.leftJoin(frequencyValue).on(PHYSICAL_FLOW.FREQUENCY.eq(frequencyValue.KEY)
.and(frequencyValue.TYPE.eq(EnumValueKind.FREQUENCY.dbValue())))
.leftJoin(dataFormatKindValue).on(PHYSICAL_SPECIFICATION.FORMAT.eq(dataFormatKindValue.KEY)
.and(dataFormatKindValue.TYPE.eq(EnumValueKind.DATA_FORMAT_KIND.dbValue())))
.where(condition);
Note: The snippet assumes you have access to a PHYSICAL_SPEC_DATA_TYPE
alias or table within this context, which is used to fetch data_type_id
. You may need to adjust the field names and join conditions based on the actual schema of your database and how the PHYSICAL_SPEC_DATA_TYPE
table relates to the PHYSICAL_SPECIFICATION
table.
PHYSICAL_SPEC_DATA_TYPE
to the chain of joins already present to extract data type information (data_type_id
) associated with each physical specification, and subsequently, with each physical flow. Make sure to replace "physical_spec_data_type"
and "data_type_id"
with the correct identifiers if they differ in your implementation.