PIG Validation

PIG Validation

This is the concept of the PIG App used to validate the standard in preparation.

CASCaDE is a project to standardize collaboration in systems engineering with respect to data format and ontology. A Request for Proposal (RFP) has been accepted by OMG in December 2024. Information in different formats and from diverse sources are transformed and integrated to a common knowledge graph.

A publicly available reference implementation shall validate the concepts of the standard as developed by the CASCaDE submission team. Validation is successful, if real-world data is ingested and the information needs of all users in the product lifecycle are met. Users and software vendors are given the opportunity to influence the project to assure their ideas are taken aboard. A joint effort on fundamental features (where differentiation isn't possible anyways) avoids duplicate work, improves quality and assures interoperability.

This model is authored with Archi using the ArchiMate 3.2 notation, then transformed to SpecIF via ArchiMate Open Exchange file format and then transformed to HTML. The PIG App will support a similar workflow (among others) for the source data formats chosen.

Model Diagrams

▣  Validation Process

A process to validate the conceptional and technical choices made by the standard in preparation. This is preliminary and needs further discussion and detailing with the target users. Ultimately, the standard must satisfy their use-cases and needs in general.

Subject to validation:

  • Has the PIG metamodel enough expressive power to deal with real-world data?
  • Can the data be transformed between RDF/Turtle, JSON-LD and GQL without loss? In all directions and in a full round-trip?
  • Are the transformation results consistent, complete, comprehensible and finally useful?
  • Can the user's 'competency questions' be satisfactorily answered?

Validation Workflow.png

Model Elements (Glossary)

□  Apply Competency Question

Once transformed to RDF/Turtle, still according to the PIG metamodel, user-defined competency questions shall be applied to the test-data. Those queries shall validate that the graph fulfills the information needs of the various user roles accessing the data. An important criterion is that the same (i.e. standard) queries yield the desired results with different data sets. Only then, normalization with repsect to syntax and semantics is successful. It is expected, however, that the competency questions, the queried, depend on the ontology with the the current set of preferred terms.

○  Business Use-Case

A set of use-cases describing the user's need. Must be exemplary (concrete), relevant and representative.

□  CASCaDE Validation Process

□  Check Schema, Consistency and Completeness

Once transformed according to the PIG metamodel, the data shall be checked with respect to schema (shape, format), consistency (constraints) and completeness. Both formally and by expert inspection.

□  Create Test-Data

Create test-data in original format as produced by popular authoring systems of the domain, for example IBM DOORS for requirement management or Cameo for systems engineering.

Initially, test-data should be small and cover relevant and typical aspects of the domain to drive and validate the development of transformations. Later on, real or near-real project data 'from the field' with growing complexity should be supplied.

□  Export Plugin

Transforms a selected subgraph from the internal data format according to the PIG metamodel to a desired output format.

The following transformations are planned for validating the approach (the standard to submit):

□  File System

A network file system. Consider to use

  • Windows/MacOS/Linux NFS
  • Samba
  • Git

□  Import Plugin

Transforms a source data format to the internal data format according to the PIG metamodel. A full transformation of the source data may not be necessary for the given use-cases, so emphasis is put on the relevant entities and relationships of the source data.

Transformations may use:

  • XSLT in case of XML source data
  • Scripting

The following transformations are planned for validating the approach (the standard tp submit):

□  Original Authoring System

A system to create and maintain information somewhere in the product lifecycle. For example IBM DOORS for requirements management or Dassault Cameo for systems engineering.

□  PIG App

A web application for creating, reading, updating and deleting data elements per class. The app is configured by the classes loaded at initialization time. The classes govern the choice and the dialog layout for modifying the data. Thus, the same software is used for more or less complex applications.

The architecture includes a plugin mechanism to allow the deployment of new transformations or storage adapters without building and deploying a complete new image, for details see 'Design Plugin Mechanism'.

A major challenge is an optimal design of the programming class structure ('scaffold'), the representation in JSON-LD, RDF/Turtle and GQL according to the PIG metamodel. Goal is to allow easy transformations between all representations. For details see Optimize the 'magic tetrahedron'.

Details of the development, build, integration and deployment environment are discussed in 'Select Development Environment and Programming Language'.

Further aspects:

  • Reporting Interface (e.g. Power-BI), server-side per API or client-side per file (export plugin).
  • Permission Management

○  Sink Data

The most general term for data produced by the PIG App for further processing.

○  Source Data

The most general term for data ingested by the PIG App from previous processing steps.

○  Test Data [Original]

Test data in a format provided by an Original Authoring System.

○  Test Data [RDF]

Test data in RDF/Turtle format to allow further processing and reasoning with tools of the RDF ecosystem.

□  Triple-Store

A standard database for knowledge graphs, e.g. Apache Fuseki.

□  UI Plugin

User Interface, such as

  • viewer,
  • editor by forms or diagramming.

The user-interface is class-driven, i.e.the details of the entity, relationship or organizer classes with their respective property classes determine the UI. For example, an editing form for a class with three property classes will have 3 fields expecting input according to its dataype and range. Or a graphical editor for a UML interaction diagram will will present a tailored pallette with drawing rules for interaction diagrams.

□  Validation Plugin

Validates a package or a collection of packages, with

  • schema checking
  • constraint/consistency checking.