The first chapter in the wftk "book" is about data for a very important reason, and one I missed for the entire first year I worked with the wftk back at the turn of the century: without an organizational principle for the data sources used by a workflow system, your expressive power is very weak. The better you can describe a variety of data sources, the more useful your workflow system will be.
Data in the wftk is organized into lists. A list is more or less equivalent to an SQL table, with a few differences: first, every record in a wftk list must have a unique key; second, the individual records are not restricted to having the same fields; and third, documents may be attached to any arbitrary record. Still, I find the SQL table to be a useful abstraction when thinking of lists, and of course the session actually exposes a DBI handle if you just want to work with SQL from the get-go.
But a list may simply be a representation of any arbitrary data structure. By implementing subclasses of Workflow::wftk::Data, we can set up nearly anything to be addressed by a workflow system. Is your data stored as rows in a file? Elements in an XML file? Individual files in a subdirectory? Actual rows in a MySQL or ODBC table? No problem; the workflow system can see that data, and manipulate it directly, once you've defined its structure. The wftk can even copy data from one source to another with a single command, or use index tables in one storage form for data in a second storage form. It understands lists of data within records in another list. The idea is to do everything.
Each item in a list is a record. A record is essentially just a hash of named values. The different records in a list will often, perhaps usually, all have the same fields -- but that is not required.
In addition to field-based data, the record may also store historical data about changes made to it, actions performed on it, or events involving it. We'll see this later, when the facility is used to store the enactment of a workflow process, but it is available to any record in the wftk. The only requirement is that you define where this historical information is stored, if the storage mode you're using doesn't do it for you. For instance, if a list is stored in a MySQL table, you'll have to put historical data elsewhere (it could be in another MySQL table, or perhaps in a separate log file elsewhere).
The wftk also understands document management. To any record, you can attach any set of arbitrary documents (although a given list may restrict your ability to do so with wanton abandon). These attachments can be anything -- from incoming faxes to source code to ... whatever. The wftk will handle version control for you if necessary. Document management can also simply be a descriptive system to track files managed externally, for instance the code files in a programming project.
Here is an index to the information in the data chapter:
- 02-a Basic data list manipulation: memory-based lists
- 02-b File lists: storing data in flat files
- 02-c Directory lists: storing data in files in a subdirectory
- Storing record-based data in a directory
- Reading directory metadata using SQL
- (Restricting the files made visible to a list
- (Composite keys
- (Treating file contents as attachments
- (Using alternative record parser/dumpers in a directory (e.g. XML)
- (Keeping attachments and data files in the same directory
- (Keeping attachments in a directory but data in a flat file
- (Using a directory attachment storage for record supplementation
- 02-d (Storing data in MySQL
- 02-e (Generic DBI: storing data anywhere other people have worked out for us
- 02-f (Writing custom list classes
- 02-g (Indexing and other complex SQL use
- 02-h Working with records
- The basic record: flat values
- (Setting default values
- (Creating records in other speedy ways
- Record storage in files
- Tying hashes
- Inline text bodies
- List values
- Boolean values
- (Comments and blank lines as metadata
- (Text bodies not named 'body'
- Subrecords and dotted nomenclature
- Extracting data from records
- (Dumping and parsing records
- (Dumping and parsing records as XML
- (Dumping and parsing records using arbitrary parsers/formatters (e.g. MIME email)
- (Templates I: publishing records for human consumption
- (Record storage I: saving pieces of records in different places
- (Record storage II: attachments, both inline and not
- (Record storage III: specifying shallow and deep retrieval
- 02-i (Historical data and enactments
- 02-j Document management
- (Attaching documents to records
- (Treating a normal data field as an attachment
- (Dead-simple document management: filesystem plus metadata
- (Multiple attachments and references: the folder model
- (Version control
- (Checkin and checkout
- (Retention management
- (Recipe: mailing documents into an archive
- (Recipe: tracking a CVS system
- 02-k (Data within data: references and sublists
- 02-l (Publishing human-readable data