Tektur - Real-time CCMS

DITA Publishing + Review & Approval

Tektur will be an easy to use Component Content Management System which allows for collaboration in real-time.

Starting with a fast web-based listing that can handle thousands of DITA topics.

You can add new topics using a Word-like tagless XML editor and import/export of existing topics in ZIP format.

The system has evolved from the UWE project (see below) which was alive from 2012 until 2014, but was abandoned in 2014 due to a lack of spare-time, now continued under a new name and with fresh ideas.

This is the development blog maintained by Alex Düsel [ Profile ].

October 12th, 2019

Comment navigation and attachments

Improved the discussion functionality. There is a paper clip icon now...

Tektur: Add attachments to comments

You can open the Attachments dialog by clicking on that paper clip...

Tektur: Add and view attachments

An approver may change the state of a comment and disable attachment upload...

Tektur: Reject a comment

You need to reason about the rejection...

Tektur: Provide a rejection message

A reader may look at the comments...

Tektur: Comments on the Reader's view

When hovering over a comment the discussion panel will show up - pin it with a click. Also, showing arrows for jumping from one comment to the next going through the entire publication.

Tektur: Attachments in Reader's view

August 18th, 2019

Frontpage

Added a simple frontpage to the system that lists all publications on the server. Also, latest changes and latest comments are listed.

There is a (german-speaking) test server with some test documents, go to:

www.publiziere.de

Tektur: Frontpage with publications, changes and comments

I have also come up with a REST-like interface. Check a few sample requests:

July 14th, 2019

Simple HTML format

My friend George has come up with the idea that it would be really nice if you could just download a set of plain HTML pages which make up the book. Put this package on any webserver or even on the filesystem and voila, your work is published.

As far as I know none of the popular CMS systems, blog engines or wikis can do this. See how it works:

Tektur: Download Simple HTML Format

There's a new button on the Reader's view. Click it to download a ZIP containing HTML pages, images and an old school HTML frameset.

Tektur: Contents of a Simple HTML ZIP package

The design looks rather simple but it can be easily customised to meet your individual needs. The content is semantically well-tagged and there is just one CSS file.

Tektur: Simple HTML Format featuring a HTML4 framest

I used a HTML frameset for separating the TOC from the content. While this may not be the most elegant solution, it is at least widespread.

May 26th, 2019

Markdown Importer

Just finished the Markdown Importer. It works either on a document- / chapter-level but also on a paragraph-level. Some screenshots:

Tektur: Import of Markdown content on a paragraph-level
Tektur: Import of Markdown content on a document- / chapter-level

The implementation is pretty straight-forward and summed up below:

  1. Convert Markdown to HTML using showdown.js.

  2. Transform HTML to DITA XML using XSLT.

  3. Flatten nested chapters (h3 to h6) in order to reflect the DITA topic vs. section hierachy.

  4. Pick up result and import.

I tried it on the test file of John Gruber (who is one of the inventors of the Markdown syntax). The result looks quite nice in the editor:

Tektur: Imported Markdown test document

So why do we need a Markdown Importer here? It happens to be that most programmers use Markdown for documenting their projects for example on GitHub.. With this feature, Tektur allows software developers to contribute to a "master" document that may be assembled in a "post-authoring" workflow step, ready to be published as a PDF or any other format ...

April 19th, 2019

🔍 Fulltext-Search

Implemented full-text search! Normally I would use Apache Solr as a search engine but this time I gave the text index of MongoDB a chance in order to keep the stack easy and clean. The whole application is written in JavaScript/NodeJS and XSLT with MongoDB as database and Apache FOP as XSL-FO formatter. That's all. I am quite happy with this architectural decision:

Tektur: Full-text search over all topics of some books

I am thinking about including more options like searching over certain elements. It does already work for source code snippets:

Tektur: Full-text search over all source code snippets

March 31st, 2019

Lazy-Tags & minor HTML improvments

Inroducing "Lazy-Tags" ... At some point we are just to lazy to extend the DITA model, add some new icons to the editor toolbar, customize the XSLT templates and CSS if we just want to include some simple markup tags that will only have some technical functionality and no semantic meaning. This is why I invented lazy-tags:

Tektur: "Lazy-tags" get parsed and marked green while editing

There are also some minor improvements for the HTML format:

  1. A button on the header brings up all accepted comments.

  2. Commented paragraphs are marked.

  3. Chapters that contain comments are marked in the TOC.

  4. A panel with comments slides up.

  5. Jump to the next and previous pages with the arrow buttons.

  6. Download all DITA topics in a ZIP.

Tektur: Display of comments in the HTML

March 3rd, 2019

Finalized PDF

I have been working on the PDF layout. Finally I updated to the latest Apache FOP version which makes things a lot easier. This is how a bookmark tree looks in a browser:

Tektur: PDF Bookmarks used as a TOC naviagtion

This feature is quite hidden in Acrobat DC - for some reason - and the bookmark view doesn't open on initial startup. You have to navigate to a number of dialogs:

Acrobat DC: Open the bookmark view (german)

The bookmarks act as a TOC:

Tektur: Bookmark view in Acrobat DC

Hazard statements are now page-wide featuring new icons:

Tektur: Hazard statement rendered in a PDF

All auto-generated registers come with a description and hyper-link back to the related piece of text:

Tektur: Display of the Glossary page in the Appendix of the PDF
Tektur: Display of the List of Figures in the Appendix of the PDF
Tektur: Display of the List of References in the Appendix of the PDF
Tektur: Display of the Index in the Appendix of the PDF

Note that these registers are not numbered in the TOC in order to distinguish them from other chapters:

Tektur: A TOC page of the PDF

Last but not least: Syntax-highlighting of source code - very important to me since I have started writing a book about XSLT and XQuery:

Tektur: PDF - Syntax-Highlighting of Source Code

As the book project progresses, I will ask my designer-friend if she is willing to contribute some logos (page header and cover page) and illustrations in order to make the whole thing look more professional :-]

December 27th, 2018

HTML Format & XSLT-Buch

Long time no see! Just added HTML format:

Tektur: Create HTML format

There is still some work to do:

Tektur: Early HTML5 format

I decided to write a book about XSLT - just for testing purposes:

This will take a while... ;-]

June 26th, 2018

Complete Reviewing Workflow

So here's the complete workflow of the reviewing features. Let's start with a basic scenario:

Let's see this in action:

Alice selectes the topic called "Installation" and starts editing...

She saves her changes and goes back to her editor-list. She opens the reviewer-dialog...

She assigns Bob and Eve a reviewer and approver role...

Also, she puts the topic into state "In review". She observes that the workflow of this topic has changed to "review", when looking at the editor-list...

Bob gets notified by email and examines his reviewer-list...

Amongst some topics from Ernie and Bert he also checks out the topic from Alice. He goes straight into that topic and submits some comments...

Back in the reviewer-list he notices that there are three open comments...

Eve gets notified by email and also checks her review-list. Amongst some topics from Winston and Jules she observes a topic from Alice. She goes right into that topic...

Having the appover role, she may reject or accept the comments. She is okay with most of Bob's comments but does not agree on the typo...

She submits a rejection message...

The rejection message is appended to Bob's comment. Bob gets notified by email...

Back in the review-list she checks that the statistic panel is updated...

Alice decides to include the feedback from Bob and Alice. She puts the topic back into state "editing". The topic is also removed from the review-list of Bob and Eve...

When opening the editor, Alice observes that there is a new button "show comments"...

When clicking on that button, she will only see the comments that were accepted by Eve. She will not see the comment that was rejected...

After commiting her changes and corrections, she sets the state of each comment to "done". The comment panel will fade out. Also note that the comments cannot be changed in the editor view...

Bob who is the owner of the comments may delete them in the next reviewing step, that will be initiated by Alice, who is the owner of this topic...

May 25th, 2018

Reject Discussions

Having the Approver role you can reject comments together with a rejection message. The Reviewer will be notified by eMail about the changes.

Tektur: Reviewer adds a comment
Tektur: Approver rejects the comment
Tektur: Approvers submits a rejection message
Tektur: Rejection message is attached to the comment

April 19th, 2018

Manage Discussions

You can add and remove comments to/from any piece of text using an inline editor widget. This is how it looks like:

Tektur: Add comments to the topic element
Tektur: Add comments to the task element
Tektur: Add comments to the map element

March 10th, 2018

Reviewer-Listing & Review-Editor

Finished with the reviewer listing. There are columns for the assignment type { approve | review } and for the number of comments. All columns are filterable and sortable:

Tektur: Just one row in the reviewer listing

When you click on the topic title you will get to the reviewer screen:

Tektur: Reviewing view of a topic

Every piece of text is decorated with a little speech bubble icon. You can put comments by clicking on such a bubble. This is where the story ends for now. To be continued in April :-]

February 25th, 2018

Review & Approval

Started working on the workflow features. The first version will be quite simple. For now, there is a speech bubble button that lets you configure approval and review roles for each topic, task and map:

Tektur: Show the approval and review dialog

There is also a new entry in the top menu called "reviewer". If you click it, then you will get to your list of items that you were assigned to by your peer editors. The configuration is located in a popup dialog:

Tektur: Assign approvers and reviewers

A Tektur document can reside in two different states:

February 24th, 2018

RTF (Word) Import / Export

Added a new output format. It's RTF that you can open and edit with your Micro$oft Word Processor.

Tektur: Creation of the RTF format

RTF will not be cached and created every time you click the button.

I put another tab on the plain-text importer dialog (also note the fullscreen button) in order to import RTF content from Word or from a website:

Tektur: Inport of Word content

Paragraphs and simple list structures are currently supported. All other content will be merged into a plain-text block. This is how the input of my test case looks like (I copied this piece of text into the popup dialog):

Tektur: Copy and Paste of Word content

Conversion to DITA is done on-the-fly with the following result:

Tektur: Copied Word into the content editor

There is also a configurable widget for inserting special characters:

Tektur: Insert a special character

BTW: I was not able to put a decent filter icon into the input boxes of the lists because JSGrid does not provide this functionality.

Thus, I was searching in the UTF-8 character set and found a magnifying lense! This is how it looks in my source code editor:

The result looks like this:

Tektur: Magnifying lenses in input boxes

Does not perfectly fit together with the other icons but at least there is an icon :-]

February 8th, 2018

Versioning

View a previous version of a topic, task or map. Just click the "More" button located on the action bar:

Tektur: Show more actions

The popup dialog lists all previous versions together with a timestamp:

Tektur: List of previous versions

Show a previous version in read-only mode:

Tektur: HTML preview of a DITA map

When you click on the button Set as Current then the current version of the document will be replaced. You will be prompted for confirmation:

Tektur: Confirmation dialog when restoring a previous version

This is how read-only mode of a topic/task looks like:

Tektur: Read-only task

January 13th, 2018

Function Follows Form

Yes, I know! It should be exactly the other way round... But I was just too lazy to think about the GUI concept again.

Currently there is a main panel with a listing or an editor on it. It depends on the state of the application. There is also a popup dialog which shows up in the middle of the screen.

Reworked this dialog in order to improve the UX. See how it looked before:

Tektur: Previous version of the FCM dialog

That is okay for testing... But in production, the user input must be validated - even better: The user should only be able to select reasonable values.

This dialog is definitely to small. I worked around this by attaching a tooltip on the right side. This is how it looks now:

Tektur: Current version of the FCM dialog

Also, stumbled across some cool new HTML5 features:

There is an <input type="range"> now. No need for implementing your own Javascript slider anymore. Not to forget the new <input type="color">...

December 27th, 2017

Some more random usability features

1.) You can add a description to each topic, task or map in the properties section.

Tektur: Metadata description of a topic.

It will be displayed as a tooltip when hovering over the topics listing

Tektur: Metadata description in a tooltip.

2.) When clicking on a breadcrumb the DITA Help box is updated with a link to the corressponding documentation on the DITA homepage.

Tektur: Link to the DITA homepage.

3.) Changing a topic reference in the map view lets you also select topics from a different project. Note that the text input field at the top of the dialog box can be used to filter the result list.

Tektur: Select topics from another project.

A topic that is linked from another project will be marked as "external".

Tektur: External topic reference.

4.) Links may reference topics from another project. Note that the link text will be updated with title changes without a refresh.

Tektur: Link text to topics from another project.

5.) There is a "more info" button next to each list entry.

Tektur: More info button.

When you click on this button you will see a little dialog, that displays two fields:

Tektur: Where-used-list.

November 9th, 2017

Same procedure as every year

The task is probably the technical editor's most beloved element. Check out the specification on the DITA homepage. Also note that the PDF layout is work-in-progress (rendered by an "old" version of Apache FOP, v1.0)

Tektur: Not a real-world example of a DITA task.

Looks also good in the editing view:

More of a real-world example would look like this:

Tektur: Real-world example of a DITA task.

Editing view:

Tektur: Editor view of the real-world task.

September 21st, 2017

Major Redesign

Brand new design and styles...

Tektur: New design of the splash screen
Tektur: Show list actions on mouse over
Tektur: New icons in the TOC
Tektur: Redesign of the editor toolbar

August 28th, 2017

Toolbar Icons

The gfx department is working hard on some custom icons for the toolbar of the editor panel...

Tektur: Custom icons for the editor panel

June 18th, 2017

Icons and other gfx

Just ordered some new custom icons and gfx from a designer-friend. Also spent some money on quality assurance and testing. The editor part is 80% completed; entirely written in JavaScript/NodeJS. Thanks to NW.js it runs both on a website and as a desktop application. Here's a screenshot animation of most recent work:

Tektur: Screenshot animation of most recent work

April 22nd, 2017

Tektur Peer-2-Peer Real-Time XML CMS

Actually I was thinking about an easy way to distribute the NodeJS binary and MongoDB together with the Tektur application code.

Finally I came across a different approach to the whole thing. Why not bundle the browser control, the NodeJS binary and the database together as a desktop application with server features?

This is pretty easy using NW.js. It comes with a NodeJS binary and Chromium browser control, that runs on Windows and Mac.

Replaced the MongoDB with TingoDB and voila here is Tektur running in a Desktop window - but still being a server application, that listens on port 8080 for other browsers to connect.

Tektur: Running in a Desktop window

Developing this thought further consequently leads to a new architectural idea:

Tektur: Peers at work

Tektur will be a server-less peer-to-peer system.

Each user brings her own disk space and computing power, making the system very scalable and secure.

Topics that were created by user XYZ will stay on the computer of XYZ. Well, that’s the idea, long term.

April 16th, 2017

Maps Editor

Started implementing the DITA maps editor. I am reusing a control which I wrote for the UWE project some years ago… basically some custom Javascript that sits on top of the well crafted jQuery TreeTable plugin.

The user can arrange the DITA map by drag 'n drop / buttons for inserting, deleting and moving the topics.

A click on the topic/map titles shows a popup dialog with further options. See some screenshots below:

Tektur: Select the map title
Tektur: Edit DITA elements title, navtitle, shortdesc and keywords
Tektur: Click on a button in the map editor
Tektur: Insert a new topic reference
Tektur: Define the target of the topic reference
Tektur: Drag 'n drop the topic reference

March 25th, 2017

RTFM - Read The F*** ahem Fine Manual

Started a documentation project for Tektur www.stylesheet-entwicklung.de (German speaking).

While editing the documentation site using Plone as CMS I realized once more that the integrated editor of Tektur is far better than TinyMCE, CKEditor and the like.

For example It is a nightmare to reorder complex list items with TinyMCE using copy and paste. You'll end up editing HTML tags in the source view. This is definitely not user friendly...

February 18th, 2017

Table Input

Tektur features CALS tables, which are more complex than HTML tables.

Only a subset of CALS attributes can be rendered in the editor. Other properties have effect only in the PDF output, such as @frame:

Tektur: DITA @frame attribute

Also @pgwide that allows for setting the table page-wide or column-wide.

Tektur: DITA @pgwide attribute

CALS tables may suffer from an erroneous markup constellation, due to their complex nature.

Therefore we check each user input. If the input breaks the table geometry, the current action is automatically undone, leaving the table in a consistent state.

The "tablefixer algorithm", which I designed a while ago (scroll down and see my post at August 17th, 2014) was reused in order to implement this feature.

Following some screenshots:

Tektur: Insert a new table

Insert a table on the next allowed position by pressing the table button in the toolbar. (Note that, according to the DITA content model, there are several restrictions when inserting a table)

Tektur: New table dialog

Edit important properties in the "Insert Table" dialog...

Tektur: Table context menu

A right-click on any place in the table area will bring up the standard context menu with a special entry "<table> Popup". A click on this entry will show the "Edit Table" dialog.

Tektur: "Edit Table" dialog

Expert users may also use the standard properties dialog in order to modify special CALS attributes:

Tektur: Standard properties dialog

For example one could set a custom frame option for the PDF output:

Tektur: Set the DITA @frame attribute
Tektur: Set value on @frame

and set a desription for this table:

Tektur: Add a description to a table
Tektur: Type text in the field table description

That's all: Tables are feature-complete, yeah :-]

Oh wait: There is something left to do. I'd like to set the width of each table cell using drag'n drop. Currently I am experimenting with a JS library colResizable which works just fine, but still having some issues with copy 'n paste and spanned columns...

February 4th, 2017

The Awesome User Graph

Rights and roles are fully configurable. Custom roles may be:

When users of Tektur work on different books with different roles it can get rather complex. Fortunately we can visualize relationships as a graph:

Tektur: The User Graphs shows all roles of a certain user

This fancy graph (which is not a real-world example!) has been implemented using the fine Springy.js library (A force directed graph layout algorithm in JavaScript). If you click on a username the graph will be expanded with the user's role information.

Basically the configuration of the system, which can be edited using a HTML5 form, is transformed to XML data, that is transformed to JSON data, which finally acts as the input for the Springy library. So easy!

January 29th, 2017

Synchronized Profile Data with Django Admin

Django comes with an excellent Admin interface out-of-the-box. Today I have been synchronizing the registration of new users:

Tektur: Register yourself as a new user.

The Profile dialog may be checked and changed at any time:

Tektur: Select the Profile popup dialog from the top main menu.

The administrators of Tektur may change further options in the Django Admin interface:

Tektur: Change global user roles in the Django admin interface.

The NodeJS view will be updated with the changes in real-time. Also note the link "User Graph":

Tektur: Check global user roles in the Profile popup.

In addition to the global roles which apply to every book in the system, the user may also be assigned to specific topics or maps. This complex relationship will be implemented using a graph (Coming soon).

January 14th, 2017

Topic Listing is Feature-Complete

Just finished work on the topics listing. I am giving you a short idea about the "business plan". The first release of Tektur, which is scheduled by the end of 2018, will be a "teaser". It will feature the basic functionality:

The system will be put online with a large set of demo documents. I will invite some people in the techdoc industry for beta-testing...

Here are some screenshots of most recent work:

Tektur: Layout options for each document.

Major design decision: The Layouter dialog is now connected to each element in the listing. Thus each element may be styled individually.

Tektur: Topic Listing with Extended Filter Options + Document Actions.

Extensive search and filter options are located in the table header. The list will be updated in milliseconds. Filter options are persisted while changing the HTML5 single-page views. Document actions are as follows:

Tektur: More Topic Properties on Info Action click

The popup dialog for the more info box will also show all DITA maps where this topic is used. The user can click on a link and will be redirected to the corresponding map.

January 4th, 2017

Working with the XML editor

Tektur: Editor user interface details

December 26th, 2016

Topic metadata

Added some slidable metadata input fields to the editor view (Note: The datepicker widget is implemented using Pickaday JS library).

Tektur: The due date of this topic can be changed

December 4th, 2016

The topics listing

Started working on the topics listing. I have been using the fast jsGrid library.

Tektur: The topics listing

As you can see, there is a lot more space in the column "actions". The first action is "Export the topic as a DITA ZIP file", indicated by the disk icon. Other actions may be:

Most of these actions have already been implemented for the UWE project.

December 3rd, 2016

XML related concepts of Tektur

Modularization is one of the key principles of a XML CMS. Whereas ordinary CMS systems let the user organize the content into pages, sections, etc. and do not support the reuse of content on a fine-grained level, Tektur will automatically split up the content into small pieces of information. This allows for maximum reuse.

Also the underlying XML DITA information model allows for semantic tagging of text paragraphs, procedures, tables, figures, and many other customizable semantical elements, that can be searched for, filtered out, linked to and exported.

Generalization is a DITA concept that allows for processing of Tektur content in other DITA system and vice versa. Even if there is customized and taylored data, generalization will add another level of abstraction to the transferred data, making it equal - without information loss.

Validities is a techdoc specific requirement. It boils down to a fine-grained conditional processing of the data. The conditions can be set by the editor and are loosely coupled to the data. Those conditions can be changed at runtime, making conditional processing a flexible approach to increasing product diversification, without changing the actual data but only the configuration of the system.

Versioning is a common feature to all CMS systems. But with a XML CMS it is possible to mark changes on a fine grained level.

That is all changed words, all deleted words and all new words are marked.

Changes can easily be related to users and to versions. These changes can be listed, filtered out or processed further. For example it is easy to mark TOC entries when there is a change somewhere in a subchapter.

References to other text parts on different pages is a common feature in print publications and in electronic formats.

This functionality is commonly known as hyperlinking. With a XML CMS hyperlinks can target all elements, e.g. paragraphs, figures, tables, chapters, ... and can be put on any place in the content.

The text of the hyperlink can dynamically be generated by the system, together with the correct page number, properly set for different languages and with different boilerplate text.

But there is a difficulty with this approach. The correct version of the target text part must be determined. As you can guess this is tricky since the modularization and versioning features of Tektur.

Automatic Typesetting is an indispensable feature in the field of technical documentation. Custom layout is slow, expensive and often stuck to the data. We could transform HTML input into a PDF. But this is not sufficient.

Here is a simple example:

A list is declared with HTML using <ul> and <li> elements. if you want to have a title to the list, you probabely want to put a paragraph before the list, like so:

<p>This is the list title</p>
<ul>
	<li>First list item</li>
	<li>This is the seocnd list item</li>
</ul>

Having this structure, there is now way of telling the PDF renderer to keep the title of the list together with the list on the same page. A page break would certainly look ugly.

One could say keep a preceding paragraph together with the list. But what if this list does not contain a title? What if the preceding paragraph spans half of the page?

With XML you could declare:

<ul>
	<title>This is the list title</title>
	<li>First list item</li>
	<li>This is the seocnd list item</li>
</ul>

Making this structure semantically belong together. Now you can tell the PDF renderer: Keep all <ul> elements together or split on <li> elements, which is a much better automatic typesetting.

November 26th, 2016

Short notes on the workflow logic

The workflow logic of Tektur will be rather simple.

It is just XML data that holds information on who will have the permission to perform certain actions on the information objects of the system, which are maps, topics, tasks, images, comments, messages, notifications, and queues.

Actions may be: view, add, edit, remove, comment, accept, reject, open, postpone, finish, access and sendTo.

Users can be added to groups, e.g. authors, reviewers, approvers, managers... Each group will have its own xml configuration and its own list of work queues.

Transition of workflow steps will be done by switching the active work queue for group XYZ and sending information objects to queues.

Here is a sketch:

Tektur: Workflow Logic

The XML configuration will internally be mapped to JSON that is stored in the MongoDB. Since this configuration has to be looked up very often, that is each time an information object model is queried for, but will be updated only here and then, this approach should be sufficient.

There won't be too much object orientation in the code since OO tends to slow down systems, especially in scripting languages...

November 19th, 2016

More about the XML editor

Free toolbar icons are hard to find. Looking forward to receiving better icons from a designer friend soon.

Tektur: Integrated DITA XML Editor

November 13th, 2016

Confirm dialogs

Confirm dialogs are so important. Easy to overlook when designing the application. Maybe hard to implement later on. Here's a screenshot of the confirm dialog in the editor view.

Tektur: Confirm dialog in the (early) editor view

November 6th, 2016

Topics Importer

I have been working on the Topics Importer. There will be two options when importing data:

For now, there is only the second option available. The corresponding popup dialog can be selected from the top menu.

Tektur: Drag 'n Drop of topics to be imported

A ZIP to be imported must contain one XML file (the DITA topic) and may also contain multiple resource files. Zips can be uploaded by dragging them onto the marked area. (Implemented with the excellent Dropzone JS library)

Once uploaded the server will check for files corrupted or invalid. It will present the user with a list.

The user may correct the import by dragging files onto the upload area again or may cancel the upload. When clicking the import button the content of valid ZIPs is populated to the database.

Here's a sketch of this process:

Tektur: GUI Importer control flow

Note that all connected browsers will be updated with the changes in real-time using Websockets and NodeJS.

September 13th, 2016

More Mockups

Making mockups for popups...

TekturCMS: Popup Dialog Mockups

There is definitely some need for more colors. How about "Raw Sienne" and "Aquamarine" for the highlighting stuff?

TekturCMS: "Raw Sienna" Color Scheme
TekturCMS: "Aquamarine" Color Scheme

September 6th, 2016

Tektur Design

Since I liked the paper-and-pencil style of the ARC-Space game (see below) I will be reusing this design - together with the Courier New font family - for a simple and clean "Typewriter" look-and-feel. Just started with two dialogs:

TekturCMS: Layouter Dialog 1
TekturCMS: Layouter Dialog 2

September 5th, 2016

Tektur Tech Stack

Here's a first sketch of the architecture. Django will stay but only for the administrative backend and XSLT transformations (Django and Python evidentially do a good job in both areas).

TekturCMS: Architecture

August 16th, 2016

First JSON response

Basic concept: DITA topics are stored in a MongDB. Every topic is implemented by an item in a NoSQL collection. Every user maintains his own list of items that will be merged with other lists if users collaborate.

This is the JSON response for adding a test-topic:

 {
  "state": "success",
  "data": {
    "_id": "57b7c545bf08c4d40f8dd008",
    "username": "alex",
    "dateIssued": "2016-08-20T02:49:41.747Z",
    "logged_in": true,
    "test1": "10000",
    "test2": "0",
    "test3": "10",
    "test4": "0",
    "test5": "33",
    "token": "aa32cd53-e762-4c4c-9da9-8ea6f16de1a7",
    "usernames": [
      "h",
      "alex",
      "sepp"
    ],
    "application": {
      "name": "Tektur",
      "debugmode": true,
      "tracker_id": "55a8cd3e700a9d6823742074",
      "configuration": {}
    },
    "dateUpdated": "2016-08-20T02:50:20.981Z",
    "topics": [
      {
        "element_id": "c399e3c3-4361-4b76-b6e9-52bd0de89fb2",
        "title": "New Topic",
        "body": "<p id=\"0cf33019-31db-4c06-a525-5261589e01c6\">New Para</p>"
      },
      {
        "element_id": "7c466009-f742-4305-a6f3-f8248b999f2d",
        "title": "New Topic",
        "body": "<p id=\"73260170-c659-4813-a1f9-7f11c008589c\">New Para</p>"
      },
      {
        "element_id": "b5e7f9ac-4bae-4ec8-8bcb-88b9a0a2e250",
        "title": "New Topic",
        "body": "<p id=\"2de8a6c1-3909-468d-8627-3f463d7d24dd\">New Para</p>"
      }
    ]
  },
  "message": "Successfully added DITA test topic!"
}

I will add more useful data to this skeleton within the next few weeks... developing the system bottom-up.

August 13th, 2016

Tektur KickOff

During the last 3 years I have learnt a lot about what's important and what's not for a new kid in the XML CMS town. Requirements:

More or less all of these points have already been evaluated. Now it's time to put all the pieces together.

Some parts of the UWE stack will stay whereas others will be dropped:

Actually I feel a bit sad, because there won't be any line of Python any more in the code. Python is still my favourite programming language, but obviously it's much better to build the stack upon one specific technology and not having a mixture of programming languages.

Since this will be my leisure time project, do not expect any production ready results before 2020.

August 30th, 2015

ARC-Space Game Logic

The basic game logic of ARC-Space is as follows:
                        
1.) CHARACTERSn and ITEMS
                        
CHARACTER classes may have certain ABILITIES in order 
to perform certain ACTIVITIES on available ITEM CLASSES or
on available other CHARACER classes in a certain
EVENT... and may gain experience and money from that.
                        
The goal of the game is to maximize experience and wealth.
(Just like in real life - what a waste of time ;)
                       
Let's look at some random names:
                        
                    CHARACTERS
                        |_THIEVES
                            |_[...]
                        |_FIGHTERS
                            |_[...]
                        |_CLERICS
                            |_[...]
                        |_EXPLORERS
                            |_[...]
                        |_SCIENTISTS
                            |_[...]
                        |_CRUSADERS
                            |_[...]
                        |_TRADERS
                            |_[...]
                        |_SLAVETRADERS
                            |_[....]
                        |_ALIENS
                            |_[...]
                    ITEMS
                        |_SHIPS
                            |_VESSELS
                                |_[...]
                            |_EXPLORERS
                                |_[...]
                            |_TRANSPORTERS
                                |_[...]
                            |_BATTLESHIPS
                                |_[...]
                            |_[...]
                        |_CABINS
                            |_BRIDGE
                                |_[...]
                            |_COMPUTER
                                |_[...]
                            |_PROPULSION
                                |_[...]
                            |_WEAPONS
                                |_[...]
                            |_SHIELDS
                                |_[...]
                            |_WORKSHOPS
                                |_[...]
                            |_RADARS
                                |_[...]
                            |_CAMOUFLAGES
                                |_[...]
                            |_PRISONS
                                |_[...]
                            |_LODGINGS
                                |_[...]
                            |_TANKS
                                |_[...]
                            |_STORAGES
                                |_[...]
                            |_LABORS
                                |_[...]
                            |_[...]
                        |_COMMODITY
                            |_AGRICULTURAL
                                |_[...]
                            |_LIVESTOCK
                                |_[...]
                            |_ENERGY
                                |_[...]
                            |_METALS
                                |_[...]
                            |_PRECIOUS
                                |_[...]
                            |_[...]
                    EVENTS
                        |_DISCOVERY
                            |_[...]
                        |_BATTLE
                            |_[...]
                        |_EMERGENCY
                            |_[...]
                        |_TRADE
                            |_[...]
                        |_EXPLORATION
                            |_[...]
                        |_SHOP
                            |_[...]
                        |_HANGAR
                           |_[...]		
                        |_[...]
                        
                        
So here's a more specific example of the abstract view given in 1.)
                        
2.) ABILITIES:
                        
a) A TRADER is able to buy a TRANSPORTER when he/she reaches a SHOP 
b) A TRADER is able to equip the TRANSPORTER with a STORAGE CABIN 
   when he/she reaches a HANGAR EVENT
c) A TRADER is able to trade some COMMODITY ITEMS 
   when he/she has a TRANSPORTER SHIP with a STORAGE CABIN 
   and reaches a TRADE EVENT
                       
[...]
                        
For a simple configuration of our ARC-SPACE! game we could use the 
following JSON structure:
                        
3.) CONDITIONS:
                        
                        ARCSpace = { "characters" : [
                        		{ "id" : 1,
                        		  "name" : "character",
                                          "type" : "trader",
                                          "abilities" : [
                                              { "id" : 1,
                                                "name" : "buy_transporter",
                                                "conditions" : [
                                                    { "id" : 1,
                                                      "name" : "has_enough_money" } ] },
                                              { "id" : 2,
                                                "name" : "equip_with_storage_cabin",
                                                "conditions" : [
                                                    { "id" : 1,
                                                      "name" : "has_a_transporter" } ] },
                                              { "id" : 3,
                                                "name" : "trade_commodity",
                                                "conditions" : [ 
                                                    { "id" : 1,
                                                      "name" : "has_a_storage_cabin"},
                                                    { "id" : 2,
                                                      "name" : "has_goods_to_trade" }] } },
                                               [...]
                        
                                        ],
                                        "test_players" : [
                                            { "id" : 1,
                                              "name" : "alexdd",
                                              "money" : 1000000,
                                              "experience" : 30,
                                              "character" : "trader",
                        
                                            [...]
                                    }, [...]
                               ],
                               "events" : [
                                   { "id" : 1,
                                     "name" : "trade",
                                     "characters" : ["trader", "thieves", "slavetraders"],
                                     "probability" : 5,
                                      [...]
                                   },
                                   {"id" : 2,
                                    "name" : "shop",
                                    "characters" : [ "fighter","trader","cleric",
                                                   "explorer","scientist"],
                                     "probability" : 10,
                                    [...]
                                   },
                                   {"id" : 3,
                                    "name" : "hangar",
                                    "characters" : [ "#all" ],
                                    "probability" : some formula
                                    [...]
                                   }],
                        
4.) EVENTS
                        
a) EVENTS will be randomly triggered with some constant factors on the player's 
   state machine.
b) Only the characters set in the list will be active; other characters may not 
   get the event.
c) Only the abilities set in the characters's ability list will be available, 
   that means:
d) items and characters that cannot interact with the player in a certain event 
   will be invisble.
                        
==============
                        
As you can see the whole ARC-Space! world can be encoded in one big junk
of JSON which will be put into a MongoDB for maximum processing speed.                   
                    

August 20th, 2015

Some more screenshots of ARC-Space!

ARC-Space will feature "paper-and-pen" style

ARC-Space: Login screen
ARC-Space: Universe screen
ARC-Space: Hangar screen
ARC-Space: System screen
ARC-Space: Shop screen
ARC-Space: Battle screen

May 8th, 2015

Started with a Space Game

Just started with developing an HTML5 adventure game. The user interface will be rather minimalistic, concentrating on the game play.

Design: Before there was "green-on-black" and a starfield animation. I removed it for simplicity, check demo here. In fact I like the ARC-Space! ASCII animation which is in the battle screen a lot.

August 17th, 2014

Rapid Algorithm Design with Python

About 10 years ago I competed in the Shortest Python Programming Contest. With 151 bytes of code for a simple program I ranked in the midfield and I was quite happy :-]

See the resulting program:

# -*- coding: iso8859-1 -*-
# Das war die erste Lösung, die mir eingefallen ist
# und ich war damit sofort auf dem richtigen Weg...
                       
def seven_seg1(input):
    data = ( ' _ ','  ',' _ ',' _ ','   ',' _ ',' _ ',' _ ',' _ ',' _ ', 
             '| |',' |',' _|',' _|','|_|','|_ ','|_ ','  |','|_|','|_|', 
             '|_|',' |','|_ ',' _|','  |',' _|','|_|','  |','|_|',' _|') 
    return '\n'.join([''.join([data[int(c)+j*10] for c in input]) for j in range(3)])
                        
# Jetzt galt es zu optimieren. Die nächste Version ist nur ein Zwischenschritt
# um auf die korrekten Array Indices für die übernächste Version zu kommen:
                        
def seven_seg2(input):
    return '\n'.join([''.join([('010010000034556774666475456465')[int(c)+j*10] \
       for c in input]) for j in range(3)]) \
      .replace('0', ' _ ').replace('1','   ').replace('3','| |').replace('4','  |')\
      .replace('5',' _|').replace('6','|_|').replace('7','|_ ')
                        
# Jetzt sieht die Sache schon ziemlich kurz aus und die folgenden Versionen 
# brachten nur jeweils einen Gewinn von ein paar Zeichen...
                        
def seven_seg3(input):
    ascii = (' _ ','   ','| |','  |',' _|','|_|','|_ ')
      return '\n'.join([''.join([ ascii[int(('010010000023445663555364345354')\
      [int(c)+j*10])] for c in input]) for j in range(3)])
                        
def seven_seg4(i):return '\n'.join([''.join([' _ +   +| |+  |+ _|+|_|+|_ '
    .split('+')[int('010010000023445663555364345354'[int(c)+j*10])]\
    for c in i]) for j in 0,1,2])
                        
def seven_seg5(i):return '\n'.join([''.join([' _    | |  | _||_||_ '\
    [int('010010000023445663555364345354'[int(c)+j*10])*3:][:3] \
    for c in i]) for j in 0,1,2])
                        
# an dieser Stelle half mir ein Kommentar eines anderen Teilnehmers in der 
# Newsgroup weiter, der meinte er hätte drei "magic numbers" und so bin ich
# schliesslich auf meinen Beitrag zum Wettbewerb gekommen, den ich dann
# eigesendet habe...
                        
def seven_seg6(i):x=''.join;return x([x(['    _ | |  | _||_||_ '
    [int(str(m)[int(c)])*3:][:3]for c in i])+'\n'\
    for m in 1011011111,2344566355,5364345354])
                        
# Und das funktioniert tatsächlich! Mein Trick waren zwei ineinander 
# verschachtelte "list comprehensions" 
# bzw. zum Schluss eine "generator expression" um noch ein paar Bytes zu sparen.
                        
print seven_seg6("123456789")                      
                    

From that time I have been using Python as my favourite tool for trying out ideas on algorithm desgin.

Consider the following CALS table:

              
<TABLE TOCENTRY="1">
    <TITLE>Table 1</TITLE>
    <TGROUP ALIGN="LEFT" CHAR="" CHAROFF="50" COLS="3">
        <COLSPEC COLNAME="COL1">
        <COLSPEC COLNAME="COL2">
        <COLSPEC COLNAME="COL3">
        <TBODY VALIGN="TOP">
            <ROW>
                <ENTRY MOREROWS="2">R1C1 - R2C1 - R3C1</ENTRY>
                <ENTRY>R1C2</ENTRY>
                <ENTRY>R1C3</ENTRY>
            </ROW>
            <ROW>
                <ENTRY MOREROWS="1">R2C2 - R3C2</ENTRY>
                <ENTRY MOREROWS="1">R2C3 - R3C3</ENTRY>
            </ROW>
            <ROW>
               <ENTRY></ENTRY> <!-- broken row -->
            </ROW>
            <ROW>
                <ENTRY MOREROWS="1">R4C1 - R5C1</ENTRY>
                <ENTRY>R4C2</ENTRY>
                <ENTRY>R4C3</ENTRY>
            </ROW>
            <ROW>
                <ENTRY MOREROWS="1">R5C2 - R6C2</ENTRY>
                <ENTRY MOREROWS="1">R5C3 - R6C3</ENTRY>
            </ROW>
            <ROW>
                <ENTRY>R6C1</ENTRY>
            </ROW>
            <ROW>
                <ENTRY MOREROWS="1">R7C1 - R8.C1</ENTRY>
                <ENTRY MOREROWS="2">R7C2 - R8C2 - R9C2</ENTRY>
                <ENTRY MOREROWS="1">R7C3 - R8C3</ENTRY>
            </ROW>
            <ROW>
                <ENTRY></ENTRY>  <!-- broken row -->
            </ROW>
            <ROW>
                <ENTRY>R9C1</ENTRY>
                <ENTRY>R9C3</ENTRY>
            </ROW>
            <ROW>
                <ENTRY MOREROWS="1">R10.C1 - R11C1</ENTRY>
                <ENTRY MOREROWS="1">R10C2 - R11C2</ENTRY>
                <ENTRY>R10C3</ENTRY>
            </ROW>
            <ROW>
                <ENTRY>R11C3</ENTRY>
            </ROW>
        </TBODY>
        </TGROUP>
  </TABLE>
                    

According to the CALS specification (https://www.oasis-open.org/specs/tm9901.html) it may suffer from erroneous markup constellations, which may be:

Erroneous markup constellations

In this particular case the rows marked with <!-- broken row --> should be removed and the @morerows attributes on preceding <entry> elements should be corrected. This is not a trivial task since you have to take vertical spanning (@morerows) and horizontal spanning (@namest, @nameend - not in the example above) into account...

So let's start with a quick hack in Python. For sure we will need some helper data structures such as lists and dictionaries. Also the regex module should be included since we will perform some textual substitutions, I guess.

import re

input_file = open('broken.sgml','r')
tag_ended = False
buff = ""
pos = 0
tags = []
row_counter = 0
entry_counter = 0
table_counter = 0
num_cols = 0
more_rows = []
entries = []
colspecs = []
broken_tags = []
tmp_row = []
content = ""
tree = {}
log = ""
                

Next we will need to read the data file. Which is an SGML file. Well, there is the fine SGMLParser module in the sgmllib library... Hmmm, however all we need to do is to tokenize a textfile according to the tags in it and save the tags for latter use. So let us not make things more complicated than they are:

## read data

while 1:
  character = input_file.read(1)
  if not character:
    input_file.close()
    break
  if character == '>':
    tag_ended = True
  elif character == '<':
    tags.append((pos,buff,content))
    pos+=1
    buff = ""
    content = ""
    tag_ended = False
  elif not tag_ended:
    buff+=character
  else:
    content+=character
tags.append((pos,buff,content))                      

Having our tags saved in the tags list, we can now loop over the tag list and search for the broken rows†Which is really not so trivial, since we will need to check each tag for consistency of attribute settings with respect to the table geometry.

## analyze data

for tag in tags:
  if tag[1].lower().startswith("table"):
    row_counter = 0
    ncols = 0
    more_rows = []
    table_counter += 1
    colspecs = []
  elif tag[1].lower().startswith("tgroup"):
    try:
      ncols = int([attr.split("=") for attr in tag[1].split(" ") \
              if attr.lower().startswith("cols")][0][1].strip('"/'))
    except:
      log+= "\n"
    for i in range(ncols):
      more_rows.append(0)
  elif tag[1].lower().startswith("colspec"):
    try:
      colname = [attr.split("=") for attr in tag[1].split(" ") \
              if attr.lower().startswith("colname")][0][1].strip('"/')
    except:
      log +="\n"
    colspecs.append(colname)
  elif tag[1].lower().startswith("row"):
    row_counter += 1
    entry_counter = 0
    entries = []
    tmp_row = []
    tmp_row.append(tag)
  elif tag[1].lower().startswith("/entry"):
    tmp_row.append(tag)
  elif tag[1].lower().startswith("entry"):
    morerows=0
    namest=None
    nameend=None
    for attr in tag[1].split(" "):
      if attr.lower().startswith("morerows"):
        morerows = int(attr.split("=")[1].strip('"/'))
      elif attr.lower().startswith("namest"):
        namest = attr.split("=")[1].strip('"/')
      elif attr.lower().startswith("nameend"):
        nameend = attr.split("=")[1].strip('"/')
    if namest and nameend:
      pass
    elif namest or nameend:
      log+= "\n" % table_counter
    entries.append({ "morerows" : morerows, "namest" : namest, "nameend" : nameend})
    tmp_row.append(tag)
  elif tag[1].lower().startswith("/row"):
    resolved_entries = []
    for entry in entries:
      if entry["namest"] and entry["nameend"]:
        spanning = 0
        try:
          start = colspecs.index(entry["namest"])
          end = colspecs.index(entry["nameend"])
          spanning = abs(end-start)
        except:
          log += "\n"% table_counter
        for i in range(spanning):
          resolved_entries.append(entry["morerows"])
      resolved_entries.append(entry["morerows"])
    ncells = len(resolved_entries) 
    nspans = len([num for num in more_rows if num > 0])
    added = 0
    if ncells + nspans > ncols: # this is the culprit
      if len(tmp_row) in (2,3) and tmp_row[1][2].strip(" \n\r") == "":
        broken_tags.append(tmp_row)
        added = 1
      log+="\n" % (table_counter, row_counter)
    i = 0
    for j in range(ncols):
      if more_rows[j] > 0:
        more_rows[j] -= 1
      elif i < ncells:
        tmp = resolved_entries[i]
        if tmp > 0:
          more_rows[j] = tmp
        i+=1
      else:
        log+= "\n" % (table_counter, row_counter)
        if len(tmp_row) in (2,3) and tmp_row[1][2].strip(" \n\r") == "":
          broken_tags.append(tmp_row)
        break
  elif tag[1].lower().startswith("/table"):
    spans = [num for num in more_rows if num > 0]    
    if len(spans) > 0:
      log+= "\n" % table_counter
  tree[str(tag[0])] = [tag[1],tag[2]]		    
		    

Well, I must say having the source code now copied to the website it looks way more complex than it is ;-] For sure, there may be some optimizations and with some more Python wizardry the code would look much shorter and cleaner, I guess. As you can see with the last line, we saved all tags in a dictionary with the tag number as key and the tuple (name, content) as value. This is the data structure we will need for fixing the corrupt table:

## fix data
 
for tag in reversed(broken_tags):
  del tree[str(tag[0][0])] # delete the culprit
  i = 1
  while not "/row" in tree[str(tag[0][0]+i)][0].lower():
    del tree[str(tag[0][0]+i)]
    i+=1
  del tree[str(tag[0][0]+i)]
  num = tag[0][0]
  row_counter = 0
  while num > 1: # adjust preceding morerows settings go up until top of table
    num-=1
    elem = tree[str(num)]
    if elem[0].lower().startswith("row"):
      row_counter += 1
    elif elem[0].lower().startswith("entry"):
      try:
        morerows = int([attr.split("=") for attr in elem[0].split(" ") \
                  if attr.lower().startswith("morerows")][0][1].strip('"'))
      except:
        morerows = 0
      if row_counter <= morerows: #check if already enough rows traversed and
        new_morerows = morerows - 1 # there is no more span reaching
      else:
        new_morerows = morerows
      elem[0] = re.sub(r'MOREROWS=\"[0-9]*\"','MOREROWS=\"' \
                    +str(new_morerows)+'\"',elem[0], flags=re.I)
      tree[str(num)] = elem		    
		    

Okay, okay this piece of code definitely needs some more explanation. Having the culprit rows (broken_tags) identified in the analyze phase and knowing their tag numbers, we can safely delete the cuprits and child entry elements in the result dictionary. We do this from table bottom up. Also, after we deleted a row we fix @morerows attributes on preceding entry elements which are in range of the deleted row. That is we decrement them by one. Finally we output the saved SGML and some error messages:

 print log
for key in sorted(tree.keys(),key=int):
  print "<"+tree[key][0]+">"+tree[key][1], 		    
		    

Conclusion: For me Python is the perfect tool for trying some stuff out. You do not need to worry about infrastructure and code conventions. Just hack the problem down and be happy about quick and working results. No need for intensive testing and too much theory on paper :-] You can download the source code and test data from my GitHub account : https://github.com/alexdd/tablefixer/archive/master.zip

May 18th, 2014>

How to structure a simple stylesheet pipeline with XSLT

Setup a folder for each processing step. Each with a main entry point (main.xsl), like so:

XSLT: Folder and file structure od XSLT pipeline

(Note that apart from "main.xsl" all other filenames are not a real world example.)

main.xsl will include the substep stylesheets in the same folder and will also be the interface to the controller.

Each main.xsl stylesheet will take exactly 3 arguments - no more no less - which are:

Ideally input data will be merged into one big XML chunk for easier processing.

The same goes for config chunks - since lookup of a specific config option could be done by checking against a unique identifier, I guess.

The controller (in Python of course) will execute each step, hand out intermediate results to other processes and print some fancy status messages ;-]

There is also a more professional approach using XProc

April 5th, 2014

Perfect XSL Stylesheet Specification

Some philosophical thoughts on stylesheet development:

There won't be any turn-around-times resulting from comparing test documents with the specification document back-and-forth, because the specification document and the test document will be the same :-]

April 5th, 2014

PushBox on Github

Just put Java source code of PushBox on Github. This J2ME game scored more than 10000 downloads in 2005, was featured on many sites, was free to play and fun to make :-] Compared to Flappy Bird it was neither addictive nor could I make any money out of it... may be someone likes to port it's fine tile engine to HTML5?

PushBox: Start Screen
PushBox: Intro Screen
PushBox: Ingame Screen
PushBox: Win Screen

January 26th, 2014

Handling many parameter options in XSLT

So called "stylesheet parameterization" is an obnoxious task. Basically it boils down to refactoring of conditional statements:

<xsl:choose>
    <xsl:when test="$myParameter='this_option'">
        <!-- do this -->
    </xsl:when>
    <xsl:when test="$myParameter='that_option'">
        <!-- do that -->
    </xsl:when>
        [...]
</xsl:choose>
                    

You can have these statements in so many places and in so many variations! Maintainability will suffer due to functionality switches being spread across the code base. In my opinion the best way to prevent this circumstance is by using XSLT's import precedence.

Say we have a "core"" stylesheet which works for one option and has been intensively tested, then we can import this stylesheet and override certain templates in our "sub" stylesheets - just like OOP.

For example say that we have a template for rendering the header in our core (super) stylesheet:

                   
<xsl:template name="render-header">
    <!-- print logo on the left side spanning two rows-->
    <!-- print some metadata right side first row -->
    <!-- print a running header right side second row -->
</xsl:template>
                    

In our sub stylesheet we will then declare another rule for our header template (because here we want to have the header elements in a different order and the metadata section beeing removed).

<xsl:template name="render-header">
    <!-- print a running header on left side -->
    <!-- print logo on right side -->
</xsl:template>
                    

You will get best results with this approach when your stylesheet structure is split into a fine grained and heavily cohesive set of XSLT templates. If you investigate UWE's XSLT stylesheet then you will certainly stumble across the following construct:

<xsl:variable name="margin-width">
    <xsl:call-template name="get-margin-width"/>
</xsl:variable>
                    

Since XSLT variables cannot be overridden but are a convenient and indispensable way of assigning a variable value to an attribute inline (like so)

width="{$margin-width}"
                    

the construct above allows us for overriding template "get-margin-width" in a sub stylesheet and setting variable "margin-width" in the super stylesheet :-]

January 26th, 2014

Ways of XML processing

import random
 
input_file = open('confidential_document.xml','r')
 
UPPER = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
NUMBERS = '0123456789'
LOWER = 'abcdefghijklmnopqrstuvwxyz'
 
tag_ended = False
entity_ended = True
output_file = open('obfuscated_document.xml','w')
 
while True:
  character = input_file.read(1)
  if not character:
      input_file.close()
      break
  if character == '>':       
      tag_ended = True
  elif character == '<':
      tag_ended = False
  elif tag_ended:
      if character == '&':
          entity_ended = False
      elif character == ';':
          entity_ended = True
      elif entity_ended:
          if character in UPPER:
              output_file.write(random.choice(UPPER))
              continue
          elif character in LOWER:
              output_file.write(random.choice(LOWER))
              continue               
          elif character in NUMBERS:
              output_file.write(random.choice(NUMBERS))
              continue
  output_file.write(character)
output_file.close()
                    

What it actually does is render your XML PCDATA content unreadable in order to hide confidential information from third parties. Very useful if you want to give away XML test data or show off some PDFs from another project.

How it works should be rather self explanatory. The XML file is read character by character and treated like a text file. Well, there is definitely no faster way...

If you are facing requirements with a little bit more "business logic" involved then you may want to check out XML shallow parsers, see here http://www.cs.sfu.ca/~cameron/REX.html and for an implementation example you may go to the Python cookbook on activestate

December 6th, 2013

UWE, slotmachine, etc. on GitHub

Most prominentely: UWE's PDF stylesheet: XSL Stylesheets which generate good looking customizable PDF output (and RTF) from a single XML source using Apache FOP as PDF formatter. May also be used as core library in your FOP stylesheet project. Most asian languages are supported; advanced Back-of-the-Book index sorting routines included; plus translated boilerplate text library. GitHub repository | example PDF | example RTF (Word)

Since UWE's stylesheet was automatically generated by an unfinished Python program...

...this is really not a good example of how to structure a stylesheet!

GRADES XSL Stylesheet that generates grade tables for the German high school's "Kollegstufe" Using Apache FOP as PDF formatter. The "Kollegstufe" comprises of the 12th and 13th year of high school. This demo XSLT stylesheet shows some important XSLT and XSL-FO features. See GitHub repository | PDF

SLOTMACHINE_PY If you have ever wondered how a State-of-the-Art slotmachine works, then this Python program is for you. It features a complete state machine, bonus scheme and special rules of a commercial slotmachine. Well actually I modified the rules in order to get a better payout rate :-] GitHub repository

ControlFlowCanvas_JS This is a <canvas> demo. which I wrote for my university in 2007. You can interactively edit the edges of a control flow graph. GitHub repository. See Demo HTML page.

XMLDifferExperiment This code implements the diffing algo which I described on this post. See GitHub repository.

dtd2xml_PY generates usable random XML test data out of a given input DTD. Element occurences, text parts, language and randomness can be configured in dtd2xml.ini. See GitHub repository.

November 22nd, 2013

BOTB index sorter unleashed

Lets get the index sorting right :-] Commercial products, like Antenna House i18n support library seem to be rather expensive (5000$ per server license).

In particular, when you take into account that language specific sort order should be in the public domain and can easily be retrieved from sites like unicode.org and Wikipedia.

Also Java's standard sorting routines are fully localized and may be used with Saxon's extension mechanism or by applying XSLT's collation attribute on the <xsl:sort> element.

Actually a pure XSLT solution to this problem would be quite fancy. But since implementing complex algorithms in plain XSLT can get dirty very quick you probabely want to stick back to writing a Saxon extension. Do you?

Not for me :-] I am proud to come up with a pure and customizable XSLT solution. Let me show you how it works - first of all the basics:

Document Definition

We will let the editor put index entries in any place on the document flow, having the following structure for an index entry:

                       
                        <index group-label="FOO">
                            <first sort-term="bar">UWE</first>
                            <second>Single-Source Publishing</second>
                            <third>XML System</third>
                        </index>                      
                    

hen generating the Back-of-the-Book index all index entries will be gathered, sorted and formatted with up to three levels leaving sub elements intact. This means that we can have our index entries formatted in different colors (for diffing purposes), with different fonts (for special characters or terms in a different language, e.g. english product names in a chinese document) or having some terms emphasized in bold for easier recognition of important parts.

Well, it's not a complete solution yet, but important languages are included. So lets start with the requirements.

Requirements:

Implementation:

We need to distiguish between two main activities. First we will have to assign a group title to each term and then we will sort this term in the group it belongs to. Thats it for the sorting part but not for the formatting part which actually turned out to be pretty tricky, e.g. see my post on eliminating duplicate page numbers in FOP intermediate format.

Sorting:

For the sorting part we have found 5 cases: and two procedures:

Assigning group labels:

ssigning a group label will be different for each language variant:

Group label algorithm in detail:

For western languages we use a "zipper style" algoritm. Where we have one the "left side" our (sorted) group label to character mappings and on the "right side" we have our sorted list of terms.

If we traverse the sorted list of terms and compare each first letter with the group labels on the left side incrementing both list's indeces in parallel, then we are able to set group labels in O(n+m) time.

For the sorting we can use Quicksort on the unicode positions which makes O(n+m) + O(n log n) in total for western languages. This looks pretty optimal.

As already mentioned above for mandarin chinese we can find group labels in O(1) time, providing an array of letters which contain the group label letters (actually one big string) with array index = unicode position - offset. This makes sorting mandarin chinese even faster O(n log n).

Chinese group labels can actually be retrieved from the unicode.org website. See Unihan Database (http://www.unicode.org/Public/UNIDATA/Unihan.zip). Particularely with regard to the file Unihan_Readings.txt.

For all other language we set group labels according to the procedures described above also providing specific collation rules, because we want to be able to sort special characters, numbers and symbols in a different order. With this approach we will fulfill all requirements a technical editor may come up with in future :-]

Sorting within groups

Sorting will be achieved using Java standard funcionality. Java implements the Unicode Collation algorithm (http://unicode.org/reports/tr10/tr10-12.html). Which means that we can feed our algorithm with custom collation rules and can get a custom sort order by defining these collation rules.

A previous version of this program was realised using a Saxon9 extension and plain Java. The class RuleBasedCollator (http://java.sun.com/j2se/1.4.2/docs/api/java/text/RuleBasedCollator.html) was used for this purpose. Now I'll be using <xsl:sort> element with @lang and @collation attributes set. Basically sorting will be performed with Quicksort on the Unicode positions with some exceptions kept in mind.

Formatting

This is actually the tricky part. Whereas sorting can be schieved with standard algorithm and standard tools" Formatting requires some kind of deep XSL-Fo and XSLT know-how.

We can distinguish between two problems:

For the first part we can apply the following algorithm:

For all sorted entries:

This sounds trivial. Unfortunately we can get into trouble when printing page numbers because as clearly shown in the for loop above we do not print some entries and therefore we do also not print links an page numbers which refer to text paragraphs" This a big problem!

To solve this we put a fingerprint onto each index entry in order to find those "hidden" links later and assign the complete list of sorted entries as a parameter into the for loop above.

For each entry that will not be printed (because it is a duplicate) we save its fingerprint, and loopover the provided sorted entries structure to set a link and page number to the text paragraph it refers to (by comparing fingerprints). Now we have a O(n^2) performance only because of page numbers"

And the fun continues when it comes to eliminating duplicate page numbers :-]

Conclusion

As you can see, you can get a very good looking BOTB index by applying trivial algorithms, lookup tables provided by unicode.org and sort orders powered by Wikipedia. No need for a 5000$ commercial library. Within the next post I will release UWE's stylesheet and DTD into the public domain. Stay tuned :-]

June 29th, 2013

UWE Page-Flip Booklet

Creating animated page-flip books using Flash has been around for years. HTML5 can do the same now. You may check out turnjs library, which turns any set of HTML pages into a page-flip animation.More advanced will be combining pdfjs library and turnjs in order to convert a PDF into a page-flip booklet. This approach has been under development and a sample can be found on this page. Unfortunately turnjs library is not for free "“ for my purpose "“ so I am going to stick with a little jquery plugin booklet by putting PNGs onto each page and voila here's the result" "added this as a special feature to the newly added PNG output format and renamed it to PFLIP format :-]

Try it out.

June 22nd, 2013

PNG thumbnail format

Just added a new format which was pretty cheap to implement since the transformation step basically interfaces with ImageMagick in order to convert a PDF into multiple PNG images (for each page one image) provided as a ZIP folder.

UWE: Select the output format - New PNG format

The size of the resulting images can be customized in the settings dialog.

Use cases:

June 22nd, 2013

Co-editors cont'd

Assign co-editors to chapters by selecting a username on each chapter title bar. The selected user will then be notified with an UWE message .

UWE: Co-editor assignment

June 16th, 2013

Co-Editors

Started working on a co-editor feature: You can add and remove other users to/from your list of co-editors. Co-editors may edit certain chapters of your documents but may not change the overall document structure. Co-editor updates will be handled in real-time using NodeJS. Here's a screenshot of the settings dialog:

UWE: Co-Editor Feature

April 6th, 2013

Settings and more settings

Since PDF A4 will be the leading output format I added a few more parameters to the PDF A4 converter. Especially we can set page margins and set the width of the margin column via the web interface without touching any XSLT stylesheets.

Also font sizes of headlines and floating text can be dynamically set in the option settings dialog.

UWE: Main settings
UWE: document structure settings
UWE: PDF settings

March 20th, 2013

Text variables in UWE

Globally defined text variables help you keeping important information up-to-date.

UWE: Define text variables in settings dialog
UWE: Insert a text variable into the document using the placeholder dialog
UWE: Textvariables are marked just like other placeholder shortcuts in the editor view
UWE: And and will be replaced in all output formats

March 12th, 2013

Started Working towards v0.9

Included some more boilerplate images today, mainly safety signs which you can get roalty free from wikipedia, e.g. here http://de.wikipedia.org/wiki/Warnzeichen.

UWE: Insertion of boilerplate images

March 9th, 2013

UWE v0.8 released

Check out UWE's new version here http://www.uwe-editor.de/login/ and log in with user:test and password:test or create a new account. Do not be shy :-] You may also read the fine manual (german)"

March 9th, 2013

Finishing touches on UWE's groupware tools

Just integrated a "Messenger" entry in the menu bar that shows the messenger panel and all of the chat messages and comments in which the editor's username has been marked. By clicking on a specific message the related document and specific comment/message will be loaded in a new browser tab"

UWE: Messenger Panel

Also got fancy with tooltips and put them in every reasonable place, e.g. when mouse hovering over a document link in a comment:

UWE: Metadata Tooltip in comments

In the documents list view:

UWE: Metadata infos in document list

February 24th, 2013

UWE's real-time commenting service

Here's a screenshot of UWE's new real-time commenting service. You can put comments on each paragraph. Your fellow peer editors will see your comments immidiately without the need of refreshing a web page.

UWE: Real-time comments
UWEs HTML Preview format bundles some collaborative features:

Of course all of these features are optional and may be turned on and off in the menu bar.

February 17th, 2013

Introducing UWE Messenger

I'm sorry that the blog hasn't been updated as much for the last 6 month, but do not worry UWE is still under heavy development :-]

Currently I have been in the process of evaluating some new technology which is:

Using these two JS libraries implementing the new discussion application was pretty simple.

I guess that moving forward towards collaborative editing of documents in real-time will be much more of a challenge.

But a good thing is, that this complex task has already been subject of intensive research. Applications which use CKEditor (which is the driving force behind UWE) as a basis for the clientside do already exist elsewhere on the web, see here http://cooffice.ntu.edu.sg/cockeditor/ or for a even more fancy screencast here http://vsr.informatik.tu-chemnitz.de/demo/GCI/ckeditor.html.

This is how the brand new discussion application looks like:

UWE: Messenger: Communicate your changes effectively to all other users.

November 25th, 2012

Subtitles

Added a new element subtitle. A subtitle element will be printed in bold and will be held together with the preceding element on the same page. It will be numbered just like table headers and figure captions. Subtitles can be inserted using the placeholder dialog or shorthand: Just type [[subtitle: a new subtitle]] in a new paragraph.

UWE: Placeholder Dialog window for editing subtitle elements

November 10th, 2012

Long time no see

I put UWE development on hold for a while but finally came back this weekend with two new features:

1. The Verbatim element which comes in handy when editing source code and other preformatted text.

UWE: Input of verbatim element

This is how the new element looks like in editor mode. Text can be copied and pasted from source code files and other documents.

UWE: verbatim element in HTML preview

Thanks to Google Prettify keywords in source code will be automagically marked in html preview.

UWE: PDF page with verbatim element

2. The Chaptertoc element which lets you insert tables of content in subchapters. These TOC elements will only be available in PDF "book" formatting option.

UWE: Chapter-toc element set on level 2

August 28th, 2012

Added interactive tag-cloud

UWE: Animated Tag-Cloud on the front page.

August 22nd, 2012

Finalized UWEv0.5 content model

UWEv0.5 is featuring a small but well-designed set of semantic elements. The purpose of the element role is yet to be specified. It will transport meta information for exporting data into other formats (e.g. PI-MOD).

UWE: DTD content model for the UWEv0.5 editorial publishing system

More layout specific elements such as TOC, index, list of figures, backpages, " can be set when configuring PDF output in the metadata dialog.

August 16th, 2012

Safety messages according to ANSI Z535.6

Just implemented safety messages according to ANSI Z535.6. You may read an interesting article on this issue on the tekom site.

UWE: Insert templates for safety messages

Insert templates for safety messages using a drop down menu in the UWE editor view and replace place holder text for cause and consequences

UWE: This is how the output looks like in HTML format.
UWE: ...and this is how the PDF will look like.

August 6th, 2012

UWE speaks English!

Working towards UWEv0.8: All boilerplate text strings of the user interface need to be translated into English. Fortunately django comes with a built-in i18n support. So this task could have been completed pretty much straight forward using the recommended GNU gettext modules.

But as outlined in a previous post I want to make UWE very portable (Window$). Therefore I stick back to plain old Python dictionaries.

I put all multi-lingual strings into one file called locale.py (this file has been attached to this post).

UWE: Localization Dropdown

If you want me to include more languages, feel free to translate the strings into the language of your choice and send a new locale.py file back to me :-]

July 27th, 2012

UWEv0.5 launched

I have just released UWEv0.5 one month earlier than planned because I have a new job coming up in September. See https://www.uwe-editor.de

You may login as user:test and password:test (http://uwe-editor.de/login/). Then play around with the test document

UWE has been built upon rock solid Django technology, e.g. Django serves the Mozilla add-on site which receives a lot of traffic:

See: http://reinout.vanrees.org/weblog/2011/06/06/large-mozilla-sites.html

July 10th, 2012

Very good looking documents!

Working on the manual live the editor:

UWE: Writing the manual with UWE

And the output: Please note that the page in the screenshot has been generated by free Open-Source Software and has been edited only by using a browser window

UWE: Screenshot of pages 10 and 11 in UWE's manual
Two difficult layout issues are:

I think Apache FOP does a very good job and can compete with commercial solutions, which cost up to 7000$ dollar per server (only the XSL-FO Renderer).

July 9th, 2012

UWE v0.5 is feature-complete

Proud to announce that UWE version 0.5 is feature-complete. Today I have started working on the manual, using UWE of course.

UWE: First four pages of UWE's manual

July 8th, 2012

Diffing Screenshot

The diffing algorithm as I described in the previous post works awesome! Here is a screenshot:

Displaying XML Diffing Information in UWE's HTML Preview

Note: This functionality can be turned on and off just by clicking a button without triggering a page reload.

July 2nd, 2012

Diffing in UWE

One of UWE's key features will be managing and comparing document versions. Whereas commercial solutions employ a rather scientific approach, see e.g. here and/or a rather complex one, see here " so called "diffing feature" in UWE will be implemented in a very simple way. UWE's WYSIWYG editor is the only place where you can modify UWE documents. Thus if we assign an unique ID on each element that we insert (paragraphs, lists, tables, images, ") we will be able to use the following algorithm in order to mark changes when comparing two different versions of one document:

FIRST STEP: Analyze versions

At this point we have marked elements in both versions. But what we want to have is one single document in which all marked elements will be merged in correct order. Thus the next step will be merging old and new version. Actually this step reassembles to copying elements which have been marked as DELETED from the old version into the new version. The tricky part is putting these elements into the right place, but with some magic XPATH selectors we have successfully been coping with this problem.

SECOND STEP: Merging

Now we have one document with all elements marked. Everything could have been done using XSLT stylesheets so far. But when detecting atomic text changes we will need to use Python's difflib.

THIRD STEP: copy old text of CHANGED elements into merged document in order to use Python's difflib FOURTH STEP use Python's difflib in XSLT stylesheet extension call on merged document

FIFTH STEP a simple XML to HTML transformation will visualize all changes: red colored and crossed through text for deleted elements and green colored text for added elements.

June 25th, 2012

UWE in a box

Having even more UWE promotion material in the pipeline:

UWE promo box!

June 19th, 2012

Portability counts: UWE's lightweight software components

Maximum portability is one of the main design goals of UWE's system architecture. Since some commercial systems are so hard to configure/install and are bloated with a huge amount of underlying technologies and third party components I wanted to make a system that runs from an USB stick "“ just to show off how powerful and light-weight Python technology can be. Therefore all backend components are portable aswell and fit on an USB device, which are CherryPy's HttpServer, SQLITE database and WHOOSH Fulltext-Search. UWE's setup scales easily for a team of 2-5 members and hundreds of documents and can be considered as a problem-adequate solution "“ but just in case you want more: All backend components could be easily replaced by mature counterparts, e.g. Nginx Webserver, Postgres database and Lucene fulltext search. Thanks to Python's ability to compile into an .exe File UWE runs from an USB stick and installation boils down to extracting a ZIP archive onto your harddrive.

Here is a complete list of UWE's third party software components:

June 13th, 2012

Added RTF format for postprocessing documents in MS WORD and OpenOffice

Since version 0.95 Apache FOP supports RTF format. I added it to the UWE output formats and UWE v0.5 is almost feature-complete. Remaining work will be adding export and import functionality, applying a diffing algorithm for comparing documents, copying of documents accross user folders and make the Javascript GUI work with all major browsers including CHROME.

UWE output formats HTML, RTF and PDF

June 11th, 2012

Botb index implementation in UWE

Just finished implementing back-of-the-book index feature using CKEditor's new placeholder plugin. You can insert index entry level 1 and index entry level 2 elements into the popup dialog as shown in the following screenshot

UWE: Input of index entries separated by commata and marked as xe1 and xe2

This is how the index will finally look like in HTML preview and PDF format:

UWE: Botb Index in HTML Preview format is displayed on the sidebar
UWE: Index is optional in PDF format and will be printed at the end of the book as usual

Since it was so easy to implement and I did not need to touch the JavaScript code very much I will also use CKEditor placeholder plugin for insertion of globally defined text variables, non-breaking text and footnotes. Transformation into corresponding XML tags is performed using regular expressions on the XSLT side in a preprocessing step.

June 8th, 2012

suppress-duplicate-pagenumber property in Apache FOP

Today I have stumbled across a common problem: We do not want to have duplicate page numbers in back-of-the-book index entries. Commercial XSL-FO rendering engines like Antenna House Formatter come with a built-in solution, see http://www.antennahouse.com/xslfo/axf5-extension.htm#axf.suppress-duplicate-page-number . Unfortunately Apache FOP does not provide such a solution. A quick google shows that this problem still remains unresolved in the public domain, see e.g. http://comments.gmane.org/gmane.text.xml.fop.user/28122 . So in the rare case that an editor declares two index entries on the same page, we will have an output like this:

If we investigate the AreaTree intermediate XML format of Apache FOP the corresponding section will probabely look like this:

We can apply an XSLT transformation on the AreaTree XML in order to remove the duplicate page number and the accompanying commata. I tried that with the following rules:

Copy all elements into output tree but if we find an <inlineparent> element with the same text content as its preceding sibling then do not copy. To be more specific check for number content only and only if <inlineparent> elements are separated by commata and if <inlineparent> elements are links. Speaking in XPATH:

And voila all duplicate page numbers are gone:

As you can see there is too much space after the last number. I tried to insert spaces and dots into the leader according to the number of removed page numbers, but you never know how many digits the removed number will consist of. A perfect solution will require deeper understanding of the rather undocumented AreaTree format and will involve some calculations. For now I can live with this approach

June 8th, 2012

UWE (Uwe is not a WYSIWYG Editor)

Hi there! I have currently been in the process of putting all my XSLT know-how into one big project. Yes! I am creating a XML content management system from scratch featuring the latest web technology without any commercial third party components. Today I am proud to show you some screenshots:

The user folder view lists all documents and images and lets you create any available format e.g. PDF on the fly
Split screen with fullscreen option lets you edit document metadata and get the chapter structure right
A mix between structural input and WYSIWYG Editor allows you to edit complex XML data
Also integrates a feature rich formula editor
Any registered user can easily add comments on published documents
Features a real full-text search over all objects in the database
Admin screen where you can configure and edit all objects in the database
Simple two column layout in resulting PDF
Margin layout with colored background in resulting PDF

2000-2008

Older stuff

Well, this is my older homepage (german speaking)

Copyright (c) by Alex Düsel 2000-2020. All rights reserved.