UML Diagrams with MetaUML
I'm a big fan of UML as a standardized notation. I haven't been a big fan, though, of UML generation software. My first experience
with UML diagramming software was Rational Rose: I
was working for a real estate firm with deep enough pockets to buy RR licenses at the time, or else I doubt I ever would have tried it.
What I rapidly found out was that Rational Rose is really great for trivial cases, but breaks down entirely for cases where it might
actually be useful. It could, of course, scan your entire project and auto-generate a UML diagram for you. It couldn't, however, figure
out which relationships were meaningful and which weren't, and which methods and fields were relevant to a human — so you ended up
looking at everything. Are serialVersionNumbers relevant fields? What about relationships to java.lang.String
? getters and setters?
Is a class with a java.util.LinkedList
a relationship to a LinkedList
or a one-to-many association with whatever the
LinkedList
contains?
I found that I was spending as much time removing irrelevant details from diagrams just to put something useful
together than I would have spent just adding all of the classes "by hand" — and then removing them again when I tried to re-import
the latest changes. It was also insanely opinionated on how things should look, so adding annotations and moving things around was usually
an exercise in masochism.
Then, of course, there's Visio. It seems like the perfect solution: a box-and-line generator program! It was definitely easier to work with than Rational Rose, but of course was incredibly manual in a drag, drop, point and click sort of way. Even then, it was pretty limited in functionality. If I tried to draw multiple associations out of one class to many others, they all had to originate from the middle of the top, bottom, left or right of the box — it wouldn't let me have one association originate from the middle, one from halfway between the middle and the top and a third from halfway between the middle and the bottom (at least not if I wanted the association to move with the class itself). If I had multiple attributes of the same class (very common!) there was no way to annotate a relationship with two different association names.
Visio is also Windows only; I can run Powerpoint, Word, Excel and even Outlook on my Mac, but it doesn't look like we'll ever get Visio for Mac. There's the web-based Gliffy, but in spite of their best efforts, it can't even match Visio for performance. I tried ArgoUML that somehow managed to combine the worst of Rational Rose (minus, I guess, the license fees) with the worst of Visio. And it looks like it requires JDK 1.6 (that's a maximum, not a minimum!)
Maybe there's a way to work around all of those limitations if you spend a lot of time reading the (not-so-user-friendly) documentation. Since generating UML diagrams was just a small part of my job, though, I never could justify the time it would have taken just to see if that was possible. Instead, I ended up manually working around the limitations and, in most cases, generating substandard documentation. And of course, being the command-line-fanatic that I am, I was always irritated by having to interrupt my flow by clicking the mouse, returning to the keyboard, clicking the mouse, returning to the keyboard... Being GUI-focused applications, they precluded any sort of useful automation. Rational Rose automated too much; the others didn't automate quite enough.
I finally found what i think is the optimal approach (at least for folks like me): MetaUML.
Honestly, TeX, LaTex, MetaFont, MetaPost, etc. have been on my to-learn list for a long time. I dabbled in TeX for formatting mathematical equations when I was in grad school, but I succumbed to time pressure and embarrassed myself by turning in a thesis written using MS-Word instead. I finally found a good excuse to at least scratch the surface while learning to generate diagrams using MetaUML. Rather than dragging boxes and lines around, you create an input file in a text editor like the one in listing 1.
input metauml;
beginfig(1);
Class.widget("Widget")()
("draw()");
Class.textInput("TextInput")()();
Class.button("Button")()();
Class.window("Window")()();
leftToRight(30)(textInput, button, window);
topToBottom(30)(widget, button);
drawObjects(widget, textInput, button, window);
link(inheritance)(textInput.n -- widget.s);
link(inheritance)(button.n -- widget.s);
link(inheritance)(window.n -- widget.s);
link(aggregation)(widget.e -- window.n);
endfig;
end
This is pretty self-explanatory (the .n
, .s
, etc. refer to the north, south, east and west sides of an
element). In particular, it works out how to position everything without my needing to explicit place any of the components; in this
relatively simple case, I can just tell it how the objects should be related. The resulting output from the mptopdf
command is shown in figure 1.
I don't like the straight lines; I can "stair-step" them with a few extra changes as shown in listing 2.
input metauml;
beginfig(1);
Class.widget("Widget")()
("draw()");
Class.textInput("TextInput")()();
Class.button("Button")()();
Class.window("Window")()();
leftToRight(30)(textInput, button, window);
topToBottom(30)(widget, button);
drawObjects(widget, textInput, button, window);
link(inheritance)(pathStepY(textInput.n, widget.s, 10));
link(inheritance)(pathStepY(button.n, widget.s, 10));
link(inheritance)(pathStepY(window.n, widget.s, 10));
link(aggregation)(rpathStepX(window.e, widget.e, 20));
endfig;
end
If you use the topToBottom
and leftToRight
macros to position things and then you add fields or methods later,
mptopdf
pushes the associated object down or to the right to make room!
Still, MetaUML is part of the Tex/MetaPost ecosystem, so it takes some getting used to:
- One of the hardest things to get used to with MetaUML is that the coordinate system works like ordinary cartesian coordinates (positive Y goes up, not down like we're used to), and the origin moves around to accommodate the drawing rather than staying fixed: if you define an object at -100, -100, the origin shifts up 100 units to the right and to the left, rather than the object just being invisible as it would with most other graphical libraries.
- Each
drawObjects
call overwrites the previous one(s), so everything has to be drawn at once - but the links have to follow the objects. - MetaUML is based on MetaPost, which itself is based on Donald Knuth's MetaFont: an equation solver for font creation. This is how
the
topToBottom
andleftToRight
macros do their job. You'll get an inscrutable "Inconsistent equation" error if the constraints are actually unsolvable. - If you try to link two objects without including the edges to link, you'll get an inexplicable:
! Missing `)' has been inserted. <to be read again> { ---> curl1}..{curl1} l.50 link(aggregation)(Container -- Contained);
- A metapost file is a program and, just like any program, you'll probably need to debug it after a certain point. Since it's declarative,
there are fewer debugging options than with imperative programming languages, but
show
(which is documented in the MetaPost guide but not the MetaUML guide) works well enough.
Of course, even though I can still work from the command-line, there's still a lot of tedious typing involved if you're trying to document an existing code base: to me, the best compromise between the opinionation of Rational Rose and the open world of Visio would be a converter from source code to MetaUML format which I could then manipulate in a text editor. Fortunately, since MetaUML's input is text, it's not too hard to put together a simple "parser" that does exactly that. The Python file in listing 3 reads in a Java source file, looks for private/public/protected markers and outputs the structure in MetaUML format (that's right, I wrote a Java parser in Python! You got a problem with that?)
import sys
import re
if len(sys.argv) < 2:
print("Usage: convertJavaToMetaUML <filename>")
sys.exit(0)
# TODO: doesn't deal with inner classes
infilename = sys.argv[1]
className = ''
fields = []
methods = []
for line in open(infilename):
content = line.strip()
tokenizer = re.compile(r'[a-zA-Z0-9<>\.]+|\(|\)|,|/\*|/\*')
# will skip package-private methods...
if content.startswith("private") or content.startswith("protected") or content.startswith("public"):
tokens = tokenizer.findall(line)
stage = 0
umlDecl = ''
returnType = ''
parameterReturnType = ''
parameterName = ''
first = True
inComment = False
isMethod = False
while len(tokens) > 0:
token = tokens.pop(0).strip()
# ignore whitespace
if len(token) == 0:
continue
# skip inline comments
if token == '/*':
while len(tokens) > 0 and token != '*/':
token = tokens.pop(0)
# Special handling for declarations like Map<String, Object>: if the token contains a < but not a >,
# keep concatenating tokens until the end delimiter is found
if token.find('<') > -1:
supp = token
while len(tokens) > 0 and supp.find('>') == -1:
supp = tokens.pop(0)
token += supp
# public, private or protected
if stage == 0:
if token == 'private':
umlDecl += '-'
elif token == 'protected':
umlDecl += '#'
elif token == 'public':
umlDecl += '+'
stage += 1
elif stage == 1: # scanning for return type
if token in ['abstract', 'final', 'static']:
continue
# This is either a class declaration, a field, a method, or a constructor.
if token == 'class':
while len(tokens) > 0 and className == '':
className = tokens.pop(0).strip()
tokens = [] # don't care about implements or extends (at least for now...)
else:
if token == className: # it's a constructor, don't output anything
tokens = []
else:
returnType = token
stage += 1
elif stage == 2: # expecting method or field declaration
# skip getters, setters and standard methods
if token[0:3] == 'get' or token[0:3] == 'set' or token == 'toString' or token == 'hashCode':
tokens = []
returnType = ''
break
umlDecl += token
stage += 1
elif stage == 3: # if this is a '(', starts a method. Otherwise, starts a variable
if token == '=':
tokens = [] # variable declaration; ignore rest of line
continue
if token == ';':
continue
if token == '(':
isMethod = True
umlDecl += token
stage += 1
elif stage == 4: # scanning for parameters
if token == ')':
break;
if token in ['final', ',']:
continue
parameterReturnType = token
stage += 1
elif stage == 5:
umlDecl += ('' if first else ', ') + token + ': ' + parameterReturnType
first = False
stage -= 1
if returnType != '':
if isMethod:
methods.append('%s): %s' % (umlDecl, returnType))
else:
fields.append('%s: %s' % (umlDecl, returnType))
else:
print '%s had no return type' % content
print 'Class.%s("%s")(' % (className, className)
for field in fields:
print '"%s",' % field
print ')('
for method in methods:
print '"%s",' % method
print ')'
This doesn't capture the parameters after the first line on multiline declarations; if I made it any more complicated, I'd probably be better off using a proper lexical parser, but this is simple enough to capture the sort of code that comes out of code generators like JAXB that I find myself trying to get my head around quite a bit.