Hi Christoph,
Your analysis is perfectly right.
As described at http://root.cern.ch/root/HowtoWriteTree.html
Root provides two modes for creating a Tree:
- Serial Mode based on the class Streamer function
- Split mode
Most Root examples illustrate how to use one of these two modes,
rarely the two modes together. In a real application, most likely,
these two modes must be used with a compromise between functionality
and performance.
The serial mode
================
-Advantages: Let's assume that obj is a TObject (or derived).
obj.Streamer takes care of serializing this object into a buffer.
If obj has members that are other objects or pointers to objects,
in turn all these objects are serialized into the same buffer.
In case obj is a graph (the same object may be referenced by several
pointers, the Root serialization mechanism takes care of writing only
once this object to the buffer. When reading this object back, Root
will also take care of rebuilding the object graph as it was when
writing it.
For example, when you select the "SaveAs file.root" option in the
"File" canvas menu, the object "canvas" is serialized via
canvas->Streamer. This canvas may have subpads, each subpad may
contain many objects. The same object may be in several pads.
In the same way if you have a pointer to a complex event structure,
event->Streamer takes care of serializing all objects in the event
making sure that objects are saved only once.
This technique is pretty simple and efficient. It must be used
in you want to preserve the internal relationships between
the objects referenced.
-Disadvantages: The problem with the serialization mode is that
you must read back the complete object graph as it was written.
You cannot read only a subset of it.
The Split mode
==============
-Advantages: The split mode is provided to structure the output file
such that one can access later a subset of an object. The default
Root split mechanism tries to analyze the object components
down to basic types if possible and for each subcomponent it creates
a separate branch. This branch has its own buffer. Since version 2.0,
a branch may also be written to a separate file
(see TBranch::SetFile).
During the analysis, only the branches referenced in a query are read
in memory. This facility may speed up considerably the analysis time.
In particular, in case of arrays of identical objects (very frequent
for objects like hits, digits, tracks), Root provides a
very efficient class TClonesArray to bypass the inefficient
new/delete operators.
-Disadvantages: The automatic split mode has many restrictions
(listed in the URL above). We hope to remove some of
these limitations in future versions. In particular,
cross-references between objects residing in different branches
cannot be automatic.
The best of the two worlds
==========================
A good object model is clearly the best compromise between a coherent
object model preserving the internal object structure and the
requirements
to access subsets of an object graph/tree.
To take the example of an Event class, a good structure should look
like:
- object header
- some pointers to objects graphs (will be serialized)
- As many pointers to TClonesArray as possible.
Ideally, one should be able to automatically split the Event class
into branches (case of $ROOTSYS/test/Event example). This example
combines the two modes. We also provide a different example (ATLFast)
where branches are built by the application.
It could be that some special classes must be added to the system to
cover more general cases. We will be happy to add such classes to Root
if they appear to be of general interest.
Rene Brun