Re: again ROOT db (long)

Rene Brun (Rene.Brun@cern.ch)
Tue, 31 Mar 1998 11:18:30 +0200


Christoph Borgmeier wrote:
>
> Dear ROOT developers and users,
>
> please let me come up again with my old problem of pointers in the ROOT
> DB. On my previous questions on this topic, I was strictly recommended to
> store data in a ROOT DB like shown in the `event' and ATLFAST example. Let
> me recollect this:
>
> All objects of an event should be stored in TClonesArrays, one
> TClonesArray for each type. Integer indices describe relations between
> table entries. The advantages of this are clear:
>
> * fast vertical access
> * good compression possibilities
> * TClonesArrays avoid some memory allocation and deallocation
>
> But it's obvious, that this approach also bears some restrictions:
>
> * all objects of one type must be stored in the same array. That might
> lead to problems with temporary and semi-temporary objects, i.e. objects
> which should not be stored or only be stored under certain circumstances.
>
> * all `integer pointers' point into the same array. This forbids the use
> of polymorphism, which is a major advantage of the ROOT system.
>
> * the objects pointed to are not defined by language constructs, the
> relations are not stored explicitly in the DB. So any code reading the DB
> has to have already built in the additional information about the
> relations.
>
> It seems to me that there are many possible applications for such missing
> features, e.g. reconstructed decays with generalized particles like
> different kinds of tracks and calorimeter information. They could also be
> related to different types of detector hits and calorimeter clusters.
>
> One could also have different sets of reconstructed tracks, e.g. simple
> hit fits or results of global vertex fits. One might also get a set of
> competing reconstructions and store only some of them. These things will
> have a rich substructure and will not fit well into `flat' arrays.
>
> ROOT provides part of the necessary power, e.g. the possibility to store
> canvases with deep polymorphic substructures. But up to now, I failed to
> store parts of my events in a similar way.
>
> The first reason (*) is, that there might be cross-links inside the object
> tree. These should not lead to the same object stored twice, there seems
> to be a hash table mechanism to avoid this, but I could not make it work
> yet.
>
> The second problem is the deleting of the sub-objects. In my event
> example, the objects containing the pointers do not necessarily _own_ the
> objects pointed to (maybe because of *), that's why I tried to create my
> own garbage collection for this. (It failed so far to free all memory of
> ~1000 events, but I am still struggling.)
>
> My question now: has anyone else found the need of features described here
> and maybe solved the two problems? Do I overlook important aspects?
>
> Best regards and thank you very much for the help
> Christoph
>

Hi Christoph,
Your analysis is perfectly right.
As described at http://root.cern.ch/root/HowtoWriteTree.html
Root provides two modes for creating a Tree:
- Serial Mode based on the class Streamer function
- Split mode

Most Root examples illustrate how to use one of these two modes,
rarely the two modes together. In a real application, most likely,
these two modes must be used with a compromise between functionality
and performance.

The serial mode
================
-Advantages: Let's assume that obj is a TObject (or derived).
obj.Streamer takes care of serializing this object into a buffer.
If obj has members that are other objects or pointers to objects,
in turn all these objects are serialized into the same buffer.
In case obj is a graph (the same object may be referenced by several
pointers, the Root serialization mechanism takes care of writing only
once this object to the buffer. When reading this object back, Root
will also take care of rebuilding the object graph as it was when
writing it.
For example, when you select the "SaveAs file.root" option in the
"File" canvas menu, the object "canvas" is serialized via
canvas->Streamer. This canvas may have subpads, each subpad may
contain many objects. The same object may be in several pads.
In the same way if you have a pointer to a complex event structure,
event->Streamer takes care of serializing all objects in the event
making sure that objects are saved only once.
This technique is pretty simple and efficient. It must be used
in you want to preserve the internal relationships between
the objects referenced.

-Disadvantages: The problem with the serialization mode is that
you must read back the complete object graph as it was written.
You cannot read only a subset of it.

The Split mode
==============
-Advantages: The split mode is provided to structure the output file
such that one can access later a subset of an object. The default
Root split mechanism tries to analyze the object components
down to basic types if possible and for each subcomponent it creates
a separate branch. This branch has its own buffer. Since version 2.0,
a branch may also be written to a separate file
(see TBranch::SetFile).
During the analysis, only the branches referenced in a query are read
in memory. This facility may speed up considerably the analysis time.
In particular, in case of arrays of identical objects (very frequent
for objects like hits, digits, tracks), Root provides a
very efficient class TClonesArray to bypass the inefficient
new/delete operators.

-Disadvantages: The automatic split mode has many restrictions
(listed in the URL above). We hope to remove some of
these limitations in future versions. In particular,
cross-references between objects residing in different branches
cannot be automatic.

The best of the two worlds
==========================
A good object model is clearly the best compromise between a coherent
object model preserving the internal object structure and the
requirements
to access subsets of an object graph/tree.
To take the example of an Event class, a good structure should look
like:
- object header
- some pointers to objects graphs (will be serialized)
- As many pointers to TClonesArray as possible.

Ideally, one should be able to automatically split the Event class
into branches (case of $ROOTSYS/test/Event example). This example
combines the two modes. We also provide a different example (ATLFast)
where branches are built by the application.
It could be that some special classes must be added to the system to
cover more general cases. We will be happy to add such classes to Root
if they appear to be of general interest.

Rene Brun