speaking about restrictions and limitations we should always keep in mind
that *reasonable* restrictions and conventions are very useful.
Moreover at a closer look some restrictions appear to be not restrictions
but just *rules* and I'll try to illustrate this point below.
Christoph Borgmeier writes:
> * all objects of one type must be stored in the same array. That might
> lead to problems with temporary and semi-temporary objects, i.e. objects
> which should not be stored or only be stored under certain circumstances.
>
Why isn't it possible to have 2 TCloneArray's or TObjArray's ?
I'm presently working on comparing 2 different pattern recognition
algorithms and 2 TObjArray's of tracks (with different names!)
and 2 different arrays of track segments (again - with different
names) coexist in the code just fine...
For the objects which have to be stored only under certain
circumstances one could keep pointers to them in the event object
and to set it to zero if such an object has not to be written out.
> * all `integer pointers' point into the same array. This forbids the use
> of polymorphism, which is a major advantage of the ROOT system.
>
Why isn't it possible for a track object to have one integer data
member being a number of the primary vertex and another integer
being a number of the calorimeter tower pointed to by a track?
> * the objects pointed to are not defined by language constructs, the
> relations are not stored explicitly in the DB. So any code reading the DB
> has to have already built in the additional information about the
> relations.
>
In case of ROOT it is a Streamer function which writes/reads
an object to/from ROOT file. So code writing the ROOT file already
has to have built-in knowledge about the things it writes out.
As the same Streamer function does both reading and writing
there is nothing wrong with the same "knowledge" to be available
on the read branch.
> It seems to me that there are many possible applications for such missing
> features, e.g. reconstructed decays with generalized particles like
> different kinds of tracks and calorimeter information. They could also be
> related to different types of detector hits and calorimeter clusters.
>
> One could also have different sets of reconstructed tracks, e.g. simple
> hit fits or results of global vertex fits. One might also get a set of
> competing reconstructions and store only some of them. These things will
> have a rich substructure and will not fit well into `flat' arrays.
>
> ROOT provides part of the necessary power, e.g. the possibility to store
> canvases with deep polymorphic substructures. But up to now, I failed to
> store parts of my events in a similar way.
>
Here is an important comment: if we consider the requirements to run-time
representation of the objects and to their persistent representation
it is easy to see that they are very different. At run-time one needs
the representation which would be as convenient and efficient as
possible so, for example, it makes sense to store in track/particle
object track px,py,pz, pt, mass, eta, phi and energy calculated once
to avoid multiple calculations of square roots(again - root ...),
sines, cosines etc.
On the other hand when the object is being written out, one of the major
requirements is data compression and compactness, so it is quite enough
to write out just 4 numbers out of 8 listed above. This is a simple
example, which shows that in real life it may not be necessary to make
persistent all the cross-linked structures which exist at run time.
There is a practical experience of BaBar collaboration who are using
Obj/DB (which has all the nice features listed by Cristoph in it)
for I/O. To keep I/O efficient BaBar people create large persistent
objects which they call "banks", pack run-time objects into the banks
write banks into Obj/DB. After the banks are read back, they
are unpacked and the run-time objects restored.
Conclusion: to keep I/O efficient one may not want to consider
writing out exactly "flat arrays" rather than "rich substructures".
> The first reason (*) is, that there might be cross-links inside the object
> tree. These should not lead to the same object stored twice, there seems
> to be a hash table mechanism to avoid this, but I could not make it work
> yet.
>
BTW, working with integers (indices of the objects in their arrays)
rather than with pointers solves this problem automatically.
> The second problem is the deleting of the sub-objects. In my event
> example, the objects containing the pointers do not necessarily _own_ the
> objects pointed to (maybe because of *), that's why I tried to create my
> own garbage collection for this. (It failed so far to free all memory of
> ~1000 events, but I am still struggling.)
This is a general problem with the lists(containers) keeping
pointers to the objects. Normally people use one of the 2
following solutions:
- one could have lists which "know" whether or not they own their
objects (have DeleteObjects flag), so destructor acts accordingly
to the value of this flag;
- one could use Clear() instead of Delete() for the containers which
do not own objects stored in them in the destructor (ROOT case).
I don't see any practical difference between these 2
approaches: in both cases it is the user who decides if the objects
pointed to have to be deleted... Just minor differences in coding and
a matter of personal taste...
I wouldn't like people to get an impression that I'm arguing with Cristoph.
I agree with his observations, but it seems that they are mostly theoretical.
What I tried to show is that for each concrete case mentioned by Cristoph
there is a practical solution with which I personally am pretty comfortable.
But again - all this is mostly a matter of taste.
Regards, Pasha.