Q. What is SWC format?
A. A file in SWC format contains information representing a
digitally reconstructed neuron. SWC is non-proprietary and
stores the minimum amount of parameters required to represent a
vector-based three-dimensional reconstruction. Files may begin
with headers above the data values, each beginning with #.
Parameters are organized into 7 columns, where each row within
the column represents one trace point. From left to right these
columns are: unique identity value for trace point, structure
type, x coordinate, y coordinate, z coordinate, radius, identity
value for parent (i.e. trace point that comes before and
connects to the current trace point). The first 10 points of an
example SWC file are provided below:
#Example header text here
1 2 4882 1797 19 9 -1
2 2 4882 1797 19 9 1
3 2 4875 1821 19 9 2
4 2 4852 1849 19 21 3
5 2 4842 1827 18 12 4
6 2 4835 1816 18 7 5
7 2 4827 1807 18 7 6
8 2 4814 1797 18 4 3
9 2 4803 1785 18 4 8
10 2 4785 1763 18 4 9
The bolded row represents one trace point which has been given an identity = 3, type = 2 (i.e. axon), X = 4875, Y = 1821, Z = 19, radius = 9, and trace point 2 is its parent (i.e. the trace point represented in the row directly above it).
Columns 1,2, and 7 are always integers. Columns 3,4,5, and 6 represent whatever units were used in the reconstructions process (e.g. pixels, micometers, etc.) and can have decimal points.
Column 1 (Identity #) must always increase in value by 1 whereas the column 7 (Parent Identity #) values have no such restriction but must be less than column 1 in the same row. Note that both rows 4 and 8 in the above example connect directly to row 3, meaning that row 3 must be a bifurcation point.
Row 1 has a parent = -1, which means that this row does not have a parent and is thus the root of the reconstruction.The commonly accepted values for Column 2 that are pertinent to the DIADEM datasets are: 1 = cell body; 2 = axon; and 3 = dendrite.
Q. Is there a difference in the metric for different data
sets other than dimensionality?
A. There are a number of differences in the metric between
datasets which are detailed on the individual dataset readme
pages (except for the threshold differences).
The thresholds, both distance and path length error
thresholds, are different to account for the resolution
in both XY and Z directions. The thresholds for spur
(a small terminal branch) removal vary by dataset as well
(some don't remover spurs). Also, the Neuromuscular
projection dataset handles terminations in a different
manner because of the rosette structures.
The manual reconstructions end at the beginning of those
structures, but the metric will not punish automated
reconstructions that trace into the rosette structures.
Q. How can we set the matching threshold?
A. The readme for the current (post-competition) version of the metric
describes how threshold parameters can be set.
Q. In first data set, "Cerebellar Climbing Fibers",
the individual planes are merges of a panel of capture
stacks. Distortions in neuron shape are visible at some
of the boundaries. Is it possible to either
fix it, or in addition, release the data as individual
stacks that need to be merged?
A. Merging was not performed by hand, but with the leading
software controlling the motorized stage of the microscope.
Mechanical error limits in stage movement are evident at
the small scale of climbing fibers though they are not large
enough to impair manual tracing.
The released data set corresponds to the original acquired
images and was not tiled in a post-processing step.
Thus, individual panel stacks are not available.
Although the results may not be optimal, they are
representative of the typical experimental configuration
in a modern neuroanatomy lab.
Q. Do different datasets have drastically different
thresholds?
A. Yes. The thresholds are as follows:
Cerebellar Climbing Fiber
XY Euclidean Distance: 37.33 pixels (1.4 microns)
Z Euclidean Distance: 4 images (1.3 microns)
XY Path Error: 0.075
Z Path Error: 0.18
Hippocampal CA3 Interneuron
XY Euclidean Distance: 11 pixels (2.4 microns)
Z Euclidean Distance: 14 images (4.67 microns)
XY Path Error: 0.08
Z Path Error: N/A
Neocortical Layer 1 Axon
XY Euclidean Distance: 4.76 pixels (1.4 microns)
Z Euclidean Distance: 5 images (5 microns)
XY Path Error: 0.07
Z Path Error: 0.18
Neuromuscular Projection Fiber
XY Euclidean Distance: 32 pixels (1.2 microns)
Z Euclidean Distance: N/A
XY Path Error: 0.04
Z Path Error: N/A
Olfactory Projection Fiber
XY Euclidean Distance: 3.94 pixels (1.3 microns)
Z Euclidean Distance: 5 images (5 microns)
XY Path Error: 0.08
Z Path Error: 0.2
Visual Cortical Layer 6 Neuron
XY Euclidean Distance: 9 pixels
Z Euclidean Distance: 6 images
XY Path Error: 0.08
Z Path Error: 0.2
Q. Is the source code for the DIADEM metric
(scoring function) available?
A. Yes, it can be downloaded
here.
The following are the MD5sums for the DIADEM metric source code:
If downloaded on/after May 7, 2012:
2b06bbe336d763e269ef2555d7c4c3ef
If downloaded between February 19, 2010 and May 7, 2012:
a7c9daa3564e947e22f0b707a0bb3a95
If downloaded between January 26, 2010 and February 19, 2010:
82368ef91ede897b87559d24909e80f6
If downloaded between November 25, 2009 and January 26, 2010:
14d147ba30e84de13d343fb349c70ca2
If downloaded before November 25, 2009:
4b1b8cb075e53b7f45fb9c3e65c79ce1
Q. Why is the DIADEM metric so complex?
Was the choice somewhat arbitrary?
A. The metric implements our best attempt to quantify the human judgement
of what differentiates a good reconstruction from a bad one. Since we agreed
on a "manual" gold standard, there is some inherent arbitrariness.
The basic idea is simple: the nodes of the trees should be in the right
position, their topological interconnectivity should be accurate,
and the path distance in reasonable range. However, there are many
different cases of possible "errors" or "variations", and these are
judged differently depending on the impact they have on the overall structure.
Moreover, the various datasets have different characteristics (representative
of experimental diversity encountered in real-lab scenarios) which are
reflected in additional requirements. These qualifications account for most
of the metric complexity.
Q. What program can be used to open the .rar data set files?
Is there a free, downloadable program for this?
A. Please carefully read the
Data Set General Readme on the website.
It indicates PeaZip as one example.
A possible alternative is Zipgenius.
These were both free last time we checked.
A google search for ".rar" will find many other hits. As usual with freeware, read carefully during installation to make sure you uncheck any add-on programs you don't want. PeaZip didn't have any last time we checked, but just in case...
Q. Is the output of the algorithm supposed to be the "segmented" tree, i.e.
a binary file in which the voxels representing the neuron have one value and
everything else have a different value? Or is the "digital reconstruction"
simply the 1-voxel thick centerline that can be extracted from this
segmentation, which then allows for determining interbranch length,
bifurcation and termination nodes etc.?
A. A digital reconstruction, the output of the algorithm, consists of series
of interconnected vectors, not voxels. Although in principle each of these
vectors is associated with a thickness, the DIADEM metric only considers the
branching topology, path distance, and position of the nodes, thus diameter
does not affect the computation of the score.
Q. Are edge-detection, image thresholding, and tree enhancing filters part
of the purpose of the challenge?
A. Any methods that can help automating the production of digital
reconstructions from sets of images may be relevant to the DIADEM challenge.
Q. In the manual reconstructions provided for the Olfactory Projection training
data set, some of the branch points appear slightly misaligned with the
underlying labeled structure. Will this affect the scoring?
A. An example of a branch point that appears slightly misaligned with the
underlying structure (from the OP_2 Training Round data set) is shown in the
figure below (red arrow).
Manual reconstructions have been tested to see if these points affect scoring.
Specifically, a correctly re-aligned reconstruction was compared to the
original file included in one of the data sets. None of the non-terminating
nodes were missed (see next FAQ for further observations on terminating nodes).
We have therefore left the reconstructions as they were originally traced.
Q. In the manual reconstructions provided for the Olfactory Projection training
data set, some of the termination points appear to vary in terms of distance
from the underlying labeled structures. Will this affect the scoring?
A. An example of two terminations points that end at varying distances compared
to the underlying structures (from the OP_2 Training Round data set) is shown
in the figure below (red arrows).
Manual reconstructions have been tested to see if variation in the positions of
termination trace points affects scoring. Specifically, a correctly re-aligned
reconstruction was compared to the original file included in one of the data
sets. Two terminal nodes were missed, resulting in a final score of 0.989.
This score is nearly perfect and well within the typical range observed between
two manual reconstructions by independent experts from the same underlying
image stack. Because such minor differences are unlikely to affect algorithm
rankings, we left the traces as they were originally traced.
Q. Does the metric account for possible floating point error in determining
whether a node is within threshold distance in the Z-direction?
A. The current version of the metric provides a
small additional margin to the Z component of the distance threshold in order
to ensure that no floating point error can affect scoring.
Q. In the swc file, does column 2 (type or tag of the tracing point) influence
the scores, that is, should the program correctly determine whether it is an
axon or a dendrite?
A. No
Q. The same branch between two bifurcations can be divided by intermediate
points differently. Does this choice affect the score?
A. The metric is based on the location of the nodes (bifurcations and
terminations), but the distance along the path does affect the computation of
the score (as explained in the Rules of the competition). Therefore, the
intermediate points should follow the image path as accurately as it is
necessary to ensure that the branch path length is accurately reproduced.
Q. The DIADEM metric provides a dramatically incorrect score and/or ignores a
large portion of my SWC file. What am I doing wrong?
A. Most likely you have an older release of the metric
(version prior to 11/25/2009) and need to download the more recent version.
An error in the previous release of the metric occurred if any line of data
did not contain the precise formatting expected (e.g. tabs between data, any
character other than a normal space at the end of a line).
The line for node 4 in the climbing fiber CF_1.swc contained a tab at the end.
This caused the line to be ignored and thus all descendant nodes could not be
attached to the tree. Ultimately the metric would conclude without a clear
error, but would likely return very poor scores for automated traces run
against the gold standard CF_1.swc. The updated metric ignores any whitespace
at the end of a data line, though any non-whitespace characters
(or any incorrect formatting) causes the program to terminate with an error
message detailing the file and line number of the improper data format.
Tabs and spaces are now treated equally to provide greater flexibility,
though other programs may have more demanding format constraints.
As before, lines beginning with the "#" symbol are seen as comments and are
ignored.
Q. In some data sets, there are several branches that I would have manually
traced differently than the training reconstruction.
How can these be considered objective gold standards to evaluate automated
tracings?
A. Experimental data is sometime ambiguous, and arbitrary choices are
occasionally unavoidable. Lab providers have confirmed that there is
subjectivity in the more complex data sets. The scoring thresholds
should account for much of the subjectivity. If you feel certain that
a point should have been traced differently, it is strongly suggested
that you trace it how you feel it should be traced. Getting hung up trying
to develop an algorithm that works around such problems is
counter-productive to the purpose of DIADEM.