by Stephen Plaza, Manager of FlyEM & Connectome Research Team Leader
April 23, 2020
After successfully producing the largest dense connectome in the world, do we now possess a recipe for efficiently mapping even larger nervous systems? As with any recipe, the key to success rests not only in having great ingredients (image quality, segmentation, personnel -- subjects of other blog entries) but in their successful mixing. This technical post focuses on our team’s strategies for extracting a connectome given great ingredients and explores the potential relevance for even larger connectomes.
Photo by Unknown Author modified under CC BY-SA license
Preamble and Overall Philosophy
Before delving into a recipe, we should note that this discussion about connectome-building strategies is premature. The largest connectomes before FlyEM’s half fly brain (hemibrain) were hundreds or thousands of neurons. The hemibrain improves the state-of-the-art scale by 10x, but will need another factor of 5-10x to do an entire fly nervous system and several orders of magnitude more to do a mouse. Even ignoring scale, there is an important question of what is a connectome, or, put differently, what is the actual deliverable needed for success. Do you need just the morphology of neurons and the large pathways, or is a more precise reconstruction required for analysis?
How does one plan for the next-largest connectome? Just as with many complex tasks, growing expertise and the ability to recognize pitfalls is important. Trying out new methods at large-scale is a recipe... for disaster. In FlyEM, we piloted many smaller studies, which stress-tested various aspects of the pipeline and suggested the best parameters (ingredients) for sample preparation and segmentation.
Other high-level things of note:
-
Even with major advances to machine learning, we should expect a robust need for manual quality control for the first connectomes (see previous blog entry on this topic).
-
Rotten ingredients give you a rotten product. It pays to spend a lot of time tuning sample preparation and segmentation (subject of future blog entries).
-
If your effort is large, the kitchen needs to be well-managed. We had over 100 people at various times involved in the hemibrain project. Careful management and fast-pace research seem contradictory but a nice balance is essential. This is an area of passion for me and will be discussed in a future blog entry.
-
Bad methods can also ruin good ingredients. For example, even the latest segmentation advances lead to many small errors. If we fail to focus on the overall reconstruction quality and instead focus on those small errors, that would dominate proofreading costs with limited reconstruction impact.
-
Biological priors for neuron shapes (such as light atlases) provide some useful context for connectome reconstructions. This also suggests the possible advantages of using lower-resolution EM datasets to comprehensively extract a neuron library beforehand. Furthermore, since datasets can contain catastrophic imaging errors or sample defects even after doing a lot of pre-screening, reconstructing multiple, high-resolution datasets for the same regions might be necessary to patch dataset gaps or bootstrap subsequent reconstructions.
Perhaps, most importantly, we emphasize the need to attack the complex connectome reconstruction effort from the top-down. One of the challenges with reconstructing a connectome is that reconstruction errors in a single neuron, due to the small world properties of the network, can greatly impact the larger connectome. We emphasize top-level quality checks to counterbalance these potential large-scale errors. This also has the advantage of producing a “rough” connectome more quickly, which can be improved with more time and diminishing returns.
Recipe
The following (rather technical) recipe is for reconstructing a fly brain and ventral nerve cord connectome containing over 100,000 neurons with 100 million(?) connections in ~1 year after assembling all the ingredients. For simplicity, we simply define a connectome as one where all neurons are morphologically reconstructed, large pathways are correctly identified, and the dataset is accessible to the community for targeted revision. You can modify the recipe based on reconstruction size and overall goals. The main strategy is to iteratively refine the reconstruction starting with the largest segments.
Ingredients
-
Grayscale image dataset with few alignment and imaging artifacts (generally the case for FIB-SEM using our sample preparation)
-
Segmentation that balances false merge and split errors (use multiple training passes)
-
Segmentation hypotheses for focused proofreading (note: ~1000 decisions can be done per day per proofreader, aim for >50% merge rate)
-
Synapse prediction (multiple training passes)
-
Light microscopy for as many neuron shapes as possible
-
Proofreading tools (e.g., neu3, neuTu, neuroglancer)
-
Proofreading staff and computer equipment
(~50 proofreaders, each trained for at least 2 months) -
Management staff for proofreading training and orchestrating quality controls
-
Software team to manage pipeline (4-8 people)
-
Biological experts (including in EM ultrastructure) (as many as you can get)
-
A willing sponsor! (Janelia)
Instructions (numbers in parentheses are estimates)
-
Create an initial set of high-level neuron constraints by manually annotating >100,000 soma (>50 proofreading days)
-
Determine a “working set” of segments based on the segments (ordered from largest to smallest) that contain ~25% of the predicted synapses (~1 million segments)
-
Determine an “anchor set” of segments based on the largest segments in the working set (~300,000 segments), which ideally is a relatively tight superset of actual backbones (>100,000).
-
Check and fix each “anchor” segment for major false mergers using top-down cleaving (link) (1-2 minutes per segment or ~6 proofreading years)
-
Focus proofread decisions linking anchor segments (~1 proofreading year)
-
Examine each remaining anchor segment for large potential false splits and mergers producing a “rough out” for each neuron (~15 minutes per body or ~15 proofreading years). This can be aided by comparing two neurons that are morphologically similar by using morphological-based or connectivity-based clustering (see CBLAST).
-
Focus proofread all decisions between segments in the working set (~2 million decisions or ~10 years)
-
Manually link remaining segments in the working set not merged by focused proofreading (~300,000K segments or ~15 proofreading years)
-
Validate synapse prediction. It is not possible to meaningfully improve overall synapse accuracy even by manually annotating millions of synapses (annotating only ~10% of 100 million connections could take ~50 proofreading years). The goal is to improve prediction by iteratively sampling the dataset. (~2 proofreading years)
-
Generate ground truth for small samples in each brain region.
-
Generate ground truth from small samples for neurons of different types
-
Retrain and predict and spot-check synapse prediction classifier.
-
-
Roughly group neurons together by similar connectivity and morphology (~1 proofreading year)
-
Classify and perform final verification of neurons (~2 expert years of validation)
The above recipe is what FlyEM plans to do for our next dataset. As with most plans, it will probably change as technology gets better or problems occur. The quality of the segmentation or the presence of complicated, novel motifs in the data (such as very different synapse morphologies in different regions) can present serious challenges to staff training and timeline projections. While the next big connectome might use new ingredients with unexpected challenges, FlyEM’s experiences can help form a recipe for success.