Skip to main content
Skip table of contents

Tools and environments for data processing and analysis

While several funding initiatives support limited data processing by the NF-OSI to transform data into new forms of value, the vast majority of value comes from data reanalyzed or reused in other ways by researchers in the community. To help with such efforts, this guide summarizes known tools/environments for creating new data and knowledge from data shared on the portal. These options represent different paths, with several important considerations and constraints to think about. For example, one especially helpful aspect is adoption & support – whether an option has official integration with Synapse, is known to be used successfully in practice by the community, or has been evaluated by our staff to provide useful resources for the community and assess potential for more official support.

This documentation will be updated to reflect new developments and experiences. If you would like to share your experience, suggest community additions and improvements, or have questions, feel free to reach out to nf-osi@sagebionetworks.org.

Comparison Table

Tool / Environment

Category

Adoption & Support

Data Access Method

Standout Features

Best Use Cases

Limitations / Considerations

Resources

cBioPortal

Visualization Platform

External integration

Web UI; linked from certain portal Datasets

Interactive genomics visualization, cohort exploration, mutation analysis

Quick exploratory analysis, clinical summaries, mutation screening

Can't customize analyses; only certain datasets available (from pre-processed data)

Demo video & tutorial

Cavatica (Seven Bridges)

Managed Platform

Synapse integration since 2024

DRS connection + UI

Collaborative workspaces, ergonomic pre-built R and Python environments

Collaborative genomics projects, custom analyses

Cloud costs; learning curve

Demo video & tutorial

Pluto.bio

Managed Platform

Synapse integration (coming end of 2025 - early 2026)

DRS connection + UI

Off-the-shelf standard pipelines (including nf-core), easy collaboration, guided analysis

Standard genomics analyses, pre-built workflows

Limited pipeline flexibility; cloud costs; newer platform

In development

Terra.bio

Managed Platform

Synapse integration (coming 2026)

DRS connection + UI

Collaborative workspaces, thousands of pipelines

Collaborative projects, custom analyses, specific grant / award has facilitated billing with Terra

Cloud costs; learning curve

Institutional HPC

Self-Managed Compute

Community validated

Command line/API client download

High computational power, flexible tool installation, familiar environment

Large-scale workflows, custom pipeline development

Institution-specific access; harder to collaborate and reproduce externally

Private Cloud (AWS/GCP/Azure)

Self-Managed Compute

Community validated

Cloud storage transfer

Scalable compute, collaborative sharing, flexible analysis environments

Team collaborations, scalable analyses, custom environments

Cloud costs; requires cloud expertise

Local Machine

Self-Managed Compute

Community validated

Command line/API client/web UI download

Complete control, offline analysis, familiar environment

Small datasets, method development, proof-of-concept work

Hardware limitations; slow for large datasets; hardest to share work

Demo video & tutorial

Code Ocean

Managed Platform

Staff evaluated

Command line/API client

Reproducible research environment, standard pipeline support (nf-core)

Reproducible analyses, standardized methods, includes publication workflows

Pay-per-compute model

Biomni (open-source version)

Agentic AI Platform, Self-Managed Compute

Staff evaluated

Command line/API client

TBD

TBD

API costs; more analysis than low-level raw data processing

In development

FutureHouse

Agentic AI Platform, Managed Platform

Staff evaluated

Command line/API client

TBD

TBD

API costs; more analysis than low-level raw data processing

In development

Jataware Biome

Agentic AI Platform, Managed Platform

Staff evaluated

Command line/API client

TBD

TBD

API costs; more analysis than low-level raw data processing

In development

Helpful Notes

DRS (Data Repository Service) Connections

DRS is a standardized API that enables secure, direct data access between repositories and analysis platforms. When a platform has "DRS connection," it eliminates the need to manually download and upload data - the analysis platform can directly access files from Synapse with proper permissions.

You authorize the connection once, then data appears automatically in your analysis workspace without manual file transfers. The advantages include:

  • Faster workflow setup (no waiting for downloads)

  • Reduced storage costs (data stays in Synapse until needed)

  • Better data provenance tracking

  • Automatic permission enforcement

Example workflow with DRS: Select your Synapse dataset → authorize platform access → data appears ready for analysis

Example workflow without DRS: Download data from Synapse → upload to analysis platform → begin analysis

Platforms like Cavatica use DRS to provide seamless integration, while self-managed compute environments require manual data transfer using command-line tools or APIs.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.