Metadata Wrangling

Metadata is standardized information provided along with data that helps with data organization and context. It is data about the data. Metadata comes from the original contributors who share their data and is curated by the NF-OSI to ensure the metadata properly allows for data usability.

To effectively reuse data, researchers benefit from understanding what metadata is available and how it is packaged. Researchers who already contribute to the portal will be more familiar with what is available. In most cases, the required metadata will be present as:

A manifest: A summary .csv file listing each file that has been uploaded to a dataset location, with properties characterizing the file and the data in the file. Example properties that provide context are assay, species, and linked individual and sample ids. The properties present depend on the data content. This metadata is also available on each file as an “annotation”, which can be seen in the Synapse web UI. The manifest compiles the metadata into a single, easily downloadable .csv for analysis.
annotations: Metadata is also available as annotations is to find and filter for data directly in the Synapse web UI. They should match what is pre-compiled as the manifest.csv.

Use the NF metadata dictionary

The metadata dictionary is a helpful reference to consult when searching for data and interpreting metadata during data reuse:

You need to know what properties exist and what metadata terms mean so you can properly search for and find the data you need.
You want to understand the difference between two tumor terms when doing analysis.

Frequently Asked Questions

For certain data and analysis, I want more of the extensive clinical data (i.e. individual-level patient demographics and phenotypes) connected to the assay data than the metadata. Where can this be found?

If available, this clinical data is present as a .csv file or as a Synapse table in the project. However, these are usually harder for contributors to share in full detail and usually have more restrictions.