BLAM Data Management Standard Operating Procedure

Following these principles ensures that our data are organized, permanent, and reproducible. Please follow these data management principles for all data collected in the lab, as well as for analysis of that data.

  • Data Management

    • We maintain a Lab server, blamhub, at http://10.17.101.32:5000/, for storing all data generated by the lab, as well as backing up analysis code. You can access blamhub from anywhere within the hopkins network and from outside the network via VPN.

    • All raw data should be copied to blamhub immediately after it is collected. If you do not have access to blamhub, please talk to Alex about creating an account. In addition to keeping backups of data, also include a backup of the source code for the experiment software used to collect the data.

    • You are encouraged to also keep a backup of analysis code and intermediate steps in your data analysis (e.g. processed data containing reach direction, reaction time, etc.), either on blamhub, or github or both. See below for details on using github.

  • Publication Archiving

    • For all published/submitted work, data and accompanying analysis code should be archived on blamhub in a dedicated folder within the ‘Publications’ folder on blamhub. Feel free to organize the data in whichever way feel is suitable, however, please ensure that the folder includes:

    • A copy of the paper

    • An obvious first port of call (‘readme.txt’ or ‘runme.m’) that explains what code does what.

    • Analysis code includes the full pipeline from raw data to all plots, tables and statistics appearing in the manuscript. Ideally, in separate m-files (‘makeFig1.m’, ‘Expt1_stats.m’).

    • A document detailing (de-identified) participant information (participant ID, dates run, age, gender, handedness, etc.). For patients, this should also include any clinical evaluations conducted as part of the experiment (e.g. ICARS, MOCA).

    • Data and code associated with a publication can also be stored in the BLAM Dataverse (described below). However, they will only store 1 TB of data/code for free per collection, so if you have a large dataset, you can store the full raw data in blamhub and compactified raw data in our Dataverse.

    • Thank you for your cooperation to help keep our science reproducible and permanent!

  • Code GitHub

    • Github is a version control tool for software development and sharing that allows you to commit snapshots of your code and publish to github.com for easy sharing. (technically, git is the local software on your computer that performs version control and github is the site where repositories can be backed up and shared).

    • Github is most useful for core code that might be used repeatedly in the future, for archiving finalized versions of code (e.g. analysis scripts for a finished paper), or for collaborative development.

    • Alex wrote a terrific introductory guide to using github: https://github.com/BLAM-Lab-Projects/github-notes/wiki

    • It’s best to set up your own personal github account. You can then be added to the BLAM-Lab-Projects organization. [how to do this?]

  • Archiving data and code in the BLAM Dataverse

    • To encourage open science, Johns Hopkins maintains a data archive that allows data and code affiliated with the university to be stored and made publicly viewable online. The benefit of storing your data/code in this way is that all collections of data and code posted to the archive are citable and will have their own DOIs associated with them.

    • The BLAM Lab has a “Dataverse” in the data archive that will house “collections” of our data and code. You can see our Dataverse here: https://archive.data.jhu.edu/dataverse/blam

    • Each collection in our Dataverse should only be used to store the data and final versions of the code that will reproduce analyses associated with an accepted paper. It should not be used to store code/data while you are still working on the paper. This is because once you publish a collection, you will not be able to change it.

    • If you are writing a paper and are planning to upload your code/data to the Dataverse, here are the basic steps of what you should do:

      1. Hold off on starting a new collection until you are about to submit your paper. In the meantime, clean up your code and comment it well for readability. The code you plan to upload should only be analysis code, not code used to run the experiment.

        • Hopkins will archive any collection of data and code under 1 TB for free, but if your collection is over 1 TB, then they will charge a fee. If you’re in the latter scenario, consider compactifying your files in a way that will keep the necessary raw data but will be under the size threshold.

      2. Once your analyses are finalized but before submitting the paper, email dataservices@jhu.edu saying you would like to add a new collection to the BLAM Lab Dataverse. They will have you fill out some forms to describe your dataset. They will also give you some recommendations of changes they would like you to make to the collection. Here are some things that you will probably have to do:

        • Include README files that clearly describe the organization of the collection, how to run analyses, and other pertinent information

        • List any dependencies your code uses (MATLAB toolboxes, R packages, etc.)

        • Include any licenses for code created by other people (e.g., files from MATLAB file exchange)

      3. Once the collection has been finalized, Hopkins will send you a link via which you can privately review your collection in their archive. At this stage, the collection is not public so you can still make changes to it if you wish. Ask that they wait to publish the dataset until your manuscript has been accepted. This is so that if reviewers for the manuscript ask you to change your analyses, you can still do so.

      4. You will also receive a DOI for the collection, which you should include in your manuscript in the Data and Code Availability section. Note that the DOI will not be active until you ask Hopkins to publish the dataset. If you would like to provide the manuscript reviewers access to your private collection, Hopkins will also send you a link to provide them (different from the link they provide for your personal viewing, which you need a JHU account for).

      5. Submit your paper. If you need to make any changes to your analyses during this process, do so.

      6. After your paper is accepted, you can ask Hopkins to upload revised versions of your analyses to the data collection. Once you’re happy with the collection, ask them to publish it.