Break
This is a time for you to get some refreshment and clear your mind a little in preparation for the next part of the workshop.
Workflow checklist
Overview
Teaching: 5 min
Exercises: 20 min
How to use this checklist
- Copy the checklist to a new document.
- For each item, think about what it involves, and when you will do it.
- Consider how you might remind yourself to do it, or persuade yourself to make time for it.
- As you come up with your answers, write them in the space provided, or tick one of the existing suggestions.
- The suggestions are roughly in the order of recommended best practice.
- When you’ve answered all the points in a section, tick it off.
Checklist
Licensing [ ]
- I will find out who will be the rights holder for the software
- When I apply for funding
- At the beginning of the project
- When I put the code on a repository
- When I make the repository public
- When _____ gets back to me, and I’ll chase them if they haven’t by ______.
- ___________________________.
- I will decide on what license to apply to the software
- When I’ve found out who the rights holder is
- When I make the repository public
- ___________________________.
- I will review the license applied to the software
- When I add new dependencies
- ___________________________.
Documentation [ ]
- I will write instructions for using the code
- When I start to write code
- When I put my code on a repository
- ___________________________.
- I will document the code using
- Recognised standard tools (e.g. Doxygen)
- Separate documents
- Inline comments
- ___________________________.
- I will update the documentation
- As I make changes to the code
- At a set time each day/week/month/_____
- Before I publish a new version
- ___________________________.
- I will publish the documentation
- Automatically using continuous integration
- Whenever I publish a new version of the software
- ___________________________.
Code [ ]
- I will make the code readable
- As I write it
- Once things are working like they should
- Before I publish a new version
- ___________________________.
- I will organise my files
- Using a well-planned and clear directory structure
- ___________________________.
Dependencies [ ]
- I will make sure my dependencies are listed
- Using a full environment management system (e.g. Docker, Apptainer)
- Using automated dependency management tools (e.g. conda, renv)
- By automatically documenting them when the code runs
- By including them in the documentation
- ___________________________.
- I will review my dependency list
- Automatically using dependency management tools
- When I add or remove a dependency
- Before I publish a new version
- ___________________________.
Tests [ ]
- I will test my software
- Using a testing framework (e.g. Cypress, pytest, testthat)
- By having a set of things I do every time
- ___________________________.
- I will run my software tests
- Automatically using continuous integration
- Before I push to the repository
- Before I publish a new version
- ___________________________.
Repository [ ]
- I will upload my code to
- A dedicated code repository (e.g. GitHub)
- A well-resourced repository (e.g. OSF, FigShare, Zenodo, university repositories)
- A personal website
- ___________________________.
- I will make the code publicly accessible
- By enabling the approriate permissions in the repository
- By publishing specific versions
- ___________________________.
Publishing [ ]
- I will publish my code by
- Issuing releases on a code repository
- Ensuring the code repository is publicly accessible
- ___________________________.
- I will do this
- Whenever milestones are reached
- Whenever I publish a related output (e.g. research paper)
- Automatically whenever I push to the repository
- ___________________________.
Referencing [ ]
- I will make my code citable by
- Publishing specific releases with a DOI (e.g. via Zenodo)
- Writing a technical/methods paper and submitting it to a journal
- Citing specific releases directly
- Citing the repository with a time/version stamp
- Citing the repository only
- ___________________________.
- I will do this
- When I publish the code
- When I publish a related output
- ___________________________.
Collaboration [ ]
- I will make it
easy for others collaborate
on my code by (tick all that apply)
- Using collaboration-friendly services like GitHub
- Writing a welcome guide for new contributors
- Having easy templates for users to submit bug reports and feature requests
- Ensuring a welcoming attitude for contributors
- ___________________________.
- ___________________________.
- ___________________________.
- ___________________________.
Rights and Licensing
Overview
Teaching: 10 min
Exercises: 10 minQuestionsObjectives
What is licensing?
Do I need to license my work?
Who owns the rights to my work?
Understand what parts of your university’s rights and licensing policy apply to you.
Know where to get help.
What is licensing?
When software is created it is automatically copyrighted. The owner of the copyright (the ‘rights holder’) can do things with it that other people can’t. A license lets people do things that they couldn’t otherwise do.
Normally, we can’t legally copy someone else’s intellectual property, for example. We normally can’t sell it. And we definitely can’t sell it and claim that we made it ourselves. These are examples of rights that we would infringe upon without an appropriate license. Most software licenses you’re familiar with will let you copy software, at least once to put it on your computer. Other licenses will let you make more copies, and distribute them – some will even let you sell the software or versions of it that you extend. Certain very ‘broad’ licenses waive all rights (i.e. you still have those rights but you’ve agreed that nothing people do will infringe them).
Do I really need to license stuff?
Yes. If you don’t, no one can do anything with your code or software. And, if you have collaborators, no one includes you.
What kind of licenses are there?
The major kinds of licenses we’re concerned about are open source licenses. These licenses embrace four freedoms:
- The freedom to use
- The freedom to modify
- The freedom to distribute
- The freedom to distribute modifications
You can get a good idea of which open source license is right for you by looking at https://choosealicense.com/.
Of course, you can choose any of a large number of different licenses (or write your own), but established licenses give people confidence to run and build on your software. The more custom a license is, the more people will worry about unintended effects.
Who decides what license to use?
The license must be decided by the rights holder. If you’re writing software on your own at home, this is almost certainly you. There may, however, be other considerations. While it’s not a strict hierarchy, the rough order to check is:
- Your software might include components that place restrictions on modifications
- E.g. ‘Copyleft’ licenses mean you can’t apply a closed license to software that includes copylefted work
- Whoever funds your research may have rules about who owns what you produce and how it’s licensed
- Your employer (i.e. university) will probably be the rights holder for software you develop as part of your job
- Universities are usually happy to grant you the rights to software you’ve written provided you can show that it has no realistic commercial development opportunities
Finding out
10 min
Take ten minutes now to find out who is the rights holder for your current project. If you don’t have a current software/code-writing project, who would hold the rights if you did?
- If you have already written software, does it have dependencies that have copyleft licensing?
- If not, it’s possible you may be forbidden from building on these works.
- This is unlikely, but it can happen.
- Check your funding arrangement if you have one – it might say who is the rights holder for work you produce, or perhaps state which licenses you can use.
- Check your university policy.
- This may be different for students.
- and faculty .
- If none of these say anything relevant, then you are the rights holder and you can use any license you like. If you don’t care, use a public domain license like the unlicense.
Totally lost?
Take this time to send an email to your supervisor, librarian, administrator, or someone else who might be able to answer the question “who owns code/software that I write in my research?”
By now you should know who will be the rights holder for code and software you write. You should know how to ask them to let you choose an appropriate license for your code or software. If you run into difficulties, you should know where to get help.
New Lesson
Overview
Teaching: 10 min
Exercises: 20 min
The text of your lesson will go here. It can make use of markdown formatting, as well as the special callout zones in The Carpentries’ template.
Version control
Overview
Teaching: 10 min
Exercises: 20 min
We’ll start by exploring how version control can be used to keep track of what one person did and when. Even if you aren’t collaborating with other people, automated version control is much better than this situation:
“Piled Higher and Deeper” by Jorge Cham, http://www.phdcomics.com
We’ve all been in this situation before: it seems unnecessary to have multiple nearly-identical versions of the same document. Some word processors let us deal with this a little better, such as Microsoft Word’s Track Changes, Google Docs’ version history, or LibreOffice’s Recording and Displaying Changes.
Version control systems start with a base version of the document and then record changes you make each step of the way. You can think of it as a recording of your progress: you can rewind to start at the base document and play back each change you made, eventually arriving at your more recent version.
Once you think of changes as separate from the document itself, you can then think about “playing back” different sets of changes on the base document, ultimately resulting in different versions of that document. For example, two users can make independent sets of changes on the same document.
Unless multiple users make changes to the same section of the document - a conflict - you can incorporate two sets of changes into the same base document.
A version control system is a tool that keeps track of these changes for us, effectively creating different versions of our files. It allows us to decide which changes will be made to the next version (each record of these changes is called a commit, and keeps useful metadata about them. The complete history of commits for a particular project and their metadata make up a repository. Repositories can be kept in sync across different computers, facilitating collaboration among different people.
The Long History of Version Control Systems
Automated version control systems are nothing new. Tools like RCS, CVS, or Subversion have been around since the early 1980s and are used by many large companies. However, many of these are now considered legacy systems (i.e., outdated) due to various limitations in their capabilities. More modern systems, such as Git and Mercurial, are distributed, meaning that they do not need a centralized server to host the repository. These modern systems also include powerful merging tools that make it possible for multiple authors to work on the same files concurrently.
Paper Writing
Imagine you drafted an excellent paragraph for a paper you are writing, but later ruin it. How would you retrieve the excellent version of your conclusion? Is it even possible?
Imagine you have 5 co-authors. How would you manage the changes and comments they make to your paper? If you use LibreOffice Writer or Microsoft Word, what happens if you accept changes made using the
Track Changes
option? Do you have a history of those changes?Solution
Recovering the excellent version is only possible if you created a copy of the old version of the paper. The danger of losing good versions often leads to the problematic workflow illustrated in the PhD Comics cartoon at the top of this page.
Collaborative writing with traditional word processors is cumbersome. Either every collaborator has to work on a document sequentially (slowing down the process of writing), or you have to send out a version to all collaborators and manually merge their comments into your document. The ‘track changes’ or ‘record changes’ option can highlight changes for you and simplifies merging, but as soon as you accept changes you will lose their history. You will then no longer know who suggested that change, why it was suggested, or when it was merged into the rest of the document. Even online word processors like Google Docs or Microsoft Office Online do not fully resolve these problems.
[modified from https://github.com/swcarpentry/git-novice/blob/gh-pages/_episodes/01-basics.md]
Template
Why share?
Overview
Teaching: 0 min
Exercises: 30 min
Enumerating the benefits
5 min
In small groups, try to think of reasons people share code and software. Using the collaborative notes document, jot down any ideas you come up with, so that we have an overall list at the end.
Ideas
Some reasons that are commonly cited are:
- Shared software can be built on by others
- Shared code can be checked by others
- Shared software can be maintained by others
- Extensions to some software have to be shared due to copyleft licensing
- It’s possible to audit and verify shared software
- Sharing software helps you hold yourself to higher coding standards
Considering the concerns
5 min
In the same groups, try to think of reasons people might hesitate to share code or software. Again, use the collaborative notes document to jot down any ideas you come up with.
Ideas
Some reasons that are commonly cited are:
- If I share my code, people will find mistakes in it, and that would be embarrassing
- My software is badly built, or my code is messy, and it’s embarrassing
- It takes time to learn how to share and do the sharing – time I don’t have
- I don’t want to maintain my code or software, and I’d feel obliged to if I shared it
- My software won’t be as impressive if people can see how it works
- People could use adapt my software for purposes I don’t agree with
- I worked hard on this code and I don’t want other people to benefit from my work
Your experiences
10 min
Now we have listed some benefits and concerns of sharing, go around your small group and have each group member identify two items that they have experience with. You can choose a benefit and a concern, or two benefits or even two concerns. For each one, share your experience of it with your colleagues. If you have time, add a brief note about your experience to the collaborative notes document, but the focus should be on sharing with your colleagues and listening to what they have to say.
Addressing concerns
10 min
We have a list of concerns that people in your workshops might have. It’s helpful to be able to provide some perspective on those concerns – we want to acknowledge that sharing can be frightening or difficult, but highlight that it is worth doing anyway!
Again, in your group, choose two of the concerns. If they are concerns you have or have had yourself, that’s ideal, but they could also be ones that you just find interesting. Go around the group, discussing each person’s concerns. Try to think of reasons why the concern should not hold you back from sharing your code and software. If you can’t think of anything, try asking the larger group or one of the workshop instructors or helpers.
Remember, it’s not wrong to have concerns! We believe that sharing data and code is important and that the benefits outweigh the concerns, and we’d like to help you understand why and give you any encouragement you need.
Key Points
People share code and software for many reasons
Shared software can be extended
Shared software can be evaluated
Bugs can be found and fixed