Break

This is a time for you to get some refreshment and clear your mind a little in preparation for the next part of the workshop.


Workflow checklist

Overview

Teaching: 5 min
Exercises: 20 min

How to use this checklist

Checklist

Licensing [ ]

Documentation [ ]

Code [ ]

Dependencies [ ]

Tests [ ]

Repository [ ]

Publishing [ ]

Referencing [ ]

Collaboration [ ]


Rights and Licensing

Overview

Teaching: 10 min
Exercises: 10 min
Questions
  • What is licensing?

  • Do I need to license my work?

  • Who owns the rights to my work?

Objectives
  • Understand what parts of your university’s rights and licensing policy apply to you.

  • Know where to get help.

What is licensing?

When software is created it is automatically copyrighted. The owner of the copyright (the ‘rights holder’) can do things with it that other people can’t. A license lets people do things that they couldn’t otherwise do.

Normally, we can’t legally copy someone else’s intellectual property, for example. We normally can’t sell it. And we definitely can’t sell it and claim that we made it ourselves. These are examples of rights that we would infringe upon without an appropriate license. Most software licenses you’re familiar with will let you copy software, at least once to put it on your computer. Other licenses will let you make more copies, and distribute them – some will even let you sell the software or versions of it that you extend. Certain very ‘broad’ licenses waive all rights (i.e. you still have those rights but you’ve agreed that nothing people do will infringe them).

Do I really need to license stuff?

Yes. If you don’t, no one can do anything with your code or software. And, if you have collaborators, no one includes you.

What kind of licenses are there?

The major kinds of licenses we’re concerned about are open source licenses. These licenses embrace four freedoms:

  1. The freedom to use
  2. The freedom to modify
  3. The freedom to distribute
  4. The freedom to distribute modifications

You can get a good idea of which open source license is right for you by looking at https://choosealicense.com/.

Of course, you can choose any of a large number of different licenses (or write your own), but established licenses give people confidence to run and build on your software. The more custom a license is, the more people will worry about unintended effects.

Who decides what license to use?

The license must be decided by the rights holder. If you’re writing software on your own at home, this is almost certainly you. There may, however, be other considerations. While it’s not a strict hierarchy, the rough order to check is:

Finding out 10 min

Take ten minutes now to find out who is the rights holder for your current project. If you don’t have a current software/code-writing project, who would hold the rights if you did?

  1. If you have already written software, does it have dependencies that have copyleft licensing?
    • If not, it’s possible you may be forbidden from building on these works.
    • This is unlikely, but it can happen.
  2. Check your funding arrangement if you have one – it might say who is the rights holder for work you produce, or perhaps state which licenses you can use.
  3. Check your university policy.
    • This may be different for students.
    • and faculty .
  4. If none of these say anything relevant, then you are the rights holder and you can use any license you like. If you don’t care, use a public domain license like the unlicense.

Totally lost?

Take this time to send an email to your supervisor, librarian, administrator, or someone else who might be able to answer the question “who owns code/software that I write in my research?”

By now you should know who will be the rights holder for code and software you write. You should know how to ask them to let you choose an appropriate license for your code or software. If you run into difficulties, you should know where to get help.


New Lesson

Overview

Teaching: 10 min
Exercises: 20 min

The text of your lesson will go here. It can make use of markdown formatting, as well as the special callout zones in The Carpentries’ template.


Version control

Overview

Teaching: 10 min
Exercises: 20 min

We’ll start by exploring how version control can be used to keep track of what one person did and when. Even if you aren’t collaborating with other people, automated version control is much better than this situation:

"Piled Higher and Deeper" by Jorge Cham, http://www.phdcomics.com

“Piled Higher and Deeper” by Jorge Cham, http://www.phdcomics.com

We’ve all been in this situation before: it seems unnecessary to have multiple nearly-identical versions of the same document. Some word processors let us deal with this a little better, such as Microsoft Word’s Track Changes, Google Docs’ version history, or LibreOffice’s Recording and Displaying Changes.

Version control systems start with a base version of the document and then record changes you make each step of the way. You can think of it as a recording of your progress: you can rewind to start at the base document and play back each change you made, eventually arriving at your more recent version.

Changes Are Saved Sequentially

Once you think of changes as separate from the document itself, you can then think about “playing back” different sets of changes on the base document, ultimately resulting in different versions of that document. For example, two users can make independent sets of changes on the same document.

Different Versions Can be Saved

Unless multiple users make changes to the same section of the document - a conflict - you can incorporate two sets of changes into the same base document.

Multiple Versions Can be Merged

A version control system is a tool that keeps track of these changes for us, effectively creating different versions of our files. It allows us to decide which changes will be made to the next version (each record of these changes is called a commit, and keeps useful metadata about them. The complete history of commits for a particular project and their metadata make up a repository. Repositories can be kept in sync across different computers, facilitating collaboration among different people.

The Long History of Version Control Systems

Automated version control systems are nothing new. Tools like RCS, CVS, or Subversion have been around since the early 1980s and are used by many large companies. However, many of these are now considered legacy systems (i.e., outdated) due to various limitations in their capabilities. More modern systems, such as Git and Mercurial, are distributed, meaning that they do not need a centralized server to host the repository. These modern systems also include powerful merging tools that make it possible for multiple authors to work on the same files concurrently.

Paper Writing

  • Imagine you drafted an excellent paragraph for a paper you are writing, but later ruin it. How would you retrieve the excellent version of your conclusion? Is it even possible?

  • Imagine you have 5 co-authors. How would you manage the changes and comments they make to your paper? If you use LibreOffice Writer or Microsoft Word, what happens if you accept changes made using the Track Changes option? Do you have a history of those changes?

Solution

  • Recovering the excellent version is only possible if you created a copy of the old version of the paper. The danger of losing good versions often leads to the problematic workflow illustrated in the PhD Comics cartoon at the top of this page.

  • Collaborative writing with traditional word processors is cumbersome. Either every collaborator has to work on a document sequentially (slowing down the process of writing), or you have to send out a version to all collaborators and manually merge their comments into your document. The ‘track changes’ or ‘record changes’ option can highlight changes for you and simplifies merging, but as soon as you accept changes you will lose their history. You will then no longer know who suggested that change, why it was suggested, or when it was merged into the rest of the document. Even online word processors like Google Docs or Microsoft Office Online do not fully resolve these problems.

[modified from https://github.com/swcarpentry/git-novice/blob/gh-pages/_episodes/01-basics.md]


Template


Why share?

Overview

Teaching: 0 min
Exercises: 30 min

Enumerating the benefits 5 min

In small groups, try to think of reasons people share code and software. Using the collaborative notes document, jot down any ideas you come up with, so that we have an overall list at the end.

Ideas

Some reasons that are commonly cited are:

  • Shared software can be built on by others
  • Shared code can be checked by others
  • Shared software can be maintained by others
  • Extensions to some software have to be shared due to copyleft licensing
  • It’s possible to audit and verify shared software
  • Sharing software helps you hold yourself to higher coding standards

Considering the concerns 5 min

In the same groups, try to think of reasons people might hesitate to share code or software. Again, use the collaborative notes document to jot down any ideas you come up with.

Ideas

Some reasons that are commonly cited are:

  • If I share my code, people will find mistakes in it, and that would be embarrassing
  • My software is badly built, or my code is messy, and it’s embarrassing
  • It takes time to learn how to share and do the sharing – time I don’t have
  • I don’t want to maintain my code or software, and I’d feel obliged to if I shared it
  • My software won’t be as impressive if people can see how it works
  • People could use adapt my software for purposes I don’t agree with
  • I worked hard on this code and I don’t want other people to benefit from my work

Your experiences 10 min

Now we have listed some benefits and concerns of sharing, go around your small group and have each group member identify two items that they have experience with. You can choose a benefit and a concern, or two benefits or even two concerns. For each one, share your experience of it with your colleagues. If you have time, add a brief note about your experience to the collaborative notes document, but the focus should be on sharing with your colleagues and listening to what they have to say.

Addressing concerns 10 min

We have a list of concerns that people in your workshops might have. It’s helpful to be able to provide some perspective on those concerns – we want to acknowledge that sharing can be frightening or difficult, but highlight that it is worth doing anyway!

Again, in your group, choose two of the concerns. If they are concerns you have or have had yourself, that’s ideal, but they could also be ones that you just find interesting. Go around the group, discussing each person’s concerns. Try to think of reasons why the concern should not hold you back from sharing your code and software. If you can’t think of anything, try asking the larger group or one of the workshop instructors or helpers.

Remember, it’s not wrong to have concerns! We believe that sharing data and code is important and that the benefits outweigh the concerns, and we’d like to help you understand why and give you any encouragement you need.

Key Points

  • People share code and software for many reasons

  • Shared software can be extended

  • Shared software can be evaluated

  • Bugs can be found and fixed