May 18, 2023
R is maintained by the R Core Team
Members of the R Community can contribute in various ways:
In this demo we’ll focus on bug fixing!
For code bugs, it helps to have a minimal reproducible example that demonstrates the bug, using only core R packages.
Check the issue really is a bug in base R
https://bugs.r-project.org/show_bug.cgi?id=15971
na1.csv
a, b, c
1, "b", 1
2, "", 2
, "b", 3
4, , 4
5, "NA", 5
na2.csv
b, c
"b", 1
"", 2
"b", 3
, 4
"NA", 5`
Error in Ops.factor(df1$b, df2$b) : level sets of factors are different
For confimed bugs, we need to check if they are still an issue in the current R development version.
Once the bug is confirmed in the development version of R, the bug should be analysed
Once the bug is fully understood, there should be a discussion about how to fix the code/documentation.
If a member of R Core agrees how to fix a bug, but does not commit to fixing it themselves, you may propose a fix
You do not need account to browse bugs on Bugzilla.
Good: a bug report where the next step is clear.
Even better: an R Core member supports the next step in a comment.
When reviewing bug reports, it is helpful to know who is on R Core
Main R Core members active on Bugzilla:
We will use Vevox to find out what you think about example bug reports.
Open vevox.app in a browser.
Enter the code: 116-836-295
(You don’t need to type the dashes).
https://bugs.r-project.org/show_bug.cgi?id=18199
Summary: “zapsmall is wrong when vector has Inf”
Bug report:
If a vector contains Inf, all the values but infinite become zero.
zapsmall(c(0.1, 0.01)) # correct [1] 0.10 0.01 zapsmall(c(0.1, 0.01, Inf)) # incorrect [1] 0 0 Inf
The report has been open for 13.5 hours without comment.
https://bugs.r-project.org/show_bug.cgi?id=17616
Summary: Anomaly with contrast functions
Report:
If you supply a contrast function to a factor, results depend on whether you pass the name or the actual function. This applies to C(), contrasts()<-, as well as lm(…., contrasts=list()).
Call:
lm(formula = uptake ~ C(Treatment, "contr.treatment"), data = CO2)
Coefficients:
(Intercept)
30.64
C(Treatment, "contr.treatment")chilled
-6.86
A contributor has commented to confirm the bug.
The last comment was a month ago.
https://bugs.r-project.org/show_bug.cgi?id=16305
Summary: Seeking more consistent package descriptions and citations
Report:
The following command provides a convenient way of citing R packages
[1] "Warnes GR, Bolker B, Gorjanc G, Grothendieck G, Korosec A, Lumley T, MacQueen D, Magnusson A, Rogers J and others (2014).\\emph{gdata: Various R programming tools for data manipulation}.R package version 2.13.3, \\url{http://CRAN.R-project.org/package=gdata}."
However, some packages cannot be formatted this way [2 examples]
Would it be possible to improve consistency in package descriptions and citations?
Bug has been open for 7 years 9 months
https://bugs.r-project.org/show_bug.cgi?id=18362
Summary: head(letters, 1:2)
should give better error message
Report from R Core member:
The error messages from checkHT() are really really not nice because they mention checkHT() instead of its caller’s call, e.g.,
head(letters, 1:2) Error in checkHT(n, dx <- dim(x)) : invalid 'n' - must have length one when dim(x) is NULL, got 2
errorCondition()
with its own class and points to some existing examples for reference.A contributor volunteered to work on this, but it is nearly 1 year since they volunteered
Get help finding a good first issue:
Use the R Contributors Slack #workout-out-loud channel or Office Hours to get help/feedback before posting on Bugzilla.
If we want to post anything on Bugzilla (make a bug report or comment on one), we need to get an account.
https://bugs.r-project.org/show_bug.cgi?id=17863
Good
Call:
factanal(x = mtcars[, 1:4], factors = 1)
Uniquenesses:
mpg cyl disp hp
0.199 0.078 0.120 0.261
Loadings:
Factor1
mpg -0.895
cyl 0.960
disp 0.938
hp 0.859
Factor1
SS loadings 3.342
Proportion Var 0.835
Test of the hypothesis that 1 factor is sufficient.
The chi square statistic is 0.5 on 2 degrees of freedom.
The p-value is 0.777
Bad
Call:
factanal(x = mtcars[, 1:4], factors = 1)
Uniquenesses:
mpg cyl disp hp
0.199 0.078 0.120 0.261
Loadings:
[1] -0.895 0.960 0.938 0.859
Factor1
SS loadings 3.342
Proportion Var 0.835
Test of the hypothesis that 1 factor is sufficient.
The chi square statistic is 0.5 on 2 degrees of freedom.
The p-value is 0.777
A single object matching 'print.factanal' was found
It was found in the following places
registered S3 method for print from namespace stats
namespace:stats
with value
function (x, digits = 3, ...)
{
cat("\nCall:\n", deparse(x$call), "\n\n", sep = "")
cat("Uniquenesses:\n")
print(round(x$uniquenesses, digits), ...)
print(x$loadings, digits = digits, ...)
if (!is.null(x$rotmat)) {
tmat <- solve(x$rotmat)
R <- tmat %*% t(tmat)
factors <- x$factors
rownames(R) <- colnames(R) <- paste0("Factor", 1:factors)
if (TRUE != all.equal(c(R), c(diag(factors)))) {
cat("\nFactor Correlations:\n")
print(R, digits = digits, ...)
}
}
if (!is.null(x$STATISTIC)) {
factors <- x$factors
cat("\nTest of the hypothesis that", factors, if (factors ==
1)
"factor is"
else "factors are", "sufficient.\n")
cat("The chi square statistic is", round(x$STATISTIC,
2), "on", x$dof, if (x$dof == 1)
"degree"
else "degrees", "of freedom.\nThe p-value is", signif(x$PVAL,
3), "\n")
}
else {
cat(paste("\nThe degrees of freedom for the model is",
x$dof, "and the fit was", round(x$criteria["objective"],
4), "\n"))
}
invisible(x)
}
<bytecode: 0x10c0156d0>
<environment: namespace:stats>
A single object matching 'print.loadings' was found
It was found in the following places
registered S3 method for print from namespace stats
namespace:stats
with value
function (x, digits = 3L, cutoff = 0.1, sort = FALSE, ...)
{
Lambda <- unclass(x)
p <- nrow(Lambda)
factors <- ncol(Lambda)
if (sort) {
mx <- max.col(abs(Lambda))
ind <- cbind(1L:p, mx)
mx[abs(Lambda[ind]) < 0.5] <- factors + 1
Lambda <- Lambda[order(mx, 1L:p), ]
}
cat("\nLoadings:\n")
fx <- setNames(format(round(Lambda, digits)), NULL)
nc <- nchar(fx[1L], type = "c")
fx[abs(Lambda) < cutoff] <- strrep(" ", nc)
print(fx, quote = FALSE, ...)
vx <- colSums(x^2)
varex <- rbind(`SS loadings` = vx)
if (is.null(attr(x, "covariance"))) {
varex <- rbind(varex, `Proportion Var` = vx/p)
if (factors > 1)
varex <- rbind(varex, `Cumulative Var` = cumsum(vx/p))
}
cat("\n")
print(round(varex, digits))
invisible(x)
}
<bytecode: 0x10c031968>
<environment: namespace:stats>
Call:
factanal(x = mtcars[, 1:4], factors = 1)
Uniquenesses:
mpg cyl disp hp
0.199 0.078 0.120 0.261
debugging in: print.loadings(x$loadings, digits = digits, ...)
debug: {
Lambda <- unclass(x)
p <- nrow(Lambda)
factors <- ncol(Lambda)
if (sort) {
mx <- max.col(abs(Lambda))
ind <- cbind(1L:p, mx)
mx[abs(Lambda[ind]) < 0.5] <- factors + 1
Lambda <- Lambda[order(mx, 1L:p), ]
}
cat("\nLoadings:\n")
fx <- setNames(format(round(Lambda, digits)), NULL)
nc <- nchar(fx[1L], type = "c")
fx[abs(Lambda) < cutoff] <- strrep(" ", nc)
print(fx, quote = FALSE, ...)
vx <- colSums(x^2)
varex <- rbind(`SS loadings` = vx)
if (is.null(attr(x, "covariance"))) {
varex <- rbind(varex, `Proportion Var` = vx/p)
if (factors > 1)
varex <- rbind(varex, `Cumulative Var` = cumsum(vx/p))
}
cat("\n")
print(round(varex, digits))
invisible(x)
}
Browse[2]>
Browse[2]>
debug: Lambda <- unclass(x)
Browse[2]>
debug: p <- nrow(Lambda)
Browse[2]>
debug: factors <- ncol(Lambda)
Browse[2]>
debug: if (sort) {
mx <- max.col(abs(Lambda))
ind <- cbind(1L:p, mx)
mx[abs(Lambda[ind]) < 0.5] <- factors + 1
Lambda <- Lambda[order(mx, 1L:p), ]
}
Browse[2]>
debug: mx <- max.col(abs(Lambda))
Browse[2]>
debug: ind <- cbind(1L:p, mx)
Browse[2]>
debug: mx[abs(Lambda[ind]) < 0.5] <- factors + 1
Browse[2]>
debug: Lambda <- Lambda[order(mx, 1L:p), ]
Browse[2]> Lambda[order(mx, 1L:p), ]
mpg cyl disp hp
-0.8947285 0.9603623 0.9381177 0.8594404
Browse[2]> Lambda
Factor1
mpg -0.8947285
cyl 0.9603623
disp 0.9381177
hp 0.8594404
Browse[2]> Lambda[order(mx, 1L:p), , drop = FALSE]
Factor1
mpg -0.8947285
cyl 0.9603623
disp 0.9381177
hp 0.8594404
Browse[2]>
print.loadings <- function (x, digits = 3L, cutoff = 0.1, sort = FALSE, ...)
{
Lambda <- unclass(x)
p <- nrow(Lambda)
factors <- ncol(Lambda)
if (sort) {
mx <- max.col(abs(Lambda))
ind <- cbind(1L:p, mx)
mx[abs(Lambda[ind]) < 0.5] <- factors + 1
Lambda <- Lambda[order(mx, 1L:p), , drop = FALSE]
}
cat("\nLoadings:\n")
fx <- setNames(format(round(Lambda, digits)), NULL)
nc <- nchar(fx[1L], type = "c")
fx[abs(Lambda) < cutoff] <- strrep(" ", nc)
print(fx, quote = FALSE, ...)
vx <- colSums(x^2)
varex <- rbind(`SS loadings` = vx)
if (is.null(attr(x, "covariance"))) {
varex <- rbind(varex, `Proportion Var` = vx/p)
if (factors > 1)
varex <- rbind(varex, `Cumulative Var` = cumsum(vx/p))
}
cat("\n")
print(round(varex, digits))
invisible(x)
}
https://bugs.r-project.org/show_bug.cgi?id=17699
Summary: trivial error in persp example
Reprex:
# (1) The Obligatory Mathematical surface.
# Rotated sinc function.
x <- seq(-10, 10, length = 30)
y <- x
f <- function(x, y) { r <- sqrt(x^2+y^2); 10 * sin(r)/r }
z <- outer(x, y, f)
z[is.na(z)] <- 1
Reported Problems:
is.na(z)
implies that there are NA
s - but there aren’t!Suggested Solutions:
NaN
and handle NAs
within the function f
.10 *
(Prior to R 4.2.2) Yes, ?graphics::persp
has the code as shown in the bug report
NAs
)NA
s in z
.x
and y
to 31, z
does have NA
z
equal to NA
?Check the definition of the sinc function e.g. on Wikipedia \[ \text{sinc } x = \frac{\sin x}{x} \] The value at \(x = 0\) is defined to be the limiting value \[ \text{sinc } 0 := \lim_{x \rightarrow 0} \frac{\sin x}{x} = 1\]
(The full definition of f
is the “rotated sinc function” which computes the sinc function for the radius of a circle centred at co-ordinates 0,0)
Continuing through example, a second persp plot is created from the data with axis ticks, lines and points.
NAs
)Option 1: Change the length of x
and y
to 31 and handle NA
s within the function f
.
x
and y
Option 2 (new idea): Keep the length at 30 and don’t handle NA
s at all
NA
s in this caseOption 1: Set z
to 10 if there are any NA
s
Option 2: Remove the scaling by 10
Contributor proposal (via patch) | R Core Reviewer |
---|---|
Remove handling of NAs for being unnecessary | Agreed with this, after considering both options |
Remove the scale of function f as not needed by definition | Agreed scaling is not necessary, but also did not see a need to remove it |
Changed the axes and label font size to 0.62 and 0.8 respectively to make the second graph clear | Thought this a matter of taste, but did make a simpler change along these lines |
Modify the z-axis values of trans3d in both points and lines to account for the removal of the scale in | Thought it simpler to keep the scale |
Tip
Make minimal change to fix the issue.
For bug 17863 (print.loadings
bug) it was enough to propose the fix in a comment
Alternatively, create a patch using the r-svn mirror of the R sources: https://github.com/r-devel/r-svn
This will create a fork of the r-svn repo on your GitHub account.
Committing changes will create a branch on your fork
Add .diff
to the URL for your PR, e.g. https://github.com/r-devel/r-svn/pull/124.diff
Right-click to save .diff
file.
This patch can be attached to the Bugzilla report, with a comment.
An advantage of creating the patch on GitHub is that you can ask another contributor to review the changes before posting on Bugzilla.
See https://github.com/r-devel/r-svn/pull/103 for an example discussing the change to the persp
documentation (Bug 17699).
This demo has focused on good first issues
There is still lot of scope for new contributors!
R Contributor Office Hours: https://contributor.r-project.org/events/office-hours/
R Contributor Slack: https://contributor.r-project.org/slack
How to get a Bugzilla account: https://contributor.r-project.org/rdevguide/BugTrack.html#RCorePkgBug
R Development Guide: https://contributor.r-project.org/rdevguide/
Slides for this demo: https://hturner.github.io/contributing-demo