6

Ask HN: Best empirical papers on software development?

 1 year ago
source link: https://news.ycombinator.com/item?id=32789507
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Ask HN: Best empirical papers on software development?

Ask HN: Best empirical papers on software development?
81 points by KingOfCoders 8 hours ago | hide | past | favorite | 27 comments
There are some good empirical papers, but I only know very few. What is your best empirical paper on software development?
These days I've been really keen on qualitative studies, where scientists work on building a better understanding of software instead of trying to validate theories. Some examples of that:

https://link.springer.com/article/10.1007/s10664-016-9464-2 — scientists followed a team of developers for three years and recorded all of their sprint retrospectives and people kept forgetting what they already learned.

https://jlubin.net/assets/oopsla21.pdf — researchers compiled 23 hours of Zoom sessions and 15 hours of Twitch programming livestreams to study how people write code in static FP languages.

In addition to that, there's a lot of good work on how we teach programming. "Commonsense Computing" [1] found that students understand concurrency a lot faster when presented as a "human" problem, such as selling tickets in a concert. I'd recommend reading Teaching Tech Together (http://teachtogether.tech), which references a lot of empirical papers on teaching programming.

[1]: https://cseweb.ucsd.edu/classes/fa08/cse599/Papers/ICERConcu...

In terms of methodological quality, "Fixing Faults in C and Java Source Code: Abbreviated vs. Full-Word Identifier Names" [0] is a favorite of Hillel Wayne's [1].

> Manageable scope.

> They did their homework.

> They mix qualitative and quantitative methods.

> Objective measure of Defects.

> Really, _really_ good experimental setup.

> And then an ethnography.

[0] http://www2.unibas.it/gscanniello/Giuseppe_Scanniello%40unib...

[1] https://www.hillelwayne.com/post/the-best-se-paper/

Hard question, there's a lot; one favourite I came across years ago on one uni. course when I was finishing my SW Eng. MSc and while had I worked in Global SW Dev for years. The research was so well put together on a difficult to measure topic:

An empirical study of speed and communication in globally distributed software development

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.324...

You might like to review the book Evidence-based Software Engineering, which is freely available online: http://www.knosof.co.uk/ESEUR/
Here's a few I like:

Why Do Computers Stop and What Can Be Done About It? https://pages.cs.wisc.edu/~remzi/Classes/739/Fall2018/Papers...

Not a paper, but the whole book "Accelerate" presents many empirical findings related to automating software operations.

Hidden Technical Debt in Machine Learning Systems - NIPS https://papers.nips.cc/paper/2015/hash/86df7dcfd896fcaf2674f...

Mythical man-month - this is a short book written dozens of years ago but from my opinion it is timeless.
s.gif
Not sure why you were downvoted. Parts of the book are indeed timeless!

My favorite but is the idea that out of the jumble of documents produced, a few emerge as the pivot around which the majority of the team’s communication revolves.

I’ve seen very effective project leads use this to great effect. Rather than robotically using boilerplate document templates they discover the most precise format for the project at hand.

s.gif
On this topic, Peopleware as well. It references a lot of studies and is extremely well researched.
Are there studies comparing total development effort to build and maintain software using types vs. no types? You would need to use closely related languages like TypeScript vs. JavaScript, or something like Python with vs. without type hints.
s.gif
There's some, but it's not... great. The two highest profile studies I know of are:

"A Large-Scale Study of Programming Languages and Code Quality in Github" [1]: Finds significant differences in defect rate between typed and untyped languages. It stood for a while before replicators tore it apart. Among other things, they counted the v8 engine as a javascript project, counted seventeen forks of Bitcoin as distinct projects, and counted commits like "add infix operator" as a sign of a defect (because it has "fix" in it). I wrote a bunch about the study here: https://www.hillelwayne.com/post/this-is-how-science-happens...

"To Type Or Not to Type" [2]: Finds that Typescript catches 15% of bugs missed in Javascript repos. Haven't dug into this as much, but I'm suspicious of it because they didn't seem to do any quality control. One of the codebases they studied, for example, is a junior dev's minesweeper game.[3]

Right now there's a lot of case studies of companies happy with typescript/mypy, which I take as tentative evidence that type systems are on the whole beneficial, but I'm also open to the tide changing in 4-5 years.

[1] https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-...

[2] https://earlbarr.com/publications/typestudy.pdf

[3] https://github.com/anabarasan/Minesweeper

I have always appreciated https://www.cs.cmu.edu/~NatProg/papers/Ko2008JavaWhyline.pdf - an empirical analysis of debugging and a tool based on those findings, and an evaluation of that tool.
Technically not a paper, but Casey Muratori takes a very empirical approach to software development, which I find extremely practical:

- https://caseymuratori.com/blog_0015

- https://caseymuratori.com/contents

Can you list the ones you know/found?
s.gif
You found some in (the very old) "Facts and Fallacies of Software Engineering"
s.gif
2003 is “very old”? I would have expected that description about something like Structured Programming from 1972. Not 2003.
s.gif
"Very old" is subjective, some people think the Antikythera to be very old.

But when I remember developing software in my startup around 2000, the world of software development was completely different. So, hence, for me, very old.

s.gif
Applications are open for YC Winter 2023
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK