So many replies, this is fantastic! (Excuse-me for the lag, I was stuck in bed with fever.)
(TL;DR: jump to final section, with apologies! I hope the length doesn't bore you into skipping).
My initial post did overlook GitHub papers, blogs and IPython* a little. Great stuff is happening with these tools, my personal favourite being Jake Vanderplas's aggressive integration of the notebook in his blog posts (also hosted on GitHub pages). @ctb I didn't know about Haldane's Sieve (not my area of study), but it looks really promising! (Took me some time to figure you're one of the blogs I read regularly -- honoured to meet you here, and everybody else!).
GitHub + .py figures or IPython papers + forks and pull requests for peer-review/comments/corrections
it should be easy for scientists to achieve the things you're talking about - the technology needs to get out of the way
Very true, missed that.
let's start from an ipython notebook in a GitHub repo, with enough metadata to point at the actual data
I think that idea (be it an IPython notebook or the .py figures, and the .md text, all based on GitHub's ease-of-use) is a great proof of concept and something to really build on top of to move forward. The fact that it's already used (added to blogs), even if not widespread, also shows to everybody that this kind of model really works for real discussion/publication.
I would try not to tie a proposal to GitHub though, or git or python for that matter, and that's why I see it as a proof of concept. To me the problem isn't so much one of tools, or languages or workflows -- people have those, and we all prefer the ones we already know -- than one of markup/parsing, linking, and usage of whatever information that provides. In a way, all this we're thinking about would already be possible if
- we could parse all pdfs to reliably extract what information is needed (this is doable, but not completely automatic to my knowledge), and
If we could do that (= turning a pdf into a webpage), then it would be mostly coding from there on. A lot of coding, but only coding, and most notably no need to amend people's practices.
Finally if someone wants to build a service/startup doing stuff around all this, they'll be a lot free-er if they're not tied to GitHub's API, or other choices we would make now (they could choose to use GitHub's API for stuff published through GitHub though).
So go GitHub/git/python! But even better if we're tool- and content-agnostic.
Stricter peer-review, code-review
the promises of future benefit seem more nebulous than the very concrete technical hurdle of learning this new workflow
Making the peer review process more strict [...]
Personally I'd love that, but I don't see anybody moving to it on their own. Using GitHub is daunting as you say (I'm trying to work on GitHub with psychologists right now: some things are obvious to them, others are really backwards -- like branches in pull requests, or syncing forks).
Next, code review and code quality: adding to @codersquid's remarks, writing good code also takes so much time out of science. In my opinion, this is because scientific code is often by nature the result of a long feature creep corresponding to the underlying scientific process, so good code needs constant refactoring (except for well-defined tools). Code quality/review could maybe emerge as a requirement once quick reproducibility is an incentive in the reputation metrics (great strides being made with Code as Research Object!). Maybe the future is a market where some businesses provide affordable free/open-source feature-creep- and research-process-compatible code to researchers with non-expert coding skills (or time).
Go for standards + implementations built on existing GitHub* practices
Why this: no group can invent the perfect solution, but what we can do is build the greatest common divisor to everything we'd like to see. Then build what we'd like to see on top of that. Plus, let more creativity build on top of that GCD with the freedom it provides. So it won't be solved in 2 years, but once it is solved I hope the basic blocks will last as long as the Web lasts. Finally, building on non-opinionated standards also leaves room for all the edge cases @BillMills mentionned (non-interpreted languages, big data and computations, etc.) to be solved.
The GCD I see is:
- Some way to declare a (web) document (webpage) as a publication, i.e. a way to automatically attach a DOI to it and include that in the markup of the page (similar to what Code as a Research Object does for code)
- And some way to declare more metadata: the DVCS used if any, the URLs to repositories, the authors (already exists for blogs, maybe it's included in the requirements for a DOI? -- I don't know)
And that's enough to declare any HTML document on the web as being a research object. That includes publications, reviews, comments, maybe even diffs on publications, figures, code for figures. So any web document with this information says to the world (by that I mean all browsers):
Hey! I'm part of scientific discussions/publications! Other documents link to me, I link to other documents, and look how this cool XYZ tool makes it so easy to view all the conversation and participate!
Next, make it easy-peasy to use with GitHub practices:
- Build plugins for Jekyll and/or Pelican that automatically add that markup to e.g. a researcher's blog post (works with existing posts!). More precisely, when publishing a new post with Jekyll or Pelican:
- If asked to, the plugin automatically creates a DOI for the post, adds author metadata and anything else we decide can be added to the Front matter (i.e. DVCS, URLs to repos, ...).
- Get all DOIs that link to this document (or even the whole conversation tree down to a maximum depth), and modify the webpage to show the discussion/reviews/comments/published modifications of graphs/etc.. Same for the DOIs this page links to.
- Allow instant forking of the code of the figures (e.g. in the background, create a repo on GitHub+fork+open inline editor in the webpage if the code is lightweight and the browser can directly compile/run it with e.g. repl.it. If the DVCS is not git, you'll have to use a compatible DVCS service, or use a Gist service like GitHub's.). Same for typo corrections.
- Allow creating a new comment, linking to the webpage or any part of it (e.g. any paragraph, -- but I can't imagine how to standardize a weblink linking to part of another webpage -- and how can this map back up to source code if the source for the publication is available?)
- Allow navigating previous versions of this document (found e.g. because later versions say they're later versions of another DOI in their metadata)
- Show reputation/reading/comment stats, link to comment authors, #feature-creep
Apologies for being so sprawling! I'll try and answer faster and shorter in the future.
EDIT: also left out all the code environment questions in the figure-editing part.
EDIT2: does this sound easy-enough to use? I imagine services like WriteLatex could integrate the metadata-features to make it easy using them.