Open Notebook Science ONSSP #1:

As promised, I slowly set out to explore ONSSPs (Open Notebook Science Service Providers). I do not have a full overview of solutions yet but found LabTrove and Open Notebook Science Network. The latter is a more clear ONSSP while the first seems to be the software.

So, my first experiment is with Open Notebook Science Network (ONSN). The platform uses WordPress, a proven technology. I am not a huge fan of the set up which has a lot of features making it sometimes hard to find what you need. Indeed, my first write up ended up as a Page rather than a Post. On the upside, there is a huge community around it, with experts in every city (literally!). But my ONS is now online and you can monitor my Open research with this RSS feed.

One of the downsides is that the editor is not oriented at structured data, though there is a feature for Forms which I may need to explore later. My first experiment was a quick, small hack: upgrade Bioclipse with OPSIN 1.6. As discussed in my #jcbms talk, I think it may be good for cheminformatics if we really start writing up step-by-step descriptions of common tasks.

My first observations are that it is an easy platform to work with. Embedding images is easy, and there should be option for chemistry extensions. For example, there is a Jmol plugin for WordPress, there are plugins for Semantic Web support (no clue which one I would recommend), an extensions for bibliographies are available too, if not mistaken. And, we also already see my ORCID prominently listed, and I am not sure if I did this, or whether this the ONSN people added this as a default feature.

Even better is the GitHub support @ONScience made me aware of, by @benbalter. The instructions were not crystal clear to me (see issues #25 and #26), some suggested fixes (pull request #27), it started working, and I now have a backup of my ONS at GitHub!

So, it looks like I am going to play with this ONSSP a lot more.

Open Notebook Science: also for cheminformatics

Last Monday the Jean-Claude Bradley Memorial Symposium was held in Cambridge (slide decks). Jean-Claude was a remarkable man and I spoke at the meeting on several things and also how he made me jealous with his Open Notebook Science work. I had the pleasure to work with him on a RDF representation of solubility data.

It took me a long time to group my thoughts and write the abstract I submitted to the meeting:
    I always believed that with Open Data, Open Source, and Open Standards I was doing the right thing; that it was enough for a better science. However, I have come to the realization that these features are not enough. Surely, they aid Open collaborations, though not even sufficient there, but they fail horribly in the "scientific method." Because while ODOSOS makes work reproducible, it lacks the context needed by scholars to understand what it solved. That is, it details out in much detail how some scientific question is answered, but not what question that was. As such, it fails to follow the established practices in scholarly research. In this presentation I will show how I should have done some of my research, and ponder on reasons why I had not done so.
And it also took me a long time and a lot of stress to get together some slides, but I managed in the end:

During the talk I promised to start doing Open Notebook Science (ONS) for my research, and I am currently exploring ONS platforms.

The meeting itself was great. There was a group of about 40 people in Cambridge and another 15 online, and most of them into Open Science or at least wanting to learn what it is about. I met old friends and new people, including a just-graduated Maastricht Science Programme student (one that I did not have in my class last year). Coverage on Twitter was pretty good (using the #jcbms hashtag, an archive) with some 90 people using the hashtag.
Several initiatives seem to be evolving, including an ONS initiative and a memorial special issue. All these will need to help from the community. The time is right.

#JChemInf Volume 5 as PDF on @FigShare

One of the things I do to prepare for holiday, is get some reading stuff together. I haven't finished Gödel, Escher, Bach yet (a suggested from the blogosphere), with a bit of luck there are new chapters of HPMOR, and I normally try to catch up with literature. One advantage of Open Access is that you can remix. So, I created a single PDF of all JChemInf Vol. 5 articles (last year I did volumes 1, 2, 3, and 4). This PDF is about 75 MB in size, and therefore fits on most smartphones. The PDF has an index, but doesn't have entries for each paper, but jumping from abstract to abstract works fine. It has a bit over fifty peer-reviewed papers.

Another advantage of Open Access is that you can reshare. And so I did, and the volumes are available from FigShare:
  1. JChemInf Vol.1
  2. JChemInf Vol.2
  3. JChemInf Vol.3
  4. JChemInf Vol.4
  5. JChemInf Vol.5
Of course, a clear downside it, is that it interferes with #altmetrics. And, I am wondering if a similar thing can be done with ePubs.

Journal Open Data Guidelines: plenty of room for clarifications

J. Gray, Wikipedia. CCZero.
Several journals are playing with statements about Open Data, and, for example, F1000Research and require Open Data. When publishers are judged in their implementation on Open Access, so should we critically analyze journals that claim to be an Open Data journal. Well, such claims I have not seen, but some journals have promising statements, like:
BioMed Central
    Data associated with the article are available under the terms of the CCZero.
However, this claim is vague, or, at least, too vague for a paper I am currently reviewing. The fuzziness lies in the word "associated". What defines associated data? How does this relate to reproducibility? If the purpose of Open Data is that the results of the paper can be reproduced, it means all data? And what happens if some of the data is from a previous paper? Or from a proprietary database? Is a paper that has data from proprietary database as key steps in the argumentation acceptable to a data that demands Open associated Data? What if the authors do not have control over the the license? Or is it limited to new data? But what defines new data here? Because it is a really hard question in an era where data has very limited provenance (versioning, author attribution, etc).

International Conference on Chemical Structures 2014

This Sunday the International Conference on Chemical Structures starts. If you aren't joining, it is important you know how to keep track of things online. First, follow the #iccs2014 hashtag on Twitter, and use the hashtag on any social platform. For example, I will bookmark papers mentioned in presentations on CiteULike. And slides that speakers put online, as well as coverage of other kinds, I'll link on Lanyrd. If you want to know what to expect, read this abstract book. And, of course, if you are attending the meeting, you can still join the online discussion.

Pathway analysis for Malaria research

A recurrent theme in my blog is that an easy way to support Open Science is to just join the show. You do not have to contribute a lot to have some impact. Of course, sometimes what you do has more impact than other times. Sometimes something with initially little impact gets high impact later. This is hard to predict, but maybe as well as the stock exchange. In the past I have contributed effort to many Open projects, often small bits, some things never get noticed (like my Ant man page in Debian which is more than 10 years old :).

One project I have long wanted to contribute to, is the Open Source Malaria project, which is brilliantly led by Matt Todd. I had two principle ideas:

  1. use Bioclipse to run the Decision Support against the OSM compounds
  2. do pathway analysis on malaria data
  3. use the AMBIT-JS to put all the OSM compounds online as a HTML page
The first and third I still have not gotten around to finishing. The first is a very simple way for you to contribute. The key question here is just to see how the compounds can be made less toxic / have less side effects. And Bioclipse can visualize this easily, based on various toxicity models, among all those from OpenTox. Really, a four hour job.

PCA results from
for the four sample groups.
The other task is more difficult, and I am really happy that Patricia Zaandam started a ten week internship with me to work on this task. She has been blogging her progress, and I strongly invite you to read her blog and comment (ask questions, post ideas, give criticism), as Open projects are driven by Open communication. Because WikiPathways has most pathways for human, Patricia looking at human expression data. And in five weeks time, she did the preprocessing of the raw data using and did the pathways analysis using PathVisio, resulting in this shortlist of pathways. And now the hard part starts: biological and methodological validation of her approach.

There is plenty of room for feedback. I am not at all a malaria expert, and learning a lot from her study. Some questions we welcome expert input in (as independent test set validation, so to say):
  • what key pathways and genes do we expect to see for treated-versus-ill malaria patients
  • what transcriptomics/proteomics/metabolomics data do you like us to consider too
Etc, etc...

Jean-Claude Bradley, Blue Obelisk award winner

Chemistry in Second Life. DOI:10.1186/1752-153X-3-14
There are nowadays a lot of people talking about Open, about open access, open data, open source. In fact, some discussion on Twitter resulted in the realization that it is highly unlikely that any scholar has not taken advantage of Open in some way in their research in the last few years. However, this is mostly due to people whom actually do, not by those who talk about it or use it.

One of the few people in chemistry who did both promoting Open and doing Open was Jean-Claude Bradley. Yesterday, I heard the sad news that he passed away. This is a great loss to many of us and certainly to the open chemistry community. Jean-Claude received the Blue Obelisk award for his Open Notebook Science work back in 2007 (I handed him the obelisk at the ACS meeting in Chicago; thanx to Chris for taking the picture, and digging it up!) and he contributed much to the community, among which his melting point and solubility data for organic compounds.

A proud me handing out the Blue Obelisk award to Jean-Claude in Chicago in 2007.
CC-BY 2007 Christoph Steinbeck.
Jean-Claude did some work together, including a book chapter, which I liked being a trained organic chemist myself (well, just a 6 month minor during my M.Sc. on supramolecular chemistry). I was really pleased that he had accepted to become part of the eNanoMapper scientific advisory board, and I was very much looking forward to working with him again on the journal side of dissemination of nanosafety research, in his role as editor-in-chief of Chemistry Central Journal.

Few people leave a big impression on me, but he was certainly one of them. Let his extensive work not go unnoticed; there is still a lot to do in Open chemistry.

Other posts about this loss.

Changes in CDK 1.6 #5: the SMILES generator

User:Fdardel and User: DMacks
CC-BY-SA at Wikipedia.
I won't say much about this, as John already did. It's much faster, more functional that what the CDK had before. Some things to keep in mind, which I ran into when proofreading my Groovy Cheminformatics with the CDK book. Importantly, make sure to read the SMILESGeneration documentation, as it many new cool options, and like much of the new CDK code, performance was a goal and it therefore is faster.

Canonical SMILES
Generating unique SMILES is done slightly differently, but elegantly:

generator = SmilesGenerator.unique();

"Aromatic" SMILES
Because SMILES with lower case element symbols reflecting aromaticity has less explicit information, it is not my suggestion to use. Still, I know that some of you are keen on using it, for various sometimes logical reasons, so here goes. Previously, you would use the setUseAromaticityFlag(true) method for this, but you can now use instead:

generator = SmilesGenerator.generic().aromatic()
smiles = generator.createSMILES(mol)

Of course, you can combine things.

Previous posts:

Changes in CDK 1.6 #4: IsotopeFactory and Isotopes

A major CDK API change happened around the IsotopeFactory. Previously, this class was used to get isotope information, which it gets from an configurable XML file. This functionality is now available from the XMLIsotopeFactory class. However, to improve the speed of getting basic isotope information as well as to reduce the size of the core modules, CDK 1.6 introduces a Isotopes class, which contains information extracted from the XML file, but is available as a pure Java class. The APIs for getting isotope information is mostly the same, but the instantiation is much simpler, and also no longer requires an IChemObjectBuilder (in Groovy):
    import org.openscience.cdk.config.*;

    isofac = Isotopes.getInstance();
    uranium = 92;
    for (atomicNumber in 1..uranium) {
      element = isofac.getElement(atomicNumber)
Previous posts:

Open PHACTS community workshop

Of course, you can start hacking with Open PHACTS any day, but please see this invitation (I will not be present myself):
    You are cordially invited to the upcoming Open PHACTS Community Workshop on 26.
    The Open PHACTS Discovery Platform is a freely accessible infrastructure that semantically integrates publicly available data for applied life science R&D. The Platform provides a powerful Application Programming Interface (API) which allows application builders and researchers to query the integrated data using existing applications, to build new applications and to access the API using workflows tools (e.g. KNIME). Examples of such applications, which illustrate what can be achieved, include the Open PHACTS Explorer, ChemBioNavigator, and PharmaTrek.

    The Open PHACTS Community Workshop in London on 26 June aims to introduce members of the academic community to the Open PHACTS Discovery Platform. The workshop will be of interest to: 
    • Researchers who would benefit from directly querying the Open PHACTS API using scripting languages or by developing applications to consume the data.
    • Lecturers & Principal Investigators who can use the Open PHACTS application ecosystem to access the data within the Open PHACTS Discovery Platform.

    The Community Workshop will introduce attendees to the Open PHACTS API and showcase how it can be used to create new or enhance existing applications. We will demonstrate, using real life use-cases, how universities can use the Open PHACTS API and associated tools for teaching and research in drug discovery.

    The Workshop is free to attend, please register your interest by replying to Feel free to forward this email to interested members of the academic community.

    Please find a preliminary agenda attached. We very much look forward to seeing you in London.

    Kind regards
The next chance to discuss Open PHACTS in person with me, is the International Conference on Chemical Structures, in the first week of June. And, if you like to learn about and/or contribute to the R client "ropenphacts" I have been hacking on, just let me know!