NRNB: GSoc Ideas Page

This is an old listing of project ideas. Our current list is at GitHub.

Idea 1: Submit your own original proposal

Feel free to propose your own idea. As long as it relates to one of our projects, we will give it serious consideration. Creativity and self-motivation are great traits for open source programmers, but make sure your proposal is also relevant. Projects can be on any network biology related software idea. Consider some of the NRNB coordinated technologies.

Idea 2: Co-authorship Social network App

Background: Biological networks can be visualized and analyzed using Cytoscape. Often researchers want to go beyond the network of proteins or genes and also look at the inter-connectedness between colleagues and institutions. Who tends to publish together? What institutions are most collaborative? Are there inter-disciplinary connections in my institution?

Goal: Create a visual summary of how individuals are connected. An existing App currently displays such a network, where nodes are authors and edges the number of publications two authors share, using publication information from querying PubMed over the web or from a local file downloaded from Scopus or Incites databases. Different data sources have different depths of data, for instance PubMed, the most widely accessible of the three main sources, has citation counts but they are limited to only those articles cited in PubMed Central and its institution data is limited to the first author. Given that PubMed is the easiest to access source, we would like to expand its functionality to capture more data about author institution and publication citation information, as well as H-index calculation for individuals or subsets of papers.

We would like to expand our co-authorship Cytoscape App to include generic file formats to easily create networks from any new data source. As well as expanding publication network visualization and summarization by integrating with other Cytoscape apps such as WordCloud to create customized node images representing the tag cloud of all the author’s publications, or adjusting node size to account for impact factors of the author’s publications.

Links:

Technology and Skills: Cytoscape, Java, XML, PubMed e-utils

Potential Mentors: Sergio Baranzini, Gary Bader, Ruth Isserlin

Contact: cytoscape-discuss@googlegroups.com

Idea 3: Upgrading Ontology Tagging at WikiPathways

Background: WikiPathways is a wiki, like Wikipedia, but for biological pathway diagrams. We ripped out the text editor of Mediawiki and put in a custom pathway drawing tool. In addition to capturing pathway models, we have a number of systems to annotate pathway using standardized identifiers and ontology terms.

Goal: To upgrade the current ontology tagging system at WikiPathways to take full advantage of the latest services from NCBO’s BioPortal.

Description: The upgrade work will include overhauling the current tagging system from the database schema up to the user interface. Promising proposals will include an assessment of the current system (found on any given pathway page and WikiPathways), research in to the available web services at BioPortal, and a set of features to overhaul and add.

Technology and Skills: WikiPathways, PHP, JavaScript, web services, Mediawiki

Potential Mentors: Alex Pico, Sravanthi Sinah, Martina Summer-Kutmon

Contact: wikipathways-devel@googlegroups.com

Idea 4: Computing patient similarity networks for 100K samples

Background: A major goal of modern biomedical research is to tailor patient treatment based on individual genomic profiles and/or other patient information. Network-guided approaches can integrate several types of patient data - e.g. genetic, RNA, clinical, brain imaging – to identify networks of patient similarities (PSN). These networks could identify biological pathways reflecting different disease subcategories or explain why responses to the same medication vary among people. Software for computing PSN exists, but currently requires the user to use a patchwork of different tools. Moreover, existing machine learning operations don’t easily scale to the size of contemporary datasets (~100K patients).

Goal: To create fast, streamlined software to compute and validate PSN. The software will compute PSN based on different types of data (e.g. molecular profiles, pathway information, clinical data), by using a similarity measure, such as Pearson correlation. It will perform K-fold cross validation for training. Finally it will generate ROC curves to measure the sensitivity-specificity of the prediction system.

Existing Software: A software pipeline for this workflow already exists, therefore the algorithm itself will not need to be developed. Development will focus on optimizing the workflow, and creating a separate software package with computing and visualization functionality. Rcpp (R interfacing with Cpp) will be used to program the software. C++ matrix libraries (e.g. Armadillo) will be used to reduce memory footprint and speed computations.

Publications:

Wang B. et al. (2014). Similarity network fusion for aggregating data types on a genomic scale. Nature Methods 11(3): 333-7.
Mostafavi S. et al. (2008). GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol. 9 Suppl 1:S4.

Technology and Skills: R statistical programming language, C++, Java

Potential Mentors: Gary Bader, Shraddha Pai

Contact: Shraddha Pai

Idea 5: Develop a Method for Aligning Pathways and Visualizing these Alignments

Background: There is continued growth in the groups generating machine-readable representations of biological pathways (such as Reactome or WikiPathways) and subsequent efforts to aggregate these pathways into unified repositories, such as Pathway Commons. Aggregation of these pathways suffers from multiple pathways being aggregated that differ due to mismatches in author representations of these pathways.

Goal: The goal of this project is to develop a method for aligning pathways and then visualizing the results of these alignments. Starting points could be to find interactions that use the same substrates and produce the same products. There are many uses of such a tool including helping to validate pathways as they are added to repositories, merging pathways across different resources, and the comparative analyses of pathways across organisms. One way to think about this project, is if you're given multiple representations of the same pathway (examples of several human Wnt pathways) can you develop a method to show a unified pathway and how the original representations map to the unified pathway.

Technology and Skills: Reactome, PathwayCommons, WikiPathways, Java, XML

Potential Mentors: Augustin Luna (MSKCC)

Contact: lunaa@cbio.mskcc.org

Idea 6: Drag and Drop or Palette Editor for SBGN and other graphical notations in Cytoscape

Description: The goal is to design and implement a drag-and-drop or palette-based editor for Cytoscape that would include nodes, edges, and graphical annotations. In the ideal case, this would be extensible so that apps add their own nodes or edges with specific graphical attributes. The Systems Biology Graphical Notation (SBGN) is a standard graphical notation for presenting pathways and protein interactions. Currently, Cytoscape lacks the node symbols and edge types required for SBGN. In this project, those notations will be added to the Cytoscape core, and a simple palette editor developed for user interaction.

Technology and Skills: Cytoscape, Java

Potential Mentors: Scooter Morris, Alex Pico

Contact: Scooter Morris

Idea 7: Improve and expand Cytoscape’s color mapping options

Description: Currently, the color options for Cytoscape’s discrete color mappings are very limited. Brewer palettes provide a luminosity-neutral set of colors that would be extremely useful to add to Cytoscape, and there are several other approaches to choosing sets of colors that contrast nicely. Creating these color palettes and providing them for use within Cytoscape’s visual style interface would be extremely useful to Cytoscape users.

Technology and Skills: Cytoscape, Java

Potential Mentors: Scooter Morris, Michael Heuer, Alex Pico

Contact: Scooter Morris

Idea 8: Constrained layout algorithms in Cytoscape

Description: Most layout algorithms in Cytoscape rely on inherent network topology and only a crude set of spatial constraints (e.g., grid, circle). There are many cases when working with networks in Cytoscape where you want to be able to constrain layouts to user- or attribute-defined spaces. This idea involves implementing layouts that are constrained by:

specific shapes

basic shapes
cellular templates (e.g., with subcellular organelles)
attribute-driven Venn diagrams (e.g., to visualize attribute groupings and overlaps)

specific “anchor” nodes

user-defined (e.g., manually selected)
attribute-defined (e.g., hubs or column values)

etc...

Technology and Skills: Cytoscape, Java, and a strong math background

Potential Mentors: Scooter Morris, Alex Pico, Bruce Aronow

Contact: Scooter Morris

Idea 9: Collaborative System for Creating/Sharing Biological Icons

Background: Biological pathways provide intuitive diagrams of the interactions underlying biological processes. For example, the SIDS Susceptibility Pathways shows interactions between metabolites, genes and proteins, such as DDC catalyzing the conversion of L-DOPA to Dopamine (look for the items highlighted in yellow on the lower right). The site WikiPathways is an open, collaborative platform dedicated to drawing and sharing biological pathways. These pathways often use icons to represent items like mitochondria or cardiac myocytes (heart muscle cells). The WikiPathways drawing editor has a limited set of icons built in, but it does not allow pathway authors to create and re-use custom icons. For example, the cardiac myocyte icon is repeated in the SIDS Susceptibility Pathways for mouse and for human, but the only way to re-use the icon is to manually redraw or copy/paste it. Updates to the icon in pathway do not appear in the other.

Goal: Build a biological icon library that allows pathway authors to create icons, share them with other WikiPathways’ users and reuse them across pathways. It could work like Github for icons, allowing users to create, fork and merge their work, with the best icons rising to the top via a rating system. The library will work as a plugin for the current WikiPathways viewer/editor pathvisiojs, allowing pathway authors to create new icons from scratch or to create icons from sections of existing pathways, e.g., the cardiac myocyte from the previously referenced pathways. It will have a backend component to store the icons and a front-end component to allow WikiPathways users to view and search for icons.

Links: An SVG drawing tool with a large number of icons available might provide some ideas.

Technology and Skills: WikiPathways, JavaScript required. Some backend experience (PHP, Node.js or Python) to create a backend for saving the icons. SVG or Canvas a plus.

Potential Mentors: Anders Riutta, Alex Pico, Kristina Hanspers

Contact: wikipathways-discuss@googlegroups.com

Idea 10: Biological Pathway Styling Themes

Background: Description: Biological pathways provide intuitive diagrams of the interactions underlying biological processes. For example, the SIDS Susceptibility Pathways shows interactions between metabolites, genes and proteins, such as DDC catalyzing the conversion of L-DOPA to Dopamine (look for the items highlighted in yellow on the lower right). The site WikiPathways is an open, collaborative platform dedicated to drawing and sharing biological pathways. These diagrams are viewed in many different environments, such as laptop screens, presentation slides and printed science journals. At WikiPathways today, the default styling theme for these diagrams is best suited for viewing by an individual on a computer screen, but it would be useful to export the diagrams with themes optimized for print and for presentations.

Goal: Develop a custom theming system to allow for exporting pathways with themes optimized for print and for projected presentation slides. For example, the presentation export option might make pathway diagrams easier to read by removing some text, enlarging the remaining text and inverting the colors of the diagram. Optionally, the presentation option could integrate with one or more existing online presentation framework(s).

Links:

WikiPathways pathway viewer/editor
Example online presentation frameworks:

Technology and Skills: WikiPathways, User experience, graphical design, CSS, some JavaScript.

Potential Mentors: Anders Riutta, Alex Pico, Kristina Hanspers

Contact: wikipathways-discuss@googlegroups.com

Idea 11: Details Panel for Semantic Terms

Background: Biological pathways provide intuitive diagrams of the interactions underlying biological processes. For example, the SIDS Susceptibility Pathways show interactions between metabolites, genes and proteins, such as DDC catalyzing the conversion of L-DOPA to Dopamine (look for the items highlighted in yellow on the lower right). The site WikiPathways is an open, collaborative platform dedicated to drawing and sharing biological pathways.

A common challenge in biological research is dealing with an ever-growing number of terms, some of them synonyms and others almost synonyms but with important differences in meaning. For instance, WikiPathways uses the term “GeneProduct” in its annotation panels (to see an example, click the “TNF” box in the upper left corner of the SIDS pathway). A WikiPathways user might be unsure of the exact definition of the “GeneProduct”, so it would be useful to point the user to the definition of the term in the WikiPathways vocabulary. This could be done with a link or a mouse over display. Another example from the SIDS pathway is the Extracellular Space, which could display information from this IRI: href="http://identifiers.org/go/GO:0005615".

The first two examples are easy to display because each IRI has an HTML representation, but a more challenging example is Protein. This IRI returns RDF but does not have an HTML representation, so it would be necessary to extract information from the RDF, such as the definition, and display it in a small, easy-to-read panel.

Goal: Develop a browser-based display system for dereferenced IRIs. The system will use content-type negotiation to request an HTML representation of a resource, but it will be able to convert JSON-LD, RDF and OWL representations into an HTML representation for display in a small details panel or pop-over. Additionally, develop a plugin for the WikiPathways viewer/editor pathvisiojs that integrates the display system. Optionally, if time allows, add functionality to allow pathvisiojs users to switch vocabularies for the GUI display, mapping from the built-in vocabularies to the user-preferred vocabularies with SKOS files.

Links:

Technology and Skills: WikiPathways, JavaScript, JSON-LD, RDF, OWL, optionally SKOS

Potential Mentors: Anders Riutta, Alex Pico, Kristina Hanspers

Contact: wikipathways-discuss@googlegroups.com

Idea 12: Development of Experiment Data Curation Tools using the NDEx infrastructure and the OpenBEL Knowledge Representation Language

Background: Biological knowledge can be represented in network form, a methodology that has proved valuable in communicating and computing on biological mechanisms. In some cases, the biological relationships encoded in the networks correspond directly to experimental results, as in protein binding assays that result in protein-protein interaction assertions. In other cases, scientific findings have been encoded as qualitative causal relationships, providing networks of facts that can be used to build causal reasoning models. These encodings have been frequently performed by manual curation or ad hoc software, so there is an opportunity to develop more general tools for abstracting experimental results into reusable portable knowledge encoded as biological networks. Recent technological progress in modern experimental OMICS techniques allow scientists to generate large data sets that need to be carefully validated and curated, therefore the automation of the interpretation and abstraction of these data to computable scientific findings is a valuable technique to bridge the gap between data generation and its availability for the scientific community.

Goal: To develop and test a set of utilities to process selected types of experimental data into more abstract biological knowledge as networks in the OpenBEL language and to compare the resulting networks with existing, manually curated knowledge networks. This transformation of data to computable scientific findings represents the automation of a basic scientific process; the abstraction of data tables into statements of inferred causality in a given biological context makes those results more reusable, more computable, and easier to integrate with findings from other studies. This facilitates scientists to take better advantage of one class of “big data” and can potentially help use the experimental data to identify new hypotheses in the biomedical/pharmaceutical fields.

Project breakdown:

Phase 1 of the project aims to develop new software utilities to capture experimental data as OpenBEL statements.

In Phase 2, the newly developed utilities will be tested with experimental data sets derived from microarray and mass spec analysis. One source of testing will be a set of experimental data generated and validated by the Itkin-Ansari lab at UC San Diego and focused on the involvement of bHLH transcription factor Id3 in pancreatic cancer.

Phase 3 will focus on generating additional utilities for the comparison of OpenBEL networks.

Finally, Phase 4 will use these tools to compare and contrast the OpenBEL networks captured from experimental data with existing OpenBEL networks created by manual curation.

Project milestones and deliverables will include:

Update/expansion of existing related networks available in public databases and repositories.
Pre-publication of all new network data using the NDEx infrastructure (www.ndexbio.org).
Generation of new leads to support research project progress.
Publication of at least 1 research article in an international peer-reviewed journal showcasing the newly developed tools, their use and potential applications.

Technology and Skills: NDex, Python, JavaScript

Potential Mentors: Dexter Pratt, Pamela Itkin-Ansari

Contact: Rudolf T. Pillich

Idea 13: Integration of WikiPathways Visualization with PathVisio into Bioclipse

Background: Bioclipse is a powerful platform combining a powerful scripting language to write data analysis and visualization workflows, which can easily be shared on, for example, MyExperiment.org. While it has various components to support metabolomics in general, it lacks support for visualization of metabolic pathways.

Goal: To integrate PathVisio visualization widget into Bioclipse for visualization of pathway information from WikiPathways. Selecting protein and metabolites in the pathway visualization, will show that chemical entity in other Bioclipse components, using the Jmol and JChemPaint plugins for Bioclipse. Preferably, interaction can be scripted, allowing integration with other metabolomics features in Bioclipse.

Description: You will use Java to write a plugin for Bioclipse that uses PathVisio as a library to visualize metabolic pathways from WikiPathways inside Bioclipse. You will use the Eclipse selection event propagation mechanism so that when users select protein and metabolites, Bioclipse will automatically visualize 2D or 3D structural information about those molecules (only the event creation has to be implemented). Preferably, a scripting language extension is written, to allow extracting information from the pathway and highlighting of parts of the pathway. This involves extending an existing interface with methods that can be run from the scripting language.

Technology and Skills: WikiPathways, Bioclipse, Java

Potential Mentors: Egon Willighagen, Jonathan Alvarsson

Contact: wikipathways-discuss@googlegroups.com

Idea 14: Help scientists share data with the Synapse client for Cytoscape

Background: Synapse is an web-based platform that makes it easy for scientists to share datasets. Users can request a DOI number for datasets, which allows them to be cited and promoted in publications. This encourages scientists to share data with the public, even after the data is deemed no longer relevant to them -- but could be to others. Synapse also provides extensive _provenance_ tools for tracking progressive changes to datasets as they are transformed from raw to publishable data. This allows for scientists to collectively establish, review, and share data processing pipelines.

The Synapse client app for Cytoscape allows users browse and search their Synapse files and to import them into Cytoscape. Cytoscape is a powerful analysis network tool that can play an important role in data processing pipelines. The current Synapse app lacks facilities for putting processed data back into Synapse.

Goal: The goal of this project is to expand the Synapse app's abilities so that it can submit data back to Synapse and annotate it using its provenance tools.

First phase: create menu items and commands for submitting networks, tables, and sessions to a Synapse account. The app would present a very simple UI for asking the user to which project the file belongs.

Second phase: provide an intuitive, easy-to-use UI for annotating submissions with Synapse's provenance tools. When the user submits files to Synapse, the user can specify a name and description of what was done to the submission and what files were used to create the submission. These files could be from Synapse or from any URL.

Links:

Technology and Skills: Cytoscape, Java, OSGi, Using REST APIs in Java, Swing

Potential Mentors: Alex Pico

Contact: Alex Pico

Idea 15: Export as D3.js-Based Web Application

Background: From Cytoscape 3.2, users can export their network visualizations as a simple web application. This feature uses cytoscape.js and other popular libraries such as Angular.js to reproduce visualizations generated with Cytoscape 3 Desktop application using modern web technologies.

Goal: This project is extending this idea. In addition to exporting Cytoscape-generated visualizations as simple node-link diagrams, you will add new options for this feature, including:

TreeMap
Dendrogram
Circle Packing
Node-Link Tree
Adj. Matrix

This enables users to visualize the same data set in different ways. In this project, you need to extend the existing D3.js Exporter App and add new export options. Note that you need both of Java and JavaScript knowledge to implement this feature. Also, we assume you are comfortable with modern web application development tools, such as command-line tools running on node.js, JavaScript frameworks like Angular.js, testing frameworks, and of course, D3.js.

Technology and Skills: Cytoscape, JavaScript, D3.js, Java, knowledge of modern web development tools such as node.js/grunt/gulp/yeoman.

Potential Mentors: Kei Ono

Contact: cytoscape-discuss@googlegroups.com

Idea 16: BridgeDb config plugin in PathVisio

Background: BridgeDb is an identifier mapping framework used in PathVisio and WikiPathways. Currently it is possible to load a mapping databases for gene products and metabolites. This should be extended with interaction mapping databases.

Goal: The existing BridgeDbConfig plugin provides functionality to load more mapping databases or to use other web services for the mappings. This functionality should be extended and integrated into the core PathVisio application as another core module. The configuration dialog has to be improved so it is more intuitive and user friendly. Furthermore the plugin should allow users to download and update BridgeDb mapping databases from within PathVisio. Currently the users have to download the databases first and then load them into PathVisio. This step should be automatized by the plugin.

Technology and Skills: BridgeDb, PathVisio, Java

Potential Mentors: Kutmon, Jonathan Melius

Contact: wikipathways-devel@googlegroups.com

Idea 17: Citing publications in biological pathways - PathVisio and WikiPathways

Background: PathVisio is a commonly-used pathway editor, visualization and analysis tool implemented in Java and is linked with the collaborative pathway database WikiPathways. Currently, PathVisio and WikiPathways allow users to add publication references to PubMed.

Goal: The goal of this project is to extend the functionality to allow the reference of other identifiers like DOIs, export references in standard formats like Bibtex or EndNote and provide statistics about citations used on WikiPathways (total number of citations, avg. number of citations per pathway, top cited papers/journals, ...).

Technology and Skills: PathVisio, WikiPathways, Java, PHP, Mediawiki

Potential Mentors: Sravanthi Sinah, Martina Summer-Kutmon, Anwesha Bohler

Contact: wikipathways-devel@googlegroups.com

Idea 18: Enhanced Layout Algorithms

Description: As Cytoscape is used for increasingly large networks, the performance of our layout algorithms becomes a limiting factor. There are a number of layout algorithms that perform very well in real-world scenarios that have not been implemented in Cytoscape. For example, the Yifan Hu Multi-level layout algorithm, a Fast Multipole Multilevel layout (See OGDF for example), Yifan Hu proportional, and potentially some of the better algorithms from Gephi. Cytoscape provides a framework for adding new layout algorithms as Apps, so this work could all be done as a series of separate apps.

Technology and Skills: Cytoscape, Java

Potential Mentors: Scooter Morris

Contact: Scooter Morris

Idea 19: Compound Graph Visualization in Cytoscape 3

Description: Cytoscape 3 has introduced two concepts into the core, groups and annotations, that could be combined to provide a compound graph visualization. Currently groups can be either in a “collapsed” or “expanded” state.

Goal: For a compound graph, a new visualization state needs to be introduced that allows the compound node (group) to be visualized as a shape (concave or convex hull) around it’s children. Moving that shape would move all of it’s children, etc.

Technology and Skills: Cytoscape, Java

Potential Mentors: Scooter Morris

Contact: cytoscape-discuss@googlegroups.com

Idea 20: Visualization of Time-series Cluster Data

Description: Currently, the clusterMaker app uses standard heat maps to visualize attribute cluster data (based on Java TreeView). Increasingly, time series data is becoming important for clustering applications and it would be useful to expand the provided visualization options to explicitly support time series data. For these purposes, we’re thinking about time-series that have multiple conditions in each series, which means we need to move beyond the standard two-dimensional heat map.

Goal: This project can explore utilizing animation, small multiples, or other standard visualization techniques (or your own novel techniques) to provide new visualization opportunities to users. Ideally, this would be integrated with network-based animation tools such as DynNetwork or CyAnimator.

Technology and Skills: Cytoscape, Java

Potential Mentors: Scooter Morris

Contact: cytoscape-discuss@googlegroups.com

Idea 21: Develop a Converter between Visual and Textual Representations of Biological Pathways

Background: Pathway Commons seeks to aggregate all pathway data available in literature, public databases, and curation/crowd sourced efforts. Our goal is to accelerate this process by leveraging pathway diagram editors that support a widely used visual representation for biological pathways.

Problem:

Pathway Commons data is represented in the Biological Pathway Exchange (BioPAX) community standard format, but the BioPAX format lacks a simple graphical editor. On the other hand, the System Biology Graphical Notation (SBGN) is a visual representation for pathways that is supported in about 20 tools.

Goal: We'd like you to build a converter between the BioPAX and SBGN formats. A similar project has been developed with the editor CellDesigner (that shares a format similar to SBGN) making us believe that this should be a reasonably straight forward project; many of the mapping issues have already been addressed. The project will make use of two Java libraries Paxtools (for BioPAX) and LibSBGN (for SBGN).

Links:

Technology and Skills: BioPAX, SBGN, PathwayCommons, Java and XML

Potential Mentors: Augustin Luna

Contact: Augustin Luna

Idea 22: Visualizing complex data with sets-aware layout algorithms

Background: Sets app for Cytoscape is a powerful tool where users can define individual sets for nodes and edges. Users create sets from a selection or from some discrete criteria. For instance, given a network that consists of gene and protein interactions, the user can define a set of all gene nodes and another set of protein nodes. Selecting all gene nodes is a snap, as the user can just click on the gene set. New sets can be made from existing ones by using the union, intersect, and subtract operations.

The app also provides two set-aware layout algorithms:

Grid layout: creates grids of node sets without taking network topology into account.
FDL: strike a balance between set membership and network topology, so unconnected nodes that belong in the same set move closer together.

Goal: The goal of this project is to expand sets-based layout algorithms. Layout algorithms that take user-defined set membership into consideration can help elucidate complex network data in ways that purely automatic layout algorithms cannot. Here are some planned sets-based layout algorithms:

Inner-FDL: Creates a grid of each set, then applies FDL on each set individually.
Ordered grid: Within a set, order each node based on an attribute. This is useful for visualizing genomic locations, so that nodes in a set will be ordered by where their representative genes are in the genome. The layout algorithm could also place nodes in some other shape besides a grid, like a circle or spiral.

Technology and Skills: Cytoscape, Java

Potential Mentors: Alex Pico, Scooter Morris

Contact: Alex Pico

Idea 23: Improve the Text-Mining Functionality of Factoid

Background: Factoid aims to be a tool that helps authors annotate biological interactions that occur in the text of academic publications in a machine-readable manner. Currently, Factoid uses basic entity-recognition to identify genes, proteins, biological processes to identify interactions between these entities.

Goal: The goal of this project is to integrate more sophisticated algorithms into the extraction of interactions from text content. A starting point for some of these would be the algorithms outlined by the EBI website WhatIzIt.

Technology and Skills: PathwayCommons, Experience connecting web services, Javascript

Potential Mentors: Augustin Luna (MSKCC), Max Franz (U of Toronto)

Contact: Augustin Luna, Max Franz

Idea 24: Export to GPML with WikiPathways app

Background: WikiPathways is an open, collaborative wiki for biological pathways. (Pathways are diagrams representing biochemical processes.) Like Wikipedia articles, all pathways are open and free to everyone, and anyone can contribute content.

The WikiPathways app for Cytoscape allows users to take pathway content from WikiPathways and import it into Cytoscape. Users can then modify the pathway's appearance using Cytoscape's myriad visualization tools.

The native format for representing pathway content in WikiPathways is GPML, an XML-based format. GPML files can be edited online on WikiPathways or can be edited using the PathVisio application. The WikiPathways Cytoscape app provides the means to import GPML files into Cytoscape networks.

Goal: Users often need to update pathway content to reflect new insights from literature or data gleaned from Cytoscape's powerful visualization and data integration tools. The goal of this project is to add the ability to export GPML files, so that users can effortlessly submit new pathway content to the WikiPathways website.

When pathway content is imported from GPML into Cytoscape, the WikiPathways app reads the GPML content and creates a series of columns in the node table representing the pathway node's visual information. This information is then used by Cytoscape to style the network. The exporter would use this node table data to reconstruct the GPML file. Additional node table data could be needed to complete the translation to and from Cytoscape and GPML.

Technology and Skills: WikiPathways, Cytoscape, Java, XML

Potential Mentors: Alex Pico

Contact: Alex Pico

Idea 25: Developing Web Interface For The Online Resource for Interface- Based Interaction Networks (TOR-IBIN) Using Cytoscape.js

Background: The protein-protein interaction (PPI) is a physical contact between two proteins, or more, resulted from biochemical reaction or electrostatic forces. The interaction usually takes place through an Interface, which is part of the protein (e.g. protein domain) with special molecular structure that enables and mediates the interaction between proteins. Interface-based PPI predictions help in understanding the mechanisms of interactions between proteins as well as helps in deciphering the functional roles of the proteins.

Focus: TOR-IBIN is an online resource for gathering and storing the published interface-based (IB) protein-protein interaction networks (IN) predicted using any PPI prediction method. TOR-IBIN aims to make these networks available in a central repository with unified format and comprehensive search functions that maximize the use of these data. TOR-IBIN web-based interface aims to implement the browse and search functions and well as network visualization using Cytoscape.JS and network download in several formats including JSON and tab-delimited formats. TOR-IBIN is a relational database implemented in MySQL. - The web interface and functionality will be similar to the web interface of the cancer variants database (Can-VD) canvd.baderlab.org.

Goals:

Develop a web-based interface for interface-based PPI data retrieval with a comprehensive search functions and data download options.
Develop a visualization and annotation module using Cytoscape.js to visualize the retrieved PPI data as a network with annotations that highlights the features of the predicted PPI such as the prediction confidence and validation...etc.

Inputs: The student will be provided with the database design and sample data as well as an introduction to the topic and the different possible visualizations and annotations needed. The student will have the freedom to choose among them or to propose other options.

Technology and Skills: Cytoscape.js, Javascript, JSON, PHP, JQuery

Potential Mentors: Mohamed Helmy, Gary Bader

Contact: Mohamed Helmy, Gary Bader

Idea 26: Full Compound Support in Cytoscape 3

Background: The notion of compound graphs is used to represent more complex types of relationships or varying levels of abstractions in data visualization. Networks from almost all domains need such abstractions and nesting, including biological networks. The recently developed standard notation of SBGN makes heavy use of such structures for molecular complexes and cellular locations.

Goal: We would like to first implement full compound support in Cytoscape 3 core. This will not only allow users to nest nodes of a network but also automatically adjust size of a compound node as its child nodes get moved, resized or deleted. Then, we'd like to implement a native spring embedder based layout algorithm named CoSE that takes such structures into account during layout. A Java implementation of CoSE is already publicly available and needs to be integrated into Cytoscape 3.

Technology and Skills: Cytoscape, Java, Data Structures and Algorithms

Potential Mentors: Ugur Dogrusoz (Bilkent University & MSKCC) and Gerardo Huck (U of Toronto)

Contact: ugur@cs.bilkent.edu.tr, gerardohuck@gmail.com

Idea 27: Extending and improving PathVisio Data Import

Background: PathVisio is a pathway editor and analysis tool. PathVisio allows users to upload and visualize data on biological pathways. Currently, it is possible to upload only one dataset at a time as a comma separated file.

Goal: This functionality should be extended to allow uploading a number of data files at the same time as well as uploading of multiple data file formats (for eg. Excel sheets) should be allowed. This will involve extending the gex and data modules of PathVisio. These modules are part of the PathVisio core and take care of importing experimental data and mapping it to the elements in a pathway.

Features of the new plugin:

Uploading multiple datasets at once : these could be any kind of quantitative or qualitative data about any pathway element, such as genes, proteins, metabolites, and interactions.
Linking each individual dataset to a particular element on a pathway, for eg. linking a gene expression dataset to genes on pathways and a flux data set to interactions on pathways.
Importing data from multiple file formats (For eg: .xls, .ods).
Create a pop-up panel accessible from the menu bar of the PathVisio main window to show:
- all the currently loaded datasets
- the biological elements they are linked to
- the currently loaded annotation databases

Technology and Skills: PathVisio, Java

Potential Mentors: Anwesha Bohler, Martina Summer-Kutmon

Contact: wikipathways-devel@googlegroups.com

Idea 28: Over-representation module for WikiPathways

Background: PathVisio is a commonly-used pathway editor, visualization and analysis tool implemented in Java. Currently, PathVisio provides an advanced statistics module that allows users to define criteria to perform pathway statistics, which is sometimes not needed.

Goal: In this project, the student will extend the statistics module with a basic over-representation methods that uses a gene list as input and provides ranked pathways as output. The genes in the pathways are highlighted when they are in the input gene list. Additionally this basic analysis module will also be integrated on WikiPathways. The student will develop a page where the user can provide a list of genes, then the PathVisio statistics module gets called to calculate the over-representation and the result will be shown on WikiPathways. The pathways will be shown in the pathvisiojs viewer and the genes in the input list will be highlighted.

Technology and Skills: WikiPathways, Java, PHP

Potential Mentors: Martina Summer-Kutmon, Anwesha Bohler

Contact: wikipathways-devel@googlegroups.com

Idea 29: Tutorial mode for PathVisio

Background: PathVisio is a commonly-used pathway editor, visualization and analysis tool implemented in Java. The analyses in PathVisio and the different advanced visualization methods can become very complex and having an integrated tutorial mode in the application is a powerful tool to give detailed instructions while performing a task.

Goal: This will give the users a step-by-step guide with hints and additional information while learning how to use PathVisio. Documentation and tutorials are crucial elements of software development and the goal of this project is to implement a tutorial mode in PathVisio. PathVisio can be extended through plugins, so the tutorial mode should allow other modules to provide additional tutorials that can be selected if the plugins are installed.

Technology and Skills: PathVisio, Java, development of tutorials

Potential Mentors: Martina Summer-Kutmon, Kristina Hanspers, Sravanthi Sinha

Contact: wikipathways-devel@googlegroups.com

Idea 30: Principal Component Analysis

Description: One of the common multidimensional analysis tools is principal component analysis. At this point, there is no good PCA tool integrated with Cytoscape. The ideal tool would support performing the analysis, visualizing the results of the analysis using standard approaches, and then visualizing the results in the context of the Cytoscape network itself. There are many tools that could assist in this task and the analysis itself is pretty well understood. The challenge to this idea is looking for ways to integrate the results of the analysis into Cytoscape.

Technology and Skills: Cytoscape, Java

Potential Mentors: Scooter Morris

Contact: Scooter Morris

Idea 31: Incremental Loading of Networks

Background: Cytoscape now supports using JSON as a format for loading networks. In an effort to improve exploration of extremely large networks, it would be extremely useful to be able to load parts of the network on request. One use case for this is loading networks from the Structure-Function Linkage Database, which creates Cytoscape networks representing similarity between groups of proteins. Some of these networks are extremely large (millions of edges) and usually users are only interested in local neighborhoods of the network.

Goal: This project would involve writing a Cytoscape app that would load an abstraction of these networks (where multiple nodes are represented by a single "meta" node) into Cytoscape and then allow users to incrementally load all of the nodes within that metanode.

Technology and Skills: Cytoscape, Java

Potential Mentors: Scooter Morris, Samad Lotia

Contact: Scooter Morris

Idea 32: Add t-SNE to clusterMaker

Background: clusterMaker is a Cytoscape app that unifies different clustering techniques and displays into a single interface. Current clustering algorithms include hierarchical, k-medoid, AutoSOME, and k-means for clustering expression or genetic data; and MCL, transitivity clustering, affinity propagation, MCODE, community clustering (GLAY), SCPS, and AutoSOME for partitioning networks based on similarity or distance values. Hierarchical, k-medoid, AutoSOME, and k-means clusters may be displayed as hierarchical groups of nodes or as heat maps. All of the network partitioning cluster algorithms create collapsible "meta nodes" to allow interactive exploration of the putative family associations within the Cytoscape network, and results may also be shown as a separate network containing only the intra-cluster edges, or with inter-cluster edges added back.

Goal: t-SNE is a technique for dimensionality reduction that is widely used. It would be extremely useful to add this algorithm into the existing clusterMaker app.

Technology and Skills: Cytoscape, Java

Potential Mentors: Scooter Morris

Contact: Scooter Morris

Idea 33: cyRecommender - New Cytoscape 3 app for recommending network ontologies

Background: Cytoscape 3 is a powerful software for networks that provides diverse apps in order to combine external knowledge with the network connectivity. There are a few apps that allow to link ontology details with node labels. However, existing apps require the user to decide what ontology or ontologies are best for each network, which may be a hard and time‑consuming task. The NCBO Recommender is a service developed by Stanford University that has been designed to recommend the most appropriate ontologies for a corpus or a list of keywords. Integrating the NCBO Recommender with Cytoscape will make it possible to automatically obtain the best ontological representation for a set of network nodes. The NCBO is developing a new version of the Recommender, which will be available in a few months. The proposed Cytoscape app will be one of the first tools using the new version of the Recommender.

Goal: To integrate the NCBO Recommender service into a new Cytoscape app. After selecting a set of network nodes, the Recommender will be invoked to obtain the best ontologies to represent the node labels, and the network will be presented using the recommended ontologies. The integration will consist on an easy to use graphical interface for any level of ontology experience.

Description: You will use Java to write an app for Cytoscape 3 that will access the NCBO Recommender REST services. Your tool will make it possible to:

Get the keywords from the selected node labels, send them to NCBO Recommender REST services, and obtain the recommended ontologies (the selection could be a subnetwork or the entire network).
Visualize the network using the recommended ontologies.
Save the recommendations as CSV or TAB-separated files.
Save the network with the new added ontology information for each node.

This work involves programming a new app that will interact with Cytoscape in the node selection, network plotting, etc. The code will be available as an open source GitHub project.

Technology and Skills: Cytoscape, Java, GitHub

Potential Mentors: Cristian R Munteanu, Marcos Martinez, Janna Hastings and Alejandro Pazos

Contact: Cristian R Munteanu

Idea 34: Advanced search on WikiPathways

Background: WikiPathways is a collaborative pathway database that is build on Mediawiki. Currently our pathways can be annotated with ontology tags from the pathway ontology, cell type ontology and disease ontology.

Goal: The advanced search should provide an interface to search for pathways containing one or more pathway elements (genes, proteins, metabolites), for pathways that are annotated with specific ontology tags or pathways that contain a specific interactions. The advanced search allows users to filter pathways based on species, curation tags, authors, creation date, number of edits, etc.

Technology and Skills: WikiPathways, PHP, Mediawiki

Potential Mentors: Martina Summer-Kutmon, Alex Pico

Contact: wikipathways-devel@googlegroups.com

Idea 35: SIR model for CentiScaPe

Background: CentiScaPe is a Cytoscape app that allows to compute network centrality parameters. It is the most complete Cytoscape app for network centralities calculating Degree, Average Shortest Path, Eccentricity, Closeness, Betweenness, Centroid, Stress, Radiality, Eigenvector, Edge Betweenness, Bridging Centrality, for directed and undirected networks. CentiScaPe allows identifying network nodes that are relevant from both experimental and topological viewpoints. CentiScaPe also provides a Boolean logic- based tool that allows easy characterization of nodes whose topological relevance depends on more than one centrality. Finally, different graphic outputs and the included description of biological significance for each computed centrality facilitate the analysis by the end users not expert in graph theory, thus allowing easy node categorization and experimental prioritization.

Goal: CentiScaPe can provide only static analysis of a network. We want to develop a tool that can compute semi-dynamic simulation, particularly the SIR model. The SIR model is a Compartmental model used in epidemiology to simulate the establishment and spread of infectious diseases. It serves as a base mathematical framework for understanding the complex dynamics of such system. Population is divided in three states: susceptible to the infection of the pathogen (often denoted by S), infected by the pathogen (given the symbol I) and recovered/removed/immune (denoted R). The way that these compartments interact is often based on a network model, hence the idea to include the SIR model in the CentiScaPe tool for network analysis

Technology and Skills: Cytoscape, CentiScaPe, Java

Potential Mentors: Giovanni Scardoni (CBMC, University of Verona)

Contact: cytoscape-discuss@googlegroups.com

Idea 36: Random networks for CentiScaPe

Background: CentiScaPe is a Cytoscape app that allows, by using a very plain interface, to compute different network centrality parameters in single and in multiple networks at the same time. It allows to plot the results by using real and normalized values in different plots. Several centralities are available: Degree, Average Shortest Path, Eccentricity, Closeness, Betweenness, Centroid, Stress, Radiality, Eigenvector, Edge Betweenness, Bridging Centrality. Each one could be computed for directed and undirected networks; CentiScaPe gives the possibility to assign a weight to the edges and, currently, we added a new feature that permits to assign a weight to each node by using an attribute.

Goal: The new version permits an in-depth analysis of biological networks, from the topological point of view. The main issue with this methodology is that it requires a lot of literature mining in order to validate and verify the results that, in general, remain speculative. In order to overcome this limitation we decided to add a feature that allows to validate the results by using random networks, i.e. by adding a sort of statistical evaluation. By randomizing a network we expect to highlight all the differences between the real and the random process[1]: this step gives strength to all the findings that come from a topological analysis and adds a statistical background that could integrate the biological information from experimental data or text mining.

We will integrate CentiScaPe with some algorithms that were already developed in an existing, but no more supported, app for the Cytoscape platform, version 2.6, that is RandomNetworks [2][3]. We will develop some algorithms that are able to generate networks by following a specific model: Watt-Strogatz, Erdos-Renyi, Barabasi-Albert and some others. Another interesting feature will be the degree-preserving algorithm that is able to randomize the edges by keeping the degree of each node fixed. Finally the comparison between the centralities before and after the randomization could be analyzed by using the topological measures of CentiScaPe.

[1] Network Motifs: Simple Building Blocks of Complex Networks. Milo et al, Science 25 October 2002.

[2] RandomNetworks App

[3] RandomNetworks Project Website

Technology and Skills: Cytoscape, CentiScaPe, Java

Potential Mentors: Giovanni Scardoni (CBMC, University of Verona)

Contact: cytoscape-discuss@googlegroups.com

Idea 37: Implement support for the SBML Multistate/Multicomponent Species package

Description: One of the many packages for SBML Level 3 is Multistate and multicomponent species. This package defines constructs for models and modelers to represent biochemical species that have internal structure or state properties. These may involve molecules that have multiple potential states, such as a protein that may be covalently modified, and molecules that combine to form heterogeneous complexes located among multiple compartments. The JSBML team has already initiated an implementation of the multi package, which will be a useful starting point for this project.

More information about the SBML and JSBML projects at SBML.org.

Technology and Skills: JSBML, SBML, Java, some exposure to biochemistry.

Potential Mentors: Nicolas Rodriguez, Nicolas Le Novère

Contact: Nicolas Rodriguez, Nicolas Le Novère

Idea 38: Add support for Schema-based validation of SBML

Description: SBML files need to be validated carefully to ensure that they conform to the specification. Currently, the most complete implementation of SBML validation is embodied in libSBML, although the rules of SBML validity are defined in the SBML specification documents. It is possible to validate SBML from JSBML using either the Online SBML Validator or a Java package we provide for calling libSBML locally (i.e., without a network connection) but we want to move toward capturing all of the SBML's validity rules in schema languages such as RELAX NG and Schematron, then having both libSBML and JSBML (and any other SBML-using system) use schema validation engines instead of hardcoded validation. This will be especially important as more SBML Level 3 packages become implemented. We have already started to define the RELAX NG schemas for SBML Level 3, but we need to implement all of the validations rules for SBML core and the different packages and provide the hooks in JSBML to using those schemas for validating SBML files.

More information about the SBML and JSBML projects at SBML.org.

Technology and Skills: JSBML, SBML, Java, XML, RELAX NG, Schematron, SBML

Potential Mentors: Sarah Keating, Andreas Dräger

Contact: Sarah Keating, Andreas Dräger

Idea 39: Representation and evaluation of mathematical formulas

Description: JSBML uses the concept of abstract syntax trees to work with mathematical expressions. In the Google Summer of Code 2014, a new math package was implemented that captures all different kinds of tree nodes that can occur in formulas (e.g., real numbers or algebraic symbols such as 'plus' or 'minus') an own, specialized class, i.e., a specific abstract syntax tree node. In this way, the handling of formulas has become much more straightforward and even more efficient. However, this new package requires some adaption of dependent parts of JSBML. This project focuses on the optimization of the performance of mathematical operations when used in large-scale simulations, as this is, for instance, done by the Systems Biology Simulation Core Library. When parsing infix formulas JSBML relies on the Java Compiler Compiler (JavaCC) project to create abstract syntax trees. With the availability of the new math package, some adaptation of existing grammar files for the JavaCC parser generator will also be required and will be part of this project.

More information about the SBML and JSBML projects at SBML.org.

Technology and Skills: JSBML, SBML, Java, JavaCC

Potential Mentors: Victor Kofia, Andreas Dräger, Alex Thomas, Sarah Keating

Contact: Victor Kofia, Andreas Dräger, Alex Thomas, Sarah Keating

Idea 40: Improving the plugin interface for CellDesigner

Description: One of the most frequently used programs in computational systems biology is CellDesigner. JSBML provides an interface that facilitates the development of plugins for this program. This interface has recently been revised and improved. Now, test cases and plugins for CellDesigner are to be implemented in order to make use of it and ensure its correct behavior. It is, for instance, possible to use CellDesigner's complex canvas user interface to create or manipulate biochemical networks and to conduct numerical computation. Complex drawing styles for biological networks can be encoded in SBML through its rendering extension. JSBML's current plugin interface to CellDesigner does, however, not support the rendering extension. In this project, important rendering styles that are used in CellDesigner should be implemented in the plugin interface.

More information about the SBML and JSBML projects at SBML.org.

Technology and Skills: JSBML, SBML, Java, some basic understanding of visualization algorithms

Potential Mentors: Akira Funahashi, Ibrahim Y. Vazirabad, Andreas Dräger

Contact: Akira Funahashi, Ibrahim Y. Vazirabad, Andreas Dräger

Idea 41: Tests for JSBML

Description: JSBML class and method structure largely corresponds with libSBML, our C++ language counterpart. One area that will need to be standardized between the two parsing libraries is the test cases for proper SBML syntax. LibSBML has a set of files (and corresponding tests) which test a large set of these SBML syntax rules. This project will require the programmer to create important tests for core and extension libraries for JSBML which correspond to the libSBML rules. The programmer could also suggest further tests, and he/she would be essential in building the code base that enacts quality control for JSBML.

More information about the SBML and JSBML projects at SBML.org.

Technology and Skills: JSBML, SBML, Java

Potential Mentors: Andreas Dräger, Alex Thomas

Contact: Andreas Dräger, Alex Thomas

Idea 42: Flattening of hierarchically structured models

Description: The hierarchical modeling package enables SBML files to be composed of multiple individual submodels. These submodels can be stored in the same file, or be distributed in a file system, even be linked through resource identifiers. Hence, comp allows models in SBML to be similarly composed as HTML documents. While this format adds a large number of benefits and freedom to the model creator, it also significantly increases the complexity of the SBML documents. In order to run regular analysis methods on those composite models, a flattening procedure, i.e., a conversion into one large single model, is prerequisite. The specification of the comp package describes how such a flattening algorithm can work. The aim of this project is now to implement such a flattening routine in JSBML in order to create an SBML file without comp. An additional goal of this project is the implementation of basic validation rules for the comp package.

More information about the SBML and JSBML projects at SBML.org.

Technology and Skills: JSBML, SBML, Java

Potential Mentors: Lucian Smith, Chris Myers, Leandro Watanabe

Contact: Lucian Smith, Chris Myers, Leandro Watanabe

Idea 43: Better support for constraint-based modeling

Description: One popular modeling technique in systems biology is Constraint-Based Reconstruction and Analysis (COBRA). The Flux Balance Constraints (fbc) package has been defined in order to represent the specific requirements of this modeling technique in SBML. The COBRA toolbox for MATLAB and its Python counterpart COBRApy belong to the most widely used programs for this modeling approach. Since these programs preceded the development of fbc, an SBML dialect has evolved. The aim of this project is to better support constraint-based modeling in JSBML. To this end, a converter from COBRA's SBML dialect to fbc needs to be implemented. This also includes an update of the fbc package implementation in JSBML to the next version.

More information about the SBML and JSBML projects at SBML.org.

Technology and Skills: JSBML, SBML, Java

Potential Mentors: Alex Thomas, Andreas Dräger

Contact: Alex Thomas, Andreas Dräger

Idea 44: Converter of CellDesigner's annotations to pure SBML

Description: One very popular software in systems biology is CellDesigner. Long before the SBML extensions for Layout and Render have been defined, this program already provided mechanisms to display network diagrams. To this end, the developers of CellDesigner have created their specific SBML dialect. In this project, a parser should be created that understands CellDesigner's SBML dialect and converts it to SBML with layout and render extension without the need to use a CellDesigner plug-in. If time permits, it would also be nice to have as well a converter from SBML with layout and render to SBGN-ML and vice-versa. Those could be integrated into the Systems Biology Format Converter (SBFC)

More information about the SBML and JSBML projects at SBML.org.

Technology and Skills: JSBML, SBML, Java

Potential Mentors: Akira Funahashi, Andreas Dräger

Contact: Akira Funahashi, Andreas Dräger

Idea 45: Spatial Analysis of Functional Enrichment in Biological Networks

Background: Biological networks connect genes, proteins and other biological entities that interact with one other to perform their function. These networks tend to have a non-random structure and often form densely connected structures (clusters) that involve closely related elements. Depending on the nature of the network, these clusters may correspond to, for example, genes acting in the same biological pathway or proteins sharing the same evolutionary ancestor. Identifying clusters and uncovering their biological role is an important and challenging aspect of understanding biological networks and making use of the information they contain.

To address this challenge directly, we have developed an algorithm to test for local functional enrichment in biological networks. Here, functional enrichment indicates statistical overrepresentation of members of a known functional group within a particular area of the network. Using this method we are able to quickly and automatically annotate a biological network without defining cluster boundaries. This method, referred to as Spatial Analysis of Functional Enrichment (SAFE), is currently being used to understand the structure of genetic interaction networks in yeast Saccharomyces cerevisiae (figure attached).

Goal: To make a SAFE app for Cytoscape that would allow researchers to run the SAFE method on any network of their interest and/or with any set of functional annotations. Specifically, the app should:

Use the pre-loaded network
Load Gene Ontology or a custom set of functional annotations (e.g., a list of node labels with categorical assignment to any number of groups)
Perform functional enrichment on the network using the annotations provided (the algorithm will be available in Matlab and/or in pseudo-code)
Visually represent the enrichment results on the network (the algorithm will be available in Matlab and/or in pseudo-code)
Provide the enrichment results in a list form for browsing and/or export as text file

Links: Baryshnikova lab

Technology and Skills: Cytoscape, Java, basic understanding of biological networks

Potential Mentors: Anastasia Baryshnikova (Princeton), Dmitriy Gorenshteyn (Princeton)

Contact: Anastasia Baryshnikova

Idea 46: Cytoscape.js quadtree support

Background: Cytoscape.js is a JS library that allows for the analysis and visualization of graphs (a.k.a. networks). This project would give the opportunity for the student to work on and with Cytoscape.js, as follows:

Goal: A quadtree is a data structure that allows for efficient lookup of sets of spatial co-ordinates. For Cytoscape.js, this could be applied to element (node and edge) hit tests: The quadtree could be queried to determine the closest element to the cursor/mouse or finger in an efficient way. It might not be warranted for many use cases because of the additional cost of maintaining the tree, but it could be useful in other cases. This project need not be constrained to quadtrees specifically, as a different particular tree structure may be more appropriate.

Technology and Skills: Cytoscape.js, JavaScript

Potential Mentors: Max Franz, Christian Lopes

Contact: cytoscape-discuss@googlegroups.com

Idea 47: Scrollbar in Cytoscape.js

Goal: This project would entail the creation of a Cytoscape.js extension that would show scrollbars in a manner similar to that of touch devices; i.e. the scrollbar would show momentarily on scroll movement and fadeout thereafter. This extension would be useful for showing the relative panning position of the graph in an economical way with regards to computation expense and screen space. You would have to create new scrollbar logic and possibly also packaging code as a reusable extension.

Links:

Technology and Skills: Cytoscape.js, JavaScript

Potential Mentors: Max Franz, Christian Lopes

Contact: cytoscape-discuss@googlegroups.com

Idea 48: PubMANIA: Advanced literature search using network inference

Background: Finding all the relevant literature when researching a topic is a common problem. There are many potentially useful methods for finding relevant works, including: using search terms, looking through papers that cite a publication of particular interest, and looking at papers written by a specific researcher. However, all of these approaches can be extremely slow, and will often miss many relevant works. A similar problem exists in genetics; often, you have a set of related genes and want to find all the genes that are related to these genes in the same way. GeneMANIA solves this problem by integrating different types of gene-gene interaction networks, learning what networks are most important in connecting your set of genes, and suggesting new genes that appear to be related based on the network structure.

Goal: We would like to apply the GeneMANIA approach to publications, creating a service that can recommend publications to the user based on a set of provided publications. This requires that we extend the GeneMANIA codebase to handle extremely large, sparsely-connected networks, and apply this to our collection of publication-interaction networks (based on PubMed). By integrating networks of citations, journals, authors, and common terms, we hope to assist researchers in quickly and easily finding all of the publications relevant to their work.

Technology and Skills: Java, SQL, Javascript, Data Structures and Algorithms

Potential Mentors: Carl de Boer (the Broad Institute)

Contact: Carl de Boer

Idea 49: Analyzing bio-network dynamics on the web

Background: ANIMO is a Cytoscape App to analyse the dynamic behavior of biological networks. As an example of its usefulness, consider a biologist studying the signaling events of a cell. ANIMO is built to help answer questions like "If I now give this stimulus to the cell, what happens? Which pathways get activated?". ANIMO can show how signals travel along the pathways represented by Cytoscape networks: nodes are colored to reflect the activity changes during the course of a simulated experiment. Graphs and answers to non-trivial questions can also be obtained, and all of this is done without the need to understand the complexities of mathematical modeling. However, the power of ANIMO comes at a cost: an additional tool needs to be installed in order to automatically analyse the network models. Some biologists are not at home with software requirements, so it would be much easier if the biologist could access their tools directly from a web page, without the need to install anything. Cytoscape provides a way to export networks as (static) web pages, thanks to the features of Cytoscape.js. It would be great if these web pages could be extended with the dynamic information generated by ANIMO. Things would get even more interesting if the biologist could analyse their network models through the same web interface.

Goal: Convert ANIMO into a web-app based on Cytoscape.js, making the most useful features of ANIMO easily accessible from any web-enabled device without requiring the installation of additional software.

Technology and Skills: Cytoscape.js, JavaScript, Cytoscape, Java, basic client-server interactions

Potential Mentors: Stefano Schivo, Rom Langerak, Max Franz

Contact: Stefano Schivo

Idea 50: New Layout for Cytoscape.js and esyN Optimised for Large Networks

Background: esyN is a web tool to facilitate the exchange of biological network models between researchers. esyN acts as a searchable database of user-created networks from any field. esyN aims at making it very easy for everybody to build biological networks. Users can upload a list of gene and retrieve interactions from Biogrid to populate the network. One major limitation at the moment is represented by the ability of the current layouts to display medium and large networks.

Goal: Implement a new cityscape.js layout optimised for large networks but yet is fast enough to run on the user’s browser.

Technology and Skills: Cytoscape.js, JavaScript

Potential Mentors: Dan Bean, Giorgio Favrin

Contact: Giorgio Favrin

Idea 51: Cytoscape App for Downloading and Uploading esyN Networks

Background: esyN is a web tool to facilitate the exchange of biological network models between researchers. esyN acts as a searchable database of user-created networks from any field. esyN allows the user to easily construct networks by uploading list of gene and retrieving their interactions from BioGRID. In addition to its basic tools, esyN contains a number of logical templates that can be used to create models more easily. The ability to use previously published models as building blocks makes esyN a powerful tool for the construction of models and network graphs. Users are able to save their own projects online and share them either publicly or with a list of collaborators.

Goal: Implement a Cytoscape app to be able to download esyN networks (both published and private ones) into cytoscape and vice versa upload cytoscape networks in esyN.

Technology and Skills: Cytoscape, Java

Potential Mentors: Dan Bean, Giorgio Favrin, Giovanni Scardoni

Contact: Giorgio Favrin

Idea 52: Enhancing Cytoscape Network Data Import

Background: Cytoscape is a network analysis and visualization tool. Users commonly want to import data files that contain both network interactions and node annotations. Currently, you have to invoke two different import dialogs multiple times to accomplish this task.

Goal: The network import dialog and file processing should be overhauled and enhanced to provide a single interface to perform network and node data import. After choosing a source file, the UI would present an interactive preview (like it does currently) with additional features like specifying columns that are associated with either source or target nodes, in addition to edge (interaction) annotation columns. The execution of the dialog would then trigger network and data table import tasks. There are many subtle aspects to this project:

Designing UI and UX
Using Cytoscape Tasks to support command line control
Improving logic behind data type and list handling

And there are many extensions to this project that might be proposed:

Improvements to the data table importer
Expanding supported file formats
Batch import of multiple files sharing a common column structure
Persistent UI settings within a session

Technology and Skills: Cytoscape, Java, Swing

Potential Mentors: Scooter Morris, Alex Pico, Christian Lopes

Contact: cytoscape-discuss@googlegroups.com

Idea 53: Implement a PathLinker Annotation Plugin for PathVisio

Background: Cells respond to their environment through a series of molecular interactions that form biological pathways; understanding these pathways is a major area of research in Systems Biology. PathLinker is an algorithm that reconstructs biological pathways from experimental data, and it has been very useful in annotating existing biological pathways as well as discovering new pathway additions. PathVisio is free open-source software to build, visualize, annotate, and analyze pathways. It integrates tightly with WikiPathways, a "wikipedia'' for biological pathways.

Goal: To implement PathLinker as an annotation plugin for PathVisio.

This plugin will assist a pathway author using who is using PathVisio to create or expand a pathway. When the author selects a protein, the plugin will suggest possible physical, regulatory, and signaling interactions involving that protein and the literature supporting each interaction. The author will then click or drag the suggestion over into the pathway to confirm its inclusion. The plugin will collect statistics on the number of suggestions accepted, ignored, or rejected by authors.

The PathLinker graph algorithm is be the technology underlying this plugin. PathLinker takes as input a comprehensive network of physical, regulatory, and signaling interactions among proteins. PathLinker also requires a set of "sources" (e.g., receptors) and a set of "targets" (e.g., transcription factors) that the pathway author will provide. PathLinker computes multiple shortest paths from any source to any target and ranks these paths in increasing order of length. A user-defined parameter controls the number of paths computed. If the pathway author asks for (say) 50 paths, then for each protein, the plugin will store which interactions in these 50 paths connect with the protein, the ranks of these paths, and the literature support for each interaction.

Links

PathLinker implementation in Python
PathLinker implementation in Java (release planned by the end of May 2015)

Technology and Skills: graph algorithms, Java, PathVisio, WikiPathways

Potential Mentors: T. M. Murali and Anna Ritz

Contact: T.M. Murali, Anna Ritz