Artificial Intelligence Masters Projects

By taking on a Rosalind Franklin Institute challenge, the students will be helping solve a complex, real-world problem faced by scientists at the Franklin. The purpose of these projects are to engage students, who are looking to embark on research careers, with the Franklin science and to allow them to build links and networks outside of their host institutions. In addition, the projects aim to strengthen the links between researchers at member universities and the Franklin.

Students will complete their projects at their host university and will be primarily supervised by an academic member of staff at their university. However, in addition to the support the students would receive from the host institution, they would also receive support from the project team at the Franklin. They would be in e-mail contact with project organisers, who would provide further information on any problems. In addition, a virtual group meeting will kick-off the start of the projects and the students will also receive online training organised by the Franklin, on topics such as research software engineering, poster and oral presentation training. Students will also get the opportunity to visit the Franklin at the end of the projects for a celebration event, which will allow them to present their work to Franklin staff and to network with academics from other member universities. The Franklin would cover the costs of students attending this event.

These projects are designed to last for three or six months to fit in with the length of most masters’ course projects. Any masters students studying a relevant degree are eligible to apply for the projects.

What we are looking for in each project:

1.       Highest standard of Code quality and Software engineering practices (Utility)

2.       Most Innovative choice of solution (Novelty)

3.       Best engagement with the rest of the community and project partners (Engagement)

4.       Best use of controlled risk applied to a project (Adventure)

 

Projects for 2020/ 2021

For the coming academic year, we are pleased to announce five masters projects. We are looking for multiple students to complete each project, after the projects have been completed we will award prizes to students based on the above criteria.

Correlated Imaging – Multem optimisation project

Run by James Parkhurst

Simulations of electron micrographs of frozen hydrated biological samples can be used to determine optimal data collection parameters and provide ideal test data to aid in the development of data processing programs for CryoEM. The electron-matter interaction is simulated by using the multislice method where an electron wave is transmitted through thin slices of the sample and then propagated by Fresnel diffraction. These calculations are performed on a fine 2D grid which gives the resolution of the output simulated electron micrograph. The software being used in this project makes extensive use of GPU acceleration in order to speed up calculations; however, in order to simulate thicker, more complex samples, higher performance is necessary.

The aim of this project is to optimise the simulation algorithms, to get improved performance and to parallelise those parts of the code that are still run in serial on the CPU.

Artificial Intelligence and Informatics – Image compression

Run by Mark Basham

In the world of scientific imaging, data is often very different from the natural images that we are used to, and indeed the images that many software algorithms are designed to work best with.  In bio-imaging of cells for example, data is stored in a high bit depth, grayscale form, and the data is often 3D and not 2D in nature.  In addition to this the images are often noisy, as too much dose from the imaging techniques can cause unwanted damage to the sample.  These differences mean that traditional image compression algorithms don’t work very well for this data, either introducing unwanted and significant artefacts, or not making best use of the 3D nature of the data.

This project is all about writing a python software library, which can compress and un-compress these 3D bio-images better than the existing standard methods by taking the special nature of these images into consideration.  You will start with a GitHub repository, and an existing testing framework in which to build your project.  This will provide you with some example data, and the metrics with which this challenge will be tested on.

Next Generation Chemistry for Medicine – Separating signals and DOSY NMR

Run by Mark Basham and Ben Gaunt

Nuclear magnetic resonance, or NMR spectroscopy can be an extremely powerful tool for probing chemical structure and behaviour, particularly in small molecules. It can produce such a wealth of information that it can be problematic; a lot of effort has gone into trying to separate out the information in NMR spectra as the overlapping peaks can make the spectra too cluttered to interpret. The problem is only compounded when using NMR to analyse mixtures, particularly those of similar compounds, where nuclei in similar chemical environments in different species will often overlap. It would however be useful to be able to analyse such systems clearly, particularly in fields such as high throughput fragment screening, where a large number of similar compounds need to be analysed and combining them into mixtures cuts down on the experimental time substantially.

One technique which would benefit particularly from this is Diffusion ordered spectroscopy (DOSY). This 2D NMR technique uses pulsed field gradients to extract information about the diffusion of compounds within solution. By varying the length the field gradient is applied for, information about the diffusion constant of the molecules can be determined; the longer the field gradient is applied, the more the peaks in the spectrum are attenuated, with peaks corresponding to molecules with higher diffusion constant attenuating more rapidly. Fourier transformation then separates the data into a 2D spectrum, with 1D spectra for each compound type separated depending on its diffusion constant. However, if peaks are overlapped then the intensity may in fact be the sum of several signals, each attenuating at different rates, making such analysis far less accurate. It would thus be very useful if signals could be easily separated.

This project is all about writing a python software library, which can identify the diffusion coefficients for the various compounds in the sample.  You will start with a GitHub repository, and an existing testing framework in which to build your project.  This will provide you with some example data, and the metrics with which this challenge will be tested on.

Biological Mass Spectrometry – Sample stage

Run by Mark Basham

As scientific detectors become faster and faster, often the challenge to experimental setups is making sure there is something new to look at quickly enough.  For many techniques, this involves moving a sample around and looking at it, just like you would move a glass slide under a microscope to see the different parts.  To do this automatically requires a high precision motorized stage, and associated control software.  The control software is the critical part here, as driving stages quickly and precisely can have significant complexities associated with it, especially if you want to move in unconventional ways.

This project will focus on a simple test system consisting of an Arduino motor controller and some 3D printable or laser cut components, which will make a x-y scanning stage which can be placed under a microscope if wanted.  The aim of the project is to make this system compatible with a standard scanning system written in python, and to make the system go as fast and as accurately as possible.

Structural Biology – Sequence matching

Run by Laura Shemilt

In biology sequence alignment is the process that looks for mismatches in DNA sequences that that are produced from the same source. These mismatches can be interpreted as point mutations or insertion/deletion mutations. Sequence alignment is performed in a whole host of biological experiments and applications however there is often a manual part to this process. Automatic multiple sequence alignment is one of the oldest problems in computational biology[i] and there are many software packages available that attempt to solve this problem[ii].

There are many modern machine learning algorithms that are used to try to solve the problem of multiple sequence alignment , one popular example is Hidden Markov models[iii]. Deep learning methods are starting to be used to look into these problems[iv]

The challenge is to apply a deep learning algorithm to perform multiple sequence alignment on a given data set. Your algorithm should be able to detect mutations from a template sequence. The accuracy of the output will be benchmarked against FASTA[v] a common method of sequence alignment. We would like to investigate the efficacy of modern deep and machine learning methods applied to this problem.

[i] https://arxiv.org/ftp/arxiv/papers/1808/1808.07717.pdf

[ii] Michael Nute, Ehsan Saleh, Tandy Warnow, Evaluating Statistical Multiple Sequence Alignment in Comparison to Other Alignment Methods on Protein Data Sets, Systematic Biology, Volume 68, Issue 3, May 2019, Pages 396–4

[iii] Eddy, Sean R. “Multiple alignment using hidden Markov models.” Ismb. Vol. 3. 1995.

[iv] Jafari, R., Javidi, M. & Kuchaki Rafsanjani, M. Using deep reinforcement learning approach for solving the multiple sequence alignment problem. SN Appl. Sci. 1, 592 (2019). https://doi.org/10.1007/s42452-019-0611-4

[v] Lipman, DJ; Pearson, WR (1985). “Rapid and sensitive protein similarity searches”. Science. 227 (4693): 1435–41