What can we do for you?
Increase the velocity of your researchers through the use of modern tools in the cloud
More Case Studies
The customerSolynta is a potato breeding company that has developed a unique hybrid breeding technique. It enables Solynta to offer potato seeds instead of potato seed tubers to start a cultivation. In this manner you need 25 grams of seeds for one hectare of land instead of 2500 kilos of potato tubers.
Solynta’s innovations rest on the work of bioinformaticians who analyse the DNA and genes of potato plants. The bioinformaticians process very large amounts of data and face two challenges constantly. First, the data volumes are such that data cannot reside on their workstation or on traditional servers. This creates the need for storage infrastructure that needs to be built and maintained while at the same time having to scale to ever larger sizes. The second challenge is that processing of this data comes in bursts. When processing needs to be done, you want to minimise processing times to reduce any idle time of the bioinformaticians. Ideally you would like to have the right sized machine for each calculation, but the right size varies per analysis. However, even if the size requirements were uniform buying the biggest possible machine makes no economic sense given that you do not utilise this capacity constantly.
Another type of challenge that has emerged as Solynta grows is that other Solynta users wish to consume data generated by the bioinformaticians. For this purpose, the bioinformaticians have built applications for internal consumption within Solynta. Building, deploying and maintaining software introduces a new type of complexity for bioinformatics researchers. There is a lot of heavy lifting involved in the building and maintenance of a software repository, the compilation of code, management of software dependencies, arrangement of infrastructure, deployment of software etc. This is not the key competency of bioinformatics scientists and all this complexity starts claiming considerable amounts of time from the researchers. In addition, it is also a cause of constant interruptions and continuous context switching.
Genomics data was moved to a data lake in AWS using Amazon S3 for data storage. The data lake offers an infinitely scalable storage with rich functionality enabling intelligent tiering and back ups. The processing of data was moved into containers containing all the tooling used by the bioinformaticians. The creation of a container in AWS was scripted so that a researcher can select the size of a container and have the container be built automatically within minutes. Once a calculation is completed the container is destroyed.
The applications built by bioinformaticians for broader use within Solynta were moved into a CICD pipeline. Once a new version of the software was committed in the software repository a new version of the application was built in a container including all required software dependencies and access to S3 or databases such as mySQL running on the AWS RDS service. In addition, SAML2 integration with OKTA was included to ensure secure access to the applications. Finally, the application containers were managed by the Fargate container management service at AWS. With Fargate, blue green deployment is available out of the box and the health of the containers is monitored continuously.
Having genomics data in a platform with unlimited capacity where you only pay for what you use saves time and provides security and peace of mind. Being able to process data with the required computing capacity, which depends on the analysis and data set, and at the same time pay only for what you use, compresses waiting times and accelerates innovation. Furthermore, scripting and managing infrastructure through code gives freedom of action to the bioinformaticians and allows them to operate independently from the Cambrian cloud management engineers.
In the area of software development all heavy lifting has been automated. Each application runs in isolated containers eliminating any dependency conflicts. The building and deployment of new versions is fully automated and even the risk of a non functioning version being brought online is mitigated through the use of blue green deployments. These improvements save considerable amounts of time from the research scientists while ensuring security and high availability of the applications for the users.