From cash registers in Istanbul to RNA therapeutics in Cambridge.
Before I knew what bioinformatics was, I was 22 in Istanbul, writing C for embedded Linux devices that talked to electronic cash registers and printed personalized coupons at the point of sale. The architecture I helped ship at KoçSistem in 2002 became the foundation for retail loyalty programs that millions of people in Turkey use today. I didn't know it then, but "predict what a customer wants from their last basket" and "predict what a cell will do from its expression profile" turned out to be variants of the same problem.
I caught the bioinformatics bug at Sabancı University, where my PhD was on graph-theoretic discrimination of native protein folds. The math was elegant. The biology was messy. The gap between a model that's elegant and an answer that's correct is the gap I have been working in ever since.
The first real shock came as a postdoc in Yang Zhang's lab at the University of Kansas. We were building I-TASSER, a unified platform for automated protein structure and function prediction that combined template-based modeling with ab initio refinement. The Nature Protocols paper we put out in 2010 has since been cited more than 7,800 times. That taught me what it actually takes for a good method to become a standard: a scientist has to be able to run it on their own data, on their own infrastructure, without a computational biologist next to them.
I came to UMass Chan Medical School in 2009 to work in the labs of Melissa J. Moore and Phillip Zamore, two of the most respected RNA biologists working today. Neither of them needed me to reinvent their field. They needed someone who could turn SOLEXA reads into answers a biologist could trust at the bench. That is where the rest of my career started.
The next decade was tools and pipelines. ASPeak for RIP-seq. GUIDEseq for CRISPR off-target analysis. DEBrowser for differential expression. DolphinNext for distributed pipeline execution. An institutional NGS stack supporting 80+ labs and 240+ scientists at UMass Chan. Most of it open source. All of it built because someone needed it and there was no good alternative within reach.
In 2023 I co-founded Via Scientific to take that stack out of the academic core and into pharma. We licensed Via Foundry from UMass Chan, raised a $5M seed in January 2024 from G20 Ventures and Innospark Ventures, and brought on advisors I had been reading and quoting for years: Melissa J. Moore (former CSO of Moderna), Rob Hickey (former EVP Engineering at DataRobot), Shah Nawaz (former VP Digital Transformation at Regeneron). Via Foundry is the multi-omics and AI analytics core: drag-and-drop pipeline authoring on top of Nextflow, Kubernetes and ShinyProxy orchestrating the interactive apps (RStudio, JupyterLab, CellxGene, IGV, Shiny), MySQL plus MongoDB underneath, React and TypeScript on the front, Node and Express on the back.
Inside the platform, AI Insights is a multi-provider assistant layer (OpenAI, Anthropic Claude, Google Gemini) with page-aware chat, run-report summarization, and log analysis. It is backed by a custom RAG system with a user-facing Knowledge Builder, token-aware chunking, embeddings stored in MySQL, two-phase semantic search for low-latency retrieval, and a multimodal pipeline that extracts and embeds text, images, SVGs, and chart screenshots described by vision models. Via Foundry both exposes and consumes the Model Context Protocol: a Python FastMCP server with around 41 tools lets external clients like Claude Desktop and Cursor drive the platform, while a backend MCP Client Manager and Tool Orchestrator let in-product AI chats discover and call external MCP tools with user confirmation. A "Model Smith" registry brings domain-specific bio-AI models (AbGPT, AbLang2, hosted on Tamarind, Vertex AI, and SageMaker) directly into bioinformatics workflows.
Once Via Foundry was shipping, the next layer became obvious. ArfAI is the conversational interface on top of the same scientific stack. A multi-agent system (planner, executor, analyst, reporter) takes a question in plain English, generates the Python or R, runs it inside sandboxed Docker, and streams plots and intermediate results back over WebSocket in real time. An embedded MLflow stack tracks every experiment, parameter sweep, dataset-drift signal, and prediction log, and the platform ships full reproduction bundles, plan-review gates, and classified failure recovery with auto-fix loops. The point is to make rigorous, reproducible scientific computing as easy as writing a Slack message, without giving up the auditability scientists actually need.
AiDrift is the safety rail underneath all of that. More biotech code over the next five years is going to be written by LLM-powered assistants than was written by humans in the last twenty, and the rails for that have not been built yet. AiDrift watches AI coding sessions in real time, scores each turn against drift heuristics (scope creep, contradiction, churn, sub-agent overlap, build and test outcomes), refuses commits and pushes when the session goes red, blocks writes outside declared paths, and gives you a one-click rollback to a stable checkpoint when the assistant goes off the rails. It ties every git commit back to the session and turn that produced it, with a revert-graph DAG and secret scanning. It ships as a CLI, a Claude Code plugin, a VSCode extension, and a React dashboard. Vibe-coding needs a seatbelt. AiDrift is the one I built.
I am still on the faculty at UMass Chan. I still co-direct the Bioinformatics Core. I still get late-night Slack messages from PhD students. I think you build better engineering when the people who actually use the software can also yell at you in person.
What I am trying to do now is simple to say and hard to do. Shorten the distance between a dataset and a medicine. That is the whole point. Everything else is detail.