Learn about our journey and work.

Challenges and opportunities for a university spin-off

by Martin Rosvall

Turning a novel research idea into a university spinoff is a long and challenging endeavor, but fortunately, many people want to see you succeed. For us, the journey started almost ten years ago.

The ability to simplify and highlight structures in large networks had proven useful across several scientific disciplines. Motivated by feedback and thank-you messages from researchers around the world, we worked hard to improve our algorithms, develop new features, and simplify the interface. We knew that turning an overload of relational data into insightful maps that can reveal stories in the data is a universal challenge, so we couldn't resist asking: How much value can we generate to users outside academia? We had to find out.

Luckily, the business developers at Umeå University's tech transfer, Uminova Innovation, saw the potential and shared our curiosity. By discussing business ideas and providing resources to test and refine them, they kick-started our journey into the unknown business world.

However, business developers' support and backslapping can only take you so far. Reality is a different game. When we had talked the talk and excited someone who knew someone else whose problem we might be able to solve, it was time to walk the walk. Outcome: a research paper but no customer.

Indeed, the real challenge is to champion both research and entrepreneurship at the same time. Science requires full commitment. The creative endeavor of turning novel research ideas into published papers that get cited is not a part-time job. Simultaneously pushing the research frontier and exploring business opportunities requires a business companion.

After a fortunate turn of events, an experienced business captain came on board. He could see things from a different perspective and help us to navigate across the gap between academia and business. Still, distilling years of research and technical expertise into a solution that solves customers' problems and then successfully communicating that solution via short windows of attention is a venture. Trying new directions and experiencing setbacks hurts, but there is no other way forward.

Moreover, the process of tweaking a business idea through several pilot projects and iteratively developing a solution also requires a no-bullshit investor and an outstanding team with both a shared vision and a mindset that accepts delayed gratification. For us, a key to the process is a research lab that attracts brilliant students and young researchers who answer "Yes!" to every "Is it possible to build that?" and springboards them into the business endeavor. Optimism and hunger is our recipe for progress.

While our journey hasn't been straight, and has sometimes been rough in shallow waters, now we are sailing on the open sea. It is time to raise the mainsail.

Customer segmentation by mapping networks for understanding and applications that generate value

by Martin Rosvall

Ever since Aristotle, organization and classification have been cornerstones of science for understanding the world. In network science, where we conduct most of our research, categorization of nodes into modules with so-called community-detection algorithms has proven indispensable to comprehending the structure of large interconnected systems.

With understanding comes powerful applications. For example, geographical maps both depict what we know about the world in the clearest way and aid navigation. Life in unfamiliar cities is an entirely different thing with Google Maps. That is why our vision is to build Google Maps for networks. Our approach is to develop and combine the best clustering algorithms and visualization tools.

Applied to e-commerce transaction data, where the purchasing network consists of customers and products, a good segmentation provides valuable understanding by predicting when and what products customers are most likely to buy next, which can be exploited in automated workflows and personalized communication through email, Facebook, and so forth. Like navigating unfamiliar cities with good maps, turning data into value with powerful tools lead to radically improved efficiency.

One of the clustering approaches that we are using is based on information theory, and hence we have named it Infomap. We have designed and developed the underlying mathematics and the algorithm for solving the clustering problem given by the mathematics and a specific set of relational data.

The underlying mathematics of Infomap identifies modules by compressing the modular description of a complete walk across the network. Applied to e-commerce transaction data, the walk corresponds to a succession of random steps between customers and products: From a customer the random walk continues to one of her purchased products selected at random and then to a customer who purchased the same product also selected at random, and so on to infinity, like an e-commerce analyst exploring the transaction network. If the transaction network has clusters of tightly interconnected customers and products, the walk will spend relatively long periods within those clusters. Using fundamental information theory, Infomap is designed to capitalize on these structures such that the description length is minimized when the clusters capture most structure in the underlying network.

Infomap's unique algorithm for solving the clustering problem consists of three components: the core algorithm, sub-module movements, and single-node movements. The core algorithm is a proven method for quickly achieving an approximate solution: Neighboring nodes are joined into clusters, which subsequently are joined into superclusters and so on. To improve the clustering accuracy, repeated and recursive runs of sub-module movements and single-node movements break the clusters into smaller components to enable fine tuning at different scales.

Investing in a powerful framework that enables straightforward mathematical generalizations and a continuously refined clustering algorithm have succeeded. Our approach has been widely heralded as one of the best of the many dozens of network-clustering algorithms used in thousands of scientific studies.

Academic glory is merely a means to an end. At Infobaleen, we want to leverage the power of Infomap and other clustering algorithms to help companies turn their data into understanding and applications that generate value.

Discovering stories in complex data requires innovative visualizations

by Martin Rosvall

Rich, relational data of who did what when is a goldmine for understanding a system with many connected things. However, it takes multiple steps of processing, refining, and massaging the data to reach the desired understanding. Even the best clustering algorithms that can identify significant structure do not take you all the way. I learned this as a postdoctoral researcher about ten years ago when I was analyzing the output of a network clustering algorithm that I was developing.

The input was a network with more than six million citations between about six thousand scientific journals and the output a list with 90 clusters, each representing a scientific area. Did the results make sense? I searched up and down the list for journals that I assumed should be clustered together. It took hours, and I did not learn anything new since I could only confirm what I already knew. Moreover, the cluster list lacked essential information about how the clusters were related to each other. I needed a visualization that highlighted scientific areas and their relationships like road maps depict cities connected by highways.

Because there was no tool available for creating such a map, I started with whiteboard sketches and simple scripts for testing different ideas. Step-by-step and with feedback from collaborators, I made my first map of science (see figure above). It depicts scientific areas with circles and their relationships with bidirectional arrows. Their sizes indicate importance. However, when we showed the maps to colleagues, they complained: "Something is wrong with the algorithm, chemistry is too small." Yes indeed, they are chemists. In any case, I went back and checked the clustering algorithm and the map script. No change. Then I went one step further and replaced the input with ten years older citation data. Bingo! We showed the alternative map to the chemists, and they were happy: "Now it looks good! What was wrong in the algorithm?" The algorithm was not wrong, their perception of science was. It was outdated.

This experience taught me two things. First, complex data require powerful visualizations to comprehend and communicate the results. Second, rich data are dynamic and change over time. We needed an efficient visualization to capture that change. With no one available, we set out to create one. We call them alluvial diagrams because they look like alluvial fans of deposit built up by streams. With the alluvial diagrams, we have discovered, for example, how neuroscience emerged as a standalone scientific area mainly from cell biology, neurology, and psychology and how dramatic changes in lending patterns occurred after the Federal Reserve began paying interest on reserve balances during the financial crisis in 2008.

Because we, as well as other researchers, needed these visualizations over and over again to discover stories in complex data, we built interactive tools to transform days of scripting into minutes of customization. They are available on for anyone to use.

At Infobaleen, our customers have the same desire as researchers to discover stories in their data. However, because they all work with some type of transaction data, we can further streamline the visualization tools and eliminate the need for customization. As an example with open data, this interactive map of movies listed on Wikipedia makes it easy to explore and discover movies related to your favorite ones by zooming and dragging. In this case, the transactions come from editors editing movie pages, but with customer transactions, you can find groups of customers with similar and nonoverlapping interests and use them in targeted campaigns.

Whether you are a researcher or a sales manager, we want to empower you with the best algorithms and visualizations for discovering stories in your data.