How Google's page rank algorithm changes Architecture space
- Chokoon
- Dec 31, 2021
- 7 min read
Updated: May 21
When architects analyze an urban site and attempt to visualize its potential, they typically overlay critical and contextually relevant information onto the site plan: property boundaries, setbacks, access points, parking arrangements, landscaping elements, and surrounding urban fabric. Among these layers, one particular element consistently stands out, often regarded as essential or at least highly valuable in numerous design scenarios.

That's the level of integration of the space. This means measuring the public flow inside the space to pinpoint how well each area is being circulated and by how many people. The term "space" refers to many things: rooms, streets, walkways, corridors, or public plazas. Basically to draw out the heat map to visualize how "busy" or "private" the space might be in comparison to one another.
This information raises the question, "What can space do?" A well-designed space may organize movement, distribute land use, influence crime and safety, affect urban carbon footprint, and determine land value, which is the bottom line for property developers and homebuyers alike. It's not difficult to see the level of productivity that could emerge just by being able to measure the space integration level.
However, accurately modeling spatial integration remains a complex challenge even with advances in artificial intelligence. This is because people are unpredictable, and often their behaviors would oppose the most basic principle. Or so we once believed.
Spatial Layout Efficiency by Tim Stoner
Turns out there was a pretty intriguing publication by a British Architect where he pointed out a solution that he had found to describe space. That was to measure the total depth of the space, or in this scenario, a room. This metric represented the cumulative number of steps required to travel from one room to every other room within the layout. By assigning numerical values to these distances and visualizing them through a color-coded system, the architect created an effective heat map illustrating the simultaneous spatial interrelations within a plan. Lower scores indicated rooms that were more accessible, fewer steps away from other spaces, and therefore busier, while higher scores corresponded to more secluded, less trafficked areas. His conclusion was compelling: approximately 70% of human movement within a built environment is influenced by the spatial configuration itself, rather than the functions or contents of the spaces.
However, we encountered an alternative and compelling approach: the PageRank algorithm, originally developed by Google’s co-founders, including Larry Page, for whom the algorithm was named. While primarily designed to rank web pages in search engine results, PageRank offers valuable insights when applied to spatial analysis.
When search engines like Google display search results, they do so by placing more "important" and higher-quality pages higher in the search results than less important pages. But how does the search engine know which pages are more important than other pages? In PageRank's algorithm, a website is more important if it is linked to by other important websites, and links from less important websites have their links weighted less. This definition seems a bit circular, but it turns out that there are multiple strategies for calculating these rankings.

Imagine we randomly started by sampling Page 5. We’d then have no choice but to go to Page 6, and then no choice but to go to Page 5 after that, and then Page 6 again, and so forth. We’d end up with an estimate of 0.5 for the PageRank for Pages 5 and 6, and an estimate of 0 for the PageRank of all the remaining pages, since we spent all our time on Pages 5 and 6 and never visited any of the other pages.
To ensure we can always get to somewhere else in the corpus of web pages, we’ll introduce to our model a damping factor d. With probability d (where d is usually set around 0.85), the random surfer will choose from one of the links on the current page at random. But otherwise (with probability 1 - d), the random surfer chooses one out of all of the pages in the corpus at random (including the one they are currently on).
Our random surfer now starts by choosing a page at random, and then, for each additional sample we’d like to generate, chooses a link from the current page at random with probability d, and chooses any page at random with probability 1 - d. If we keep track of how many times each page has shown up as a sample, we can treat the proportion of states that were on a given page as its PageRank.

We can also define a page’s PageRank using a recursive mathematical expression. Let PR(p) be the PageRank of a given page p: the probability that a random surfer ends up on that page. How do we define PR(p)? Well, we know there are two ways that a random surfer could end up on the page:
With probability 1 - d, the surfer chose a page at random and ended up on page p.
With probability d, the surfer followed a link from a page i to page p.
The first condition is fairly straightforward to express mathematically: it’s 1 - d divided by N, where N is the total number of pages across the entire corpus. This is because the 1 - d probability of choosing a page at random is split evenly among all N possible pages.
For the second condition, we need to consider each possible page i that links to page p. For each of those incoming pages, let NumLinks(i) be the number of links on page i. Each page i that links to p has its own PageRank, PR(i), representing the probability that we are on page i at any given time. And since from page i we travel to any of that page’s links with equal probability, we divide PR(i) by the number of links NumLinks(i) to get the probability that we were on page i and chose the link to page p.
This gives us the following definition for the PageRank for a page p as shown above.
In this formula, d is the damping factor, N is the total number of pages in the corpus, i ranges over all pages that link to page p, and NumLinks(i) is the number of links present on page i.
How would we go about calculating PageRank values for each page, then? We can do so via iteration: start by assuming the PageRank of every page is 1 / N (i.e., equally likely to be on any page). Then, use the above formula to calculate new PageRank values for each page, based on the previous PageRank values. If we keep repeating this process, calculating a new set of PageRank values for each page based on the previous set of PageRank values, eventually the PageRank values will converge.
The formula seems promising. But how does it relate to the integration level of the space? The physical space?
By examining the core mechanics of the algorithm, we can draw meaningful parallels and consider how to adapt this heuristic approach for spatial analysis. The primary function of PageRank is to rank web pages according to their relative importance. Translating this idea into architecture, it becomes valuable to explore how the algorithm might help us identify which spaces within a built environment hold greater significance or possess higher qualitative value compared to others.
Take our case study as an example:

This is the architectural floor plan of a section inside a building. The plan includes 10 rooms connecting via open passageways. Room 1 connects to room 2, 3, and 4. Room 2 only connects to room 1. Room 3 connects to room 1, 5, and so forth. With this information, it's sufficient for us to construct a corpus, which is basically a Python dictionary mapping a room number to a set of all rooms linked to by that number.
Corpus = {
"Room1": {"Room2","Room3","Room4"},
"Room2": {"Room1"},
"Room3": {"Room1","Room5"},
"Room4": {"Room1","Room5","Room6"},
"Room5": {"Room3","Room4","Room7","Room8"},
"Room6": {"Room4","Room8"},
"Room7": {"Room5","Room9"},
"Room8": {"Room5","Room6","Room9"},
"Room9": {"Room7","Room8","Room10"},
"Room10": {"Room9"}
}
The iterative function begins by assigning each room a rank of 1 / N, where N is the total number of rooms in the corpus, 10.
The function then repeatedly calculate new rank values based on all of the current rank values, according to the PageRank formula. (i.e., calculating room’s rank based on the ranks of all rooms that link to it). A room that has no links at all is interpreted as having one link for every room in the corpus (including itself).
The algorithm processed for 100 iterations and this was the result:

As illustrated in the visualization, Room 5 exhibits the highest level of spatial engagement, owing to its numerous connections with adjacent rooms and its direct links to other highly significant spaces. Conversely, Room 2 and Room 10, located at the periphery and each connected by only a single access point, demonstrate lower engagement levels and naturally tend toward more private, secluded functions.
Let's take a look at another example on an urban scale:

The city plan visualizes road rankings through a color gradient; red indicating higher-ranked routes and blue representing lower-ranked ones. This calculation considers only the primary, secondary, and tertiary road networks. While the outcome appears compelling, it is important to recognize that this example has been significantly simplified. In reality, highways feature multilane configurations, and intersections do not necessarily imply full connectivity among all converging roads. Additionally, traffic regulations and vehicle movement restrictions further complicate the spatial dynamics.
These factors highlight the constraints inherent in applying this heuristic model. Ultimately, the precision of the dataset directly influences the accuracy of the resulting analysis.
Here's the final example:

The algorithm has also been applied to retail floor plan analysis, with shelf content intentionally excluded to focus purely on spatial dynamics. The visualization maps engagement quality across aisles, where red indicates higher engagement and blue represents lower. Primary aisles demonstrate elevated values and superior connectivity, signifying multiple access paths and higher foot traffic. In contrast, secondary aisles, connected solely to primary routes, exhibit averaged lower probabilities of visitation, resulting in reduced exposure for products displayed there.
A key challenge in this application lies in defining spatial boundaries, determining precisely where an aisle begins and ends, and how much physical area it occupies. Unlike architectural spaces with clear partitions, retail interiors often lack doors or explicit barriers, complicating the delineation of zones within the spatial dataset. This ambiguity introduces complexities in establishing the initial parameters of the spatial corpus for accurate algorithmic modeling.
Nevertheless, all three case studies highlight promising outcomes. The algorithm’s strength lies not only in accurately visualizing existing spatial conditions but also in revealing latent solutions that may otherwise go unnoticed. Its purpose extends beyond merely accepting the presented results; instead, it challenges designers to imagine what the space could become.
In conclusion, this algorithm remains a work in progress, requiring ongoing refinement informed by empirical studies of public behavior within physical environments. The ability to quantify and describe spatial engagement is undeniably crucial for designing high-performance architecture. Moreover, the scope of defining space transcends conventional methods and is limited only by the bounds of imagination. Artificial intelligence, with its vast potential, is not confined solely to the realm of computer engineers but also offers transformative possibilities for daring architects and designers.
Comments