- Chokoon

# How Google's page rank algorithm changes Architecture space

Updated: Apr 25

/ When architects analyze the site in the city and try to visualize it, they do so by placing more "relevant" and critical pieces of information onto the site plan illustration: property lines, setbacks, driveways, parking, landscape features, and urban elements. However, there's one element that stands out, and many would consider it to be of utmost importance or at least very useful on multiple occasions.

That's the level of **integration** of the space. This means measuring the public flow inside the space to pinpoint how well each area is being circulated and by how many people. The term "space" refers to many things: rooms, roads, walkways, streets, or corridors. Basically to draw out the heat map to visualize how "busy" or "private" the space might be in comparison to one another.

This information raises the question, "What can space do?" A well-designed space may organize movement, distribute land use, influence crime and safety, affect urban carbon footprint, and determine land value, which is the bottom line for property developers and homebuyers alike. It's not difficult to see the level of productivity that could emerge just by being able to measure the space integration level.

However, It's starting to dawn on us that such a thing would be especially tricky to calculate even with the aid of artificial intelligence. This is because people are unpredictable, and often their behaviors would oppose the most basic principle. At least, that's what we used to believe.

Spatial Layout Efficiency by Tim Stoner

Turns out there was a pretty intriguing publication by a British Architect where he pointed out a solution that he had found to describe space. That was to measure the total depth of the space, or in this scenario, a room. The number indicated the joint score of how many steps away from each room to every other room. The colors represented numerical values which provided an effective visualization of simultaneous relations in spatial layouts. The lower the score, the fewer steps it would have taken to walk to that room, thus busier and vice versa. He had concluded that the movement of people inside the space was mainly (about 70%) determined by the layout of the space, not the content. /

/ However, we came across another plausible solution; the **PageRank** algorithm, which approached the problem from a different perspective and was created by Google's co-founders (including Larry Page, for whom the algorithm was named).

When search engines like Google display search results, they do so by placing more "important" and higher-quality pages higher in the search results than less important pages. But how does the search engine know which pages are more important than other pages? In PageRank's algorithm, a website is more important if it is linked to by other important websites, and links from less important websites have their links weighted less. This definition seems a bit circular, but it turns out that there are multiple strategies for calculating these rankings.

Imagine we randomly started by sampling Page 5. We’d then have no choice but to go to Page 6, and then no choice but to go to Page 5 after that, and then Page 6 again, and so forth. We’d end up with an estimate of 0.5 for the PageRank for Pages 5 and 6, and an estimate of 0 for the PageRank of all the remaining pages, since we spent all our time on Pages 5 and 6 and never visited any of the other pages.

To ensure we can always get to somewhere else in the corpus of web pages, we’ll introduce to our model a damping factor **d**. With probability **d** (where **d** is usually set around **0.85**), the random surfer will choose from one of the links on the current page at random. But otherwise (with probability **1 - d**), the random surfer chooses one out of all of the pages in the corpus at random (including the one they are currently on).

Our random surfer now starts by choosing a page at random, and then, for each additional sample we’d like to generate, chooses a link from the current page at random with probability **d**, and chooses any page at random with probability **1 - d**. If we keep track of how many times each page has shown up as a sample, we can treat the proportion of states that were on a given page as its PageRank.

We can also define a page’s PageRank using a recursive mathematical expression. Let **PR(p)** be the PageRank of a given page **p**: the probability that a random surfer ends up on that page. How do we define **PR(p)**? Well, we know there are two ways that a random surfer could end up on the page:

With probability

**1 - d**, the surfer chose a page at random and ended up on page**p**.With probability

**d**, the surfer followed a link from a page**i**to page**p**.

The first condition is fairly straightforward to express mathematically: it’s **1 - d** divided by **N**, where **N** is the total number of pages across the entire corpus. This is because the **1 - d** probability of choosing a page at random is split evenly among all **N** possible pages.

For the second condition, we need to consider each possible page **i** that links to page **p**. For each of those incoming pages, let **NumLinks(i)** be the number of links on page **i**. Each page **i** that links to **p** has its own PageRank, **PR(i)**, representing the probability that we are on page **i** at any given time. And since from page **i** we travel to any of that page’s links with equal probability, we divide **PR(i)** by the number of links **NumLinks(i)** to get the probability that we were on page **i** and chose the link to page **p**.

This gives us the following definition for the PageRank for a page **p** as shown above.

In this formula, **d** is the damping factor, **N** is the total number of pages in the corpus, **i** ranges over all pages that link to page **p**, and **NumLinks(i)** is the number of links present on page **i**.

How would we go about calculating PageRank values for each page, then? We can do so via iteration: start by assuming the PageRank of every page is **1 / N** (i.e., equally likely to be on any page). Then, use the above formula to calculate new PageRank values for each page, based on the previous PageRank values. If we keep repeating this process, calculating a new set of PageRank values for each page based on the previous set of PageRank values, eventually the PageRank values will converge. /

/ The formula seems promising. But how does it relate to the integration level of the space? The physical space?

When we look at the main idea of how the algorithm is functioning, we could begin to create a connection to translate that heuristic approach. The purpose of the algorithm is to rank web pages to figure which one is more important than the other. It's worthwhile to think about how we could integrate the algorithm to assist us in finding out which space is more important or has higher quality than the other.

Take our case study as an example:

This is the architectural floor plan of a section inside a building. The plan includes 10 rooms connecting via open passageways. Room 1 connects to room 2, 3, and 4. Room 2 only connects to room 1. Room 3 connects to room 1, 5, and so forth. With this information, it's sufficient for us to construct a corpus, which is basically a Python dictionary mapping a room number to a set of all rooms linked to by that number.

Corpus = {

"Room1": {"Room2","Room3","Room4"},

"Room2": {"Room1"},

"Room3": {"Room1","Room5"},

"Room4": {"Room1","Room5","Room6"},

"Room5": {"Room3","Room4","Room7","Room8"},

"Room6": {"Room4","Room8"},

"Room7": {"Room5","Room9"},

"Room8": {"Room5","Room6","Room9"},

"Room9": {"Room7","Room8","Room10"},

"Room10": {"Room9"}

}

The iterative function begins by assigning each room a rank of **1 / N**, where **N** is the total number of rooms in the corpus, 10.

The function then repeatedly calculate new rank values based on all of the current rank values, according to the PageRank formula. (i.e., calculating room’s rank based on the ranks of all rooms that link to it). A room that has no links at all is interpreted as having one link for every room in the corpus (including itself).

The algorithm has processed for 100 iterations, and this is the result that we receive:

As seen from the visualization, room 5 is perceived to have the highest level of engagement because it has abundant connections to other rooms; has most nearby high-quality rooms linked to it. On the other hand, room 2 and room 10, which situates in the corner and only holds a single connection each, lags in engagement and inclines to become more private. /

Let's take a look at another example on an urban scale:

/ The city plan displays the ranking of the roads via the use of color; red suggests higher quality and blue for lower quality. The calculation has taken into account only the primary, secondary, and tertiary road lines. The result may seem alluring, but keep in mind that this example, in particular, has been vastly simplified. In the physical world, the highway has multilane, and the intersection does not necessarily mean all roads leading to it are linked together. The street carries laws, and automobile movements are restricted.

These are only some of the factors that put the constraint on the dynamic of our heuristic approach. The more precise the information in the corpus is, the more accurate the calculation is going to be. /

Here's the final example:

/ The algorithm has also been integrated inside the retail plan management with the content of the shelf disregarded. Similarly, the plan illustrates the engagement quality in each aisle; red suggests higher quality and blue for lower quality. The main aisles exhibit a higher value and better connectivity; meaning they have more paths leading to them. On the contrary, the sub aisles, which only connect to the primary aisles, have the probability averages out between them, thus retaining lower exposure accordingly. As a result, the product that's being displayed there is likely to have fewer visitors.

The challenge with this experiment is identifying the boundary of the space; at what point does an aisle end and how much area does it cover? Since the space inside the retail is very distinct; no doors and no clear barriers. This could become tricky when thinking about how we should set up the content inside the corpus in the first place. /

/ Nevertheless, all three case studies have emphasized promising results. The algorithm doesn't simply depend solely on visualizing the existing problem as precisely as it could but also on raising the unseen solution. The purpose is not to determine and accept the result as it appears to be but to constitute for what it could be.

In conclusion, the algorithm is a work in progress and can only be developed further by studying the public engagement behavior in the physical facility. It's certainly clear to see why the ability to describe space would play such a vital role in designing a high-performance architecture, and the method of defining one only ends at one's imagination. The subject of artificial intelligence is highly potent, and it does not isolate itself solely inside the realm of computer engineers. /