Science is about asking questions and looking for answers. It is not a bad thing when you get exactly the answer you were expecting to exactly the question you were asking. On the other hand, it seems even better to get a completely surprising answer to a question almost but not quite the exact question you were asking.
The question I was not quite asking was, who is the most connected actor in the history of cinema. The answer that I did not expect was Matsunosuke Onoe.
The question that I was trying to answer was how well can we model a successful lifetime career trajectory given only the graph structure of a professional network. How can we predict the fame of a successful person?
We who work define our careers by a sequence of gigs, jobs or other professional works. This is not just personal in most cases, because most of us do not work alone, but in collaboration. Hence, a personal career trajectory is actually part of a professional graph where two people are connected together if they have both worked on the same job or contributed to the same work. Our careers can be seen as a timestamped sequence of connections in a social network.
Three well studied examples of professional networks are those defined by citations between scholarly publications, those defined by musical discographies and those defined by movie and television credits. The example I have been studying was the one defined by movie credits.
IMDb has courteously provided a limited subset of its data for research and education. The professional graph that I have been looking at is the graph of actors, writers, directors and producers credited for movies in that IMDb data subset. In detail, this graph is a bipartite graph whose vertices are people and movies, where people are connected to movies if they are listed as principal contributors for that movie. To simplify matters, I chose the connected component containing Fred Astaire.
My first attempt to measure the importance of people in this professional graph was to apply the PageRank algorithm. To understand this algorithm, please imagine a dedicated cinema buff randomly browsing through the history of movies and movie makers. The majority of time, this dedicated cinema buff uniformly randomly selects a new connection in the network and follows that connection. If the cinema buff is looking at a movie, she will choose a person from the credits and look at that person’s bio. If the cinema buff is looking at a person’s bio, she will choose a movie from that person’s filmography. Then every once in a while, the bored cinema buff will decide to choose a completely unrelated movie or person.
Now, this cinema buff, we have already said to be completely dedicated. In fact, she is so dedicated that she will do this forever. After browsing for a sufficiently long time, it will probably not matter where she started browsing. Still, some movie makers in more highly connected parts of the graph than others. If there are more professional paths that have a particular movie maker, then there will be a higher probability of the dedicated cinema buff saying that she is currently looking at the bio of that movie than a different one with fewer connections.The PageRank algorithm ranks most highly those movie makers the dedicated cinema buff is mostly likely to be looking at after a long period of time of browsing.
And here are the top twelve people in the cinema graph by PageRank.
- Matsunosuke Onoe
- Kinya Ogawa
- Sakae Nitta
- Jirô Yoshino
- Floyd Elliott
- Michael Blackwood
- Minoru Inao
- Satoru Kobayashi
- William Shakespeare
- Wui Ng
I have to admit that while the algorithm makes sense, it is not immediately obvious what this list means to me. Many of these names are completely unfamiliar to me. Intuitively, I understand why William Shakespeare might be on this list. His plays have very commonly become movies, which means he is clearly one of the top writers in the history not just of cinema, but of theater in all forms.
However, Matsunosuke Onoe was a surprise to me, not just because I had never heard of him before this, but because he was born in 1880 and he died in 1926. This means that his works were from very early in the history of cinema, most of them, at least a hundred years old.
Here are a few details which are not in the database, but from other sources. Matsunosuke Onoe began his career in kabuki theater. He later became a prolific movie maker, acting in more than 1000 films. He is considered the first star in Japanese cinema. That he was prolific is not the only reason he is at the top. In this limited dataset, there are only 565 movies by Matsunosuke Onoe, while Brahmanandam has 808 and William Shakespeare 509. That Matsunosuke Onoe is top must also be because his collaborators also were prolific or had connections through the rest of the graph.
Interestingly, when I searched YouTube for Matsunosuke Onoe I found only one of his movies out of those more than 1000. The name of that movie was Gōketsu Jiraiya, which means Jiraiya the Hero. There are only two reviews of this on IMDb, and both of those reviews said more or less that the reviewers found the movie incomprehensible. With no additional context, I also would most likely have found the movie incomprehensible.
The story of Gōketsu Jiraiya, is the story of a legendary ninja and practitioner of toad magic. It is the story of his love for Tsunade, a great practitioner of slug magic and his conflict with the villain Orochimaru, once his friend, but now possessed by a serpent demon.
This was a story that had its roots in folklore and was published in a multi-volume graphic novel, Jiraiya Gōketsu Monogatari between 1839 to 1868. From this, the story was adapted to a kabuki play, and eventually this 1921 movie among others.
While this knowledge may be obscure from here, this time and place where I happened to have learned, there are people who know of this story of Jiraiya the Hero and Matsunosuke Onoe the Actor. Those people remember, and those people care, and now I am one of those people. I would not have learned anything about this had I not done these PageRank experiments on this professional graph.
How many other stories may we learn this way?