site:{university domain (A)}
linkdomain:{university domain (B)}
and to obtain
the total number of pages indexed in the university
domain (A):
site: {university domain
(A)}
A SQL routine
was used to submit the 1 001 000 needed queries to built
the link matrix.
Geographical Map
We have built a
geographical map in order to show the distribution of
pages and link flows at the level of countries. To
design a geographical map we need a base map which
contains the political boundaries of the World. This
base map was downloaded from Blue Marble Geographics web
site (www.bluemarblegeo.com). Then, we used the Geographical
Information System (GIS) software MapViewer 6 to build
the final map. This map has two layers: a hutch map
which represents the number of web pages by country and
a flow map which shows the links between countries. The
classification method used in both layers was Jenks’
natural breaks (Jenks, 1963). This method determines the
best arrangement of values into classes by iteratively
comparing sums of the squared difference between
observed values within each class and class means. This
method improves the visualization and the interpretation
of the results, because it creates more significant
differences between classes.
Network Graph
A network graph was build with the
in-links between the 1 000 university web domains.
Several variables have been used in order to add
information about the network configuration. Nodes size
shows the volume of web pages that each university
publishes on the Web, colours represent the nationality
of each high education organization and arc size shows
the frequency of links between two university
domains.
The
software used to visualise the network was Pajek 1.02.
We selected a cut-off of minimum 50 links to improve the
network visualization. Also we used the
Fruchterman-Reingold algorithm to lay out the network
because it is the fast for large networks (de Nooy,
Mrvar & Batagelj, 2005).
Several
social network indicators were used to describe the
network topology and the main characteristics of the
nodes:
-
K-Core: a sub-network
in which each node has at least degree k. K-Cores
allow us to detect groups with a strong link density.
In free-scale networks, i.e. the Web, the core with
the highest degree is the central core of the network,
detecting the set of nodes the network rests on
(Seidman, 1983).
-
Degree: the number of
lines connecting a node. This can be normalized
(nDegree) by the total number of nodes in the network.
In a directed network such as the Web we can count
only the incoming links (InDegree) or the outgoing
links (OutDegree). In Webometrics, InDegree allows us
to detect the visibililty of a web domain (Cothey,
2005; Kretschmer & Kretschmer, 2006).
-
Betweenness: the
capacity of one node to help connect those nodes that
are not directly connected to each other. Its
normalization is the percentage over the total number
of nodes in the network. From a webometric point of
view, this measure allows us to detect hubs or
gateways that connect different web networks
(Faba-Pérez, Zapico-Alonso, Guerrero-Bote &
Moya-Anegón, 2005).
Results
Descriptive analysis
Prior to
the link analysis we made a frequency distribution by
country of the 1 000 universities.
Countries |
Universities |
% |
United States |
369 |
36.9 |
United Kingdom |
68 |
6.8 |
Germany |
66 |
6.6 |
France |
50 |
5 |
Spain |
41 |
4.1 |
Canada |
39 |
3.9 |
Japan |
35 |
3.5 |
Italy |
34 |
3.4 |
Australia |
30 |
3 |
China |
17 |
1.7 |
Taiwan |
17 |
1.7 |
Sweden |
15 |
1.5 |
Brazil |
14 |
1.4 |
The Netherlands |
13 |
1.3 |
Finland |
12 |
1.2 |
Rest of the World |
180 |
18 |
TOTAL |
1 000 |
100 |
Table
1. Universities distribution by country (15
first)
Table 1
shows the number of universities by country, listing
only the first 15 countries. The United States (US)
universities are 36.9% of the entire sample, trailed by
the United Kingdom (UK) (6.8%) and Germany (6.6%). This
distribution is also observed in the Top 200 of the
ranking which suggests that there is a digital divide in
favour of US universities. The low performance of
emerging countries like Russia (0.6%) and India (0.4%)
is also clear.
Geographical
Map
Figure
1. Geographical map of the distribution of pages by
country and their link flows
Figure 1 shows the
geographical distribution of web pages by country and
the incoming and outgoing links among these countries.
Two regions stand out for their large amount of web
pages: North America (USA and Canada) and the European
Union (EU) zone. The USA is the country with most web
pages (50.57%), holding half of the world academic web
pages indexed in Yahoo! Search. It is followed by
Germany (7.14%) and the UK (4.28%) in the EU. Besides
these zones, notice the web development of Japan
(2.35%), Australia (2.35%) and China (2.33%) in the East
and Brazil (.94%) in South America. Contrarily, two
zones have no universities in the sample: Africa (with
the exception of South Africa) and the Middle East (with
the exception of Israel and Arabia Saudi).
From the US position,
the upper loops show the outgoing links and the lower
loops the incoming ones. The most important link flows
are between North American countries and EU countries,
while in a second ring are links between East Asian and
Oceanic countries and the US.
Network
Graph
The World class network
(Figure 2) shows small-world properties because its
clustering coefficient (C=527.25) is considerable higher
than the same for a random network (C= 35.14) (Watts
& Strogatz, 1998). Furthermore, its average path
length (l=2.26) is also rather low. Visually,
small-world properties can be seen through the traversal links that
run across the network, connecting distant clusters
(Figure 2). Thein and out degree
frequency distributions follow a power law trend (?in=
.81; ?out=
.73) which
allows us to state that this network owns scale-free
properties as well (Barabasi, Albert & Jeong,
2000).
Figure 2. Network graph of the
World class universities on the Web (N=1 000 arcs= 50
links)
Figure 2 shows the
graph of the 1 000 higher education institutions. First,
each university is linked with the universities of its
own country. Thus, we can visually detect homogeneous
national groups such as Germany (red), the UK (light
green) or Japan (orange). However, we can also see that
there are countries that do not constitute a compact
group such as France (dark blue), Canada (white) and
other countries with a small set of universities such as
the Netherlands (dark red). This may be due to some
countries are included in other larger national
sub-networks, indeed Canada is related to the US and the
Netherlands with the UK. This describes a cumulative
process in which each national sub-network is aggregated
to other one like an accreation model.
The graph also shows
linguistic (Thelwall, Tang & Price, 2003) and
geographical relationships (Thelwall, 2002). The
European countries are located on the right side of the
picture, while the left side is mainly taken up by Asian
and American ones. It shows, for example, that Spanish
universities are between the European and the
Latin-American ones, relating linguistic aspect with
geographical proximity. In a similar way, Australia is
located between the USA and the UK.
Observe that size is
related to link attraction, because the large
universities are located in the core of the network.
Nevertheless, some countries, specifically Asian ones
(China, Japan and Taiwan), have large universities that
are far from the core. This may be caused by low
development of English pages by these countries (Vaughan
& Thelwall, 2004).
Figure 3. Detailed view of the
central core of the network
The main core of the
World network was detected with the k-cores method. The
central core is 116 nodes with degree 93. This highly
connected cluster has 98 American universities. The rest
are from Canada (7) and Europe (11). Figure 3 shows in
detail this central core, highlighting universities like
Harvard, Stanford or Massachusetts Institute of
Technology (MIT) which are located in the centre of the
graph and attract a huge amount of links from the entire
network. Next, the important European universities in
the core of the network pull their national networks, as
with Cambridge of the British network, Trier of the
German one or the Swiss Federal Institute of Technology
Zurich (ETHZ) of Switzerland. However, despite the
closeness of the Australian universities (purple), there
is no presence of Asian, African and Latin-American
universities, with the exception of the Israeli ones
which are located around the Unites States
sub-network.
We also calculated the
in- and out- degree of each university and ranked it.
United States universities are the most interconnected
in the network. MIT (78.1) and the universities of
Berkeley (73.5) and Stanford (73.1) are the web domains
most linked in the network (Table 2). Contrarily, the
universities that keep the network more connected,
making outgoing links, are US as well, particulary the
universities of Wisconsin-Madison (47), Stanford (41.8)
and Florida (41.2) (Table 3). Notice that both tables
only include US universities and the first European
universities in the indegree rank are Cambridge in 18th
and Leeds in 19th. In the outdegree, the first are ETHZ
in 15th and the University of Amsterdam in 22nd.
University |
Domain |
InDegree |
nInDeg |
Massachusetts Institute of
Technology |
mit.edu |
781 |
78.1 |
University
of California,
Berkeley |
berkeley.edu |
735 |
73.5 |
Stanford
University |
stanford.edu |
731 |
73.1 |
University
of Illinois
at Urbana-Champaign |
uiuc.edu |
666 |
66.6 |
Harvard
University |
harvard.edu |
634 |
63.4 |
University
of Michigan |
umich.edu |
634 |
63.4 |
University
of Wisconsin-Madison |
wisc.edu |
629 |
62.9 |
University
of Texas at Austin |
utexas.edu |
589 |
58.9 |
Cornell
University |
cornell.edu |
557 |
55.7 |
University
of Washington |
washington.edu |
555 |
55.5 |
Table 2. First 10 universities by
their InDegree
University |
Domain |
OutDegree |
nOutDegree |
University of
Wisconsin-Madison |
wisc.edu |
470 |
47 |
Stanford
University |
stanford.edu |
418 |
41.8 |
University of
Florida |
ufl.edu |
412 |
41.2 |
University of California,
Berkeley |
berkeley.edu |
411 |
41.1 |
University of
Washington |
washington.edu |
390 |
39 |
Massachusetts Institute of
Technology |
mit.edu |
378 |
37.8 |
University of Illinois at
Urbana-Champaign |
uiuc.edu |
369 |
36.9 |
Carnegie Mellon
University |
cmu.edu |
365 |
36.5 |
University of
Pennsylvania |
upenn.edu |
360 |
36 |
Harvard University |
harvard.edu |
356 |
35.6 |
Table 3. First 10 universities by
their OutDegree
As above,
the World network is the aggregated union of national
sub-networks. The betweenness centrality index detects
the gateway universities that connect these national
sub-networks with the remaining ones. Table 4 shows the
principal universities in each country according to the
betweenness centrality. We can appreciate outstanding
universities in each country such as MIT in the US,
Cambridge in the UK or ETHZ in Switzerland. Thus, these
universities connect local web spaces with international
ones. However, there are no German or Spanish
universities in the top positions, although both
countries have a good position in the network. We
suggest that as there is a linguistic factor in the
relationships between countries, the German-speaking
network is represented by ETHZ and the Spanish-speaking
one by the Autonomous National University of Mexico
(UNAM). Moreover, the betweenness index is rather close
to the degree indicators, so we can state that these
universities are the most important in their national or
linguistic sub-network.
Country |
University |
web
domain |
Betweenness |
nBetweenness |
US |
Massachusetts Institute of
Technology |
miy.edu |
65422 |
6.54 |
UK |
University of
Cambridge |
cam.ac.uk |
20037 |
2.00 |
CH |
Swiss
Federal Institute of Technology Zurich |
ethz.ch |
18584 |
1.86 |
FR |
Jussieu Campus |
jussieu.fr |
13280 |
1.32 |
JP |
University of
Tokyo |
u-tokyo.ac.jp |
12529 |
1.25 |
FI |
University of
Helsinki |
helsinki.fi |
9489 |
.95 |
MX |
Autonomous National
University of Mexico |
unam.mx |
7019 |
0.7 |
CA |
University of British
Columbia |
ubc.ca |
6813 |
0.68 |
TW |
National Taiwan
University |
ntu.edu.tw |
6604 |
.66 |
IT |
University of
Bolonia |
unibo.it |
6397 |
.63 |
Table 4. First 10 universities by
their Betweenness in their countries
Discussion
For some while now, the use of
search engine data has been discussed because of the
instability of their results over a short time period
(Bar-Ilan, 1998; Rousseau, 1997), the weakness of their
search operators (Igwersen, 1998) and the unreliability
of their databases (Sullivan, 2003). However, recent
studies have shown that current search engines have
improved their consistency and reliability (Bar-Ilan,
2002; Bar-Ilan, 2004; Bar-Ilan, 2005a). Although their
technical features have considerably improved, the
coverage of their databases and the harvesting process
are key issues to discuss. Bar-Ilan (2005b) detects that
some search engines have serious problems indexing and
retrieving non-Latin characters such as Japanese,
Chinese or Russian. Vaughan and Thelwall (2004) showed
that there is a local bias in favour of US and against
East Asian web sites which are underrepresented in the
search engines. Our work may be affected by these biases
because large East Asian universities web domains are
located far away in the graph (Figure 2), although they
have a large amount of web pages. The great presence of
the US universities may be slightly affected by these
coverage biases as well. Interpreting these results must
take into account these biases.
The link flows and web
page distribution in the geographical map (Figure 1)
follow a similar pattern to the European Union (Ortega
& Aguillo, 2008b). Countries with many web pages
attract and make more links than others, confirming the
strong relationship between web pages and links
(Thelwall & Harries, 2003; Katz & Cothey, 2006).
The network graph also shows similar results to previous
works. The World-class universities are grouped in local
or national sub-networks which are connected with other
sub-networks for linguistic or geographical reasons
(Heimeriks & van den Besselaar, 2006; Ortega et al.,
2008). These local or national sub-networks are
structurally fitted to the community model of the Web
suggested by Flake et al. (2000; 2002), several
“gateway” universities act as hubs/authorities that
connect the national communities or sub-networks between
them (Barabasi & Albert, 1999; Kleinberg, 1999).
This causes the reduction of the distances between nodes
and explains the emergence of small-world phenomena on
the Web (Björneborn, 2003).
Conclusions
The
World-class university network graph is comprised of
national sub-networks that merge in a central core where
the principal universities of each country pull their
networks toward international link relationships. This
network rests on the United States, which dominates the
world network in conjunction with the aggregation of the
European ones, especially the British and the German
sub-networks. This situation may be caused mainly by the
technological development of these countries and the
production of international content, that is, English
web pages. This second reason might explain the apparent
backward situation of some East Asian
countries.
Referencesaaaa
Aguillo, I. F., Granadino,
B., Ortega, J. L., & Prieto, J. A. (2006).
Scientific Research Activity and
Communication Measured With Cybermetrics Indicators.
Journal of the American Society for Information
Science and Technology,
57(10),1296-1302.
Aguillo, I. F., Ortega, J.
L., & Fernandez, M. (2008). Webometrics Ranking of World Universities:
Introduction, Methodology and Future Developments.
Higher Education in Europe,
33(2-3)
Barabasi, A. L.,
Albert, R., & Jeong, H. (2000). Scale-Free
Characteristics of Random Networks: the Topology of the
World-Wide Web. Physica A, 281(1-4),
69-77.
Barabasi, A. L., &
Albert, R. (1999). Emergence of
Scaling in Random Networks. Science, 286(5439),
509-512.
Bar-Ilan, J.
(1998). On the Overlap, the Precision and Estimated
Recall of Search Engines, a Case Study of the Query
"Erdos". Scientometrics, 42(2),
207-228.
Bar-Ilan, J.
(2002). Methods for Measuring Search Engine Performance
Over Time. Journal of the American Society for
Information Science and Technology, 53(4),
308-319.
Bar-Ilan, J.
(2004). The Use of Web Search Engines in Information
Science Research. Annual Review of Information
Science and Technology, 38,
231-288.
Bar-Ilan, J.
(2005a). Expectations versus reality – Search engine
features needed for Web research at mid 2005.
Cybermetrics, 9(1), http://www.cindoc.csic.es/cybermetrics//articles/v9i1p2.html
Bar-Ilan, J.
(2005b). Comparing Rankings of Search Results on the
Web. Information Processing & Management,
41(6), 1511-1519.
Bjorneborn, L.
(2003). Small-World Link Structures across an
Academic Web Space: A Library and Information Science
Approach. Copenhagen: Royal School of Library and
Information Science. http://vip.db.dk/lb/phd/phd-thesis.pdf
Chen, C. (2003).
Mapping Scientific Frontiers: The Quest for Knowledge
Visualization. London:
Springer-Verlag.
Cothey, V.
(2005). Some preliminary results from a link-crawl of
the European Union Research Area Web. In P. Ingwersen
& B. Larsen (Eds.), Proceeding of the 10th
International Conference of the International Society
for Scientometrics and Informetrics. Stockholm:
Karolinska University Press.
Faba-Perez, C.,
Zapico-Alonso, F., Guerrero-Bote, V. P., & De
Moya-Anegon, F. (2005). Comparative Analysis of
Webometric Measurements in Thematic Environments.
Journal of the American Society for Information
Science and Technology, 56(8),
779-785.
Heimeriks, G., & Van Den
Besselaar, P. (2006). Analyzing
hyperlinks networks: The meaning of hyperlink based
indicators of knowledge production. Cybermetrics,
10(1,1). http://www.cindoc.csic.es/cybermetrics/articles/v10i1p1.html
Ingwersen, P.
(1998). The Calculation of Web Impact Factors.
Journal of Documentation, 54(2),
236-243.
Jenks, G. F.
(1963). Generalization in statistical mapping. Annals
of the Association of American Geographers, 53,
15-26.
Katz, J. S.,
& Cothey, V. (2006). Web indicators for complex
innovation systems. Research Evaluation, 15(2),
85-95.
Kleinberg, J.
(1999). Authoritative sources in a hyperlinked
environment. Journal of the ACM, 46(5),
604-632.
Kretschmer, H.,
& Kretschmer, T. (2006). Application of a New
Centrality Measure for Social Network Analysis to
Bibliometric and Webometric Data. In Proceeding of
the IEEE International Conference on Digital Information
Management (ICDIM). Bangalore, India: IEEE
Nooy, W. de, Mrvar, A.,
& Batagelj, V. (2005). Exploratory Social Network Analysis with
Pajek. Cambridge, UK:
Cambridge University Press.
Ortega, J. L., &
Aguillo, I. F. (2007). La Web académica española en el
contexto del Espacio Europeo de Educación Superior:
Estudio exploratorio. El profesional de la
información, 16(5), 417–425.
Ortega, J. L.,
Aguillo, I. F., Cothey, V., & Scharnhorst, A.
(2008). Maps of the academic web in the European Higher
Education Area - an exploration of visual web
indicators. Scientometrics, 74(2),
295-308.
Ortega, J. L.
& Aguillo, I. F. (2008a). Visualization of the
Nordic academic web: Link analysis using social network
tools. Information Processing & Management,
44(4), 1624-1633.
Ortega, J. L.
& Aguillo, I. F. (2008a). Linking patterns in the
European Union’s Countries: geographical maps of the
European academic web space. Journal of Information
Science (in press) http://internetlab.cindoc.csic.es/cv/11/Ortega_Aguillo_2008.pdf
Polanco, X.,
Boudourides, M., Besagni, D., & Roche, I. (2001).
Clustering and Mapping European University Web Sites
Sample for Displaying Associations and Visualizing
Networks. In Proceeding of the NTTS&ETK 2001
Conference. Hersonissos, Crete
Rousseau, R.
(1997). Sitations: an Exploratory Study.
Cybermetrics, 1(1). http://www.cindoc.csic.es/cybermetrics/articles/v1i1p1.html
Seidman, S. B.
(1983). Network structure and minimum degree. Social
Networks, 5, 269–287.
Smith, A. G.
(2008). Benchmarking Google Scholar with the New Zealand
PBRF research assessment exercise.
Scientometrics, 74(2), 309-316.
Sullivan, D.
(2003). Google Dance Syndrome Strikes Again.
SearchEngineWatch.Com. http://searchenginewatch.com/showPage.html?page=3114531.
Thelwall, M.
(2002a). Evidence for the existence of geographic trends
in university web site interlinking. Journal of
Documentation, 58(5), 563-574.
Thelwall, M.
(2002b). A research and institutional size based model
for national university web site interlinking,
Journal of Documentation, 58(6), 683-694.
Thelwall, M.,
& Aguillo, I. F. (2003). La salud de
las Web universitarias españolas. Revista Española De
Documentación Científica,
26(3),
Thelwall, M.,
& Harries, G. (2003). The Connection Between the
Research of a University and Counts of Links to Its Web
Pages: an Investigation Based Upon a Classification of
the Relationships of Pages to the Research of the Host
University. Journal of the American Society for
Information Science and Technology, 54(7),
594-602.
Thelwall, M.,
& Harries, G. (2004). Do The Web Sites of Higher
Rated Scholars Have Significantly More Online
Impact? Journal of the American
Society for Information Science and Technology,
55(2), 149-159.
Thelwall, M.,
Tang, R., & Price, L. (2003). Linguistic Patterns of
Academic Web Use in Western Europe.
Scientometrics, 56(3), 417-432.
Thelwall, M.,
& Zuccala, A. (2008). A University-Centred European
Union Link Analysis.
Scientometrics, 75(3),
407-420
Vaughan, L.
(2006). Visualizing linguistic and cultural differences
using Web co-link data. Journal of the American
Society for Information Science and Technology,
57(9), 1178-1193.
Vaughan, L., &
Thelwall, M. (2004). Search engine coverage bias:
evidence and possible causes, Information Processing
& Management, 40(4), 693-707.
Vaughan, L. &
Thelwall, M. (2005). A modeling approach to uncover
hyperlink patterns: The case of Canadian universities.
Information Processing & Management, 41(2),
347-359.
Watts, D. J., &
Strogatz, S. H. (1998). "Collective dynamics of
'small-world' networks". Nature,
393, 440-442.