Title: Paper review: Statistical and Combinatorial Analysis of the TOR Routing Protocol
Date: 2021-01-17 13:45

<script defer src="./files/katex.min.js" type="text/javascript"></script>
<link rel="stylesheet" href="./files/katex.min.css">
<script defer src="./files/auto-render.min.js" type="text/javascript"></script>
<script defer src="./files/katex_config.js" type="text/javascript"></script>


My [previous article]({filename}/paper_notes/osint_analysis_tor_foundation.md) on a paper
from [Éric Filiol](https://fr.wikipedia.org/wiki/%C3%89ric_Filiol) about
[Tor](https://torproject.org) was well-received and sparked a lot of
interesting conversations, so here is a review of a similar one: [Statistical
and combinatorial analysis of the TOR routing protocol: structural weaknesses
identified in the TOR
network](https://link.springer.com/article/10.1007/s11416-019-00334-x) ([local
mirror](XXX)), written again by people from the [Laboratoire de Virologie et de
Cryptologie Opérationnelles  (C +
V)<sup>O</sup>](https://en.esiea.fr/information-and-systems-security/) from the
[ESIEA](https://en.esiea.fr/): É. Filiol, and two of [his
interns](https://www.esiea.fr/wp-content/uploads/2018/05/Rapport-activit%C3%A9s_cvo2017.compressed.pdf):
Nicolas Job and [Maxence Delong](https://dblp.org/pid/217/1983.html)
(now [pursuing a doctorate](https://www.esiea.fr/expertise-confiance-numerique-securite/equipe-cns/) on the topic).

Nicolas Job was credited as being both from the (C + V)<sup>O</sup> laboratory,
but also from the "Department of Defense, Paris, France": this is plain wrong.
He was doing a 6 months internship for his
[ESSI](https://www.ssi.gouv.fr/en/cybersecurity-in-france/formations/)
*degree*, a certification in 7 months plus an internship, delivered by the
[ANSSI](https://en.wikipedia.org/wiki/Agence_nationale_de_la_s%C3%A9curit%C3%A9_des_syst%C3%A8mes_d%27information).
The ANSSI is reporting to the [Secretariat-General for National Defence and
Security](https://en.wikipedia.org/wiki/Secretariat-General_for_National_Defence_and_Security_(France)) (SGDSN),
which despite its confusing name, doesn't report to the [Ministry of Armed
Forces](https://en.wikipedia.org/wiki/Ministry_of_Armed_Forces_(France)) (the equivalent of the [USA's
DoD](https://en.wikipedia.org/wiki/United_States_Department_of_Defense) or
[UK's MoD](https://en.wikipedia.org/wiki/Ministry_of_Defence_(United_Kingdom))) at all, but
to the [prime minister](https://en.wikipedia.org/wiki/Prime_Minister_of_France) instead. For the
curious minds, the reason to have such a separation in France is that the
SGDSN's role, amongst others, is to define and enforce rules regarding
[Classified information](https://en.wikipedia.org/wiki/Classified_information):
it wouldn't be possible for it to inspect the Army had it to report to it.

I [previously wrote]({filename}/paper_notes/osint_analysis_tor_foundation.md)
the following about É. Filiol, and I'm still standing by my word:

> [Eric Filiol](https://sites.google.com/site/ericfiliol/) is
known for pretending to have broken AES in 2002 (he [didn't](https://eprint.iacr.org/2002/149)),
and in 2003 (he still [didn't]( https://eprint.iacr.org/2003/022.pdf )) and Tor
in 2011
(he [didn't either]( https://blog.torproject.org/rumors-tors-compromise-are-greatly-exaggerated)),
and for being the *architect and designer* of [DAVFI]( http://www.davfi.fr/index_en.html ),
a French "new generation anti-malware solution",
known for a being a phenomenal (and extraordinary expensive)
[source of fun]( https://news0ft.blogspot.com/2016/06/le-gachis.html ).

The paper was
[published](https://link.springer.com/journal/11416/volumes-and-issues/16-1) in
the [Journal of Computer Virology and Hacking Techniques](https://www.springer.com/journal/11416)
in March 2020, apparently because it
was presented at the
[2nd International Workshop on FORmal methods for Security Engineering](http://www.icissp.org/Abstracts/2018/ForSE_2018_Abstracts.htm) in
2018. It was also
[presented](https://www.semanticscholar.org/paper/Statistical-and-Combinatorial-Analysis-of-the-TOR-Filiol-Nicolas/ff748491209db3106d79fb4a44133ae626effa7a)
in 2018 at the [The International Conference on Information Systems
Security and Privacy 2018](http://www.icissp.org/Home.aspx?y=2018).
It was also presented at the [13th International Conference on Cyber Warfare and SecurityICCWS 2018](https://www.academic-conferences.org/pdf/download-info/iccws-2018-full-timetable/) the same year. É. Filiol has been presenting
at this conference in 2017 ("Automated Intelligence Gathering Through Comparison of JPEG Images and their Thumbnails"),
2016 ("Combinatorial Optimization of Operational (Cyber) Attacks against Large-scale Critical Infrastructures: The Vertex Cover Approach"),
and 2014 ("Critical Infrastructures: Where we stand today?") as well.

All of those conferences claim to have strict review processes.

---

## Introduction

> The various computer networks such as the Internet today allow everyone to
have many means of communication. However, this access is only possible if the
user has an IP address. This condition therefore allows entities that have
access to the network flows, to analyze it and thus to retrieve information
about particular users.

"IP" stands for [Internet
Protocol](https://en.wikipedia.org/wiki/Internet_Protocol): you can do
networking without IP address, like
[Appletalk](https://en.wikipedia.org/wiki/AppleTalk). Moreover, there are
ways/systems to achieve [decent
anonymity](https://en.wikipedia.org/wiki/Degree_of_anonymity) even against an
attacker who has a global view of the network, like
[pond](https://github.com/agl/pond) for example.

> Communities have been formed to defend the anonymity of each other and have
developed networks whose purported purpose is to enable people to navigate
without anyone being able to identify the origin and destination of the
communication. These networks therefore seem to be an ideal tool for
criminals and terrorists of all kinds, as they can carry out their actions with
little trace.

Privacy is a [fucking elementary human right](https://en.wikipedia.org/wiki/Right_to_privacy).
Moreover, if we want to go the bad faith way:

- Road enable people to transport goods without oversight, making them ideal to
	transport drugs, weapons and bodies.
- Butchers are wielding knifes all day long, making it an ideal profession
  for wannabe murderers.
- Bar and pubs are loud and anonymous places, making them ideal for
	conspirations.
- Libraries are full of [horrible books](https://en.wikipedia.org/wiki/Lists_of_banned_books),
  making them ideal places to radicalise the youth.
- …

> The purpose of this article is therefore to study a particular network of
anonymization, both for clients and services: The Onion Router (TOR for short).
Its purpose is to protect the confidentiality, integrity and availability of
exchanges within its organization. This network relies on a community that uses
a particular network protocol.

As *usual*, it seems that there is a confusion about the terminology:

- *The Tor Project* is a non-profit (charity) organization that maintains and develops the Tor software.
- `tor` is a program to use the Tor network
- *The Tor Network* is the set of all Tor relays

Amusingly, there is the following related [paragraph on the Tor Project
website](https://support.torproject.org/about/):

> Note: even though it originally came from an acronym, Tor is not spelled
"TOR". Only the first letter is capitalized. In fact, we can usually spot
people who haven't read any of our website (and have instead learned everything
they know about Tor from news articles) by the fact that they spell it wrong.

<br>

> The open source code is documented, but many aspects such as route
construction are only briefly documented. This is contrary to the principles
of security, since TOR’s security of this network is based on a considerable
amount of obscurity and secrecy.

This is just plain bullshit. The main implementation is [completely
open-source](https://gitlab.torproject.org/tpo/core/tor), and people even
wrote a [lot of different implementations](https://gitlab.torproject.org/legacy/trac/-/wikis/doc/ListOfTorImplementations) in
[java](https://subgraph.com/orchid/index.en.html),
[erlang](https://ahf.me/assets/talks/SHA2017-NL-17-Talla.pdf),
[haskell](https://hackage.haskell.org/package/haskell-tor),
[go](https://github.com/mmcloughlin/pearl), in [go
again](https://www.tvdw.eu/blog/2015/01/24/implementing-a-tor-relay-from-scratch/),
[C++](https://github.com/wbenny/mini-tor) in a ~20kb binary,
[javascript](https://github.com/Ayms/node-Tor/)
[Python]( https://github.com/torpyorg/torpy ),
[Python](https://github.com/Marten4n6/TinyTor) in ~1200 lines of code,
[Python again](https://github.com/cea-sec/TorPylle/blob/master/torpylle.py) in ~1000 lines of code,
[C#]( https://github.com/neoeinstein/purpleonion ),
[rust](https://gitlab.torproject.org/tpo/core/arti), … 

If the specification was cryptic or unclear, it's unlikely that so many people
would have written this amount of implementations.

The gotor author even [said](https://www.tvdw.eu/blog/2015/01/24/implementing-a-tor-relay-from-scratch/):

> The network has a specification document, but carefully following it will
result in an implementation that does not work (this has now been corrected).
This posed some challenges, but luckily the CTor source code is relatively easy
to read.

[Viva64](https://www.viva64.com/en/about/), the company behind the static
analysis suite PVS Studio, known to regularly publish
[blogposts](https://www.viva64.com/en/b/) showcasing errors/mistakes they found
in [popular open-source projects](https://www.viva64.com/en/inspections/)
[wrote](https://www.viva64.com/en/b/0507/):

>  My congratulations to the authors of the Tor project. I didn't manage to
find any errors after the analysis by PVS-Studio static code analyzer. We write
such words very rarely, so the authors may really be proud. They do deserve a
medal for the high-quality code.

And even if there wasn't any kind of documentation, the consensus format is
in plain-text, and pretty obvious:

[![Consensus illustration]({static}/images/tor_filiol2/consensus.png)]({static}/images/tor_filiol2/consensus.png)

<br>

> Therefore, would it be possible for an attacker who controls a limited number
of routers to be able to control and impact a significant part of the traffic
and hence more or less would be able to damage TOR's security?

Why the "therefore"? Even if tor wasn't open-source (it is),
with no documentation, nor specification (tor is [heavily publicly
specified](https://gitlab.torproject.org/tpo/core/torspec)), it wouldn't
automatically means that an attacker controlling nodes would be able to perform
attacks.

> In a second part, the analysis of this problem from the point of view of
graph theory and sub-problems of optimal paths and vertex cover has made it
possible to identify particular reduced subsets of nodes on which a large
proportion of the TOR traffic depends. Taking control of these nodes by an
attacker could significantly damage the security of the TOR network.

If the goal was to find what nodes are worth compromising, those data are
[already public](https://metrics.torproject.org/rs.html#toprelays) and don't
involve fancy mathematics.


## Description of the tor protocol

> The Tor Project Inc. is a non-profit foundation located in Massachusetts.
It was founded in December 2006 by seven computer scientists, including Roger
Dingledine and Nick Mathewson. It is primarily responsible for software
maintenance — including the Tor Browser and Tail (sic.) — for the Tor anonymous
net- work.

While the Tor Project Inc it is indeed maintaining the Tor Browser, Tails is
[developed independently](https://tails.boum.org/doc/about/trust/index.en.html#index4h1).

> Today the organization is funded by donors and government agencies, including
the US Department of State Bureau of Democracy and the German Federal Foreign
Office (see (Delong et al., 2018) for a deep OSINT analysis of the Tor project
and foundation).

Tor's sponsors are [*way* more diverse](
https://www.torproject.org/about/sponsors/ ) than just two clickbaity USA
agencies, and was only funded in 2015 by the GFFO.

The cited paper is [rubbish]({filename}/paper_notes/osint_analysis_tor_foundation.md),
and the main author isn't Delong, but Filiol: he's citing his own paper.

> The Tor network consists of about 12,000 routers divided into two families,
relays and bridges. The Tor Foundation publishes almost all the information in
the first category, and quite nothing for the second category in order to
prevent a state from denying access to the network because it does not have the
IP addresses of these hidden nodes. The Tor network is organized into several
levels where routers have multiple roles.

It seems that there is another terminology confusion:

- A bridge is a non-public relay
- A relay is a router and is a node

> Four Authority levels are identified:
- Nine Directory authorities and Bridge authorities.
- Directory caches
- Fallback directory mirrors
- HSDir

It's a weird custom way to classify things, since there are no mentions of any "Authority levels"
anywhere in the specification nor in any implementation that I'm aware of.
It seems that there is an other confusion here, since the four "authority
levels" are in fact
[flags](https://metrics.torproject.org/glossary.html#relay-flag), and there are
not four of them, but twelve main ones: "Authority", "BadExit", "Exit", "Fast",
"Guard", "HSDir", "NoEdConsensus", "Stable", "StableDesc", "Running", "Valid"
and "V2Dir", as well as various additional ones, like "ReacheableIPv6"
"IPv6Exit", "FallbackDir", …

> the Guard node is an input router to the Tor net-
work. It also has the role of non exit router and
can therefore process internal network flows.

An exit relay [can be a
guard](https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2450), it's
just that since they're scarce, they are not prioritized.

> the Bridge node (TOR Foundation, 2014d) is a router whose information is only
partially published and whose purpose is to allow customers to bypass
censorship.

There are no customers, only users.

> As for this particular class, we have developed an automatic ex- traction
procedure which is non public. At the present time, a first list of more than
2,500 bridges (on a total number of 3,000-3,500) have been extracted (Filiol
et al., 2017) as well as many information about the way bridges are managed
behind the scene. Recently the Tor foundation has increased the number of
bridges but this does not change anything with respect to our extraction
procedure.

Just like E. Filiol found a [magical way to break
AES](https://eprint.iacr.org/2003/022.pdf), but it's non public, pinky swear.
This kind of rubbish affirmations has no place in a scientific paper.

> The introduction point is a router (relay type) connected to a service hidden
by a particular circuit and which allows a client to submit an appointment
request for establishing a connection.

There is no such thing as "appointment request" in the protocol, this looks
like a bad double-translation (English → French → English) of "[rendezvous
point](https://community.torproject.org/onion-services/overview/)".

> The Tor protocol manages two types of communications, either to the Internet
network (external circuits, three routers) or to its internal network (internal
circuits, six routers) through which an end-to-end flow will be exchanged
between a client and a server.

"Exchanging an end-to-end-flow" isn't proper English, and I guess should be
understood as "an end-to-end encrypted connection will be established between a
client and a server".

> To build TOR routes (or circuits), clients and servers
of the Tor network build routes consisting of three
second level routers

No idea what a "second level router" could be.


## Statistical analysis of the tor routing protocol

> We essentially focuses on the public routers in the rest
of the papers (external circuits). As far as bridges are
concerned, it is sufficient to say that we have obtained
similar results (6-node circuits) as for public routers.
Other information are not public.

Pinky swear!

> The Tor routing protocol relies on the consensus file
which contains all the available TOR routers

All of them except bridges.

> A very first analysis (see Figure 1) clearly shows that
the distribution of routers is far from being uniform
and exhibits a wide dispersion.

Here is the Figure 1.

[![Fig 1.]({static}/images/tor_filiol2/fig1.png)]({static}/images/tor_filiol2/fig1.png)

There are no units on both axis, a green line name `g(x)`, and apparently the
violet crosses are `"proba_middle.dat" u 2`, whatever this means.

> In the minds of the general public, the choice of each router is made
randomly. This popular belief is false insofar as it implies that each node has
the same probability of being chosen.

I call bullshit on this. Of course, nodes don't have an equiprobable chance of
being selected: stabler, bigger, older, … relays have more chance of being
picked than a brand new slowpoke blinkenlight potato, if only for scalability's
sake which the paper actually agrees with later on.
 
> Building a circuit requires the choice of three separate routers. In the case
of an external route, the protocol will have to choose a guard, a middle and a
exit node. To make this choice, the protocol groups routers by type. If
necessary, it eliminates routers that are not specification-compliant. Then, it
computes the different sums of the weights and chooses a random value $$r$$
between $$0$$ and this sum. Finally, it scrolls through the list of the type
considered by summing up to $$i$$-th rank such as $$\sum_{j=0}^{j=i-1} weight_j
< r$$ and $$sum_{j=0}^{j-i} weight_j > r$$

The mathematical formula only means that the client will split the list of
relays in two, but doesn't explain why.

> This choice is not a priori questionable since it weighs certain routers
according to their stability, duration of network activity and available
bandwidth. However, this weighting is already partly undermining the supposed
equiprobability.

There is no "supposed equiprobability". 

Moreover, there are also other factors, like the
[AS](https://en.wikipedia.org/wiki/Autonomous_system_(Internet)) and the `/16`
because you don't want to have all your 3 relays too close to each other for
example.

> Indeed, if it is possible to favor certain routers, it is therefore possible
for the Tor foundation to restrict the choice of routers to particular subsets.

The Tor foundation doesn't operate any relays, let alone
[authorities](https://metrics.torproject.org/rs.html#search/flag:authority).
And of course, should 6 or more authority operators conspire together to serve
a skewed view of the network, nothing would prevent them from doing so, by
design, since they're the ones creating the consensus.


## Theoretical Statistical Model

> Since the open source code of Tor is very large and complex, we first
implement the automatic extraction of all relevant data the writing of routers
given by the details file and to analyze them.

"We were too lazy to actually read the source code, so we tried to make sense
of the raw consensus data instead".

<blockquote>
%%
 C_{middleguard} = \frac{C_{guard} × C_{middleguard}}{ C_{guard} − C_{middleguard}}
%%
</blockquote>

This one is actually hilarious:

%%
\begin{aligned}
C_{middleguard} &= \frac{ C_{guard} \times C_{middleguard} }{ C_{guard} - C_{middleguard} } \\\\
(C_{guard} - C_{middleguard}) \times C_{middleguard} &= C_{guard} \times C_{middleguard} \\\\ 
C_{guard} \times C_{middleguard} - C_{middleguard}^2 &= C_{guard} \times C_{middleguard}\\\\
- C_{middleguard}^2 &= 0\\\\
C_{middleguard} &= 0\\\\
\end{aligned}
%%


The other formula don't make more sense.


> $$C_{middle\\_seul}$$

Nice *frenglish* here, "seul" meaning "alone".

> Indeed, the specifications of the Tor network require that two routers on the
same circuit do not belong to the same x.y.z.w/16 subnetwork (class B) as well
as to the same (Johnson et al., 2013) family.

Nobody is using [classful
networks](https://en.wikipedia.org/wiki/Classful_network) anymore.  Also, it's
weird to cite a paper from 2013 instead of referring to the [current
specification](https://gitlab.torproject.org/tpo/core/torspec).

> This has the effect of influencing client choices. Let us precise that a
highly multi-threaded implementation enables to process any new version of the
consensus file in less than 5 minutes (around 10 billions of possible routes).

This is pure non-sense: what does "processing means"? [Metabase is able to
load the consensus instantly]({filename}/privacy/tor_metabase.md). This reads
like a bragging tentative from the intern who implemented whatever processing
was "needed" for the paper. And where is the number "10 billions" coming from?
With ~[7000](https://metrics.torproject.org/networksize.html) relays and
~[1500]( https://metrics.torproject.org/networksize.html ) bridges, even by not
using relays in the same `/16`, in the same `AS`, with the same family, the
number of possible routes is __way__ higher than 10¹⁰.

> Our theoretical model must now be compared with what is observed within the
Tor network in order to validate it or not.

It's not a theoretical model, since it is *entirely* based on the consensus' data.
Taking data, deriving some pseudo-formula from it, and then check if the
formula matches the same data is in fact an *interesting* approach.

> In order to have a representative sample with a margin of error of 1% and a
confidence level of 99%, we must consider at least 16,500 TOR routes.

Where are those number coming from‽ And how can taking 16,500 routes be enough
to have a confidence level of 99% when there are "around 10 billions of
possible routes"?

> Figure 2 represents the occurrences obtained for the different types of
routers when generating routes and shows that, according to the established
model, they follow a power law. Therefore, the weight assigned to each router
is the parameter that influences the choice made by the protocol (see Table 2).

Here is Figure 2.

[![Fig 2.1]({static}/images/tor_filiol2/fig2.1.png)]({static}/images/tor_filiol2/fig2.1.png)
[![Fig 2.2]({static}/images/tor_filiol2/fig2.2.png)]({static}/images/tor_filiol2/fig2.2.png)
[![Fig 2.3]({static}/images/tor_filiol2/fig2.3.png)]({static}/images/tor_filiol2/fig2.3.png)

It's interesting to notice that this is a logarithmic scale, but instead of
having the axis labelled as such, it's on the data itself that the logarithm is
computed, forming a weird variant of [log-log
plot](https://en.wikipedia.org/wiki/Log%E2%80%93log_plot) that is usually used
in economy, because a straight line would indicate a power-law:
it's obviously not the case here.

> It is also important to look at the case of internal circuits (used to access
to hidden services). In this case, the probability of the third router on the
circuit cannot be established directly from the data provided by the Tor
foundation. It was therefore necessary to identify the proportions of each type
of router (guard, middle, exit) positioned in third position.

It's [more complicated than
that](https://community.torproject.org/onion-services/overview/), since you
want to avoid reusing the same relay anywhere in your 6 (or 3 when using
[non-anonymous hidden
services](https://blog.torproject.org/whats-new-tor-0298)) relay-chains when
accessing a hidden service.

> In addition, the study was conducted on a small but representative sample
size but it is acceptable that on a large number of simulations, the results
would converge to the theory.

"We only did a few tests that kinda matched, but we believe that should we have
done it the right way, it should still work."

This is not how research is supposed to be done.

## Combinatorial analysis of the tor routing protocol

> Indeed, the previous study of routes has
focused only on routers and not on the links between
nodes.

This is non-sense, people have been studying tor graphs in almost every way,
including routes, just search for "path" or "flow" on
[anonbib](https://www.freehaven.net/anonbib/).

> Without loss of generality, this new model deals with at external circuits
only and when the input router is a guard. Additional results can be obtained
by contacted the first author.

Notwithstanding the broken English, it seems that we need to contact an author
to get the actual data/model used. Next step might be as
[DLC](https://en.wikipedia.org/wiki/Downloadable_content) with a season pass to
get everything that should have been included in the paper?

The rest of this part is about computing various distributions of the routes,
to find that the more bandwidth a node has, the bigger is its probability 
of being picked for routes.

## Discussion about the tor security

> Results presented in Section 4 (especially Tables 4 and 5) prove that it is
enough to target a reduced subset of nodes (only 1,463 nodes (1,217 middle, of
which 1,030 guards and 246 exits) to control 50% of the traffic in Table 4) .

I wouldn't use the word "only" to talk about ~1500 nodes: we're talking about
20% of the tor network.

> Thus, it would be able to remove the anonymity of users — clients and hidden
services — by tracking packets end-to-end through special patterns identified
or set up in the packets. It seems that the FBI operated in this way when it
dismantled the silk road in 2014

This is [pure fantasy](https://en.wikipedia.org/wiki/Ross_Ulbricht#Silk_Road,_arrest_and_trial).

> These node subsets of higher interest are then privileged targets for DDoS
(mandatorily from outside the Tor network) or coordinated targeted malware
attacks. Therefore, it is interesting to carry out a security analysis of the
ports and services which are opened for each of them, since they can constitute
potential entry points (a vulnerability scan to look for 0-day
vulnerabilities is also a required step).

If you have an RCE in tor, there is no need to target specific nodes since you
can just spray. If you have an RCE in Linux, there are easier and better ways
to get control of the network…

Also, I'm curious about what the authors meant by "a vulnerability scan to look
for 0-day vulnerabilities".

> We have used a massive scan tool designed in our lab to analyze top
significant routers involved in 50% of the Tor traf- fic, as well as the first
thousand ports and a few ot- her frequently used ones

I guess [`nmap`](https://nmap.org/) isn't hype enough?

> In addition, these ports are possible targets that can be overwhelmed by syn flooding attacks.

We're not in the 90s, [SYN flood](https://en.wikipedia.org/wiki/SYN_flood) isn't a thing anymore.

> As far as bridge routers are concerned, a corrupted machine in middle
position could count the number of network frames it is relaying when
constructing a circuit. It could then determine its own position and hence
compare the IP address of the machine right before its own position with
those listed by the foundation. It would enable to determine whether the
input router is a bridge or not. In order to prevent such an attack, it is
envisaged to always insert a guard router between the bridge and the middle
router. Thus, hidden nodes would be drowned within network users.

This attack is outlined in Roger Dingledine's
[Ten ways to discover Tor bridges](https://research.torproject.org/techreports/ten-ways-discover-tor-bridges-2011-10-31.pdf)
from 2011, summarized in a [blogpost]( https://blog.torproject.org/research-problems-ten-ways-discover-tor-bridges )
from the Tor project. Also, the solution proposed here is naïve compared to the
one from Dingledine's paper.

> Our study tends to prove that the security of the Tor network is not optimal.
Indeed, since overall security is based on the individual good practices of the
router owners, the maximum level that will be reached will not exceed that of
the weakest link.

This is also bullshit: the tor network *is designed* to accommodate
a minority of malicious/weak/misconfigured/potato/… nodes. Imagine if this
wasn't the case, and if the proposition was true: a single malicious node
would bring "the security of the Tor network" down to zero.

> It is also important to note that local security (of nodes and servers,
protocol strength...) is not sufficient as soon as we deal with a worldwide
infrastructure.

I don't understand where this is coming from, or even what it means.

> Having a more global and higher point of view is also important. In this
respect, our study has showed the Tor infrastructure too much relies actually
on a reduced number of nodes.

The [diversity problem](https://metrics.torproject.org/bubbles.html#as-exits-only) has been known for years,
and there is nothing novel presented here.

Amusingly, a couple of graphs are presented in the appendix of the paper, to
illustrate that whatever they are showing (there are no units on them) is a
power-law. Unfortunately, it looks way more like a normal distribution:

> Figure 5: Inverse cumulative density functions of data compared to the fitted law. The red dotted line (resp. green and blue)
describes the power law (resp. log-normal and exponential law).

[![Fig0]({static}/images/tor_filiol2/figX0.jpg)]({static}/images/tor_filiol2/figX0.jpg)
[![Fig1]({static}/images/tor_filiol2/figX1.jpg)]({static}/images/tor_filiol2/figX1.jpg)
[![Fig2]({static}/images/tor_filiol2/figX2.jpg)]({static}/images/tor_filiol2/figX2.jpg)
[![Fig3]({static}/images/tor_filiol2/figX3.jpg)]({static}/images/tor_filiol2/figX3.jpg)

> Figure 6: Inverse cumulative density functions of data compared to the fitted law (1-billion top TOR routes). The red dotted
line (resp. green and blue) describes the power law (resp. log-normal and exponential law).

[![Fig4]({static}/images/tor_filiol2/figX4.jpg)]({static}/images/tor_filiol2/figX4.jpg)
[![Fig5]({static}/images/tor_filiol2/figX5.jpg)]({static}/images/tor_filiol2/figX5.jpg)
[![Fig6]({static}/images/tor_filiol2/figX6.jpg)]({static}/images/tor_filiol2/figX6.jpg)
[![Fig7]({static}/images/tor_filiol2/figX7.jpg)]({static}/images/tor_filiol2/figX7.jpg)

They are respectively designated as "All OR", "OR guard", "OR middle", "OR
exit", "All OR", "OR guard", "OR middle" and "OR exit".

---

# Conclusion

Like the [previous
paper]({filename}/paper_notes/osint_analysis_tor_foundation.md), this one is
also written in broken English, full of silly mistakes (which could be excusable
if the paper wasn't reviewed by 3 different conferences and a journal.),
filled with approximations, sheer clickbaity inventions and bad approximations about Tor.
