The whole article is quite funny, especially the lists of most used tankie words, or the branding of foreignpolicy as a left-wing news source.
The whole article is quite funny, especially the lists of most used tankie words, or the branding of foreignpolicy as a left-wing news source.
So far, on the arxiv page, no data or source code have been provided alongside the paper. I’d expect jupyter journals, or something like that at least, for reproducibility. Perhaps they will be added later or they are provided in a URL within the paper that I have not yet read.
In any case, the screenshot is of Table 11, and it is found in Appendix D, Domain Analysis:
Describing foreignpolicy.com as left-wing is an example of miscategorization by the authors, as is calling redsails.org a “Chinese far-left platform.” Neither of these are accurate statements, and they undercut trust that the authors are correctly and thoroughly labeling and interpreting their data. Between this and other glaring oversights in Table 12 – which purports that domains like “redditsave.com,” “ko-fi.com,” “twimg.com,” and “archive.is” are “representative domains of tankies” specifically and supposedly not heavily found in other similar far-left communities (as per the authors’ description of the Tf-Idf algorithm and their motivation for its use) – there is a compelling case that the authors (1) do not themselves possess a sufficient level of understanding of left-wing ideology – much less Marxist-Leninist ideology – to label it accurately, and (2) may have been sloppy with their data analysis (though this can’t be definitively known without access to the underlying datasets and analytics source code).
Majestic is described on the cited URL as: “The million domains we find with the most referring subnets.” Basically, of the 7,049 different domains contained in the 146,078 URLs the authors found in their crawl, remove any that are found in the top 1,000 domains as defined by Majestic. Domains like google.com, facebook.com, reddit.com (whether or not the authors recognize the potential problem with excluding that particular result from the table is unknown at this point; I have not finished reviewing the paper).
Thanks