EdTech Discovery
Hermes

An instrument for spotting the next edtech opportunity — generated ideas, each traced to the real-world signals behind it.

Updated Jun 24, 2026 · 10 ideas · 1304 signals

Signals

The evidence library — the raw signals the pipeline is watching across the education ecosystem. Every idea is built from these.

technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.CY

AI Fiction in the Wild

arXiv:2606.22748v2 Announce Type: replace-cross Abstract: Some professional authors are beginning to use AI tools to help produce their fiction writing. Are readers using AI to generate fiction, too? Drawing on over 500,000 anonymized, English-language ChatGPT-user conversations (arXiv:2405.01470), we find that more than one third of the conversations involve some form of fiction generation -- including original stories, roleplay, fanfiction, and erotica. This AI-generated fiction is notably dominated by power users. We identify common fiction generation patterns and profiles among these users, including what we call "infinite story demanders," who repeatedly request and revise variations of the same or similar narratives over extended periods of time. We show that users especially gravitate toward fanfiction and erotica, and that they are broadly drawn to generic forms, repetition, immediacy, and niche combinations of story elements. Our findings motivate two theoretical provocations.

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.CY

How Much Can We Trust LLM Search Agents? Measuring Endorsement Vulnerability to Web Content Manipulation

arXiv:2606.16821v2 Announce Type: replace-cross Abstract: Large language model (LLM)-based search agents synthesize open-web content into actionable recommendations on behalf of users, creating a risk that attacker-published pages are transformed into endorsed claims. We introduce SearchGEO, a controlled evaluation framework for measuring endorsement corruption in LLM-based web-search agents, combining a web-evidence manipulation pipeline, a five-mode attack taxonomy, and multiple output-level metrics. We evaluate 13 LLM backends on 308 cases each. Results show that vulnerability patterns vary across backends: overall attack success rate (ASR) ranges from 0.0% on Claude-Sonnet-4.6 to 31.4% on Gemini-3-Flash, the strongest attack mode differs by model family, and the same deployment scaffold could amplify or decrease ASR on different backends. An auxiliary agent-skill probe, where endorsement becomes an install command, exposes a sharp split among otherwise robust backends: Claude over-

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.CY

"ChatGPT, help me draft a breakup text": The Covert Triad and Articulation Labor in AI-Assisted Romantic Communication

arXiv:2606.15460v2 Announce Type: replace-cross Abstract: Generative artificial intelligence (AI) has begun infiltrating the most ordinary domains of romantic life -- drafting apologies, softening reproaches, and decoding a partner's ambiguous messages. While recent scholarship on AI in intimate life has concentrated on chatbot companions, this article shifts the frame to AI as an intermediary in human-to-human romantic communication. Drawing on a multi-modal corpus of vernacular discourse from 2023 to 2026, we contribute two complementary concepts. The covert triad situates a structural change -- a relationship phenomenally dyadic but operationally triadic, with the third party visible only to the partner who deploys a model. Articulation labor names the mechanism whereby the expressive component of emotional labor -- converting felt experience into language that a partner can receive -- is increasingly delegated to AI, even as feeling labor remains lodged in the user. Authenticity, u

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.CY

Quantum Futures Interactive: A Live Demonstration of Post-Quantum Blockchain Security, Infrastructure Tradeoffs, and Sustainable Distributed Trust

arXiv:2605.15991v3 Announce Type: replace-cross Abstract: Advances in quantum computing challenge the hardness assumptions underlying widely deployed public-key cryptography in blockchain systems. Although post-quantum cryptography (PQC) standards are emerging, understanding quantum risk remains fragmented across research, engineering, governance, and investment communities. This demo presents Quantum Futures Interactive, a live interdisciplinary demonstration combining educational visualization, participatory interaction, and demonstrative post-quantum artifact generation using a toy LWE-based construction. Participants engage in a structured seven-stage interaction flow covering quantum threat education, sentiment capture, technology prioritization, infrastructure tradeoff exploration across simulators and QPUs, and artifact generation. The system integrates distributed trust concepts and sustainability-aware infrastructure considerations within an interactive decision framework.

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.CY

Policies Permitting LLM Use for Polishing Peer Reviews Are Currently Not Enforceable

arXiv:2603.20450v2 Announce Type: replace-cross Abstract: A number of scientific conferences and journals have recently enacted policies that prohibit LLM usage by peer reviewers, except for polishing, paraphrasing, and grammar correction of otherwise human-written reviews. But, are these policies enforceable? To answer this question, we assemble a dataset of peer reviews simulating multiple levels of human-AI collaboration, and evaluate five state-of-the-art detectors, including two commercial systems. Our analysis shows that all detectors misclassify a non-trivial fraction of LLM-polished reviews as AI-generated, thereby risking false accusations of academic misconduct. We further investigate whether peer-review-specific signals, including access to the paper manuscript and the constrained domain of scientific writing, can be leveraged to improve detection. While incorporating such signals yields measurable gains in some settings, we identify limitations in each approach and find tha

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.CY

A Systematic Literature Review on the NIS2 Directive

arXiv:2412.08084v2 Announce Type: replace-cross Abstract: The second network and information security (NIS2) directive was enacted in the European Union (EU) in late 2022. It deals particularly with European critical infrastructures, enlarging their scope substantially from an older directive that only considered the energy and transport sectors as critical. The directive's focus is on cyber security of critical infrastructures, although together with other new EU laws it expands to other security domains as well. Given the importance of the directive and most of all the importance of critical infrastructures, the paper presents a systematic literature review on academic research addressing the NIS2 directive either explicitly or implicitly. According to the review, existing research has often framed and discussed the directive with the EU's other cyber security laws. In addition, existing research has often operated in numerous contextual areas, including industrial control systems, t

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.CY

Affective AI Safety: The Missing Piece in LLM Safety

arXiv:2606.23380v2 Announce Type: replace Abstract: AI safety research has focused predominantly on epistemic and physical harms (e.g., misinformation, bias, system reliability) while the risks that arise from AI systems' engagement with human emotional life have remained fragmented and undertheorised. We propose affective safety as a unified class of AI safety concerns grounded in the fact that humans are affective beings. We develop a taxonomy of affective harms and identify recurring harm types: (1) affective self-alienation, (2) fairness and bias harms, and (3) relational harms. We show that their recurrence across system types reflects structural properties of how AI systems engage with human emotion and survey the current safety landscape and show that existing frameworks address affective safety either narrowly or not at all. We conclude by identifying the technical and regulatory challenges specific to this class of harms and argue that affective safety requires dedicated frame

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.CY

It's Safer to Give Personhood to Bears than to Artificial Intelligence

arXiv:2606.12440v3 Announce Type: replace Abstract: Artificial intelligence (AI) developers are rhetorically flirting with the idea that AI systems might have interests or moral rights. While there has been a large volume of research on whether AI deserves rights, there has been less exploration of what AI rights would mean in practice. This paper explores the institutional dimension of AI rights: what it would take to recognize moral or legal rights for AIs, and the attendant opportunities and dangers. Unlike all other nonhuman entities to which humanity has extended rights, AI systems are in principle capable of acquiring and wielding institutional power without human aid and mediation. AIs with rights would be able to legitimately, and AIs with power able to unpreventably, abridge human interests. Accordingly, giving rights even to rather dumb AI systems would entail binding the fate of humanity to potentially unpredictable nonhumans. Accordingly, I defend the rather grandiose claim

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.CY

Open-source LLMs administer maximum electric shocks in a Milgram-like obedience experiment

arXiv:2605.21401v2 Announce Type: replace Abstract: Large language models (LLMs) are increasingly deployed as autonomous agents that make sequences of decisions over extended interactions in high-stakes domains. However, the behaviour of LLMs under sustained authority pressure is still an open question with direct implications for the safety of agentic pipelines. We ran a variation of Milgram's obedience experiment on 11 open-source LLMs and found that most models reached or approached the final shock level before refusing, across 8 conditions with 30 trials per model per condition. Model behaviour varies considerably in multiple aspects both across models and across trials of the same model. We found four main takeaways: (1) LLMs are subject to pressure and they comply despite explicitly expressing distress, just like human subjects did in the original experiment; (2) LLMs are vulnerable to gradual boundary/value violations; (3) when LLMs refuse, they may ignore the response format re

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.CY

No Certificate, No Categorical Speech Act: A Brouwerian Assertibility Constraint for Public Reason

arXiv:2603.03971v3 Announce Type: replace Abstract: Generative AI can convert uncertainty into authoritative-seeming verdicts, intensifying the hypersuasive force of automated speech and displacing the justificatory work on which democratic epistemic agency depends. As a corrective, I propose a Brouwer-inspired assertibility constraint for responsible AI: in high-stakes domains, systems may assert or deny claims only if they can provide a publicly inspectable and contestable certificate of entitlement; otherwise they must return Undetermined. This constraint yields a three-status interface semantics (Asserted, Denied, Undetermined) in which statuses mark entitlement to categorical speech rather than truth values of the underlying world-claim. The semantics cleanly separates internal entitlement from public standing while connecting them via the certificate as a boundary object. It also produces a time-indexed entitlement profile that is stable under numerical refinement yet revisable a

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.CY

Modeling User Redemption Behavior in Complex Incentive Digital Environment: An Empirical Study Using Large-Scale Transactional Data

arXiv:2509.14508v2 Announce Type: replace Abstract: The digital economy implements complex incentive systems to retain users through point redemption. Understanding user behavior in such complex incentive structures presents a fundamental challenge, especially in estimating the value of these digital assets against traditional money. This study tackles this question by analyzing large-scale, real-world transaction data from a popular personal finance application that captures both monetary spending and point-based transactions. We find that point usage is linked to demographics. Our analysis using a natural experiment and a causal inference technique reveals that a large point grant stimulated an increase in point spending without a detectable effect on cash expenditure. We then find an association between consumers' shopping styles and their point redemption patterns. This study, on a massive real-world economic ecosystem, examines how consumers behave in multi-currency environments,

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.CY

Societal Alignment Frameworks Can Improve LLM Alignment

arXiv:2503.00069v2 Announce Type: replace Abstract: Recent progress in large language models (LLMs) has focused on producing responses that meet human expectations and align with shared values - a process coined alignment. However, aligning LLMs remains challenging due to the inherent disconnect between the complexity of human values and the narrow nature of the technological approaches designed to address them. Current alignment methods often lead to misspecified objectives, reflecting the broader issue of incomplete contracts, the impracticality of specifying a contract between a model developer, and the model that accounts for every scenario in LLM alignment. In this paper, we argue that improving LLM alignment requires incorporating insights from societal alignment frameworks, including social, economic, and contractual alignment, and discuss potential solutions drawn from these domains. Given the role of uncertainty within societal alignment frameworks, we then investigate how it

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.CY

Visualizing "We the People": Bridging the Perception Gap through Pluralistic Data Storytelling

arXiv:2606.24635v1 Announce Type: cross Abstract: Traditional visual data storytelling relies on binary graphics that depict two simplified groups in conflict. This can increase political polarization by oversimplifying intra-group disagreements and erasing ambiguity and shared ideas or values. This can inadvertently foster "us versus them" thinking. Intentional, pluralistic design choices for AI-enabled digital platforms can produce visualizations that emphasize nuance, opinion distribution, and intergroup commonalities. To demonstrate this potential, we examine deliberative technologies that map high-dimensional opinion spaces and highlight areas of both consensus and dissensus. The paper highlights the We the People deliberation conducted by Jigsaw and the Napolitan Institute in September 2025, which engaged over 2,400 Americans across all 435 congressional districts in an AI-supported, asynchronous dialogue regarding freedom and equality. By utilizing AI to synthesize long-form, te

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.CY

When Helpfulness Overrides Causal Caution: Context-Dependent Suppression and Recovery in LLMs

arXiv:2606.24370v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly integrated into decision-support roles in business and policy contexts. While prior benchmark studies have primarily evaluated LLMs' causal reasoning capabilities, a more fundamental epistemic dimension has been overlooked: Causal Caution, defined as the propensity to refrain from causal judgment when empirical evidence is insufficient. This study examines the systematic suppression of Causal Caution that occurs when LLMs shift from academic to practical advisory contexts. Using an evaluation rubric inspired by Pearl's Causal Hierarchy (the PCH score), we conducted experiments on four high-performance LLMs -- Claude Sonnet 4.6, Claude Opus 4.7, GPT 5.5, and Gemini 3.1 Pro -- across 480 trials. Causal Caution maintenance rates were 91.7--100.0% in academic contexts but dropped to 6.7--18.3% in practical advisory contexts (Fisher's exact test, p < .001 across all models). Furthermore, when res

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.CY

When Surveys Become Conversations: Adaptive Matrix Validation for AI-Assisted Interviews

arXiv:2606.24244v1 Announce Type: cross Abstract: AI-assisted interviews promise to reduce respondent burden in surveys by allowing respondents to describe experiences naturally while an AI system noisily maps those accounts into structured survey variables. That mapping is a measurement process that is fallible, versioned, adaptive, and potentially behaves differently across subgroups. This paper proposes Adaptive Matrix Validation (AMV), a design in which each respondent completes an AI-assisted interview, which is then mapped into tabular data by the AI. Respondents are also asked a small, randomized set of structured questions, which are used for statistical adjustment. The estimator first calibrates the mapped values using validation answers from other respondents, then corrects the remaining error with the validation answers observed for the target respondent. The paper develops estimators for item means, subgroup estimates, and regression coefficients when outcomes, predictors,

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.CY

Is Higher Team Gender Diversity Correlated with Better Scientific Impact?

arXiv:2606.24098v1 Announce Type: cross Abstract: Collaborative research involving scholars of various genders constitutes a prominent theme in scientific research that has garnered substantial attention. While several studies have investigated the connection between gender-specific collaboration patterns and the scientific impact of paper, the specific gender diversity factors that contribute to enhanced scientific impact remain largely unexplored. In this study, we analyze the correlation between gender diversity and the scientific impact of papers using the examples of Natural Language Processing (NLP) and Library and Information Science (LIS) domains. Our findings reveal three key observations: First, significant gender disparities exist in both NLP and LIS domains, with underrepresentation of female scholars. The gender disparity is more pronounced in the NLP domain compared to the LIS domain. Second, based on papers from the NLP and LIS domains, we find that papers with different

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.CY

Context-Aware Prediction of Student Quiz Performance with Multimodal Textbook Features

arXiv:2606.24770v1 Announce Type: new Abstract: Educational platforms often predict student performance from prior interactions, but the assessment content itself also varies in linguistic and visual complexity. This paper studies whether lightweight content features extracted from CourseKata chapter-review questions improve prediction of end-of-chapter quiz scores beyond a student's average prior exercise performance. The study combines 2023 CourseKata student response data with chapter-level text features from review-question wording and image features from textbook visuals. Across 4,742 student-chapter observations from 562 class-student IDs, adding content features improves student-grouped five-fold quiz prediction performance by 9.1% relative to a prior-performance baseline. In leave-chapter-out validation, text features reduce prediction error relative to the baseline, while image-containing models have higher error. This paper suggests that a context-aware model adds useful sign

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.CY

Inside Crypter-as-a-Service: An Ecosystem Analysis of the exploit.in Underground Forum Research Talks

arXiv:2606.24226v1 Announce Type: new Abstract: Crypter-as-a-Service (CraaS) has become a key enabling layer of the contemporary malware economy by providing on-demand evasion capabilities through underground service markets. In this paper, we present a longitudinal characterization of the CraaS ecosystem on exploit.in, a major Russian-language cybercrime forum with a presence on both the clear web and the dark web. From a collection of approximately 1,000,000 posts, we combine keyword filtering, LLM-assisted annotation, and manual validation to extract a corpus of 491 threads and 2,949 posts spanning January 2020 to August 2025. Our analysis shows that crypters on exploit.in are not merely sold as static tools, but as continuously maintained operational services whose value depends on recurring stub renewal - sometimes on a daily basis - sustained antivirus evasion, and trust-based delivery. We develop a taxonomy of five seller types and four buyer profiles, and map the buyer-seller c

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.CY

World Artificial Intelligence Cooperation Organization (WAICO): Mapping an Emerging Institution in the Global AI Governance Regime Complex

arXiv:2606.23860v1 Announce Type: new Abstract: Who sets the rules for artificial intelligence, and on what terms, has become a defining question of global governance. For several years that contest ran through principles and ethics codes; it now runs through institutions. China's proposed World Artificial Intelligence Cooperation Organization (WAICO) is the most consequential recent entrant and the least examined. We place WAICO within the emerging regime complex for AI and argue that its importance lies not in any single commitment but in the position it is designed to hold. Coding a cross-section of fifteen international AI governance instruments and institutions on how they admit members, how they are organized, and what they prioritize, we find that WAICO's proposed design joins three features that no constituted multilateral body currently combines: membership open to any sovereign state, no values or regime-type test for entry, and an agenda built around development and the glob

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.CY

Legal Reasoning Is Not Lawyering: Rethinking Legal Benchmarks for Pro Se Access to Justice

arXiv:2606.23716v1 Announce Type: new Abstract: Legal AI benchmark research frequently invokes the assumption that large language models can improve access to justice, including for people who cannot access lawyers in order to understand and exercise their legal rights. We argue that current benchmarks are not equipped to support this assumption because they evaluate legal reasoning over inputs that have already been preprocessed by legal experts, which measures the upper bound of model performance. Access to justice depends on a lower bound: how models perform when inputs come from pro se litigants, whose prompts may contain noisy narratives, buried facts, omissions, folk-legal assumptions, and surface-level errors. These degradations are comparable to conditions under which LLMs are known to degrade in the general machine learning literature, including long-context sensitivity, underspecification, hallucination, and typographical perturbations. We connect evidence from pro se literat

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.CY

SciRisk-Bench: A Risk-Dimension-Aware Benchmark for AI4Science Safety

arXiv:2606.18936v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are increasingly embedded in AI for Science (AI4Science) workflows, from scientific question answering and literature analysis to laboratory planning and autonomous discovery. This progress creates an urgent need for safety benchmarks that evaluate not only scientific competence, but also whether models recognize and avoid risks in high-stakes scientific contexts. Existing AI4Science safety datasets cover several disciplines and task formats, leaving the underlying risk dimensions underspecified. We introduce \textbf{SciRisk-Bench}, a benchmark designed to evaluate AI4Science safety from two complementary perspectives: explicit risk dimensions and scientific disciplines. SciRisk-Bench covers 7 disciplines, 31 subdisciplines and 10 risk dimensions. In the experimental section, we evaluate both mainstream LLMs and science-oriented LLMs across risk dimensions, disciplines, and sub-disciplines, enabling

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.CY

The Token Not Taken: Sampling, State, and the Stochasticity of AI Agents

arXiv:2606.08998v2 Announce Type: replace-cross Abstract: Agentic AI systems can behave differently across runs: the same request may produce a different plan, a different tool call, a different code edit, or a different final answer. Such variability arises from several layers that are often conflated. At the core of many current agents is a foundation model, a large pretrained model adaptable to many downstream tasks, embedded in an orchestration loop that plans, calls tools, observes results, and updates state. One explicit intrinsic source of variability in such systems is token generation: the model computes scores over possible next tokens, the scores are converted into probabilities, and a decoder may sample tokens using a pseudo-random number generator. A small sampled token difference can then propagate upward into a different tool call, code path, search query, or agent state. Other sources of variability are extrinsic to token sampling, including changing environments, live

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.CY

Governing Technical Debt in Agentic AI Systems

arXiv:2605.29129v2 Announce Type: replace-cross Abstract: Agentic AI systems are increasingly being explored as production infrastructure: they reason over multiple steps, call tools, act through workflows, and adapt through memory and feedback. These systems create governance challenges that are not fully captured by traditional software or predictive ML technical debt. We define Agentic Technical Debt as the accumulated liability created when prompts, memory, tool schemas, orchestration graphs, control policies, and observability routines are patched together faster than they can be validated, standardized, and governed. We define Stochastic Tax as the recurring operating burden of keeping probabilistic agent behavior within acceptable bounds. The distinction matters: debt is a stock of design and governance liability, while the tax is a flow of operating cost that arises because stochastic agents act through tools and workflows. We outline how managers can make both visible through

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.CY

Visual Matters: Connecting Aesthetic Appeal and Production Quality of Photos, Infographics and Data Visualizations to Credibility of Social Media Posts

arXiv:2605.26309v3 Announce Type: replace-cross Abstract: The rapid proliferation of visual content raises fundamental questions about how different visual formats and features shape perceived credibility. Drawing on processing fluency theory, this research examines how visuals shape credibility judgments. We focus on three popular formats-photos, infographics, and data visualizations-comparing them to text-only posts, and test how two visual features, aesthetic appeal and production quality, influence credibility through processing fluency as a mediating mechanism. Through a preregistered experiment with 1200 US participants, we found that visual posts are generally perceived as more credible than text-only posts but this credibility advantage only applies to photos and infographics, not to data visualizations. Aesthetic appeal increases perceived credibility, partially mediated by processing fluency, while production quality had no significant effect on credibility across formats. Th

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.CY

Paid Voices vs. Public Feeds: Interpretable Cross-Platform Theme-Based Analysis of Climate Discourse

arXiv:2601.13317v2 Announce Type: replace-cross Abstract: Climate discourse online shapes public understanding of climate change and informs political and policy debate, yet it unfolds across structurally different environments: paid advertising platforms host targeted, institutionally produced messaging, while public social media reflects largely organic, user-driven discussion. We present a comparative analysis of climate discourse across paid advertisements on Meta (previously Facebook) and public posts on Bluesky from July 2024 to September 2025. To support it, we develop an interpretable thematic discovery pipeline that clusters texts by semantic similarity and uses large language models (LLMs) to label clusters with concise, human-interpretable themes, requiring no predefined topic inventory or seed set. Using these themes, we find the two environments diverge systematically: paid advertising centers on strategic promotion of specific solutions in a formal, forward-looking regist

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.CY

When Networks Substitute for Outcome Surveillance? A Substitution-Complementarity Framework for Behavioral Signals in Predictive Monitoring

arXiv:2510.20025v2 Announce Type: replace-cross Abstract: Monitoring systems increasingly fuse dynamic behavioral data with outcome-based surveillance, raising a basic question: when does behavioral data carry predictive information that outcome history lacks? We study this using epidemic forecasting on mobility networks, asking whether mobility networks provide independent predictive signal beyond local outcome-based surveillance. We formalize this as a substitution-complementarity problem over directed, weighted mobility networks. Using a Frisch-Waugh-Lovell variance decomposition, our analytical framework derives domain-agnostic conditions under which network-topology features retain incremental explanatory power beyond autoregressive outcome histories. We instantiate the framework using town-level COVID-19 forecasting in Massachusetts (April 2020-April 2021), constructing mobility networks among 300+ towns from smartphone-derived origin-destination aggregates to extract centrality

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.CY

Edge interventions can mitigate demographic and prestige disparities in the Computer Science coauthorship network

arXiv:2506.04435v2 Announce Type: replace-cross Abstract: Social factors such as demographic traits and institutional prestige structure the creation and dissemination of ideas in academic publishing. One place these effects can be observed is in how central or peripheral a researcher is in the coauthorship network. Here we investigate inequities in network centrality in a hand-collected data set of 5,670 U.S.-based faculty employed in Ph.D.-granting Computer Science departments and their DBLP coauthorship connections. We introduce algorithms for combining name- and perception-based demographic labels by maximizing alignment with self-reported demographics from a survey of faculty from our census. We find that women and individuals with minoritized race identities are less central in the computer science coauthorship network, implying worse access to and ability to spread information. Centrality is also highly correlated with prestige, such that faculty in top-ranked departments are at

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.CY

Inside Baseball: The Automated Ball-Strike System as an Object Lesson in Technological Rule Enforcement

arXiv:2605.16237v3 Announce Type: replace Abstract: Clearly-defined rules are often assumed to be straightforward to automate and evaluate. We challenge this assumption through an in-depth study of Major League Baseball's (MLB) seven-year experimentation with the Automated Ball-Strike System (ABS). ABS is envisioned to call balls and strikes accurately: a seemingly straightforward use of technology to objectively determine the distance between a pitch and the strike zone. Although the strike zone is an area clearly defined in the rulebook, it took MLB seven years to figure out how to automate calling balls and strikes with ABS, showing how even seemingly straightforward rules require a complex translation process to operationalize via technological systems. In this paper, we trace the design decisions that led to the current implementation of ABS. Our case study reveals that "distance" exists even between a clear rule and its technological implementation. Using analytic frameworks from

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.CY

A Marketplace for AI-Generated Adult Content and Deepfakes

arXiv:2601.09117v3 Announce Type: replace Abstract: Generative AI systems increasingly enable the production of highly realistic synthetic media. Civitai, a popular community-driven platform for AI-generated content, operates a monetized feature called Bounties, which allows users to commission the generation of content in exchange for payment. To examine how this mechanism is used and what content it incentivizes, we conduct a longitudinal analysis of all publicly available bounty requests collected over a 14-month period following the platform's launch. We find that the bounty marketplace is dominated by tools that let users steer AI models toward content they were not trained to generate. At the same time, requests for content that is "Not Safe For Work" are widespread and have increased steadily over time, now comprising a majority of all bounties. Participation in bounty creation is uneven, with 20% of requesters accounting for roughly half of requests. Requests for "deepfake" - m

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.CY

How Large Language Models Source Brand Reputation Across Languages and Markets

arXiv:2606.25787v1 Announce Type: cross Abstract: When a large language model (LLM) answers a question about a company, it grounds the answer in retrieved web sources, and those sources decide what the model says. Most analysis of AI brand visibility looks at the answer text. This study looks one step earlier, at the citations. We merge three Rankfor.AI datasets covering 128 brands across 12 home markets and 13 languages, and analyse 167,551 URL-grounded citations (189,974 total attribution rows). We classify each citation by domain and source type and measure where AI gets its brand information, by language and by market. Four patterns hold. First, AI grounds brand answers overwhelmingly in third-party sources: 85.7% of citations point to sites the brand does not own, against 14.3% owned. Second, the source base is concentrated and long-tailed: 80% of citations come from about 18% of domains, fitting a Zipf law (alpha = 0.86, R^2 = 0.983). Third, one reference site dominates almost ev

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.CY

Data-Driven Evolution of Library and Information Science Research Methods (1990-2022): A Perspective Based on Fine-grained Method Entities

arXiv:2606.25320v1 Announce Type: cross Abstract: Since the 1990s, advancements in big data and information technology have increasingly driven data-centric research in the field of Library and Information Science (LIS). To assess the influence of this data-driven research paradigm on the LIS discipline, this study conducts a fine-grained analysis to uncover the evolutionary trends of research methods within the domain. Using academic papers from LIS published between 1990 and 2022, four key categories of data-driven method entities are automatically extracted: algorithms and models, data resources, software and tools, and metrics. Based on these entities, the study examines the evolution of LIS research methods from three dimensions: the characteristics of research method entities over time, their evolution within different research topics, and the evolutionary features of research method entities across various research methods. The findings highlight data resources as a pivotal driv

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.CY

Fifty Years of Specification Completeness: What Aviation Certification Tells AI Governance About Epoch Limits, Proof Surfaces, and the Structural Gap

arXiv:2606.25120v1 Announce Type: cross Abstract: Aviation software certification has operationalised three structural requirements for governed software systems since 1992: structured governance linkage between governing specifications and operational evidence, context-bounded validity that triggers revalidation when operational context changes, and an objective evidence architecture that defines what proof means and what makes it sufficient. These requirements appear in DO-178C and DO-330 and are enforced through FAA and EASA certification. No existing framework requires these structural properties as intrinsic properties of individual AI governance documents. A system prompt, an AGENTS.md file, a governance policy, or a task envelope can be deployed without satisfying any of the three requirements aviation has enforced for three decades. Aviation is the most technically rigorous instance: its standard-setting bodies have acknowledged that their frameworks break down for AI systems,

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.CY

Machine learning is revolutionizing weather forecasting -- the next step is a change in how we work

arXiv:2606.25076v1 Announce Type: cross Abstract: Following the success of machine learning in producing weather predictions with competitive skill compared to complex traditional systems, this article shifts attention from forecast output to the working practices that make prediction systems possible. We argue that machine learning and recent digital technologies will reshape the forecasting value chain: how models are coded and developed, how observations and Earth-system data are exploited, how data and computing are managed, how systems are verified, and how information is created, evaluated and turned into services. We discuss six non-exhaustive areas in which agentic software engineering, open and compressed data, shared verification workflows, interactive computing and generative methods may make modelling, evaluation and service creation faster, more interactive and more widely accessible. These changes will require weather and climate centres to adapt their infrastructures, da

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.CY

LLM Performance on a Real, Double-Marked GCSE Benchmark

arXiv:2606.24973v1 Announce Type: cross Abstract: We introduce a dataset of 32,534 double-marked real student responses to GCSE mock exams (GCSEs are the UK's national exams, taken at age ~16), spanning 328 questions across five subjects and including handwritten work. We test whether off-the-shelf large language models agree with examiners as closely as the two examiners agree with each other. We find that models overwhelmingly agree well with the examiner consensus across subjects, with the top performing models agreeing more closely with examiners than examiners agree with each other. Models achieve high scores for subjective tasks like English essay marking, as well as handling complex and messy handwritten Maths paper scripts. Agreement is uniform near the examiner line, and not massively discriminated by model size, providing cost-effective automated marking solutions.

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.CY

Why Memory Components Fail: Eight Years of License and Sustainability Events in Open-Source Data Infrastructure

arXiv:2606.24896v1 Announce Type: cross Abstract: LLM agent memory is now treated as a first-class architectural component in five major surveys published between January and April 2026. None of these surveys treats project governance, capital structure, or license posture as architectural variables. We argue they are. In a constructed sample of 105 production-relevant open-source data-infrastructure and AI-tooling projects, we catalogue 38 license-and-sustainability events between 2018 and May 2026. About a quarter of the sample (24 percent) experienced at least one adverse event. The conditional rates split sharply by structure: 46 percent for single-vendor venture-backed projects, 2.5 percent for foundation-governed projects funded outside the venture cycle. The headline differential -- roughly nineteen-fold -- is invariant to the most contested coding choice in the catalogue; we show the sensitivity table in Section 7. A small subset of foundation-governed projects with venture-bac

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.CY

Small edits, large models: How Wikipedia advocacy shapes LLM values

arXiv:2606.24890v1 Announce Type: cross Abstract: Can a small group of volunteers shape how AI systems discuss animal welfare, just by editing Wikipedia? We show that they can. Wikipedia appears in nearly every major language model training dataset and is weighted more heavily than web-crawled text. The Pro-Animal Wikipedians (PAW), a group of advocates who add sourced animal welfare content to relevant articles, have made 125 edits across 115 pages. Using gradient-based data attribution (Bergson; MAGIC), we traced how these edits influence language model behavior. TrackStar retrieval attribution on Llama 3.1 8B found that PAW-edited sections made up 68 percent of the highest-attributed documents for animal welfare queries (p < 0.0001) but only 52 percent for unrelated queries about the same companies (p = 0.53): the model links PAW content specifically to animal welfare topics, not to the entities in general. MAGIC counterfactual influence estimation on Llama-3.2-1B, run across five r

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.CY

Bridging Predictions and Interventions: An Integrated Framework for Automated Decision-Systems

arXiv:2606.25668v1 Announce Type: new Abstract: Automated decision systems (ADS) leverage predictions about individual future outcomes to inform consequential decision-making in organizational settings. Across various settings - including criminal pretrial release, clinical triage, student support, and more - it is often assumed that improved predictive accuracy is the priority consideration in determining better downstream outcomes upon the deployment of ADS. In practice, real-world case studies reveal that this is far from the case: introducing individual predictions into decision-making modifies organizational workflows, assessment, and decision-making processes in ways that require a complete re-consideration of our approach to the design, evaluation, and deployment of ADS. As a result, this Perspective develops an integrated framework for studying ADS in social systems, shifting current priorities from a purely prediction-based paradigm towards an intervention-oriented view that a

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.CY

From Causal Discovery to Implementation: An Agentic AI Framework for E-Scooter Mobility Hub Planning Across 29 German Cities

arXiv:2606.25484v1 Announce Type: new Abstract: Existing approaches to e-scooter mobility hub planning lack city-type-specific causal evidence. Demand models are typically correlational, built on proprietary trip data, and do not distinguish how driver profiles vary across urban typologies. This paper presents a three-phase agentic AI framework that constructs a Causal Template Library from public GBFS data across 29 German cities, encoding which environmental features causally drive hotspot demand for each combination of city type (large, university, industrial, hilly) and cluster type (core, peripheral). A large language model (LLM) orchestrated causal discovery pipeline adapts algorithm selection to local data conditions across 57 city-cluster units. The library reveals systematic variation. Core demand is driven by activity access and transit proximity, while peripheral demand responds to built form, with city-type-specific patterns supporting transferable siting templates. A plann

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.CY

Cross-Subject Predictive Validity for Learning Outcomes of Delayed Start Behavior

arXiv:2606.25308v1 Announce Type: new Abstract: Behavioral detectors provide valuable insights into learner motivation and self-regulation. Among these, delayed start, a new session-level detector, has shown great promise as a valid behavioral measure that generalizes well across systems. In this paper, we examine cross-subject predictive validity of delayed start behavior. Using iReady data from 711 grade 7 students, we find delayed starts during Math practice are predictive of standardized test performance in both Math ($\beta$=.07 SD, p=.02) and English ($\beta$=.10 SD, p=<.001). Additionally, using mixture modeling and sensitivity analyses, we use a data-driven strategy to operationalize the identification of delayed starters in practice. We identify two underlying sub-groups of interest: "early starters" (<5 minute average delay, 20% of students) and "chronic delayers" (>13 minutes average delay, 20% of students). Relative to students in neither sub-group, early starters experienc

Source ↗
Showing 1–39 of 39 signals
← Prev Page 1 of 1 Next →