Journal of Philosophical Investigations

نوع مقاله : مقاله علمی- پژوهشی

نویسنده

Lomonosov Moscow State University, Immanuel Kant Baltic Federal University, Russia

چکیده

The problem of AI alignment has parallels in Kantian ethics and can benefit from its concepts and arguments. The Kantian framework allows us to better answer the question of what exactly AI is being aligned to, what are the problems of alignment of rational agents in general, and what are the prospects for achieving a state of alignment. Having described the state of discussions about alignment in AI, I will reformulate them in Kantian terms. Thus, the process of alignment is captured by the concept of enlightenment, and for the final state of alignment in Kant’s lexicon there is the concept of the “kingdom of ends.” I will argue that the discourse of alignment and the Kantian ethical program 1) are devoted to the same general end of harmonizing the thinking and acting of rational agents, 2) encounter similar difficulties, well known in the Kantian discussions with its comparatively longer history, and 3) for a number of reasons lying on the side of humanity, do not have and, despite the hopes and attitudes of some participants in the AI discussions, will not have a theoretically rigorous, harmonious and practically implementable, conflict-free solution – alignment will remain a regulative idea in the Kantian sense, but will not become a reality.

کلیدواژه‌ها

موضوعات

عنوان مقاله [English]

Kantian Fallibilist Ethics for AI alignment

نویسنده [English]

  • Vadim Chaly

Lomonosov Moscow State University, Immanuel Kant Baltic Federal University, Russia

چکیده [English]

The problem of AI alignment has parallels in Kantian ethics and can benefit from its concepts and arguments. The Kantian framework allows us to better answer the question of what exactly AI is being aligned to, what are the problems of alignment of rational agents in general, and what are the prospects for achieving a state of alignment. Having described the state of discussions about alignment in AI, I will reformulate them in Kantian terms. Thus, the process of alignment is captured by the concept of enlightenment, and for the final state of alignment in Kant’s lexicon there is the concept of the “kingdom of ends.” I will argue that the discourse of alignment and the Kantian ethical program 1) are devoted to the same general end of harmonizing the thinking and acting of rational agents, 2) encounter similar difficulties, well known in the Kantian discussions with its comparatively longer history, and 3) for a number of reasons lying on the side of humanity, do not have and, despite the hopes and attitudes of some participants in the AI discussions, will not have a theoretically rigorous, harmonious and practically implementable, conflict-free solution – alignment will remain a regulative idea in the Kantian sense, but will not become a reality.

کلیدواژه‌ها [English]

  • AI alignment
  • moral deliberation
  • moral fallibilism specification gaming
  • kingdom of ends
  • categorical imperative
  • misgeneralization
Baumann, M. (2019). Consequentializing and Underdetermination. Australasian Journal of Philosophy, 97 (3), 511–27. https://doi.org/10.1080/00048402.2018.1501078
Baumann, M. (2022). Moral Underdetermination and a New Skeptical Challenge. Synthese 200 (3), 208. https://doi.org/10.1007/s11229-022-03529-w
Bennett, M. R., & Hacker. P. M. S. (2021). Philosophical Foundations of Neuroscience. John Wiley & Sons.
Future of Life Institute. Asilomar AI Principles. Future of Life Institute (blog). https://futureoflife.org/open-letter/ai-principles/
Gabriel, I. (2020). Artificial Intelligence, Values, and Alignment. Minds and Machines, 30 (3), 411–37. https://doi.org/10.1007/s11023-020-09539-2
Grier, M. (2001). Kant’s Doctrine of Transcendental Illusion. Cambridge University Press.
Hanna, R. & Michelle M. (2009). Embodied Minds in Action. Oxford University Press.
Hegel, G. W. F. (1991). Elements of the Philosophy of Right. Edited by A W. Wood. Translated by H. B. Nisbet. Cambridge University Press.
Herman, B. (1993). The Practice of Moral Judgment. Harvard University Press.
Ji, & et al. (2024). AI Alignment: A Comprehensive Survey. arXiv. http://arxiv.org/abs/2310.19852
Kant, I. (1996). Practical Philosophy. Edited & translated by M. J. Gregor. Cambridge University Press.
Kim, H. & Dieter S. (eds). (2022). Kant and Artificial Intelligence. Walter de Gruyter GmbH & Co KG.
Klemperer, V. (2013). Language of the Third Reich. Bloomsbury Academic.
Koons, R. C. (2022). Defeasible Reasoning. In The Stanford Encyclopedia of Philosophy, edited by Edward N. Zalta, Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/archives/sum2022/entries/reasoning-defeasible/
Langosco, & et al. (2022). Goal Misgeneralization in Deep Reinforcement Learning. In Proceedings of the 39th International Conference on Machine Learning, 12004–19. PMLR. https://proceedings.mlr.press/v162/langosco22a.html
Leike, & et al. (2018). Scalable Agent Alignment via Reward Modeling: A Research Direction. arXiv. https://doi.org/10.48550/arXiv.1811.07871
MacIntyre, A. C. (1966) A Short History of Ethics. Macmillan.
MacIntyre, A. C. (1988). Whose Justice? Which Rationality? University of Notre Dame Press.
Massimi, M. (2017). What Is This Thing Called ‘Scientific Knowledge? – Kant on Imaginary Standpoints and the Regulative Role of Reason. Kant Yearbook 9 (1), 63–84. https://doi.org/10.1515/kantyb-2017-0004
Massimi, M. (2018). Points of View: Kant on Perspectival Knowledge. Synthese 198 (S13), 3279–96. https://doi.org/10.1007/s11229-018-1876-7
Muchnik, P. (2019). Laura Papish, Kant on Evil, Self-Deception, and Moral Reform, Oxford University Press, 2018 pp. Xvii + 280 Isbn 9780190692100 $85.00.” Kantian Review 24 (2), 316–22. https://doi.org/10.1017/s1369415419000104
O’Neill, O. (2013). Acting on Principle: An Essay on Kantian Ethics. 2nd edition, Cambridge University Press.
Papish, L. (2018). Kantian Self-Deception. In Kant on Evil, Self-Deception, and Moral Reform, edited by Laura Papish, Oxford University Press. https://doi.org/10.1093/oso/9780190692100.003.0004
Papyshev, G. & Migliorini, S. (2024). Developing a Liability Framework for Harms Arising out of Specification Gaming. In. https://openreview.net/forum?id=pU9QUQGsuc.
Rawls, J. & Herman, B. (2000). Lectures on the History of Moral Philosophy. Harvard University Press.
Rawls, J. (1989). Themes in Kant’s Moral Philosophy. In Kant’s Transcendental Deductions: The Three Critiques and the Opus Postumum, 80–113. Stanford University Press.
Recanati, F. (2007). Perspectival Thought: A Plea for (Moderate) Relativism. Clarendon Press.
Sneddon, A. (2011). A New Kantian Response to Maxim-Fiddling. Kantian Review 16 (1): 67–88. https://doi.org/10.1017/s1369415410000087
Sticker, M. (2019). Kant, Moral Overdemandingness and Self-Scrutiny. Noûs n/a (n/a): 1–24. https://doi.org/10.1111/nous.12308
Sticker, M. (2017). When the Reflective Watch-Dog Barks: Conscience and Self-Deception in Kant.” Journal of Value Inquiry 51 (1), 85–104. https://doi.org/10.1007/s10790-016-9559-4
Timmons, M. (2017). Significance and System: Essays in Kant’s Ethics. Oxford University Press.
Wood, A. W. (2006). The Supreme Principle of Morality. In The Cambridge Companion to Kant and Modern Philosophy, Edited by P. Guyer, 342–80. Cambridge University Press.
Чалый, В. А. (2022) К кантианскому моральному фаллибилизму: недоопределенность в рассуждениях по первой формуле категорического императива. Вестник Московского Университета. Серия 7. Философия 1, 105–14.
CAPTCHA Image