Kantian Fallibilist Ethics for AI alignment

Chaly, Vadim

doi:10.22034/jpiut.2024.62766.3837

Kantian Fallibilist Ethics for AI alignment

نوع مقاله : مقاله علمی- پژوهشی

نویسنده

Vadim Chaly

Lomonosov Moscow State University, Immanuel Kant Baltic Federal University, Russia

10.22034/jpiut.2024.62766.3837

چکیده

The problem of AI alignment has parallels in Kantian ethics and can benefit from its concepts and arguments. The Kantian framework allows us to better answer the question of what exactly AI is being aligned to, what are the problems of alignment of rational agents in general, and what are the prospects for achieving a state of alignment. Having described the state of discussions about alignment in AI, I will reformulate them in Kantian terms. Thus, the process of alignment is captured by the concept of enlightenment, and for the final state of alignment in Kant’s lexicon there is the concept of the “kingdom of ends.” I will argue that the discourse of alignment and the Kantian ethical program 1) are devoted to the same general end of harmonizing the thinking and acting of rational agents, 2) encounter similar difficulties, well known in the Kantian discussions with its comparatively longer history, and 3) for a number of reasons lying on the side of humanity, do not have and, despite the hopes and attitudes of some participants in the AI discussions, will not have a theoretically rigorous, harmonious and practically implementable, conflict-free solution – alignment will remain a regulative idea in the Kantian sense, but will not become a reality.

کلیدواژه‌ها

موضوعات

فلسفه

عنوان مقاله [English]

Kantian Fallibilist Ethics for AI alignment

نویسنده [English]

Vadim Chaly

Lomonosov Moscow State University, Immanuel Kant Baltic Federal University, Russia

چکیده [English]

کلیدواژه‌ها [English]

AI alignment
moral deliberation
moral fallibilism specification gaming
kingdom of ends
categorical imperative
misgeneralization

مراجع

Baumann, M. (2019). Consequentializing and Underdetermination. Australasian Journal of Philosophy, 97 (3), 511–27. https://doi.org/10.1080/00048402.2018.1501078

Baumann, M. (2022). Moral Underdetermination and a New Skeptical Challenge. Synthese 200 (3), 208. https://doi.org/10.1007/s11229-022-03529-w

Bennett, M. R., & Hacker. P. M. S. (2021). Philosophical Foundations of Neuroscience. John Wiley & Sons.

Future of Life Institute. Asilomar AI Principles. Future of Life Institute (blog). https://futureoflife.org/open-letter/ai-principles/

Gabriel, I. (2020). Artificial Intelligence, Values, and Alignment. Minds and Machines, 30 (3), 411–37. https://doi.org/10.1007/s11023-020-09539-2

Grier, M. (2001). Kant’s Doctrine of Transcendental Illusion. Cambridge University Press.

Hanna, R. & Michelle M. (2009). Embodied Minds in Action. Oxford University Press.

Hegel, G. W. F. (1991). Elements of the Philosophy of Right. Edited by A W. Wood. Translated by H. B. Nisbet. Cambridge University Press.

Herman, B. (1993). The Practice of Moral Judgment. Harvard University Press.

Ji, & et al. (2024). AI Alignment: A Comprehensive Survey. arXiv. http://arxiv.org/abs/2310.19852

Kant, I. (1996). Practical Philosophy. Edited & translated by M. J. Gregor. Cambridge University Press.

Kim, H. & Dieter S. (eds). (2022). Kant and Artificial Intelligence. Walter de Gruyter GmbH & Co KG.

Klemperer, V. (2013). Language of the Third Reich. Bloomsbury Academic.

Koons, R. C. (2022). Defeasible Reasoning. In The Stanford Encyclopedia of Philosophy, edited by Edward N. Zalta, Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/archives/sum2022/entries/reasoning-defeasible/

Langosco, & et al. (2022). Goal Misgeneralization in Deep Reinforcement Learning. In Proceedings of the 39^th International Conference on Machine Learning, 12004–19. PMLR. https://proceedings.mlr.press/v162/langosco22a.html

Leike, & et al. (2018). Scalable Agent Alignment via Reward Modeling: A Research Direction. arXiv. https://doi.org/10.48550/arXiv.1811.07871

MacIntyre, A. C. (1966) A Short History of Ethics. Macmillan.

MacIntyre, A. C. (1988). Whose Justice? Which Rationality? University of Notre Dame Press.

Massimi, M. (2017). What Is This Thing Called ‘Scientific Knowledge? – Kant on Imaginary Standpoints and the Regulative Role of Reason. Kant Yearbook 9 (1), 63–84. https://doi.org/10.1515/kantyb-2017-0004

Massimi, M. (2018). Points of View: Kant on Perspectival Knowledge. Synthese 198 (S13), 3279–96. https://doi.org/10.1007/s11229-018-1876-7

Muchnik, P. (2019). Laura Papish, Kant on Evil, Self-Deception, and Moral Reform, Oxford University Press, 2018 pp. Xvii + 280 Isbn 9780190692100 $85.00.” Kantian Review 24 (2), 316–22. https://doi.org/10.1017/s1369415419000104

O’Neill, O. (2013). Acting on Principle: An Essay on Kantian Ethics. 2^nd edition, Cambridge University Press.

Papish, L. (2018). Kantian Self-Deception. In Kant on Evil, Self-Deception, and Moral Reform, edited by Laura Papish, Oxford University Press. https://doi.org/10.1093/oso/9780190692100.003.0004

Papyshev, G. & Migliorini, S. (2024). Developing a Liability Framework for Harms Arising out of Specification Gaming. In. https://openreview.net/forum?id=pU9QUQGsuc.

Rawls, J. & Herman, B. (2000). Lectures on the History of Moral Philosophy. Harvard University Press.

Rawls, J. (1989). Themes in Kant’s Moral Philosophy. In Kant’s Transcendental Deductions: The Three Critiques and the Opus Postumum, 80–113. Stanford University Press.

Recanati, F. (2007). Perspectival Thought: A Plea for (Moderate) Relativism. Clarendon Press.

Sneddon, A. (2011). A New Kantian Response to Maxim-Fiddling. Kantian Review 16 (1): 67–88. https://doi.org/10.1017/s1369415410000087

Sticker, M. (2019). Kant, Moral Overdemandingness and Self-Scrutiny. Noûs n/a (n/a): 1–24. https://doi.org/10.1111/nous.12308

Sticker, M. (2017). When the Reflective Watch-Dog Barks: Conscience and Self-Deception in Kant.” Journal of Value Inquiry 51 (1), 85–104. https://doi.org/10.1007/s10790-016-9559-4

Timmons, M. (2017). Significance and System: Essays in Kant’s Ethics. Oxford University Press.

Wood, A. W. (2006). The Supreme Principle of Morality. In The Cambridge Companion to Kant and Modern Philosophy, Edited by P. Guyer, 342–80. Cambridge University Press.

Чалый, В. А. (2022) К кантианскому моральному фаллибилизму: недоопределенность в рассуждениях по первой формуле категорического императива. Вестник Московского Университета. Серия 7. Философия 1, 105–14.

نام و نام خانوادگی *

پست الکترونیکی *

وابستگی سازمانی *

توضیحات *

شناسه امنیتی *

دوره 18، شماره 47 - شماره پیاپی 47
ویژه نامه
ویژه‌نامه فلسفه کانت در قرن21 (تابستان1403)
1403
صفحه 303-318

تعداد مشاهده مقاله: 627
تعداد دریافت فایل اصل مقاله: 542

Kantian Fallibilist Ethics for AI alignment

Kantian Fallibilist Ethics for AI alignment

مراجع

ارسال نظر در مورد این مقاله

دوره 18، شماره 47 - شماره پیاپی 47
ویژه نامه
ویژه‌نامه فلسفه کانت در قرن21 (تابستان1403)
1403
صفحه 303-318

فایل ها

هم رسانی

ارجاع به این مقاله

آمار

Kantian Fallibilist Ethics for AI alignment

Kantian Fallibilist Ethics for AI alignment

مراجع

ارسال نظر در مورد این مقاله

دوره 18، شماره 47 - شماره پیاپی 47 ویژه نامهویژه‌نامه فلسفه کانت در قرن21 (تابستان1403) 1403صفحه 303-318

فایل ها

هم رسانی

ارجاع به این مقاله

آمار

دوره 18، شماره 47 - شماره پیاپی 47
ویژه نامه
ویژه‌نامه فلسفه کانت در قرن21 (تابستان1403)
1403
صفحه 303-318