Arai Lab. 2006

荒井研究室について

次の3つのキーワードを3本柱に、音声コミュニケーションを研究対象の中心として音声科学や音声知覚、聴覚科学などのヒューマンサイエンスから、音響学や音声工学、信号処理などに関わる研究までを幅広く行う:

・学際・・・工学だけに留まらず、数学、物理学、医学、病理学、心理学、言語学など幅広い学問領域との接点を大切に
・国際・・・国際的なレベルの研究を、国際的な研究者とともに
・人のため・・・福祉工学につながる研究も念頭において

[現在の研究テーマ]

以下に、各研究テーマの概略を述べる:

1) 音声の時間特性に関する研究

2) ホール等の室内で音声を拡声する際に「聞こえ」の劣化を防ぐ

3) 補聴器応用を目的とした聴覚障害者のための音声処理

4) 音声合成などの音声技術を用いた障害者支援

5) コンピュータモデルや声道模型による声道の可視化

6) ディジタル画像に字幕を付与するための音声区間自動検出

7) DSPを用いた音声処理の実時間化に関する研究

8) 音声知覚における脳の左右半球の働きを解明する

9) 音声における個人性に関する研究

10) その他：音響音声学を含む音声学や音韻論など

・質問などのある方は、

までメールをどうぞ。

荒井研究室ホームページ

[研究テーマの概略]

1) 音声の時間特性に関する研究

最近、機械に音声を入力することによってその機械がある処理を施したり（例えばカーナビ）、コンピュータに音声を入力すると入力された音声が画面上の文字に変換されたり（例えばディクテーションソフト）などをよく目にすることがある。このような自動音声認識において、その認識率は必ずしも１００％ではない。その理由としては話者の違いによる影響（性別、方言、なまりなど）、同じ話者でも発話の違いによるもの（例えば、発話スタイルや感情などが含まれる）、周囲の雑音（騒音、周りのの音声、残響など）、その他様々な要因が考えられる。

ところで、我々人間は雑音や残響の多い環境でも音声を聞き取ることができる。それに対し、機械を使って自動的に音声を認識しようとすると雑音や残響の影響で音声の認識率は著しく低下する。人間の聴覚に匹敵するほどの確実な音声認識を機械で実現するには、現在の技術ではまだ遠い。そのために、自動音声認識などを始めとする多くの音声技術において、人間の聴覚機構を解明し、それを模倣するための研究に多くの努力が注がれているが、その中でも最近特に注目されている技術に「音声の変調スペクトル表現」というものがある。

音声信号は、この図の上段にあるように横軸に時間、縦軸に振幅を持つような時間波形で表されるが、図の下段にあるようなスペクトログラム（横軸が時間、縦軸が周波数）で表現するとその特徴はより明確になる。

音声信号のスペクトログラムを見ると、その情報は時間-周波数平面において複雑な形で分布しているのが分かる。しかし、その中にも音声信号が持つ固有の規則性を見い出すことができる。例えば、スペクトログラムを時間軸（横軸）に沿ってスライスしてみる。ここで、スライスする位置を仮に 1 kHzの周波数と仮定すると、そのスライスした断面からは「1 kHzにおけるパワーの時間変化」を見ることができる。その時間変化にはある周期性があり、その周期性の中で最も強い成分は4～5 Hz付近に存在する。この成分は、「ある帯域における振幅やパワーの時間変化」に対するスペクトル分析を行うことによって推定することができる。この「振幅・パワーの時間変化に対するスペクトル」は「変調スペクトル (modulation spectrum)」と呼ばれ、変調スペクトルの横軸は「変調周波数 (modulation frequency)」と呼ばれる。この変調スペクトルを見ると、そのピークが4～5 Hzに現れるのである。この周波数は、言わば音声の調音器官（顎、舌など）の動きを反映したものと考えられるが、興味深いことに、人間の聴覚が敏感である変調周波数帯域に対応している。

参考文献:

: T. Arai, H. Hermansky, M. Pavel and C. Avendano, ``Intelligibility of speech with filtered time trajectories of spectral envelopes,'' Proc. of the International Conf. on Spoken Language Processing (ICSLP), Vol. 4, pp. 2490-2493, Philadelphia, 1996.
: T. Arai and S. Greenberg, ``The temporal properties of spoken Japanese are similar to those of English,'' Proc. of the European Conf. on Speech Communication and Technology (Eurospeech), Vol. 2, pp. 1011-1014, Rhodes, 1997.
: T. Arai, M. Pavel, H. Hermansky and C. Avendano, ``Syllable intelligibility for temporally filtered LPC cepstral trajectories,'' Journal of the Acoustical Society of America, Vol. 105, No. 5, pp. 2783-2791, 1999.
: D. Behne, T. Arai, P. Czigler and K. Sullivan, ``Vowel duration and spectra as perceptual cues to vowel quantity: A comparison of Japanese and Swedish,'' Proc. of the International Congress of Phonetic Sciences (ICPhS), Vol. 2, pp. 857-860, San Francisco, 1999.
: T. Arai and N. Warner, ``Word level timing in spontaneous Japanese speech,'' Proc. of the International Congress of Phonetic Sciences (ICPhS), Vol. 2, pp. 1055-1058, San Francisco, 1999.
: N. Warner and T. Arai, ``Japanese Mora-Timing: A Review,'' Phonetica, Vol. 58, pp. 1-25, 2001.
: N. Warner and T. Arai, ``The role of the mora in the timing of spontaneous Japanese speech,'' Journal of the Acoustical Society of America, Vol. 109, No. 3, pp. 1144-1156, 2001.
: S. Greenberg and T. Arai, ``The relation between speech intelligibility and the complex modulation spectrum,'' Proc. of the European Conf. on Speech Communication and Technology (Eurospeech), Vol. 1, pp. 473-476, Aalborg, 2001.
: M. Komatsu and T. Arai, ``Acoustic realization of prosodic types: Constructing average syllables,'' Meeting Handbook of the Linguistic Association of Canada and the United States (LACUS) Forum, p. 39, 2002.
: M. Komatsu and T. Arai, ``Acoustic realization of prosodic types: Constructing average syllables,'' LACUS Forum XXIX: Linguistics and the Real World, edited by D. W. Coleman, W. J. Sullivan and A. Lommel, pp. 259-269, Houston, 2003.
: S. Greenberg and T. Arai, ``What are the essential cues for understanding spoken language?,'' International Workshop on Speech Dynamics by Ear, Eye, Mouth and Machine, Technical Report of IEICE Japan, Vol. SP2003-48, pp. 27-36, 2003.
: S. Greenberg and T. Arai, ``What are the essential cues for understanding spoken language?,'' IEICE Trans. on Information and Systems, Vol. E87-D, No. 5, pp. 1059-1070, 2004.
: T. Arai, ``Degradation of speech intelligibility in time-reversed reverberation,'' Trans. Tech. Comm. Psychol. Physiol. Acoust., The Acoustical Society of Japan, Vol. 35, No. 4, H-2005-41, pp. 237-242, 2005 (in Japanese).
: R. Drullman, J. M. Festen, and R. Plomp, ``Effect of Temporal Envelope Smearing on Speech Reception,'' J. Acoustic. Soc. Amer., Vol. 95, pp. 1053-1064, 1994.
: R. Drullman, J. M. Festen, and R. Plomp, ``Effect of Reducing Slow Temporal Modulations on Speech Reception,'' J. Acoustic. Soc. Amer., Vol. 95, pp. 2670-2680, 1994.
: T. Houtgast, H. J. M. Steeneken, ``A Review of the MTF Concept in Room Acoustics and its Use for Estimating Speech Intelligibility in Auditoria,'' J. Acoustic. Soc. Amer., Vol. 77, pp. 1069-1077, 1985.
: S. Greenberg, ``Understanding speech understanding: towards a unified theory of speech perception,'' Proceedings of the ESCA Tutorial and Advanced Research Workshop on the Auditory Basis of Speech Perception, W.A. Ainsworth and S. Greenberg (eds.), Keele University, UK, pp. 1-8, 1996.

トップに戻る

2) ホール等の室内で音声を拡声する際に「聞こえ」の劣化を防ぐ

残響のある室内では音声が聞き取りにくくなる。このような室内における音声明瞭度の低下は、室内の変調伝達関数 (modulation transfer function, MTF)を測定することで客観的に表されることが分かっている。このMTFを測定すると、残響は一般に低域通過特性を有している。残響によって音声明瞭度が減少する原因として、先行音に付加された残響の尾が後続音をマスクする overlap-maskingが考えられる。つまり先行音が母音のようなエネルギの強い音素の場合、後続の音素は残響が付加された先行音による影響を大きく受けるのである。

講演ホールや教会、多目的ホールや駅構内、トンネルなど大きな空間では音声が電気的に拡声される際、室内のスピーカから放射された音声には同時に長い残響が掛かることになる。もし、スピーカから放射されるよりも前の段階で、何らかしらの残響にロバストな音声加工処理が出来れば、残響により明瞭度の低下を防ぐことが出来るかも知れない。

本研究では残響に頑健な音声加工処理のための前処理アルゴリズムを開発することを目的とする。そのため、今までに荒井研究室では変調フィルタリングを用いた処理や、母音のような音声の定常部を抑圧することでoverlap-masking の減少を目指す「定常部抑圧処理」を用いた方式などを提案しており、音声明瞭度の改善を前処理という立場で実現できることを実験的に証明してきた。この分野の研究が進めば、高齢者・聴覚障害者・非母語話者のための「聞こえ」の問題や、より高性能な補聴器の設計などにも貢献するものと期待される。

参考文献:

: A. Kusumoto, T. Arai, T. Kitamura, M. Takahashi and Y. Murahara, ``Speech processing on the room acoustics for the hearing-impaired,'' Proc. Autumn Meet. Acoust. Soc. Jpn., Vol. 1, pp. 389-390, 1999 (in Japanese).
: A. Kusumoto, T. Arai, T. Kitamura, M. Takahashi and Y. Murahara, ``Modulation enhancement of speech as a preprocessing for reverberant chambers with the hearing-impaired,'' Proc. of the IEEE International Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 2, pp. 853-856, Istanbul, 2000.
: T. Kitamura, K. Kinoshita, T. Arai, A. Kusumoto and Y. Murahara, ``Designing modulation filters for improving speech intelligibility in reverberant environments,'' Proc. of the International Conf. on Spoken Language Processing (ICSLP), Vol. 3, pp. 586-589, Beijing, 2000.
: T. Kitamura, T. Arai, A. Kusumoto and Y. Murahara, ``Modulation filtering in a robust speech processing against reverberation for the hearing-impaired,'' Proc. Spring Meet. Acoust. Soc. Jpn., Vol. 1, pp. 333-334, 2000 (in Japanese).
: T. Arai, K. Kinoshita, N. Hodoshima, A. Kusumoto and T. Kitamura, ``Effects of supressing steady-state portions of speech on intelligibility in reverberant environments,'' Proc. Autumn Meet. Acoust. Soc. Jpn., Vol. 1, pp. 449-450, 2001 (in Japanese).
: T. Arai, K. Kinoshita, N. Hodoshima, A. Kusumoto and T. Kitamura, ``Effects on suppressing steady-state portions of speech on intelligibility in reverberant environments,'' Acoustical Science and Technology, Vol. 23, No. 4, pp. 229-232, 2002.
: N. Hodoshima, T. Arai and A. Kusumoto, ``Enhancing temporal dynamics of speech to improve intelligibility in reverberant environments,'' Proc. of the Forum Acusticum Sevilla, 2002.
: N. Hodoshima, T. Inoue, T. Arai and A. Kusumoto, ``Suppressing steady-state portions of speech for improving intelligibility in various reverberant environments,'' Proc. of China-Japan Joint Conference on Acoustics pp. 199-202, Nanjing, 2002.
: N. Hodoshima, T. Inoue, T. Arai, K. Kinoshita and A. Kusumoto, ``Suppressing steady-state portions of speech for improving intelligibility as pre-processing: Under various reverberant environments,'' Technical Report of IEICE Japan, Vol. SP2002-65, pp. 47-51, 2002 (in Japanese).
: T. Inoue, N. Hodoshima, T. Arai, K. Kinoshita and A. Kusumoto, ``Improvement of speech intelligibility under various reverberant environments by the steady-state suppression,'' Proc. Autumn Meet. Acoust. Soc. Jpn., Vol. 1, pp. 377-378, 2002 (in Japanese).
: N. Hodoshima, T. Arai, T. Inoue, K. Kinoshita and A. Kusumoto, ``Improving speech intelligibility by steady-state suppression as pre-processing in small to medium sized halls,'' Proc. of the European Conf. on Speech Communication and Technology (Eurospeech) , pp. 1365-1368, Geneva, 2003.
: T. Arai, N. Hodoshima, T. Goto, T. Inoue, N. Ohata, K. Kinoshita, T. Kitamura and A. Kusumoto, ``Pre-processing technique to prevent degradation of speech intelligibility in reverberant environments,'' Trans. Tech. Comm. Psychol. Physiol. Acoust., The Acoustical Society of Japan, Vol. 33, No. 5, H-2003-59, pp. 341-346, 2003 (in Japanese).
: N. Hodoshima, T. Arai, T. Inoue, K. Kinoshita and A. Kusumoto, ``Improving intelligibility of speech by steady-state suppression as pre-processing in small to medium sized halls,'' International Workshop on Speech Dynamics by Ear, Eye, Mouth and Machine, Technical Report of IEICE Japan, Vol. SP2003-53, pp. 61-66, 2003.
: N. Hodoshima, T. Arai, T. Inoue, K. Kinoshita and A. Kusumoto, ``Improving intelligibility of speech through PA systems by steady-state suppression in small to medium sized halls,'' Proc. Spring Meet. Acoust. Soc. Jpn., Vol. 2, pp. 1073-1074, 2003 (in Japanese).
: T. Goto, T. Inoue, N. Ohata, N. Hodoshima and T. Arai, ``The effect of pre-processing for improving speech intelligibility in the Sophia University lecture hall,'' Proc. Autumn Meet. Acoust. Soc. Jpn., Vol. 1, pp. 613-614, 2003 (in Japanese, Poster Award).
: N. Hodoshima, T. Inoue, T. Arai, A. Kusumoto and K. Kinoshita, ``Suppressing steady-state portions of speech for improving intelligibility in various reverberant environments,'' Acoustical Science and Technology, Vol. 25, No. 1, pp. 58-60, 2004.
: N. Hodoshima, T. Goto, N. Ohata, T. Inoue and T. Arai, ``The effect of pre-processing for improving speech intelligibility in the Sophia University lecture hall,'' Proc. of the International Congress on Acoustics, Vol. III, pp. 2389-2392, Kyoto, 2004.
: N. Hodoshima, T. Goto, N. Ohata, T. Inoue and T. Arai, ``The effect of pre-processing for improving speech intelligibility in the Sophia University lecture hall,'' Proc. of the International Symposium on Room Acoustics: Design and Science, Hyogo, 2004.
: A. Kusumoto, T. Arai, K. Kinoshita, N. Hodoshima and N. Vaughan, ``Modulation enhancement of speech by a pre-processing algorithm for improving intelligibility in reverberant environments,'' Speech Communication, Vol. 45, No. 2, pp. 101-113, 2005.
: N. Hodoshima, T. Goto, N. Ohata, T. Inoue and T. Arai, ``The effect of pre-processing approach for improving speech intelligibility in a hall: Comparison between diotic and dichotic listening conditions,'' Acoustical Science and Technology, Vol. 26, No. 2, pp. 212-214, 2005.
: T. Arai, ``Padding zero into steady-state portions of speech as a preprocess for improving intelligibility in reverberant environments,'' Acoustical Science and Technology, Vol. 25, No. 5, pp. 459-461, 2005.
: N. Hayashi, T. Arai, N. Hodoshima, Y. Miyauchi and K. Kurisu, ``Steady-state pre-processing for improving speech intelligibility in reverberant environments: Evaluation in a hall with an electrical reverberator,'' Proc. of the Interspeech, pp. 1741-1744, Lisbon, 2005.
: Y. Miyauchi, N. Hodoshima, K. Yasu, N. Hayashi, T. Arai and M. Shindo, ``A preprocessing technique for improving speech intelligibility in reverberant environments: The effect of steady-state suppression on elderly people,'' Proc. of the Interspeech, Lisbon, pp. 2769-2772, 2005.
: N. Hodoshima and T. Arai, ``Investigating an optimum suppression rate of steady-state portions of speech that improves intelligibility the most as a pre-processing approach in reverberant environments,'' J. Acoust. Soc. Am., Vol. 118, No. 3, Pt. 2, p. 1930, 2005.
: K. Yasu, Y. Miyauchi, N. Hodoshima, N. Hayashi, T. Inoue, T. Arai and M. Shindo, ``Evaluation of steady-state suppression of speech for elderly people in reverberant environments,'' Technical Report of IEICE Japan, Vol. SP2004-154, pp. 1-6, 2005 (in Japanese).
: Y. Miyauchi, N. Hodoshima, K. Yasu, N. Hayashi, T. Inoue, T. Arai and M. Shindo, ``Pre-processing for improving speech intelligibility in reverberant environments: The effect of steady-state suppression on elderly people,'' Proc. Spring Meet. Acoust. Soc. Jpn., Vol. 1, pp. 319-320, 2005 (in Japanese).
: N. Hayashi, N. Hodoshima, T. Inoue, T. Goto, F. Tadokoro, Y. Miyauchi, T. Arai and K. Kurisu, ``Evaluation of steady-state suppression in a hall with an electrical reverberator: Introducing a pre-processing approach on intelligibility into real acoustic environments,'' Proc. Spring Meet. Acoust. Soc. Jpn., Vol. 1, pp. 537-538, 2005 (in Japanese).
: N. Hodoshima and T. Arai, ``Investigating an optimum suppression rate of steady-state portions of speech that improves intelligibility as a pre-processing approach in reverberant environments,'' Proc. Autumn Meet. Acoust. Soc. Jpn., pp. 607-608, 2005 (in Japanese).
: Y. Nakata, Y. Murakami, N. Hayashi, Y. Miyauchi, N. Hodoshima, T. Arai and K. Kurisu, ``Evaluation of two steady-state processing methods for improving speech intelligibility in reverberant environments,'' Proc. Autumn Meet. Acoust. Soc. Jpn., pp. 693-694, 2005 (in Japanese).

トップに戻る

3) 補聴器応用を目的とした聴覚障害者のための音声処理

補聴器がディジタル化されて依頼、補聴技術は様々なアルゴリズムが試されるようになった。荒井研究室では「臨界帯域圧縮」技術を用いて、聴覚障害者における聴覚フィルタの広がりを補償するようなアルゴリズムを用いた処理方式を提案してきている。その他、前処理として前述した「定常部抑圧処理」技術を補聴器に応用する実験も進めている。

参考文献:

: K. Yasu, K. Kobayashi, K. Shinohara, M. Hishitani, T. Arai and Y. Murahara, ``Critical-band compression method for digital hearing aids,'' Proc. of the Forum Acusticum Sevilla, 2002.
: K. Yasu, K. Kobayashi, K. Shinohara, M. Hishitani, T. Arai and Y. Murahara, ``Frequency compression of critical band for digital hearing aids,'' Proc. of China-Japan Joint Conference on Acoustics, pp. 159-162, Nanjing, 2002.
: M. Hishitani, K. Kobayashi, K. Shinohara, K. Yasu and T. Arai, ``Compressing critical bands for digital hearing aids,'' Handbook of the International Hearing Aid Research Conference (IHCON), pp. 64-65, Lake Tahoe, 2002.
: K. Yasu, M. Hishitani, T. Arai and Y. Murahara, ``Critical-band compression algorithms for digital hearing aids,'' Technical Report of IEICE Japan, Vol. SP2002-102, pp. 41-45, 2002 (in Japanese).
: K. Yasu, M. Hishitani, T. Arai, Y. Murahara and K. Shinohara, ``Critical-band compression in the frequency domain for digital hearing aids,'' Proc. Autumn Meet. Acoust. Soc. Jpn., Vol. 1, pp. 379-380, 2002 (in Japanese).
: K. Yasu, M. Yasuda, T. Arai, Y. Murahara and M. Hishitani, ``An evaluation of the critical-band compression algorithm for the wider auditory filter of hearing impaired people: A case study of one profound hearing-impaired person,'' Proc. Autumn Meet. Acoust. Soc. Jpn., Vol. 1, pp. 415-416, 2003.
: K. Yasu, M. Hishitani, T. Arai and Y. Murahara, ``Critical-band based frequency compression for digital hearing aids,'' Acoustical Science and Technology, Vol. 25, No. 1, pp. 61-63, 2004.
: T. Arai, K. Yasu and N. Hodoshima, ``Effective speech processing for various impaired listeners,'' Proc. of the International Congress on Acoustics, Vol. II, pp. 1389-1392, Kyoto, 2004 (Invpted Paper).
: K. Yasu, K. Kobayashi, T. Arai, ``The modification of critical-band based frequency compression using cepstral analysis,'' Handbook of the International Hearing Aid Research Conference (IHCON), p. 55, Lake Tahoe, 2004 (Student Scholarship Award).
: K. Kobayashi, Y. Hatta, K. Yasu, N. Hodoshima, T. Arai and M. Shindo, ``A study on monosyllable enhancement for elderly listeners by steady-state suppression,'' Technical Report of IEICE Japan, Vol. SP2004-155, pp. 7-12, 2005 (in Japanese).
: K. Kobayashi, Y. Hatta, K. Yasu, N. Hodoshima, T. Arai and M. Shindo, ``Consonant enhancement of monosyllable for elderly listeners by steady-state suppression,'' Proc. Spring Meet. Acoust. Soc. Jpn., Vol. 1, pp. 321-322, 2005 (in Japanese).
: K. Yasu, K. Kobayashi, T. Arai and M. Shindo, ``Evaluation of the speech excited by the critical-band limited noise aligned at the center of each band by hearing impaired listeners,'' Proc. Autumn Meet. Acoust. Soc. Jpn., pp. 517-518, 2005 (in Japanese).

トップに戻る

4) 音声に関する障害者支援

音声合成などの音声技術を用いた障害者支援、口蓋裂音声（口蓋化構音、開鼻声、嗄声など）やその他の構音障害、音声障害に関する音響分析など。

参考文献:

: S. Hirai, K. Okazaki and T. Arai, ``A quantitative evaluation of hypernasality in children: Using the slope of spectrum envelope,'' Japan Journal of Logopedics and Phoniatrics, Vol. 35, No. 2, pp. 199-206, 1994 (in Japanese).
: T. Arai, K. Okazaki and S. Imatomi, ``Palatalized articulation of [s] sounds using synthetic speech,'' Japan Journal of Logopedics and Phoniatrics, Vol. 36, No. 3, pp. 350-354, 1995 (in Japanese).
: T. Arai, K. Okazaki and S. Imatomi, ``Analysis for palatalized articulation of [s] sounds using synthetic speech,'' Proc. of the European Conf. on Speech Communication and Technology (Eurospeech), Vol. 3, pp. 1725-1728, Madrid, 1995.
: S. Imatomi, T. Arai, Y. Mimura and M. Kato, ``Effects of hoarseness on hypernasality ratings,'' Proc. of the European Conf. on Speech Communication and Technology (Eurospeech), Vol. 3, pp. 1075-1078, Budapest, 1999.
: S. Imatomi, T. Arai, Y. Mimura, M. Kato, F. Okubo and Y. Hosaka, ``Kaibi-sei no choukaku hantei ni okeru sasei no eikyou,'' Journal of Japanese Cleft Palate Association, Vol. 24, No. 2, p. 209, 1999 (in Japanese).
: S. Imatomi, T. Arai and M. Kato, ``How hoarseness affects on rating of hypernasality: Source-filter-theory approach,'' Meeting of the International Clinical Phonetics and Linguistics Association, Edinburgh, 2000.
: C. Oda, M. Komatsu, T. Arai, S. Imatomi, A. Kawahara, F. Shusse and K. Okazaki, ``Dysarthria ni okeru hatsuwa no togire,'' Japan Journal of Logopedics and Phoniatrics, Vol. 41, No. 1, p. 70, 2000 (in Japanese).
: N. Saika, T. Arai, S. Imatomi, Y. Murahara and M. Kato, ``Synthesis of hypernasal voice by manipulating spectral envelopes and its effect on hoarseness,'' Proc. Autumn Meet. Acoust. Soc. Jpn., Vol. 1, pp. 361-362, 2001 (in Japanese).
: S. Imatomi and T. Arai, ``The relation between perceived hypernasality of speech and its hoarseness,'' Proc. of the Forum Acusticum Sevilla, 2002.
: Y. Kaneko, T. Sugawara, T. Arai, K. Okazaki and K. Iitaka, ``Categorical perception of Japanese geminate consonant /Q/ in children: Its developmental relations with reading its kana letter,'' Joint Conf. of the IX International Congress for the Study of Child Language and the Symposium on Research in Child Disorders, p.145, Madison, 2002.
: O. Fukazawa, M. Shindo, T. Arai and K. Kaga, ``Roujin-sei nanchou-sha no go-on no choushu ni kansuru kenkyuu,'' Jpn. J. Commun. Disord., Vol. 19, No. 3, p. 191, 2002 (in Japanese).
: R. Watanabe, K. Iitaka, K. Okazaki, K. Ooishi and T. Arai, ``Nihon-go youon no chikaku to yomi ni kansuru hattatsu-teki kenkyuu,'' Jpn. J. Commun. Disord., Vol. 19, No. 3, p. 196, 2002 (in Japanese).
: Y. Kaneko, K. Okazaki, K. Iitaka and T. Arai, ``Youji no sokuon no choukaku-benbetsu ni kansuru kiso-teki kenkyuu: Kotoba no kyoushitsu ni kayou jidou no 2 jirei,'' Jpn. J. Commun. Disord., Vol. 19, No. 3, p. 206, 2002 (in Japanese).
: S. Imatomi, T. Arai, N. Saika and M. Kato, ``Kaibisei no choukaku-hantei ni okeru sasei no eikyou,'' Jpn. J. Commun. Disord., Vol. 19, No. 3, p. 211, 2002 (in Japanese).
: S. Imatomi, T. Arai and M. Kato, ``Effects of hoarseness on ratings of hypernasality: Source-filter-theory approach,'' Japan Journal of Logopedics and Phoniatrics, Vol. 44, No. 4, pp. 304-314, 2003 (in Japanese).
: M. Irie, M. Shindo, N. Nagatsuka and T. Arai, ``Shitsugo-shou-sha ni okeru gengo-teki purosodii-ninchi ni tsuite no kenkyuu: Akusento-idou-benbetsu-nouryoku wo chuushin toshite,'' Jpn. J. Commun. Disord., Vol. 20, No. 3, p. 172, 2003 (in Japanese).
: M. Irie, M. Shindo, N. Nagatsuka and T. Arai, ``The discrimination of pitch-accent in aphasic patients,'' Jpn. J. Commun. Disord., Vol. 21, No. 3, pp. 165-171, 2004 (in Japanese).
: Y. Hirano, M. Shiroma, M. Shindo, T. Arai and K. Kaga, ``Jinkou-naiji-souyou-ji no kouon: Onkyou-bunseki to choukaku-inshou wo moto ni (kenjou-ji to hikaku shite),'' The Japan Journal of Logopedics and Phoniatrics, Vol. 45, No. 1, p. 39, 2004 (in Japanese).
: E. Ohki, K. Hara, K. Iitaka, M. Shindo and T. Arai, ``Kenjou-ji ni okeru nihongo-chouon no kategori-chikaku no hattatsu: Chouon no ishiki to hyouki-hou tono hattatsuteki-kanren,'' Jpn. J. Commun. Disord., Vol. 22, No. 3, p. 204, 2005 (in Japanese).
: S. Hirai, K. Yasu, T. Arai and K. Iitaka, ``Acoustic cues in fricative perception for Japanese native speakers,'' Technical Report of IEICE Japan, Vol. SP2004-168, pp. 25-30, 2005 (in Japanese).
: F. Murai, T. Arai and T. Kimura, ``The utterances of the cleft palate children: The comparison between before and after palatal surgery,'' Technical Report of IEICE Japan, Vol. SP2004-171, pp. 41-46, 2005 (in Japanese).

トップに戻る

5) コンピュータモデルや声道模型による声道の可視化

声道形状を可視化することは、従来から音声科学における重要なテーマであった。60年以上も前に、千葉・梶山によって「The Vowel: Its Nature and Structure」にて測定された声道形状は、その中でも先駆的な研究であったと言える。

X線動画としては、1960年代にMITのJoseph S. Perkellによって測定されたcineradiographic dataが有名であり、そのデータが貴重となった今日において、そのデータを再検討することは非常に意味が大きいと言える。

最近ではMRIなどの画像技術の進展があり、それらの測定結果をコンピュータモデルとして実現することが可能な時代となった。この研究テーマでは、このように声道の可視化を試み、また音声の生成機構を解明することを目的とする。

ところで、声道模型を用いた研究はコンピュータ上のシミュレーションよりも直接的なアプローチであり、未だ解明されていない様々な現象をより的確に捉えることの出来る可能性を秘めている。そして、声道模型そのものが教材として教育的であることが、我々の研究からもわかってきた。そこで、より多くの種類の声道模型を作成し、それから応用される研究・教育分野を探ることも、もう１つの大きな目的である。そして、ここで開発された成果は同時に障害者支援にもつながることが期待される。

声道模型に関するページ

参考文献:

: T. Arai, ``The replication of Chiba and Kajiyama's mechanical models of the human vocal cavity,'' Journal of the Phonetic Society of Japan, Vol. 5, No. 2, pp. 31-38, 2001.
: T. Arai, N. Usuki and Y. Murahara, ``Prototype of a vocal-tract model for vowel production designed for education in speech science,'' Proc. of the European Conf. on Speech Communication and Technology (Eurospeech), Vol. 4, pp. 2791-2794, Aalborg, 2001.
: N. Usuki, T. Arai and Y. Murahara, ``Prototype of vocal-tract models for education in acoustics of vowel production,'' Proc. Spring Meet. Acoust. Soc. Jpn., Vol. 1, pp. 399-400, 2001 (in Japanese).
: N. Usuki, M. Yoshida, Hasan A. Alwi, T. Arai and Y. Murahara, ``Usefulness of a mechanical model of the human vocal tract for education in speech science: Perturbation theory in vowel production,'' Proc. Autumn Meet. Acoust. Soc. Jpn., Vol. 1, pp. 403-404, 2001 (in Japanese).
: T. Arai, ``An effective method for education in acoustics and speech science: Integrating textbooks, computer simulation and physical models,'' Proc. of the Forum Acusticum Sevilla, 2002.
: N. Saika, E. Maeda, N. Usuki, T. Arai and Y. Murahara, ``Developing mechanical models of the human vocal tract for education in speech science,'' Proc. of the Forum Acusticum Sevilla, 2002.
: E. Maeda, N. Usuki, T. Arai and Y. Murahara, ``The importance of physical models of the human vocal tract for education in acoustics in the digital era,'' Proc. of China-Japan Joint Conference on Acoustics, pp. 163-166, Nanjing, 2002.
: T. Arai, E. Maeda, N. Saika and Y. Murahara, ``Physical models of the human vocal tract as tools for education in acoustics,'' Proc. of the First Pan-American/Iberian Meeting on Acoustics, Cancun, 2002.
: E. Maeda, T. Arai, N. Saika and Y. Murahara, ``Lab experiment using physical models of the human vocal tract for high-school students,'' Proc. of the First Pan-American/Iberian Meeting on Acoustics, Cancun, 2002.
: T. Kitamura, T. Arai and P. Connors, ``A study on education tools with an animated talking agent for the hearing impaired,'' Technical Report of IEICE Japan, Vol. SP2002-113, pp. 27-32, 2002 (in Japanese).
: M. Yoshida, N. Usuki, T. Arai, Y. Murahara and T. Sugawara, ``A challenge for visualizing vocal tract resonance: Devising a tool for acoustic education in speech science,'' Proc. Spring Meet. Acoust. Soc. Jpn., Vol. 1, pp. 399-400, 2002 (in Japanese).
: T. Arai, ``Incorporating more intuitive acoustic education into speech science,'' Proc. Spring Meet. Acoust. Soc. Jpn., Vol. 2, pp. 1219-1220, 2002 (in Japanese).
: E. Maeda, T. Arai, N. Saika and Y. Murahara, ``Education in acoustics using mechanical models of the human vocal tract in high school,'' Proc. Autumn Meet. Acoust. Soc. Jpn., Vol. 1, pp. 299-300, 2002 (in Japanese).
: T. Arai, N. Saika, E. Maeda and Y. Murahara, ``Chiba-Kajiyama ni yoru seidou-mokei no fukugen to sono kyouzai to shite no ouyou,'' Proc. of the General Meeting of the Phonetic Society of Japan, pp. 23-28, 2002 (in Japanese).
: T. Arai, ``Physical and computer-based tools for teaching Phonetics,'' Proc. of the International Congress of Phonetic Sciences (ICPhS), Vol. 1, pp. 305-308, Barcelona, 2003.
: T. Lander and T. Arai, ``Using Arai's vocal tract models for education in Phonetics,'' Proc. of the International Congress of Phonetic Sciences (ICPhS), Vol. 1, pp. 317-320, Barcelona, 2003.
: T. Arai and E. Maeda, ``Acoustics education in speech science using physical models of the human vocal tract,'' Trans. Tech. Comm. Education in Acoustics, The Acoustical Society of Japan, Vol. EDU-2003-08, pp. 1-5, 2003 (in Japanese).
: E. Maeda and T. Arai, ``Education in acoustics for high-school students using mechanical vocal tract,'' Trans. Tech. Comm. Education in Acoustics, The Acoustical Society of Japan, Vol. EDU-2003-09, pp. 1-6, 2003 (in Japanese).
: E. Maeda, T. Arai and N. Saika, ``Study of mechanical models of the human vocal tract having nasal cavity,'' Technical Report of IEICE Japan, Vol. SP2003-55, pp. 1-5, 2003 (in Japanese).
: E. Maeda, T. Arai, N. Saika and Y. Murahara, ``Studying the sound source of a mechanical vocal tract using a driver unit of horn speaker,'' Proc. Spring Meet. Acoust. Soc. Jpn., Vol. 1, pp. 417-418, 2003 (in Japanese).
: E. Maeda, T. Arai, N. Saika and Y. Murahara, ``The effects of nasal cavity and paranasal sinuses on sounds using mechnical vocal tract models,'' Proc. Autumn Meet. Acoust. Soc. Jpn., Vol. 1, pp. 339-340, 2003.
: T. Arai, E. Maeda and N. Umeda, ``Education in Acoustics using Umeda and Teranishi's mechanical model of the human vocal tract,'' Proc. Autumn Meet. Acoust. Soc. Jpn., Vol. 1, pp. 341-342, 2003.
: T. Kitamura, T. Arai, P. Connors, P. Stone, K. Iitaka and M. Shindo, ``A study on educational tools with an animated talking agent for the hearing impaired,'' Sophia Linguistica, Vol. 50, pp. 221-233, 2003 (in Japanese).
: E. Maeda, N. Usuki, T. Arai, N. Saika and Y. Murahara, ``Comparing the characteristics of the palate and cylinder type vocal tract models,'' Acoustical Science and Technology, Vol. 25, No. 1, pp. 64-65, 2004.
: T. Arai, ``Education in Acoustics using physical models of the human vocal tract,'' Proc. of the International Congress on Acoustics, Vol. III, pp. 1969-1972, Kyoto, 2004 (Invited Paper).
: E. Maeda and T. Arai, ``Additional physical models of the human vocal tract as tools for education in language learning,'' Proc. of the International Congress on Acoustics, Vol. III, pp. 2321-2322, Kyoto, 2004 (Invited Demonstration).
: T. Arai, ``Visualizing vowel-production mechanism using simple education tools,'' J. Acoust. Soc. Am., Vol. 118, No. 3, Pt. 2, p. 1862, 2005.
: T. Arai, ``Lung model and head-shaped model with visible vocal tract as educational tools in Acoustics,'' Proc. Spring Meet. Acoust. Soc. Jpn., Vol. 1, pp. 273-274, 2005 (in Japanese).

トップに戻る

6) ディジタル画像に字幕を付与するための音声区間自動検出

情報社会が進む中、ディジタル画像をネット上で配信するようなケースも非常に多くなってきている。さらに国際化にともなって、画像や音声データだけではなく、それらと一緒に２次データとして多言語の字幕情報も一緒に配信することが必要となってきている。また、お年寄りや聴覚障害者支援などのためにも、字幕情報付与はニーズが高まっている。そのような中、音声信号から自動的に音声区間を検出する技術が再度見直されている。本研究テーマでは、そのための自動アルゴリズムを開発している。今後は、多様な発話スタイルへの対応、背景雑音や音楽などへの対応、複数話者への対応、多言語への対応などが課題である。

参考文献:

: Y. Fujikashi, A. Koga, T. Arai, N. Kanedera and J. Yoshii, ``Linear-prediction based end-point detection of speech for captioning system,'' Proc. Autumn Meet. Acoust. Soc. Jpn., pp. 33-34, 2005 (in Japanese).

トップに戻る

7) DSPを用いた音声処理の実時間化に関する研究

音声などの信号処理を実際に実用の場面で行う場合には、どうしても実時間性が要求される。そのためには、信号処理のために特別に設計されたプロセッサである DSP (digital signal processor) が威力を発揮する。このDSPは最近ではありとあらゆるディジタル機器に採り入れられているが、本研究テーマではこのDSPを用いてある信号処理を実時間で実現することを目標とする。特に、前述した「定常部抑圧処理」が実時間化されれば、実際のホールなどにおいてPA (public address) システムに実際の本処理を組み込むことに、リアルタイムで音声明瞭度低下を抑えることが可能となる。また、ディジタル・パターンプレイバックを実時間で実現する試みも行っている。

ディジタル・パターン・プレイバックに関するページ

参考文献:

: T. Arai, K. Yasu and T. Goto, ``Digital pattern playback,'' Proc. Autumn Meet. Acoust. Soc. Jpn., pp. 429-430, 2005 (in Japanese).
: T. Goto, T. Arai and K. Yasu, ``Teijou-bu yokuatsu-shori no riaru-taimu-ka he mukete: DSP ni yoru kaihatsu,'' Proc. of DSPS Educators Conference, pp. 91-94, 2005 (in Japanese).
: T. Arai, K. Yasu and T. Goto, ``Digital pattern playback,'' Proc. Autumn Meet. Acoust. Soc. Jpn., pp. 429-430, 2005 (in Japanese).
: T. Arai, K. Yasu and T. Goto, ``Digital pattern playback: Converting spectrograms to sound for educational purposes,'' Acoustical Science and Technology, Vol. 27, No. 5, 2006.

トップに戻る

8) 音声知覚における脳の左右半球の働きを解明する

本研究テーマは、荒井研究室のキーワードである「学際」の１つとして重要であり、上智大学認知心理学研究室の道又先生との共同研究として進められている。特に、音声呈示をした際に脳の左右差の問題や、左右半球で知覚された情報がその後にいかに融合されるかなどを探求するのが本研究の目的である。本研究は共同研究を行うことにより、電気電子工学の知識と心理学の知識が融合することにより新しい視点での研究が可能になる。

参考文献:

: M. Chait, S. Greenberg, T. Arai, J. Simon and D. Poeppel, ``Two time scales in speech segmentation,'' Proc. of ISCA Workshop on Plasticity in Speech Perception, p. 158, London, 2005.
: M. Chait, S. Greenberg, T. Arai and D. Poeppel, ``Binding mechanisms in speech processing,'' Annual Meeting of the Cognitive Neuroscience Society, p. 82, New York, 2003.
: M. Chait, S. Greenberg, T. Arai, J. Z. Simon and D. Poeppel, ``Brain mechanisms for speech segmentation,'' KIT International Symposium on Brain and Language, Tokyo, 2003.

トップに戻る

9) 音声における個人性に関する研究

人間の音声には伝えたい内容を表す言語情報以外に話者個人の特徴（音声の個人性）も含まれている。個人性を表す特徴量が抽出できれば、テレフォンバンキングやセキュリティ目的で音声を用いることができるようになる。また、人間による個人性の知覚を調べることは人間の認知能力を明らかにするだけでなく、犯罪捜査などの法工学にも貢献することができる。本研究では人間が話者を識別する際にどのような音声が有効なのかを調べ、それらの音声と個人性を表す特徴量との関連を明らかにすることを目的とする。さらに、人間の「ことば」の観点から音声の個人性をとらえ、言語学的な考察も行っていく。

参考文献:

: K. Amino, T. Sugawara and T. Arai, ``The correspondences between the perception of the speaker individualities contained in speech sounds and their acoustic properties,'' Proc. of the Interspeech, pp. 2025-2028, Lisbon, 2005.
: K. Amino, T. Sugawara and T. Arai, ``The correspondences between the differences among the phones in human speaker identification and their acoustic properties,'' Technical Report of IEICE Japan, Vol. SP2004-164, pp. 1-6, 2005 (in Japanese).
: K. Amino, T. Sugawara and T. Arai, ``The relationship between the asymmetry of the nasal and the oral sounds in human speaker identification and their acoustic distances,'' Proc. Spring Meet. Acoust. Soc. Jpn., Vol. 1, pp. 277-278, 2005 (in Japanese).
: K. Amino, T. Sugawara and T. Arai, ``A study on perception of cold-affected speech,'' Proc. Autumn Meet. Acoust. Soc. Jpn., pp. 431-432, 2005 (in Japanese).

トップに戻る

10) その他：音響音声学を含む音声学や音韻論など

鼻音化母音におけるフォルマント周波数のシフトおよびその補償をはじめ、音声の音響分析やその知覚、音響音声学を含む音声学や音韻論に関わる研究など。

参考文献:

: T. Arai, ``A case study of spontaneous speech in Japanese,'' Proc. of the International Congress of Phonetic Sciences (ICPhS), Vol. 1, pp. 615-618, San Francisco, 1999.
: M. Komatsu, W. Tokuma, S. Tokuma and T. Arai, ``The effect of reduced spectral information on Japanese consonant perception: Comparison between L1 and L2 listeners,'' Proc. of the International Conf. on Spoken Language Processing (ICSLP), Vol. 3, pp. 750-753, Beijing, 2000.
: M. Komatsu, T. Shinya, M. Takasawa and T. Arai, ``LPC zansa-shingou ni yoru shiin-ninshiki to onso-hairetsu,'' Proc. of the General Meeting of the Phonetic Society of Japan, pp. 49-54, 2000 (in Japanese).
: A. Masaki, M. Takasawa and T. Arai, ``Go-akusento no chikaku ni okeru bogo no eikyou ni tsuite: Nihon-go, doitsu-go, supein-go washa-kan no hikaku,'' Proc. of the General Meeting of the Phonetic Society of Japan, pp. 165-170, 2000 (in Japanese).
: M. Komatsu, K. Mori, T. Arai and Y. Murahara, ``Human language identification with reduced segmental information: Comparison between monolinguals and bilinguals,'' Proc. of the European Conf. on Speech Communication and Technology (Eurospeech), Vol. 1, pp. 149-152, Aalborg, 2001.
: M. Komatsu, S. Tokuma, W. Tokuma and T. Arai, ``Modelling the perceptual identification of Japanese consonants from LPC cepstral distances,'' Proc. of the European Conf. on Speech Communication and Technology (Eurospeech), Vol. 1, pp. 391-394, Aalborg, 2001.
: N. Warner and T. Arai, ``Accentual phrase rises as a cue to word boundaries,'' Meeting Handbook of the Linguistic Society of America, p. 95, Washington, D.C., 2001.
: T. Arai, N. Warner and S. Greenberg, ``Analysis of spontaneous Japanese in OGI Multi-Language Telephone-Speech Corpus,'' Proc. Spring Meet. Acoust. Soc. Jpn., Vol. 1, pp. 361-362, 2001 (in Japanese).
: A. Masaki, M. Takasawa and T. Arai, ``Akzentwahrnehmung von Japanern bei technisch kontrollierten F0-Konturen,'' Sophia Linguistica, No. 48, pp. 213-224, 2001 (in German).
: M. Komatsu, K. Mori, T. Arai, M. Aoyagi and Y. Murahara, ``Human language identification with reduced segmental information,'' Acoustical Science and Technology, Vol. 23, No. 3, pp. 143-153, 2002.
: M. Komatsu, S. Tokuma, W. Tokuma and T. Arai, ``Multi-dimensional analysis of sonority: Perception, acoustics, and phonology,'' Proc. of the International Conf. on Spoken Language Processing (ICSLP), pp. 2293-2296, Denver, 2002.
: M. Komatsu, T. Arai and T. Sugawara, ``Inritsu-ruikei-ron he no onkyou-teki apuroochi,'' Proc. of the General Meeting of the Phonetic Society of Japan, pp. 75-80, 2003 (in Japanese).
: M. Komatsu, T. Arai and T. Sugawara, ``Perceptual discrimination of prosodic types,'' Proc. of the International Conference: Speech Prosody, pp. 725-728, Nara, 2004.
: M. Komatsu, T. Arai and T. Sugawara, ``Perceptual discrimination of prosodic types and their preliminary acoustic analysis,'' Proc. of the Interspeech, pp. 1280-1283, Jeju Island, 2004.
: T. Arai, ``Formant shift in nasalization of vowels,'' J. Acoust. Soc. Am., Vol. 115, No. 5, p. 2541, New York, 2004.
: T. Arai, ``Comparing tongue position of vowels in oral and nasal contexts,'' Sophia Symposium on New Technology and the Investigation of the Articulatory Process, pp. 33-49, Sophia University, 2004.
: K. N. Stevens, translated by T. Arai, ``The acoustic/articulatory interface,'' J. Acost. Soc. Jpn., Vol. 61, No.9, pp. 524-531, 2005.
: T. Arai, ``Comparing tongue positions of vowels in oral and nasal contexts,'' Proc. of the Interspeech, pp. 1033-1036, Lisbon, 2005.
: T. Arai, ``Onsei-shingou no kashi-ka to sono onsei-gaku-teki kachi,'' Proc. of the General Meeting of the Phonetic Society of Japan, p. 3, 2005 (in Japanese).
: K. Takeuchi and T. Arai, ``Nihon-jin gakushuu-sha no furansu-go biboin-seisei no tame no housaku: F1-henka wo motarasu chouon-houhou no kansatsu,'' Proc. of the General Meeting of the Phonetic Society of Japan, pp. 179-184, 2005 (in Japanese).

トップに戻る

荒井研究室ホームページ

上智大学 理工学部 情報理工学科 荒井研究室 研究分野：音声コミュニケーション

荒井研究室について

上智大学理工学部情報理工学科荒井研究室　研究分野：音声コミュニケーション