ブラウザの音声認識と音声合成機能を使ってAIで英会話練習を行うツール

ブラウザの音声認識エンジンと音声合成エンジン、ChatGPT(gpt-4o-mini)を使うことで、手軽にAIと英会話の練習ができるWEBアプリを、AI界隈で話題のAIコードエディターのCursorとAIエージェント拡張のCline/Copilot拡張を使って作成してみたので紹介します。

file

お使いのブラウザによっては機能しない可能性がありますのでご了承ください。
(Linux/Mac/AndroidのGoogle Chromeで確認)


デモアプリの使い方

※アプリはこの説明の下にある「オンラインデモ」のリンクから開けます。

  1. OpenAI APIキー欄にご自身のAPIキーを入力します。
    さい。
    file

  2. 話すボタンを押下すると、ブラウザがマイクの権限を求めてくるので一時的に許可してください(与えた権限はリセットできます)。
    file

  3. マイクの権限が許可されると音声入力待ちになるので適当に英語で話しかけます。
    file

  4. 音声が認識されるとAIの回答が音声で読み上げられます。

  5. AIの回答が終わると音声認識が再開されるので、相互に会話を続けていきます。
    タイムアウトした場合は話すボタンを押すことで会話が継続します。

ブラウザのセキュリティ機構上、最初の応答の読み上げが行われない場合があります。

APIキーや会話内容は一切EXCEEDSYSTEMのサーバーに送信されません(EXCEEDSYSTEMのWEBサーバーのログにも残りません)。

一度会話をはじめるとページを離れるか音声入力がタイムアウトするまで会話状態が続きます。意図しない入力でOpenAI APIトークンを浪費しないようにご注意ください(At your own risk!)


オンラインデモ

本サービスをご利用になる前に、必ずプライバシーポリシーおよび免責事項ご確認ください。ご利用をもって、これらに同意いただいたものとみなします。

AIパーソナル英会話トレーナー(別画面で開きます)

※アプリの利用は無料ですが、別途従量課金制のOpenAI APIキーが必要です。


ソースコード

以下のソースをローカルに保存してブラウザで開けばローカル環境でも実行することができます。
機能実現に必要な最低限のコードで、エラーメッセージ表示などは行っていません。

生成AIを使って作成したコードであることから、コードのライセンスにはご注意ください。

※ソースは不定期更新です。

EnglishTrainer.html

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>EXCEEDSYSTEM AIパーソナル英会話トレーナー</title>
    <!-- Tailwind CSS CDN -->
    <script src="https://cdn.tailwindcss.com"></script>
    <!-- Font Awesome CDN -->
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.5.1/css/all.min.css" />
    <!-- CryptoJS CDN -->
    <script src="https://cdnjs.cloudflare.com/ajax/libs/crypto-js/4.2.0/crypto-js.min.js"></script>
    <style>
      #conversation-history {
        max-height: 500px;
        overflow-y: auto;
      }
      #conversation-history li {
        display: flex;
        align-items: center;
        justify-content: space-between;
      }
      #conversation-history li button {
        margin-left: 10px;
      }
    </style>
  </head>
  <body class="bg-gray-100 min-h-screen p-8">
    <div class="max-w-3xl mx-auto bg-white rounded-lg shadow-lg p-8">
      <h1 class="text-3xl font-bold text-gray-800 mb-8">AIパーソナル英会話トレーナー</h1>

      <div class="space-y-4 mb-8">
        <div class="flex flex-col space-y-2">
          <label for="api-key" class="text-gray-700 font-medium">OpenAI APIキー:</label>
          <input type="password" id="api-key" placeholder="Enter your API key" class="border border-gray-300 rounded-md px-4 py-2 focus:outline-none focus:ring-2 focus:ring-blue-500" />
        </div>

        <div class="flex flex-col space-y-2">
          <label for="english-level" class="text-gray-700 font-medium">英語レベル:</label>
          <select id="english-level" class="border border-gray-300 rounded-md px-4 py-2 focus:outline-none focus:ring-2 focus:ring-blue-500">
            <option value="3">英検3級</option>
            <option value="2" selected>英検2級</option>
            <option value="1">英検1級</option>
          </select>
        </div>

        <div class="flex flex-col space-y-2">
          <label for="sentence-length" class="text-gray-700 font-medium">応答の長さ:</label>
          <select id="sentence-length" class="border border-gray-300 rounded-md px-4 py-2 focus:outline-none focus:ring-2 focus:ring-blue-500">
            <option value="short">短め (1-2文)</option>
            <option value="medium" selected>普通 (2-3文)</option>
            <option value="long">長め (3-5文)</option>
          </select>
        </div>

        <div class="flex flex-col space-y-2">
          <label class="text-gray-700 font-medium flex items-center">
            <input type="checkbox" id="show-translation" class="mr-2" />
            日本語訳を表示
          </label>
        </div>

        <button id="start-recognition" class="bg-blue-500 hover:bg-blue-600 text-white font-medium py-2 px-4 rounded-md transition duration-200">話す</button>
      </div>

      <p id="transcript" class="text-gray-600 mb-4 p-4 bg-gray-50 rounded-md">会話履歴</p>

      <div class="bg-white rounded-lg border border-gray-200">
        <ul id="conversation-history" class="divide-y divide-gray-200">
          <!-- Conversation items will be added here -->
        </ul>
      </div>
    </div>

    <script>
      // 設定定数
      const CONFIG = {
        SPEECH_CONFIG: {
          LANG: {
            EN_US: 'en-US',
            JA_JP: 'ja-JP',
          },
          RECOGNITION: {
            CONTINUOUS: false, // 継続的な音声認識
            INTERIM_RESULTS: false, // 中間結果の取得
          },
          SYNTHESIS: {
            VOLUME: 1.0, // 音声の音量
            RATE: 1.0, // 音声の速度
            PITCH: 1.0, // 音声のピッチ
          },
        },
        CHAT: {
          MAX_HISTORY: 5, // チャット履歴の最大保持数
        },
      };

      // HTMLをエスケープするヘルパー関数を追加
      function escapeHTML(str) {
        return str.replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>').replace(/"/g, '"').replace(/'/g, '&#039;');
      }

      // Voice chat class
      class VoiceChat {
        constructor() {
          // 音声認識とテキスト読み上げの初期化
          this.recognition = new webkitSpeechRecognition();
          this.synthesis = window.speechSynthesis;
          this.setupRecognition();
          this.chatHistory = [];
          this.apiKey = '';
          this.conversationHistoryList = document.getElementById('conversation-history');
          this.englishLevel = document.getElementById('english-level');
          this.sentenceLength = document.getElementById('sentence-length');
          this.showTranslation = document.getElementById('show-translation');

          // AIの役割と動作を定義するシステムプロンプト
          this.getSystemPrompt = () => {
            const level = this.englishLevel.value;
            const lengthPreference = this.sentenceLength.value;
            const needsTranslation = this.showTranslation.checked;

            const levelText = {
              3: 'EIKEN Grade 3 level (basic)',
              2: 'EIKEN Grade 2 level (upper intermediate)',
              1: 'EIKEN Grade 1 level (advanced)',
            }[level];

            const lengthInstruction = {
              short: 'Keep responses very concise, using 1-2 sentences.',
              medium: 'Use 2-3 sentences for responses.',
              long: 'Provide detailed responses using 3-5 sentences.',
            }[lengthPreference];

            const translationInstruction = needsTranslation ? '- Provide Japanese translation after each response in the format: [JP: 日本語訳]' : '';

            return {
              role: 'system',
              content: `You are an advanced English conversation coach specialized in teaching Japanese learners. Follow these guidelines:

CORE TEACHING APPROACH:
- Use vocabulary and expressions at the ${levelText}
- ${lengthInstruction}
- Focus on natural, everyday English that native speakers actually use
- Maintain a friendly, encouraging tone while providing constructive feedback

JAPANESE LEARNER SPECIFIC SUPPORT:
- Focus on common Japanese-English translation pitfalls:
  * Help avoid word-for-word translation from Japanese
  * Guide users away from Japanese English patterns (e.g., "Please your name" → "What's your name?")
  * Correct typical particle-related mistakes (e.g., "I'm interested in/about" → "I'm interested in")

PRONUNCIATION FOCUS:
- Pay special attention to challenging sounds for Japanese speakers:
  * L/R distinction (light/right, glass/grass)
  * TH sounds (think, that, three)
  * V/B sounds (very/berry)
  * Stress patterns and intonation
- Provide simple pronunciation tips using katakana comparisons when helpful

GRAMMAR AND EXPRESSION:
- Address common Japanese learner challenges:
  * Article usage (a, an, the)
  * Present perfect vs past tense
  * Auxiliary verbs (would, could, should)
  * Subject pronouns (avoiding overuse of "I think...")
- Help break the habit of creating overly formal or textbook-like sentences

SPEAKING CONFIDENCE:
- Encourage speaking without over-focusing on perfect grammar
- Teach fillers and conversation maintenance phrases:
  * "Well...", "Let me see...", "You know..."
  * "Could you say that again?"
  * "I'm not sure how to say this, but..."
- Provide alternatives to silence or "etto..."

CULTURAL COMMUNICATION:
- Address differences in communication styles:
  * Direct vs indirect communication
  * How to express opinions clearly
  * When and how to disagree politely
  * How to make requests naturally
- Teach situation-appropriate responses instead of always using "yes" or apologizing

BUSINESS ENGLISH (when relevant):
- Focus on common business situations:
  * Email writing
  * Meeting participation
  * Presentation skills
  * Small talk with colleagues
- Teach alternatives to Japanese business English expressions

PRACTICAL TIPS:
- Suggest ways to practice English in Japan:
  * Recommend English learning resources
  * Share self-study techniques
  * Suggest ways to find English conversation opportunities
- Provide memory tricks specific to Japanese speakers

ERROR CORRECTION:
- When users make mistakes, use the "sandwich method":
  1. Acknowledge the meaning they're trying to convey
  2. Provide the correct form/expression
  3. Give a natural example using the correct form
- Point out common pronunciation patterns or stress mistakes if detected

LEARNING REINFORCEMENT:
- Occasionally recap key expressions or patterns from earlier in the conversation
- Provide brief explanations for why certain expressions are more natural than others
- Give cultural context when relevant to the conversation topic

CONVERSATION FLOW:
- Keep the conversation focused on practical, real-world scenarios
- Adapt to the user's interests and conversation style
- If the user seems stuck, provide gentle prompts or alternative ways to express their thoughts

PRONUNCIATION ANALYSIS:
- Analyze the user's input for potential pronunciation confusions by checking context
- Examples of contextual analysis:
  * If user says "There is right outside" when describing a lamp or brightness, they likely meant "light"
  * If user says "I late rice everyday" when talking about food, they likely meant "I eat rice everyday"
  * Context suggesting "glass of water" but transcribed as "grass of water" indicates L/R confusion
- When you detect such confusions:
  1. Point out the possible confusion
  2. Explain why you think there might be a pronunciation issue
  3. Provide specific pronunciation guidance
  4. Give example sentences contrasting the similar sounds

${translationInstruction}

Remember: Build confidence first, accuracy second. Many Japanese learners are skilled in reading and writing but need support in speaking confidently. Focus on creating a comfortable environment where mistakes are seen as learning opportunities.

If my last message is "[silence]", treat it as if I'm stuck and having trouble responding. 
In that case, provide gentle encouragement or rephrase your previous question in a simpler way.`,
            };
          };

          this.isFirstInteraction = true;
          this.startButton = document.getElementById('start-recognition');
          this.isListening = false;
        }

        setupRecognition() {
          // 音声認識の設定
          this.recognition.lang = CONFIG.SPEECH_CONFIG.LANG.EN_US;
          this.recognition.continuous = CONFIG.SPEECH_CONFIG.RECOGNITION.CONTINUOUS;
          this.recognition.interimResults = CONFIG.SPEECH_CONFIG.RECOGNITION.INTERIM_RESULTS;

          // 音声認識結果のハンドリング
          this.recognition.onresult = async (event) => {
            const transcript = Array.from(event.results)
              .map((result) => result[0].transcript)
              .join('');

            if (event.results[0].isFinal) {
              console.log('User:', transcript);
              await this.getAIResponse(transcript);
            }
          };

          // エラーハンドリング
          this.recognition.onerror = async (event) => {
            console.error('Error:', event.error);
            // 音声認識エラー時に[silence]として処理
            await this.getAIResponse('[silence]');
          };

          // 音声認識開始時の処理
          this.recognition.onstart = () => {
            this.isListening = true;
            this.startButton.disabled = true;
            this.startButton.classList.add('opacity-50', 'cursor-not-allowed');
            this.startButton.textContent = '聞いています...';
          };

          // 音声認識終了時の処理
          this.recognition.onend = () => {
            this.isListening = false;
            this.startButton.disabled = false;
            this.startButton.classList.remove('opacity-50', 'cursor-not-allowed');
            this.startButton.textContent = '話す';
          };
        }

        async getAIResponse(userInput) {
          // OpenAI APIを使用してAIの応答を取得
          if (!this.apiKey) {
            console.error('Please enter your OpenAI API key.');
            return;
          }

          try {
            // チャット履歴を制限
            const recentHistory = this.chatHistory.slice(-CONFIG.CHAT.MAX_HISTORY * 2);

            const response = await fetch('https://api.openai.com/v1/chat/completions', {
              method: 'POST',
              headers: {
                'Content-Type': 'application/json',
                Authorization: `Bearer ${this.apiKey}`,
              },
              body: JSON.stringify({
                model: 'gpt-4o-mini',
                messages: [this.getSystemPrompt(), ...recentHistory, { role: 'user', content: userInput }],
                max_tokens: 250,
                temperature: 0.7,
              }),
            });

            const data = await response.json();
            const aiResponse = data.choices[0].message.content;

            // Update chat history
            this.chatHistory.push({ role: 'user', content: userInput }, { role: 'assistant', content: aiResponse });

            // Update conversation history list
            this.updateConversationHistoryList(userInput, aiResponse);

            // Speak the response
            this.speak(aiResponse);
          } catch (error) {
            console.error('API Error:', error);
          }
        }

        updateConversationHistoryList(userInput, aiResponse) {
          // チャット履歴のUI更新処理
          const userItem = document.createElement('li');
          userItem.className = 'p-4 flex items-center justify-between';
          userItem.innerHTML = `<span class="text-gray-800"><span class="font-medium">User:</span> ${escapeHTML(userInput)}</span>`;

          const userPlayButton = document.createElement('button');
          userPlayButton.innerHTML = '<i class="fas fa-volume-up"></i>';
          userPlayButton.className = 'ml-4 p-2 hover:bg-gray-200 text-gray-700 rounded-full transition duration-200';
          userPlayButton.title = '音声を再生'; // ホバー時のツールチップ
          userPlayButton.addEventListener('click', () => this.speak(userInput));
          userItem.appendChild(userPlayButton);
          this.conversationHistoryList.appendChild(userItem);

          const assistantItem = document.createElement('li');
          assistantItem.className = 'p-4 flex items-center justify-between bg-blue-50';

          // 英文と和訳を分離
          let englishText = aiResponse;
          let japaneseText = '';

          const translationMatch = aiResponse.match(/\[JP:\s*(.+?)\]/);
          if (translationMatch) {
            englishText = aiResponse.replace(/\s*\[JP:\s*.+?\]/, '');
            japaneseText = translationMatch[1];
          }

          // HTML構造を作成
          assistantItem.innerHTML = `
            <div class="text-gray-800 flex-grow">
              <span class="font-medium">Assistant:</span>
              <div>${escapeHTML(englishText)}</div>
              ${japaneseText ? `<div class="text-gray-600 text-sm mt-1">${escapeHTML(japaneseText)}</div>` : ''}
            </div>`;

          const assistantPlayButton = document.createElement('button');
          assistantPlayButton.innerHTML = '<i class="fas fa-volume-up"></i>';
          assistantPlayButton.className = 'ml-4 p-2 hover:bg-blue-200 text-blue-700 rounded-full transition duration-200';
          assistantPlayButton.title = '音声を再生'; // ホバー時のツールチップ
          assistantPlayButton.addEventListener('click', () => this.speak(englishText));
          assistantItem.appendChild(assistantPlayButton);

          this.conversationHistoryList.appendChild(assistantItem);
          assistantItem.scrollIntoView({ behavior: 'smooth', block: 'end' });
        }

        speak(text) {
          // テキストを文単位で分割
          const sentences = text.match(/[^.!?]+[.!?]+/g) || [text];
          this.speakSentences(sentences, 0);
        }

        speakSentences(sentences, index) {
          if (index >= sentences.length) {
            // [silence]の場合は音声認識を再開しない
            // 直前のユーザー入力を確認
            const lastUserInput = this.chatHistory[this.chatHistory.length - 2]?.content;
            if (lastUserInput === '[silence]') {
              return;
            }
            // [silence]以外の場合は音声認識を再開
            setTimeout(() => {
              this.recognition.start();
            }, 100);
            return;
          }

          const utterance = new SpeechSynthesisUtterance(sentences[index]);
          utterance.lang = CONFIG.SPEECH_CONFIG.LANG.EN_US;
          utterance.volume = CONFIG.SPEECH_CONFIG.SYNTHESIS.VOLUME;
          utterance.rate = CONFIG.SPEECH_CONFIG.SYNTHESIS.RATE;
          utterance.pitch = CONFIG.SPEECH_CONFIG.SYNTHESIS.PITCH;

          // 音声認識を停止
          if (this.recognition) {
            this.recognition.stop();
          }

          // 次の文を再生
          utterance.onend = () => {
            this.speakSentences(sentences, index + 1);
          };

          if (!this.isFirstInteraction) {
            this.synthesis.cancel();
          }
          this.synthesis.speak(utterance);
        }

        startListening() {
          // 音声認識の開始処理
          // 初回のユーザーインタラクション時に音声合成を初期化
          if (this.isFirstInteraction) {
            const silence = new SpeechSynthesisUtterance('');
            this.synthesis.speak(silence);
            this.isFirstInteraction = false;
          }

          this.recognition.start();
        }
      }

      // インスタンス化とイベントリスナーの設定
      const voiceChat = new VoiceChat();

      // UIイベントの設定
      const startButton = document.getElementById('start-recognition');
      const apiKeyInput = document.getElementById('api-key');

      function saveApiKey(apiKey) {
        const secretKey = 'your-secret-key'; // これは安全な場所に保存してください
        const encryptedApiKey = CryptoJS.AES.encrypt(apiKey, secretKey).toString();
        localStorage.setItem('/EnglishTrainer/encryptedApiKey', encryptedApiKey);
      }

      function getApiKey() {
        const secretKey = 'your-secret-key'; // 暗号化時と同じキーを使用
        const encryptedApiKey = localStorage.getItem('/EnglishTrainer/encryptedApiKey');
        if (encryptedApiKey) {
          const bytes = CryptoJS.AES.decrypt(encryptedApiKey, secretKey);
          const decryptedApiKey = bytes.toString(CryptoJS.enc.Utf8);
          return decryptedApiKey;
        }
        return null;
      }

      // ローカルストレージからAPIキーを読み込む
      const savedApiKey = getApiKey();
      if (savedApiKey) {
        apiKeyInput.value = savedApiKey;
        voiceChat.apiKey = savedApiKey;
        startButton.disabled = false;
        startButton.classList.remove('opacity-50', 'cursor-not-allowed');
      }

      // APIキー入力時の処理
      apiKeyInput.addEventListener('input', () => {
        voiceChat.apiKey = apiKeyInput.value;
        saveApiKey(apiKeyInput.value); // APIキーを暗号化して保存
        startButton.disabled = !voiceChat.apiKey;
        if (!voiceChat.apiKey) {
          startButton.classList.add('opacity-50', 'cursor-not-allowed');
        } else {
          startButton.classList.remove('opacity-50', 'cursor-not-allowed');
        }
      });

      // 開始ボタンクリック時の処理
      startButton.addEventListener('click', () => {
        voiceChat.startListening();
      });

      // 設定をローカルストレージに保存する関数
      function saveSettings() {
        const settings = {
          englishLevel: voiceChat.englishLevel.value,
          sentenceLength: voiceChat.sentenceLength.value,
          showTranslation: voiceChat.showTranslation.checked,
        };
        localStorage.setItem('/EnglishTrainer/appSettings', JSON.stringify(settings));
      }

      // ローカルストレージから設定を復元する関数
      function loadSettings() {
        const savedSettings = localStorage.getItem('/EnglishTrainer/appSettings');
        if (savedSettings) {
          const settings = JSON.parse(savedSettings);
          voiceChat.englishLevel.value = settings.englishLevel;
          voiceChat.sentenceLength.value = settings.sentenceLength;
          voiceChat.showTranslation.checked = settings.showTranslation;
        }
      }

      // ページ読み込み時に設定を復元
      document.addEventListener('DOMContentLoaded', () => {
        loadSettings();
      });

      // 設定変更時に保存
      voiceChat.englishLevel.addEventListener('change', saveSettings);
      voiceChat.sentenceLength.addEventListener('change', saveSettings);
      voiceChat.showTranslation.addEventListener('change', saveSettings);
    </script>
  </body>
</html>

GitHub Gist


以上です。

シェアする

  • このエントリーをはてなブックマークに追加

フォローする