Research Intern - Azure Cognitive Services: Speech

Last updated 29 days ago
Location:Redmond, Washington
Job Type:Full Time

This description has been designed to indicate the general nature and level of work performed within this position. The actual duties, responsibilities, and qualifications may vary based on assignment.

Research Internships at Microsoft provide a dynamic environment for research careers with a network of world-class research labs led by globally-recognized scientists and engineers. Our researchers and engineers pursue innovation in a range of scientific and technical disciplines to help solve complex challenges in diverse fields, including computing, healthcare, economics, and the environment.

The Azure Cognitive Services team is on a mission to advance the state of the art in AI and deliver against our company vision for how intelligent cloud and intelligent edge will shape the next phase of innovation. The Cognitive Services team includes top researchers from across Microsoft to create a center of excellence in speech, computer vision and natural language.

As one of Microsoft’s most exciting AI initiatives, speech recognition is driving the accuracy and robustness of natural user interfaces with natural spoken language and their deep integration to user facing produce/service from Microsoft 1st party or 3rd party application. The Speech Group develops speech recognition features in Enterprise, Entertainment and Desktop and Mobile products and particularly in the voice platform that powers Microsoft 365 conversational transcription services, Search and Assistant experience on desktop and mobile computing devices, Bing voice search, Microsoft Office dictation, Skype speech to speech translation.

The Speech Group brings together talents in the areas of signal processing, speech recognition, statistical modeling and Deep Learning to develop and deliver robust, natural and scalable speech recognition, across a rich set of scenarios and languages. Our team has pioneered the industrialized deep learning based speech technologies, and has contributed key innovations to the speech community.

We are looking for Research Interns in speech recognition, speech separation and enhancement, speaker recognition and diarization, and audio processing. In this position, you will conduct both fundamental and applied research under the supervision of Microsoft mentors. As an intern, you will have the opportunity to prototype, demonstrate and publish your results, and more importantly to advance the state-of-the-art technologies forward to impact millions of users.


Interns put inquiry and theory into practice. Alongside fellow doctoral candidates and some of the world’s best researchers, interns learn, collaborate, and network for life. Interns not only advance their own careers, but they also contribute to exciting research and development strides. During the 12-week internship, students are paired with mentors and expected to collaborate with other interns and researchers, present findings, and contribute to the vibrant life of the community. Research internships are available in all areas of research, and are offered year-round, though they typically begin in the summer.


In addition to the qualifications below, you’ll need to submit a minimum of two reference letters for this position. After you submit your application, a request for letters may be sent to your list of references on your behalf. Note that reference letters cannot be requested until after you have submitted your application, and furthermore, that they might not be automatically requested for all candidates. You may wish to alert your letter writers in advance, so they will be ready to submit your letter.

Required Qualifications

  • Ph.D. candidate in speech recognition, separation, and enhancement, speaker recognition and diarization, audio Processing, deep learning, machine learning, AI or a related field.
  • At least a year of research experience in speech recognition, separation, and enhancement, speaker recognition and diarization, audio Processing, deep learning, machine learning, AI or a related field

Preferred Qualifications

  • Strong algorithmic problem solving and software development skills (C/C++, Python, etc.).
  • Experience with open source tools such as PyTorch, Tensorflow, etc.
  • Publication(s) in top-tier conferences or journals in related fields (e.g., ICASSP, Interspeech, ASRU, SLT, IEEE/ACM Transactions on Audio, Speech and Language Processing, Speech Communication, Computer Speech and Language etc.).
  • Excellent communication and writing skills.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.