A Department of Homeland Security Center of Excellence led by the University of Maryland

START fosters a collaborative, multidisciplinary environment to create a team to meet the needs of the homeland security enterprise and train the next generation of scholars and practitioners. 

Improving Arabic Text Processing for Studying Social Unrest - Spring 2019


Improving Arabic Text Processing for Studying Social Unrest - Spring 2019

Deadline: 
November 11, 2018
Type: 
Internships
Semester: 
Spring
Semester Year: 
2019

Description

START is excited to be collaborating with the University of Maryland Center for Advanced Study of Language (CASL) to offer the following project as part of our program.  CASL serves as the premier strategic research partner for the U.S. Intelligence Community (IC), solving the most critical and challenging language problems.

Large-scale analysis of text (including social media text) for computational social science benefits from the use of natural language processing (NLP) tools such as sentiment analysis, emotion detection, and authorship attribution/verification algorithms.  These, in turn, benefit from lower-level language processing tools (such as language normalization, morphological analysis, and named-entity detection) as well as from databases (such as sentiment lexicons, knowledge bases, and lists of named entities).

Both language processing tools and the databases they rely on work best in the domain for which they were developed.  When analyzing language in a new domain (such as tweets related to social unrest or crisis communication), the tools need to be adapted to that domain.  Likewise, as languages change and new words come into use, language tools need to be kept up-to-date or they begin to suffer from out-of-vocabulary problems. One way to improve NLP tools is through targeted annotation for previously unknown vocabulary items, such as domain-specific terms or newly coined words.

Responsibilities

As an intern on this project, you will learn ways to use NLP tools to study social media Arabic, and how to improve and retrain those tools for better coverage of the data. Depending on your skills and language background, you may analyze and annotate Arabic words that failed to parse in state-of-the-art NLP tools.  Annotation will include a basic description of the reason for the failed parse (e.g., variant spelling, dialect terms, named entity), the probable meaning of the novel term, grammatical (e.g., part of speech) information for the term, and (if applicable and discoverable) the dialect(s) and/or domain(s).

Annotation will begin on a corpus of social media Arabic collected by CASL’s Computational Cultural Assessment research team related to social unrest, and may include additional corpora on other topics during the course of the internship.  Depending on skills and interests, you may help collect new corpora related to Arabs’ reactions to current events or other topics of interest.

Supervisor(s):  C. Anton Rytting (crytting@umd.edu), Julie Yelle (jyelle@umd.edu), and Paul Rodrigues (prr@umd.edu)

Deadline: Sunday, November 11, 2018, 11:59pm

Citizenship Requirement: US citizenship is not required.

Qualifications:

Required

  • Basic proficiency (at least 3 semesters’ instruction if not native) in Modern Standard Arabic, with additional proficiency highly preferred
  • Ability to type in the Arabic script
  • Interest in human languages or language technology

Preferred

  • Familiarity with dialects of Saudi Arabia (Najdi, Hijazi) or Egypt (e.g., Cairene) highly preferred.
  • Structural understanding of Arabic grammar (especially morphology) preferred.
  • Basic familiarity with Linux and/or with natural language processing (NLP) preferred.
  • Some basic programming skills (e.g., Python or Perl) is helpful, but not required.

Team Meeting Times: TBD

Work Location: UMD Patapsco Building (near the College Park Metro station) or CASL’s main building (7005 52nd Ave, College Park, MD). Working remotely is not permitted, except by prior approval and arrangement with supervisors.


General Information for all START Internships

Location:

START Headquarters is located in the Discovery District in College Park, MD. Our exact address will be provided upon being invited for an interview. All internship hours must be completed at this office unless otherwise specified. Working remotely is not permitted. 

Schedule Requirements:

Orientation Date: Thursday, January 24th, 2019. All interns are required to attend orientation. You may be required to attend an additional day of orientation on Friday, January 25th, 2019. Your supervisor will inform you if you are required to attend both days.
Internship Duration: Thursday, January 24th, 2019 to Friday, May 10th, 2019. All interns must be able to commit to the duration of the whole program.
Work Hours: All interns must work at least 10 hours per week during the Spring 2019 program. Work hours are scheduled from Monday to Friday, 9:00am-5:00pm. Interns may not work longer than 8-hour shifts.

Other Information:

All internships are UNPAID and START is unable to provide travel stipends or housing arrangements.
We strongly encourage and recommend that interns seek academic credit for their internship through their home institution or department, if possible.
If undertaking the internship for credit, you must indicate this on your application form. Be sure to notify your internship supervisor if you need to work more than 10 hours per work for this reason.
Applicants interested in applying for an internship for any semester other than or in addition to Spring 2019, must submit a separate application for each semester with the correct application form for that semester.
How to Apply for START Internships:

START is currently accepting applications for Spring 2019. The spring application form will be open until Sunday, November 11th, 2018 at 11:59pm. However some projects for the spring semester have an earlier application deadline of Sunday, October 28th, 2018 at 11:59pm. Late, incomplete, or applications not submitted correctly will not be considered. To access the application: click here.

Notes:

Applicants must pay close attention to the requirements of each internship they are applying for, including attendance to team meetings and minimum time commitment. Inability to attend compulsory meetings or work the minimum required hours will result in the revocation of any offer made.
Address your cover letter to the internship supervisor of your first choice project.
Failure to complete the application form in full, including the selection of 1-3 internship preferences could result in your application being rejected without further consideration.
Failure to submit the proper materials according to the directions provided in the project description could result in your application being rejected without further consideration.
Due to the high volume of applicants, only top candidates selected for an interview will be contacted.
Applicants may be asked to attend more than one interview.
Any successful candidate will be asked to respond with a firm acceptance within 48 hours of the offer being made. Failure to respond could result in the vacancy passed to another candidate.
Any questions regarding the specific requirements for the internship vacancy should be directed to the supervisor(s) listed for the project.
Any questions regarding the application process should be directed to the START Education Team at internships-start@umd.edu.
Application Materials:

All internship applicants must submit all materials in one .pdf file using the file name format:

LastName, FirstName_InternCandidate.pdf or .doc.

 The internship application packet should include the following documents in the following order:

One page cover letter
One page resume
Official or unofficial transcript(s)
Two-page writing sample (Communications applicants must submit two writing samples.)
Note for International Students:

START welcomes applications from international students for all of our internships where US citizenship is not a requirement (see the qualifications listed for each project for details).

It is, however, the responsibility of the applicant to ensure that their visa or immigration status permits them to undertake an unpaid internship. It is also the responsibility of the applicant to ensure that all proper paperwork, like documented approval from your home institution, is available and processed in time for the start of the internship. Failure to comply with these stipulations, or provide the paperwork required to verify your status, will result in your internship offer being rescinded without further consideration. START is unable to sponsor visas for non-US Citizens due to the short timeline of our program and the lengthy processing time for visas. Unfortunately, this largely limits our ability to accept anything other than F-1 visas on regular, not OPT, status.