Find out more about studying Mathematics at Sussex.

Subject and course videos

Important note on the information outlined on these pages

Some information in older videos may be out of date. See our course pages for the latest information, including fees, modules, scholarships, placements and our rankings. It indicates what you can learn based on recent teaching.

There may be some changes due to staffing availability, different areas of research interests, student feedback and timetabling demands. Depending on the nature of the Covid-19 pandemic, we may also have to make changes to respond to this – please visit our advice for applicants pages for the latest information in relation to this.

If you would like to request a transcript of one of the videos shown on these pages, please email

Data Science webinars

Watch our recent taster lectures focusing on our Masters courses in Data Science and Human and Social Data Science.

  • Webinar transcript

    Rob Batchelor

    Good afternoon, everyone. Or good evening, maybe, depending on where you are. Welcome to the session on Data Science. I'm joined by Professor Enrico Scalas who's the convener of the Data Science programme, and he will be presenting for 40-45 minutes with a taster lecture. So I'm going to hand over to Enrico.

    Enrico Scalas

    Thank you very much. So I hope you can see the screen and hear me.

    I would like to apologise in advance for the following reason: where I am standing now, sometimes I have small breaks in internet connection and my voice is typically fading. So if this is happening, maybe Rob will tell me and I can repeat the part that faded. So welcome to this taster session on the MSc Data Science. I think I can probably start straight away, so the topic I would like to address is the future of data science.

    Of course, this is a taster session, and so I am putting together several topics that typically are covered in modules. This is not essential, it's sort of an overview seminar. It is not the typical lecture you would receive on one side, but on the other side, it gives you an idea of what is going on during lectures.

    So here you have my contacts in case you would like to write me separately. I am the convener of MSc Data Science, as Rob mentioned, and I'm also working in the Department of Mathematics as a Probabilistic. So, this is some self-promotion of the University of Sussex to start with. And indeed, our is one of the most beautiful campuses in the United Kingdom, I would say. It is in the heart of the South Downs National Park, so you are surrounded by a natural landscape of typical English countryside. It's not what it used to be, maybe 2000 or 3000 years ago, but still it is very typical in characteristic of England. Essentially, when you are in Brighton and you go north towards London, there is a small range of hills which are called the South Downs and Falmer, where the University is located, is essentially this Natural Park.

    Just to mention also that connection by train from Brighton Station is quick, normally it takes sort of 9 minutes from Brighton Station to Farmers Station and that's why I am saying here we are only minutes away from the lively, diverse and student-friendly seaside city of Brighton, and also we are quite close to London. We do not have a direct connection to London, but you can find trains to London in both in Brighton and in Lewes.

    These are the nearest stations with direct connections with and with London, and Lewes is even closer than Brighton. I think it's 5 minutes from Falmer Station. So these are pictures of the campus, this is the chapel, it's a multi-confessional chapel. And here you have some service buildings, and this is the slope going to the library.

    It's a picture taken from the library. Okay. So let's now go into more details, and essentially focus a little bit on the future of artificial intelligence. Of course, there is a famous joke that has been attributed to several people, but Quote Investigator (which is a website) has attributed this quote to the Danish politician Karl Kristian Steincke, who is actually a Danish politician with a German surname.
    But however, I don't know how the correct Danish pronunciation would be. And in his 1948 autobiography, which was with the nice title 'Bye and Thank You', and the quote is: 'It is difficult to make predictions, especially about the future'. So in Danish would be something like 'Det er vanskeligt at spaa, isaer naar det gaelder Fremtiden.' Something like that.

    Sorry for Danes who are here, if any, for my bad pronunciation. However, I have identified three main important trends both in artificial intelligence and in data science. The first one is the probabilistic causality. The second one is the approximation of solutions of partial differential equations using machine learning tools. And the third one is a rigorous results on machine learning and deep learning to really understand why the methods work. Let us start from point one probabilistic causality.

    So if you remember how all the fuss with data science started, there is a famous article in a popular science magazine, actually it's more than a popular science magazine, it's a magazine that is magnifying a little bit the Silicon Valley, etc. So Wire, yes. In Wire they wrote, maybe a little bit more than ten years ago if I remember correctly, this article or this column in which the author was essentially claiming that with the large amount of data we have available, as you have seen in the presentation by Julie, Julie Weeds in the previous talk, we have this data and we do not need theories or models anymore, but we just analyse the data and we extract the truth from the data. So this is the idea of that particular article, some science without theory and without models. To be fair, this is completely wrong, in my opinion, for several philosophical and also practical reasons. But luckily there has been, in computer science and artificial intelligence and data science, this trend of using a probabilistic causation model in order to understand what is going on and how data is really generated and which are the relationship between variables that made up that data.

    So, for instance. Connected to probabilistic causation, there is a theory for Directed Acyclic Graphs that is represented in this picture here where essentially every node, these circles here, represents a variable and every link here represents a causal relationship. So for instance, here you have a red node labeled with D, which is the direct cause of this green node labeled by S.

    So this could be, for instance, a disease causing some condition. And this is the most straightforward case of a Directed Acyclic Graph where you have essentially a simple direct cause and effect. So the analysis of this situation is rather straightforward. There might be functional relationships between S and D. So typically, for instance, if you are considering statistics, statisticians that use what they call general linear models, and maybe there is a linear relationship between S and D for instance, but the theory of probabilistic causality, whose main contributor is Judea Pearl, generally it's not necessarily needed to define the specific functional relationship between variables. You may have situations in which you, for instance, find the correlations between D and S. So these two variables may be strongly correlated, but there is no causal relationship, no direct causal relationship between D and S just because they have a common cause. This is the case in which you have a common cause and this is represented by this variable R, which is a cause of D and a cause of S.

    So this is a typical situation where you have a so-called spurious correlation, and if you do not have a theory or an idea of how your data is organised, you will make the mistake of assuming that one of these two variables is the cause of the other, especially if you do not have specific measurements of R. The third case that I am representing here is the common cause plus a direct cause, so it's another possible Direct Acyclic Graph where R is a common cause of both D and S, but then D has also a causal relationship with S in the particular cause of S.

    So if you can express by experiment, for instance, the effect of R, you will be able to see the direct effect of D on S, but if you cannot, R is acting as a confounding variable. And again you are in trouble if you do not know anything about R and sort of see strong correlations between D and S, and you have maybe even a theory that these are the causal S, but you cannot understand the perturbation of the confounding factor.

    So this is a very important part of the future of data science in my opinion. A second topic that I would like to cover is the approximation of solutions to differential equation problems, in particular, partial differential equations. Why I am focusing on this? Because essentially deep learning, and in general neural networks and other machine learning tools, are quite powerful interpolation tools.

    And so they are very good in solving mathematical problems for us. And they do it very quickly (of course, after a training phase). If you are interested a little bit in machine learning, you know that neural networks are layers of neurons connected to other neurons with activation functions which may be non-linear, and weights. And if you have, say, some input variables here and some outputs variables here and you start teaching the neural network to recognise the outputs variables, even the correct and input variables using several methods, including the famous Backpropagation algorithm which is starting from the output and computing the weights, optimising essentially some distance between the output of the network and the actual outputs. Yes. If you do all this, it takes a lot of time, but once you have done that, it takes a second, even less than a second, microseconds or below to get the result of a very complex computation. So this is an example of the paper of some colleagues of ours from Airbus and Imperial College who are using neural networks to solve partial differential equations, and in particular the partial differential equations that you would have to solve to study the profile of a wing for an aircraft. So, Sanjiv Sharma, I think, is working for Airbus and Francesco Montomoli is working for Imperial College.

    So this is an example of this. And to be fair, the methods are very good, however, you have to be very careful when you use them. So I am really very worried for the future, again, for this reason. So we are now building a dual machine tools and these beautiful machine learning tools are able to solve very complicated problems.

    However, they work very well only in a narrow range of parameters and variables. If you use them in another range, they will give you a completely wrong answer. And of course, if you are an expert in the field and you are using these methods, you know that. But now let us consider a situation where people are not studying these things deeply and they are just users.

    And the generation passes, and they maybe transferred this knowledge, right? And then another generation passes and this knowledge is transferred. But maybe then this knowledge is lost and you use the machine learning tool in some wrong regime. And if you are using it for critical problems such as the design of an aircraft, you can determine disastrous outcomes.

    Okay. So if you're lurking in the background here, there is no point. That if your knowledge is not deep enough, if you do not study enough, you may be the source of a disaster. And finally, the third point I wanted to briefly mention, because it's definitely probably outside of knowledge for most of you, is proving rigorous theorems on some aspects of deep learning. It is very important because it is only by using mathematics and using theorems that you can be sure that something is correct.

    Okay. And this is also resonating with the previous aspect. Okay. So I see that there are several questions coming in the chat and I will stop sharing the screen for a second and have a look at questions. So the first question is about the deposit fee and I cannot reply but I see that there has been a reply in the discussion.

    Let me see if I can find any questions. So hi to everyone, by the way, again. Assistance on applications. Yes. So scholarships are available and I will post a link. Essentially, if you look at the links in the main session, there is a link where you can find the information on the the MSc programmes. Maybe I can show you how you can do that.

    So, for instance, let me share the screen once more.

    And let us abandon the presentation for a second and let us go to a web browser. So I'm looking for the website of the University of Sussex, here, for instance, and I want to study Data Science Master's. Search and you will see the programmes in data science here. And let us pick, for instance, the data science MSc here, and here you have plenty of information including the typical requests for application.

    Like you should normally have an upper second class undergraduate degree or above. And the qualifications are clearly physics, engineering, science, computing, mathematics or life sciences. You might have other professional qualifications that can be taken into account, but of course if your background is, I don't know, in the history, psychology, philosophy, economics, we have to check whether you may stand a chance of passing our exams.

    But if you go down here or you pass the modules, you pass our pictures and here there is information on fees and scholarships. And here is 'How can I fund my course?' So in here you can see several opportunities for scholarships. It seems that there are scholarships for students coming from certain countries here. This is depending on where you are coming from.

    So I wanted also to point to the Artificial Intelligence and Data Science postgraduate conversion scholarships for people from groups currently underrepresented in the fields of artificial intelligence and data science. And I think if you go here and find out more, you can find if you are eligible for this particular funding. So you can see also the number of scholarships available this year.

    25, the deadline for applying and how to apply. Good. So let me stop sharing again for a second.

    Yes. What degree background is necessary to study data science? I think I answered this question just now.

    What are the fundamental topics that one needs to prepare for September? I would like to mention some programming, like for instance, if you have experience with Python, R, you could revise this, and probability and statistics, basic probability and statistics. If you are applicants, you will receive a specific letter with a link to where you can find further information, the online version of these MScs is not yet available.

    We are working on an online Data Science MSc, but this has not yet been approved by the University of Sussex. Fundamental subjects they should know before starting this program: so I was telling you, Python, R, other programing languages and also probability and statistics. Many of these questions are on the same topic, so I will go on with the presentation.

    I have 10 minutes more, roughly. Maybe a little bit more. It should be 1330. So it is time to share screen again.

    So just to give you an idea of the things that you will learn in in the programme, I have here created a first Monte Carlo R program. So now you are probably aware of the debates in data science between Python and R, now we have a series of modules using Python and we also have included for the next academic year, especially for those students coming from backgrounds where they have not [worked with] Python before, Programming through Python modules.

    But if you are not a programmer or you have never programmed, you might consider that this is not really the MSc program for you. Because we cannot teach you to program as if you were a student at the beginning of your bachelor's studies. So we have to [assume] that at least you have seen some programing in the course of your studies.

    However, we give you plenty of examples. This you can see here is a code in R. By the way, even if Python is present in many modules that you might follow, you might have to learn how to program R if you are not yet able to do so. And even MATLAB or even other programing languages.

    So you need to have a great flexibility and a great will to learn, and also self-learn how to program even though we will try to provide you all the material which is needed. But be careful that again, if programming is not your cup of tea, these particular MAasters are not suitable for you if you do not like programming or you do not like to work for hours on a computer.

    So this example here is taken from my Monte Carlo simulation module. It is very trivial, so let me run it first and then I will comment. Let's see if it gives some error. No. So what you see on the screen is a histogram of random numbers that are uniformly distributed between zero and one.

    So the random numbers uniformly distributed between zero and one are generated by this command here 'runif' in R. In other languages the syntax is slightly different, but more or less this is the situation. This is 1 million points. So the capital N is the sample size. And this command 'hist' is creating the histogram. The second floor of the square root of N is computing the square root of N, and its integer part is the number of beings . And probability 'equal to true' in the command 'hist' means that you normalised the histogram so that the integer below the histogram from zero to one is equal to one. So providing probability density function, for those of you know what a probability density function is.

    Incidentally, if you do not know what a probability density function is, you might again reconsider and think, if you have never heard about the probability density function then perhaps this is not the right program for you. Or at least you might be willing to study probability and statistics using some textbook, some links, etc. before you come. And then the rest is just to plot the theoretical value, which is just one. From zero to one and zero elsewhere.

    And this first plot command is plotting the histogram points here as points between zero and one. And in the Y and the X axis and between zero and the maximum 0.3 on this end. And then this lines command here is superimposing the theoretical value of the probability density function. So you see perhaps this blue line here. And this blue line here is the theoretical value of the probability density function.

    And you see that the histogram of the empirical Monte Carlo simulation, it gives you fluctuations around this line. And this is a typical behavior of histograms generated with Monte Carlo simulations. And as you increase the sample size, the dispersion becomes smaller and smaller as a consequence of a theorem behaving concurrently, which is in itself a consequence of the strong flow of large numbers.

    Yes. So this is an example of what we teach. So it's a good time to go back to the chat and see if there are question. So there is a question - if students can have extra credits to offer more courses during the semester. You mean if you can follow more than the prescribed modules or courses?

    In principle you can, but it is not recommended for the main reason that already following the standard modules, your time will be full essentially. So I would not recommend to follow extra modules. You can follow them as an auditor, so someone who goes there to watch the lectures, but I would not recommend taking the exam. Secondly, I wanted to find out if we can get resources to prepare ahead of the session commencement. Yes. So as I was telling you before, the applicants (those who have applied and accepted the offer, etc.) will receive, after the acceptance of the offer etc., a refresh letter from the university signed by the conveners. So it is formally signed by me in the case of data science, where there are several links and hints on this. I see that Rob is also adding something.

    Is there any possibility of coming with your spouse or family, etc.? Be careful because this is the United Kingdom. It's a country with a home office and it has visa rules. So go to the website, but at least they are quite clear. They are quite clear. So they give you most details that you need to know.

    And I will go now back to sharing the screen to conclude my presentation. I still have say 5 minutes maximum. Just to say that if you come to Sussex you will be sort of embedded in a research environment where many people are working, using data science as a tool, or their main research is data science. We have a research program which is called DISCUS on which I will tell you a couple of words and then research and data science is present in the Department of Informatics, of course, Mathematics, Physics and Astronomy, but also in the Sussex Humanities Lab.

    DISCUS is the Data-Intensive Science Centre at the University of Sussex, and they offer several helpful elements and tools for MSc students. For instance, they collect interdisciplinary MSc projects for your final dissertation. As mentioned by Julie, there is the Industry and Peer Mentoring programme, and a Student Challenge set of events, and also informal data science seminars for PhD students
    and Masters students. Final words on your career. Of course, there are two kinds of data scientists: developers of new ideas which we learn normally who might be developers and creators. And then there is a set of users of existing methods. So my advice to you is to become developers and creators. It might be more difficult, but if you become just a user of existing methods, there is a very fast obsolescence in data science.

    And in five, ten years you may face difficulties and even out of the job market if you are only users. As mentioned by Julie, we have contacts with several companies. This does not, however, guarantee for those of you who are interested in an industrial placement here that you will be getting the industrial place in here because the industrial placement is like applying for a job.

    Essentially you have to prepare your CV, you have to submit your CV, the company has to come back and invite you for an interview, have they have to like you at the interview before offering you the placement. So this is the reason why the placement is not guaranteed. And in the past, there were several questions about professional accreditation in the United Kingdom.

    Currently, there is no professional accreditation for data scientists and data science, but it is forthcoming, as you can see. The Royal Statistical Society is leading efforts to create the profession of data scientists, which will have specific requirements. And finally, as I was telling you before, you can always go in and check our webpage.

    You can contact me and people in admissions, etc. And so thank you and goodbye. Actually, this is the end of the presentation and I can stop sharing the screen. And we have 3 minutes for further comments.

    Rob Batchelor

    And just to answer a quick question that's on the chat, English language requirements from Ghana - we have those requirements listed on our website for all countries, for English language.

    So even if you're from the UK, we still need to see evidence that you can you can use English to the correct level. So you can find that on our website. For Ghana we accept the results from WASSCE so you will be able to use your school requirements for your master's application.

    Deferrals are usually possible. I recently joined Sussex, so I may not be up to speed with the exact requirements of whether it's possible to defer. And if so, for how many times. So do check the details on your offer letter once you've received that and we'll. let you know. Last recommended date for deposit payments?

    Well, if you need to have a CAS then you need to provide that for us as soon as possible so we can get your CAS because we won't release that until we've had the deposit. So that's worth bearing in mind, you need to plan in time for your visa application, which can take a long time depending on when. And the global pandemic obviously delayed things. Just running through the list of questions. Thank you for the extra link there. The scholarship requirements and the amounts are vastly different depending on the awards. We have scores of them, so I'm not fully up to speed with exactly how much each one is worth. But generally speaking, if you're awarded more, if you're successful for more than one Sussex scholarship, we will usually give you the higher value awards, not the lower.
    However, all scholarships are, as you would expect, competitive, so it's probably quite unlikely. You may be successful in getting an external award, for example, achieving a scholarship that could be combined with Sussex Awards, but normally not to more than the value of the tuition fees. How long is the application process? We like to think it's pretty swift. That will vary.

    Swift will vary depending on what time of year it is and how many applications we have. The earlier you apply, the faster you get a decision, generally speaking, because the longer you leave it, the more applications we have to deal with.

    Scholarship requirements are all listed online. It does vary from from award to award. There are some scholarships which are only for certain nationalities. And then I'm just going to get the link back to the main session as well - so I'm going a paste on the chat box so that you can follow the link.

    I've just pasted that. It'll take you back to the main Zoom session. And so we are just finished here. So that is us. Thank you very much for listening. I certainly learned a lot. I'm not sure I'm going to be joining data science anytime soon. So it's a good thing I've already got employment with the university. Thank you Enrico for that presentation, it was really interesting. Nice to see all of you from so many different places. We hope that you will join us at Sussex.


  • Webinar transcript

    Isobel Hussey

    OK. So I think that's probably everyone now.

    And I just want you to take the opportunity to say hello, So I'm Isobel and I work here in the international office and I help students join us from overseas.

    So predominantly I work with students who were joining us from Central and South Asia.

    But I can also help with various different queries and things like that.

    But enough from me. I'm going to hand back over to Julie now.

    Julie Weeds

    Thank you. OK, thank you, Isobel. Hello, everybody, again.

    You've got me again for this session as human and social data science potential students.

    Just to say that I'm convenor for the human social data science course.

    So I would be a point of contact for you on the course and would certainly be talking to you in induction and be talking to you at a time of setting up projects, etc. Luke is saying he hasn't got audio. Can other people hear me?

    Luke, I think it's just you.

    OK, so the point of this session is just to talk a little bit more about what it would be like to be doing the human and social data science course.

    And we set this up as a taster lecture, so what I thought I would do is talk to you a little bit about one module in particular which is a module that actually students on any of the three degrees might take.

    But it is one that is on offer to human and social data science.

    I'm sure this might give you a bit of a flavour of what it would be like to be on this course and what kind of skills you might want to work on in advance or what skills you will be learning once you get here.

    So a module that I teach in the first term is applied natural language processing, so that's what I'm going to be talking to you a little bit about this morning.

    It may be that you come and do this degree and don't do this module at all.

    So that's, you know, this is not a core module to human and social data science.

    However, it is an option and it is a quite popular option. A lot of students do choose this option, and students may want to know more about it to be able to make an informed choice as to whether it's the right option for them. And also, whether you do this module or not,
    I think it will introduce you to the kind of skills you will need, even if you don't do this module.

    Because one of the things that we'll be talking about over the next half an hour is Python programming, which is something we do in this module, and is something that you will be doing on at least three of your core modules, I think, in the first term.
    And it also in machine learning in the second term, and you'll also be doing that on your dissertation projects.

    So learning to programme in Python is something which you will be doing a lot of next year.

    And again, as human and social data scientists coming in, we don't expect you to have done any programming at all beforehand, although we are, as of this year - and you will hear more about this should you accept your offers - offering an optional pre-sessional course in Python programming to help you hit the ground running.

    But you will be learning to programme in Python through the different core modules and optional modules that you will be taking.

    OK, so I said I was going to talk about applied natural language processing.

    Let me just share my screen and let me run my slides.

    OK. So as I said, applied natural language processing may well be an option that you might choose to do in the autumn, so some of these slides are taken actually from the first lecture that I would give as part of that module.

    I've adapted them slightly for today's purposes, but it's quite similar to what you might find yourself hearing in a lecture in the first week in the autumn.

    OK, so first thing that we would need to think about on this module is: what actually is applied natural language processing?
    So I invite you to kind of try to break that phrase down a bit more.

    What actually is natural language? Well, natural languages are languages which were invented by humans in order to communicate with humans.

    So they include English. We're talking in natural language now, but obviously they also include other languages French, German, Chinese, Arabic, all of the languages that exist in the world, which humans have invented in order to communicate with other humans.
    When we're talking about natural language processing, what we're doing is we're trying to get computers to handle natural language inputs and outputs.

    Maybe only one of those. It might not be doing both inputs and outputs.

    But certainly, we are thinking about that automatic processing of natural language, which might be in the analysis of the natural language, so it might be trying to... mine text for information, or it may be that we're trying to do some natural language generation as well.

    When we're thinking about applying natural language processing on this module, what we're really doing is we're thinking of it in quite an applied way.

    We're thinking about using computer tools and applications to do interesting things with natural language.

    So it's not necessarily about the... we need to know something about the underlying techniques to understand how to use them, but it's more like driving the car rather than kind of building the car ourselves.

    If you want to learn more about building the car of natural language processing, then you would want to take the advanced natural language processing module in the second term.

    But here, what we're doing is we're thinking about: what can we get computers to do in this area?

    How can we use computers to process text and be able to analyse it in order to gain insights from large amounts of text data?

    Maybe we've got millions of tweets that we've collected on a certain hashtag, and we want to know what people think about the there a certain situation, maybe a sort of political situation in the world, or maybe it's a product a company's just released.

    So that would be one application.

    But I'd like you to think a little bit, as you should hopefully already be doing, what other applications of natural language processing are there?

    And if we were actually in a lecture, I would probably invite you to talk to each other about it.

    Or certainly, I would ask you to kind of put some ideas in the chat if we were doing this online, but hopefully we won't be in a hybrid or online situation in the autumn and we'll be in a lecture theatre or a seminar room, and we could actually just have a chat about this. What applications of natural language processing can you think of? I'm just going to go to Luca's question because he's just asked whether we can take the advance module as part of this programme.

    We don't usually encourage our human and social social data science students to take the advanced natural language processing module.

    That is a good point, partly because it is more mathematical and more programming orientated than this first kind of introductory module.

    But it would be possible, if you and I, as the course convenor/module convenor, felt that you had the appropriate skills to be able to take that module, it could be swapped in to your degree.

    That would be possible if you wanted to do that.

    Or it might be if you've got those skills, you actually think, Well, actually, I want to do data science rather than human and social data science.

    So that's something that we can talk about. before you apply or even once you've applied.

    OK, so hopefully you've been thinking about what applications natural language processing there are.
    And here are quite a few that I thought of.

    So natural language processing applications include: being able to retrieve and rank documents on the web relevant to the query, providing answers to questions that people ask automatically to be able to understand the question, be able to retrieve the correct information and give an answer to those questions.

    Translating documents or speech from one natural language to another counts as natural language processing.
    We might want to simplify our documents for a child or non-native speaker.

    We might want to summarise documents because we want to take something which is pages long and reduce it to a summary that's much shorter to read.

    One that I started, which is, I think, probably very relevant to human and data social data science is opinion monitoring.
    So trying to find products or companies with good or bad reviews.

    There's user content moderation. So trying to identify when somebody says something which they shouldn't on social media or chat, trying to automatically anonymise things or filter out that inappropriate content automatically.

    Automatic recommendation, trying to match up products and jobs and people.

    So maybe doing some targeted advertising. Then there's another one which many people think of, I think we're all getting used to chat bots and virtual assistants such as Siri and Alexa. They are applications of natural language processing.

    And there are many more which you may have thought about. OK, so why would you study applied NLP?

    Well, I've crossed this one out because this one probably doesn't apply to you as human and social data scientist.

    This applies more to the kind of artificial intelligence people. So I've crossed that one out.

    But what really does probably apply to you is that, lots of the data that we want to process comes as text.

    We want to be able to extract information and usable insights through text and answer questions such as: what the people think about... Donald Trump? Or Vladimir Putin? Or to come out of the political domain, what do people think about the new Apple MacBook? Whatever we might want to be able to extract. Identify the opinions that people have based on what is being said, potentially on the internet, on social media, and be able to analyse those opinions.

    It's also an important part of policy for companies or policy for government in terms of knowing what people are saying.

    What do people think? What do people think about their new policy? This will also drive potentially how the policy itself or potentially how that policy is managed and advertised.

    OK, so what will you actually do in applied NLP?

    Well, you would learn about some of the problems in NLP and the potential solutions. Because NLP's hard, let's say that, you know, in terms of perfect natural language processing, it's still a research area in terms of improving it.

    So you'll learn about some of those, why it's difficult and what we can do at the moment.

    A lot of the time you'll be deploying off the shelf NLP technology via the Python programming language, using libraries of methods and routines.

    You'll be working with pre-existing technology and applying it to realistic sized datasets to try to extract insights from. You will be.
    and this is why I think this is relevant to everybody, here learning the Python programming language and developing appreciation of the challenges of NLP.

    What would I expect you to know already coming onto the module? I don't assume any knowledge of Python.

    I will teach you the Python you need to know to apply NLP. However, I'm going to say this, and I will say this again at the beginning of next term. We do expect, you know, over the first couple of weeks that you will pick up what you need in terms of basic Python programming to be able to use Python within your different modules to be able to actually then learn more about natural language processing itself.

    So while we're not assuming knowledge of Python, it wouldn't hurt for you to start building that experience with programming, particularly in Python before you come.

    So if you've got any time over the summer, one of the best things you could do - people always ask about, What can I read?

    I say you don't need to read anything but go do a basic Python programming course, and that will be really good preparation.
    We're not going to expect that you have necessarily done that, but it will really, definitely help you.

    We certainly doesn't assume any knowledge of machine learning. You won't be learning about machine learning as a module until the second term.

    But a lot of natural language processing methods do use machine learning.

    So we will be using machine learning, but we'll be using it very much in the kind of using it kind of way.

    Applying it. But we will obviously be explaining some of those machine learning techniques.

    So that would be your kind of first introduction, potentially, to machine learning techniques, which you then learn more about in more detail in the second term.

    It doesn't assume any knowledge of linguistics beyond basic familiarity with the English language, vocabulary and grammar, so most of my examples would be from the English language. We occasionally have some foreign language examples in there as well, depending what we're doing. But it generally just assumes you're familiar with the English language.

    OK. The other thing that I really therefore wanted to talk to you about today was Python, because we will be, on this module, applying NLP through Python programming.

    So I wanted to do a little bit of an introduction to Python to actually get you going even now before you are here.

    So. You're going to be starting to programme in Python.

    What is all this about, if you haven't done any programming at all before? Well, Python code, you can think of that as a set of instructions that you want the computer to execute.

    Often many times. This is why we programme: because basically we're lazy. We don't want to do things over and over again.

    We just want to hit run and have the computer do it over and over again, so it will be much quicker.

    We can do a lot more if we can code it using our programming knowledge that we're going to be building up and get the computer to do these things automatically.

    The first thing we need to think about when we're thinking about coding or programming is how do we interact with that code? Because we need to be able to edit it.

    We need to be to change those instructions. We need to be able to give it inputs. If we want to run the code, what are we running it on? Are we running it on a set of documents that we want to process? Or are we running it on a set of web URLs that we want to process in some way? We also want to see the results.

    It may be that there's some analysis that we're doing on a set of documents, and we want to know how many of a set of tweets are positive about a certain product, and we want to therefore see the results.

    So what we use is what is referred to in computer science as an integrated development environment. Or abbreviated to IDE. And you will see this when people talk about, what IDE are you using?

    This is your environment that you use to help manage your code.

    What we will be using on this module and across, I think all of the modules the do coding in this degree, will something called Python notebooks.

    And we can access these. One of the nice things - There're many reasons, and we'll talk more about this probably next year, why we use notebooks - one of the advantages of a notebook environment, other than the fact that it allows us to do all of these things together in terms of editing, making notes, giving inputs and seeing the results, is that we can access our notebooks online via a service, a cloud service called Google CoLab or through dedicated software running on our own machines.

    And one that which we tend to use and which is installed in our lab machines, and which, if you want to install it on your own machine at home is one that we would recommend, which you can install for free, is Anaconda.

    Now what I've got for the rest of the session, we've got about 20 minutes left, is a notebook.

    And actually what I'm going to do, is I'm going to put that link into the chat. What you should be able to do is go to this link and you will actually be able to see this notebook and atually you can interact with it yourself and actually you can use this as a kind of first programming tutorial.

    And you can work through it after the session as well in your own time.

    So hopefully you can all access that link.

    I tested out my partner yesterday. He never does any Python programming, and he could access it on his own machine.

    So whatever machine you're working on, it should open up.

    And what you should see is something which looks a little bit like this.

    I think I've changed it in terms of I know I changed the title. It doesn't say NLE 2021 Lab 1 anymore.

    It says something like a taster, but in general, it will look something like this.

    And what you see is that there is a mix of different things in the notebook.

    There are what we call text cells and then there are what we call code cells.

    So and I've sort of highlighted them on this copy of the first bit of the notebook.

    So we can add text cells by clicking here and then we can just type text. Things that we want to write down.

    So this text here was just typed into a text cell.

    And then we have these code cells, which again, we can add by pressing plus code and we get something here, which is then a runnable code cell.

    So and here we've got the first line of Python programming code, which is the function call print.

    And what we're going to try to do is ask the computer to print this bit of text: Hello World. What happens is then we would run that code cell, to see what happens.

    And I can bring this up actually on my machine and do it live in a second.

    It's probably easier than actually looking at here. So wait a minute. Let's just switch.

    I've got it over here. So what we've got is this one here. And so when I'm here, there're different ways in which I can run a code. So you can actually just click the play cell there, but actually when you get used to this, you kind of go for this shift + enter, which is the kind of keyboard shortcut. It's so much easier than always finding the little play button at the side.

    But either of those will work. If I run that - actually, first of all I'm going to clear all my outputs from when I ran it earlier, when you load these things up, you kind of give you your last outputs.

    OK, so if I click on that, what you see is the output from running that piece of code. And you can see that the computer thought about it a little bit and realised that in order to complete this instruction, it needs to put in the output here, this piece of text 'Hello World'.

    It's under edit clear all inputs. Oh, thank you. That's what I was looking for.

    Yeah, it's different between Google CoLab and between Anaconda. On Anaconda, it's on the kind of runtime menu. On Google CoLab it's here, yes. If I did clear all outputs, we can see all of those outputs are gone, and now we have to actually run them to be able to see what actually happens. And what you should have, if you've gone to that link, you probably need to make a copy of it in order to be able to edit it in any way.

    And therefore you would need to save a copy somewhere so that you can actually edit the file yourself.

    But yes, we can run these cells, and what we find is we get the output of the different cells.

    I can just flick back to my slides. I'll come back to that in a moment, but if I bring my slides back up.

    So yes, that's what we see. We've got our code cells, we've got our outputs.

    And then, yes, this is actually what's changed slightly. This is CoLab, but they must have changed it because it used to have 'restart runtime' there.

    And then you still need to clear the cells for what you would do. Slightly different on Anaconda; it's on something called the kernel menu.

    That's a useful if you want to turn something off and on again and start again, restarting the runtime was a useful thing to know where that is.

    OK. I've noticed there's a question coming in, I'm going to try to get to that question at the end of the session before we go back to the main session.

    With any other questions that come in like that.

    But I just want to kind of go through a little bit more on the Python programming as a kind of introductory session.

    OK, so that was our notebook functionality.

    The first thing that we learn about with our first inductory, Python notebook, and again, this is something which you would do in that first week of term, but again, we would expect you to kind of make good progress with it.

    This is all things you're going to learn quite quickly.

    So the first thing that you're going to need to be thinking about are other different kind of data types, and to know that any piece of data has an associated data type which tells the computer how to interpret the value. Because actually in the computer this is a stored as what we call 'bits'. Ones and zeros.

    So it has a, allocation in memory. There're ones and zeroes there, and we need to tell the computer how to interpret that.

    And so data has what we call a type. The basic data types that we deal with are integers, so these are whole numbers; floating point numbers / floats, which are decimal numbers; we have strings which get abbreviated to str, so it's something like 'my name is Julie' is a string; and then we also have this type called boolean, which are true and false, so this is just two possible values. And again, it's another kind of important basic data type that we have.

    We then have to differentiate between variables and values.

    And this, again, is something which is an important distinction to make because quite often what we want to do is, we have a value and we want to remember it.

    We want to be able to use it later. So what do we do? We put it in a box, we put it in a variable.

    We store it there for later and then we can use it again so many times later.

    Each variable has to have a name. Some way to identify it, so we know where something is. It's basically a kind of basic filing system on the computer. Variable names can be absolutely anything you like.

    But it does really help if you name your variables was in a way which helps you know what's in the box, what's in the variable, so it helps you to organise your code. So here I've got a variable called 'student name', and what I'm doing here is using the equals symbol to say, store the string 'Adam' in the variable called 'student name', and then that will be stored there for later.

    And then I can do operations on this variable that are appropriate to the thing in the box.

    So I might do string operations on it. What kind of operations might we want to do?

    Well, we might want to print 'Hello'- whatever that is stored in that 'student name'. If I go back to the piece of the code that I've got over here, I've got some other bits here going through the different types, which again, you can kind of have a look at in your own time.

    Some basic operations. So here we're talking about the fact that we could join strings together using the + symbol.

    So we're using that within what I had on the slide as well.

    So here, we are joining together two string values 'Hello' and 'World' to make a single string.

    'Hello world'. We've got operations that we can do on integers and floats. Kind of fairly obvious mathematical ones.

    And again, you start to kind of see the kind of notation that we might use here.

    We use the * for times, and the / for divide.

    A kind of important one that you'll need to get to used to seeing, we're not really talking about it now, is this double = to test if two things are the same.

    So that's telling me, yes, 5x4 is the same as 2x10.

    I've even got some exercises for you in there. The one that I was talking about here is the fact that we've got 'student name' there.
    You can print it out. I don't want to do all of that now.

    A bit that I wanted you to think about is, what do you think the output would be for this piece of code?

    So when we had 'print' and then we had 'Hello'+'Student name', what's going to happen there?

    And well, if you don't know, you can press run on that cell.

    But what you should see is 'Hello Adam'. And it's whatever we've stored in 'student name'. If I add in another cell here - let's move that up because it's before - and I change what student name is, so say actually student name is now saying Julie.

    Now, if I run that same piece of code, it will say 'Hello Julie'.

    I can change it again. Whatever is in the box is what I print 'Hello' to. So when we come back to my slides, here, we should see that when we've done that, we get the 'Hello Adam' because the student name was Adam at the time.

    In the kind of first lecture we would have, and as you're learning Python, you're going to learn about a lot more complex data structures that we don't have time to talk all about in the next ten minutes.

    I was going to just do a very brief introduction to lists. Just as a kind of harder data structure that you'd be learning about, quite, very early on on the course, and this is how we group together our basic data types to make some kind of collection of data types.

    So within that notebook, if you keep working through it, it takes you through some basic list functionality.

    First of all, what is a list? It's an ordered collection of other data types, which can vary in length, and we use them a lot in natural language processing because we have a lot of sequences of textual objects, characters, words, sentences, paragraphs, page documents. There's always an order. And so a list makes a sense. If I want a sentence, I can think of it as a list of words.

    If I want a document, I can think of it as a list of sentences, as if I want a whole document collection, well, that's a list of documents.
    And quite often, we want to do the same thing to everything in the collection, and that's what's where lists come in really useful.

    So here I've just got some code going through how we would set up a list. And see, lists of things which come in square brackets.
    So if I want the list of numbers 2, 3, 5, 7, 11, I separate them by commas with square brackets at the start and the end to say, this is a list. I'm then storing this list in a variable I've called 'Prines'. But remember. I've called it 'Primes' because I wanted to remember that they're prime numbers, but I could've called it anything. I could have called it 'monkeys', I could have called it 'cheese', I could have called it absolutely anything I wanted, but to help me with my programming, I've called it 'Primes' to help me remember that's what I'm storing in there. And we can store anything in a list. I've got numbers here, we can have strings, we could have a mixture of numbers and strings. That would all be fine in Python.

    It's important to note that the order in a list matters. We do have other data types where order doesn't matter, such as sets.

    You'll learn more about those next year. But here, a list which is 2, 3, 5, 7, 11 is different to a list which is 3, 7, 11, 5, 2.

    What kind of things might you do with lists? We're going to talk a little bit about indexing and iteration.

    So first of all, list indexing. We want to get an item out of a list at a particular position.

    So that is what we refer to as the the list index. A thing to remember in computer science is that we always start there, right?
    So the first thing in your list is not in position one. It's in position zero.

    It's just one of those things that computer scientists do, they start counting at zero, so the first item in the list is index zero.

    So if I want the first item in my list of primes, I would say 'primes', and then to index into it I use the square brackets again, say primes[0]

    And if I run that piece of code in the notebook, I would get back 2. Because that is what zero is pointing at in my list. If I want one of the others, if I want the second item, I would say primes[1].

    If I want the third item, I say primes[2], etc. all the way until the end. And you can have a little play with that in the notebook.

    Also, some other interesting things we could do with list indexing; we could actually index from the other end of the list as well. So if I might not know how long the list is or I might not care how the list is, I just want the last thing on the list, I can just say primes[-1]. That gives me the last thing. I can count backwards through the list as well, so I can have the third-from-end thing, that's primes[-3].

    So that would give me five when I run that bit of code there. So that's indexing. My last thing I wanted to mention to hopefully not have blown your heads too much in terms of introducing you to Python coding, is something which starts to make coding become more useful.

    It's this idea of being able to iterate over a list. We take each thing in a list, do the same thing to it. So I don't have to give an instruction every time, I can just say to the computer, for everything that's in this list, do this operation.

    And that's what a 'for' loop, which is the most simple kind of list iteration there is, Will do for you. Here what I'm saying is I have a list, and this would be any list variable I could put here, so this is the one that I want to iterate over. Do the same thing to everything in it.
    So what I'm saying is 'for prime in primes'. See this again could be called anything.

    'Item in primes' might be a good way of referring to that.

    I'm going to take each one of those items and then I'm going to go through this looping process where the first time, prime is the first thing, that's index 0.

    Then it's at index 1, then it's at index 2. I don't have to tell it to increase that index point.

    I can just tell Python I want to do the same thing for everything in the list.

    And the thing that I want to do goes in the body of the loop. What I want to do is print.

    I want to print that number out. I know it's a number.

    We don't actually. It might be anything. It could be another string. And then we're going to print 'is a prime' as well.

    It's a very, very simple iteration. What we can see is that these two lines of code allow me to, however long that list, this list could have hundreds of numbers in it, would print a line for every single item in the list, which starts with the item in the list and then 'is a prime'. As we do more programming you'll learn more interesting things that you might actually want to do to those items in the list.

    OK. That was my kind of mini introduction to Python and to applied natural language processes. I'll put a few words at the end of that mini introductory lecture there as well. You might want to sort of look at and think, did I pick up on that as a kind of keyword that might be useful in programming? And think about what they mean.

    If you've got any questions for me, we've got about three minutes before we have to go back and join the other session for the the general Q&A.

    I can see there's one question on the Q&A. This is an interesting question coming in. It says what is the background of students that are normally on this stream and what type of roles would they go into?

    So typically we actually, again, we have a wide range of students.

    What we imagine is that most of the students coming into this will have a background in business or in humanities and social sciences.
    So that's the typical background.

    But actually, we do have students who actually convert the other way as well, who kind of actually have a background in computer science.

    But they're really interested in the human and social side of data science, and they want to work in essentially in business or in policy, and therefore they want to learn more about innovation and policy.

    And so they take this degree as well. Typically our students might go on and work, and again, we will talk to Emily, who's one of our alumni of this course shortly, and we've got another student coming in.

    I mean, typically you would imagine that you might go on to work in policy, in government, or in a company that's dealing with kind of human and social aspects. We had a number of students, they were probably a mixture of human and social data science and data science students, but that's partly because we didn't have many human and social data science students, I know whenever I talk to Brandwatch they're always really interested. They've massively supported this as a course that we deliver, saying that they find it easier to train up people in the technical aspects, but find it hard to train up people in the domain aspect.

    The fact that we are taking students into this degree who have more knowledge about the kind of social science aspects of data in the real world.

    And then we can guide their learning to include programming and more statistics on their course that they can carry on to do this at a company such as Brandwatch. So I know very much that, you know, Brandwatch are very keen to look at our graduates of this course.
    So I hope that answers that question. How much experience you need for data science is doing with social sciences, e.g., psychology.
    It's hard to say how much experience. I think everybody comes in with different experience.

    You don't need a certain amount of psychology or a certain amount of business.

    It is the fact that people are coming from different backgrounds, but you may well find that your background and your experience may then influence which optional modules that you choose and also what project that you choose to do and who you choose to work on your project with.

    Because when you do your project, you want to obviously showcase your skills.

    And if you can do a project which brings together skills that you already had before you came into the university, and also the skills you've learnt on the course, that's where, you know, the best projects come from.

    So I'm not sure I've completely answered that question, but I've answered that question the best that I can. Question from Geraldine: how many people are typically on the course? This year we've had 24 or 25 students on the course, I believe, and certainly be looking to maintain that kind of number next year, maybe a few more, depending who applies.

    But yeah, I would be thinking 20 to 30 students in the course.

    Any other questions, because I believe we should be rejoining? OK.

    Isobel's put the link back. So let's finish there.

    If you have got more questions, we can put all those into the general Q&A as well.

    So please keep hold of those questions and ask them in a moment. OK.

    See you in a moment.


Free public lectures

Sussex Universe is the free public lecture series from the Schools of Mathematical and Physical Sciences and Life Sciences at the University of Sussex.

The lectures cover a range of topics across the research carried out in both schools including astronomy, quantum technology, experimental and theoretical particle physics, mathematics, and materials science.

Explore our public lectures

Useful links

You might also be interested in: