[Skip to Content]
[Skip to Content Landing]

Sexual Orientation and Gender Identity (SOGI) Data Collection

Learning Objectives
1. Describe the utility of complete demographic data
2. Recall terminology and concepts related to SOGI data
3. Describe best-practice SOGI data collection methods
0.5 Credit CME

This video is an excerpt from the AMA Advancing Equity through Quality & Safety Peer Network session on Sexual Orientation and Gender Identity (SOGI) Data. This section provides an overview of important SOGI data topics, concepts, and best practices provided by Dr Carl Streed Jr.

Sign in to take quiz and track your certificates

Education from AMA Center for Health Equity
AMA’s online education to empower individuals and organizations, in health care and beyond, in advancing racial justice and equity. Learn more.

Video Transcript

Carl Streed Jr., MD, MPH: I've been tasked with, in a short period of time, to try and cover the how, why, and what of SOGI data collection: sexual orientation, gender identity data collection.

Streed: I'm really going to do a bit of a whirlwind here, because I think a lot of the gray area around this is going to come out in the Q&A. But that being said, I want to try to provide the best evidence that we have for the time being to answer a few questions up front.

First things first in basic disclosures: As mentioned, I do hold a variety of leadership positions in organizations or associations that are germane to broadly LGBTQ health and SOGI data collection in particular, and I have grant funding specific to a lot of issues in LGBTQ health, as well as best practices in SOGI data collection, and also serve as a consultant, but not germane to this work.

Also, I want to start with a land acknowledgment. Normally I'd be coming to you from Boston, which is the ancestral home and current home actually of the Massachusett Tribe, but I'm actually currently coming to you from Albuquerque, which is also the homes of... the Pueblo and the Tigua.

I think it's always important that when we talk about broader social justice work—which I do think health in itself is part of social justice work—we need to really acknowledge the history of what has happened on our continent, as well as to first acknowledge it to be able to first to then begin to address it. And so I want to start there.

Some of the learning objectives I'm going to try and cover quickly. I will talk about the value and utility of actually doing this demographic data collection, and then I will talk about some key concepts briefly. Then I will talk about what has been demonstrated to be helpful in convincing folks around the utility of data collection, I will also talk about the implementation efforts around this.

We have penned a number of perspectives on SOGI data collection, the main thing I always want people to really acknowledge is that data is political and data reflects a set of priorities, and that when we do not have data on populations in particular, we are essentially saying there are no problems... specific to that issue.

This has been the issue—particularly around race/ethnicity, until more recently—where if we weren't collecting the data robustly, we couldn't identify any kind of disparities or inequities. This is what we're trying to translate into LGBTQI+ health, where we're trying to collect sexual orientation, gender identity, and intersex status as well, which I'll talk about more.

Collecting the data allows us to identify problems, then actually engage in action. This is really the crux of a lot of health disparities in health services research, in terms of: We defined the data, we define the problem to detect what's happening, better understand what's going on, and then actually develop intervention.

The National Institutes for Minority Health and Health Disparities has created this great graphic to really understand the multilevel ways in which data around populations and communities actually can be broken down into a number of components to better understand what is happening.

Particularly in the work that I do that utilizes SOGI data collection, that utilizes better categories around sexual orientation and gender identity, that I can understand the individual-level stressors, I can understand some individual-level issues that may be happening, even at the biological level.

But I can also then focus on community-level factors, and then develop interventions that that essentially intercalate at each of these levels, both individual/interpersonal community and then higher up in society. I think it's important to think of data collection as really what undergirds all of our efforts to better serve individuals, communities, and broader populations.

That's the utility of doing it. We can't identify problems, essentially, without having complete demographic data.

I want to jump into some terminology, I've been bantering around a lot of terms already. You've heard the, essentially the alphabet soup and some people describe it as LGBTQ. Those terms have clear parts within history, including lesbian, gay, bi, trans.

"Transsexual" is a term we don't really use in the US or the English context. That said, transsexual, when translated, is actually the term preferred by South American trans individuals. Again, I think it's important recognize that there are cultural differences in some of these terminologies.

"Queer," again, a term that has been kind of reappropriated by communities that previously was derogatory, but it is a category around sex orientation and gender identity that people may be using as well.

"Intersex" reflects around somebody's sexual anatomy or a variety of characteristics as it relates to their sexual development. This is a more clinical diagnosis, but it can also be part of somebody's identity when they better understand what's going on.

"Ally" is what I'm hoping everybody here on the call aspires to be through behaviors and actions to really kind of support LGBTQ communities—and, again, SOGI data collection is one of those steps.

"Asexual" is one of the new "As" that people are beginning to more properly understand as we collect information around sex orientation. Asexual individuals are people who do not particularly seek sexual intercourse.

"Pansexual" is kind of an additional term, which some people mistakenly put with bisexual, but pansexual kind of describes a broader sexual attraction to individuals.

Those are the terminologies, but I think it's more important understand the concepts around gender identity expression, sex assigned at birth, and physical attraction and emotional attraction, that play into all these major components that we're talking about around SOGI data collection.

I think many folks on the call probably familiar, but just to reiterate: "Gender identities," the internal sense of self as it relates to man, woman, and additional gender categories specific somebody's culture or society.

"Gender expression" are the social and cultural cues people use to actually express their gender identity. And, again, I really want to stress that these are derived by how somebody was raised, where they're where they grew up, and the culture in which they identify. These can vary significantly across time and significantly across groups of people geographically as well.

I always like to use the example from The Daily Show with Trevor Noah, where Trevor Noah actually had a non-binary activist, Jacob Tobia, on and Jacob was trying to help Trevor Noah—essentially "trans-as-gender expression," trying to trying to help him, essentially, be more feminine—gives Trevor Noah an earring.

Trevor puts it on, he's like, "This is great! You know, what my grandmother would actually be very proud because this is actually a symbol of masculine expression in my South African culture." I like that example to really highlight that we can't make assumptions around somebody's gender expression as it relates to their gender identity.

"Sex assigned at birth," I think, is what people are most familiar with in the medical context. This is what we seem to think we're capturing when we have the sex variable in our electronic health record system. What this is typically done is a reflection of "What did a clinician say somebody's sex was based on their external genitalia at birth? It's a boy, it's a girl," and left it at that, when we know there's actually more complexity around this as it relates to differences in sex development or intersex characteristics, which also need to be taken into consideration.

As more laws evolved to better reflect that it's not just male/female at birth, that there are additional sex categories, that this is something that the electronic health record system has to be prepared to absorb. This is actually work that we're doing in Massachusetts, where we now have laws to allow an X category for sex markers.

"Physical attraction," "emotional attraction," I don't have to get into too much depth here, these are really more of the components that define sexual orientation or sexual identity.

With regards to all those concepts and what we need to collect now—thanks to the National Institute for Health for asking the National Academies to draft a report on best practices around SOGI data collection—these are the main categories that they are recommending that we look at:

"Sex assigned at birth," as I described. "Intersex characteristics" as a separate category to query.

An "anatomy inventory" allows us to best characterize somebody's physical body as separate from necessarily their gender identity, sexual orientation, or their sex assigned at birth, so we're not making assumptions around their anatomy that dictates clinical decision making.

This is actually germane to, for example, straight, cisgendered individuals who had surgery where they may have had a hysterectomy that has removed their cervix. That would be important for us to know so we stop querying them about doing cervical cancer screening and the like. So again, that is an additional layer that we're trying to add to electronic health records, and this was a strong recommendation from the National Academies.

Again, they highlight gender identity and sexual orientation, but within all of this, they also want to make sure that electronic health records accurately record somebody's name. I want to highlight the difference between somebody's name and their legal name that may be on their documents, this may not reflect their actual name.

Same with pronouns, electronic health record, are rife with automated messaging that utilizes pronouns based on assumptions from somebody's sex marker within the electronic health record. We need to actually collect somebody's correct pronouns and ensure that those get incorporated into all kinds of communication. That will improve not only general workflow, but it will also improve patient trust in the system for utilizing the correct terminology.

I think I want to highlight here that people may have noticed I'm not saying "preferred name" or "preferred pronoun," I don't think those are the correct ways of describing it. It's one of those things, "Carl" is not my preferred name, it is my name, "he/him" are not my preferred pronouns, they are the pronouns that I use. "Preferred" can be seen as an implicit way of essentially belittling somebody's name or pronouns.

That was a whirlwind around the concepts and highlighting that these concepts are high-level recommendations from the National Academies and the National Institutes for Health for appropriate data collection. I want to talk about some of the basics and the tools that are available to actually implement this within electronic health records.

First things first—there's so much evidence out there at this point, there are numerous articles, this is just a handful of the more recent ones that really highlight the benefits and the ease with which SOGI data collection can be implemented in a number of settings, and I will highlight a few key components of that— so first things first:

This is work done by the Fenway Institute, one of the leaders in LGBTQ health research, and particularly one of the go-to sources for a lot of folks to best understand how to implement SOGI data collection. They had done a survey in ambulatory care setting in the... late 2000s, early 2010s, across a number of geographic sites—so making sure they not only looked at New England, where they're based in Boston, but also looking at Alabama, looking at Chicago, looking at the additional sites within the plains states and Denver, and then also looking at the West Coast to try and ensure that there's geographic diversity in these ambulatory care settings.

They essentially asked, "What did people think of SOGI data collection in general? Did they understand the question? Did they think it was important to the patient? Did the patient think the question was easy to answer?" and so forth.

I'm highlighting the "somewhat agree" or "strong agree" to a lot of these questions to really highlight that, across the board, SOGI data collections are understood, patients recognize it's important for their health, that if they had any questions that the clinician was able to best answer the question and explain why the SOGI data question was needed, and the questions were essentially very easily understood across a number of settings.

This same work was then recapitulated with an emergency care setting. This is a study funded by the Patient Centered Outcomes Research Institute, PCORI. That was a multi-site study looking at Baltimore and Boston... and I apologize, I'm forgetting the third site. This was led by Dr Hyder, who is now dean of a medical school in Pakistan.

We were looking at: how do people feel about actually asking SOGI data questions—particularly sexual orientation—in emergency room settings and as it related to their particular health concern?

I'm highlighting a number of characteristics of respondents to highlight that, across the board, you have overwhelming majorities of folks who would not refuse to answer the question, who really understood the value of providing SOGI data collection.

Patients—regardless of education, age, sexual orientation, or even racial/ethnic background—were willing to answer these questions. I think that's often one of the first things that people say: We're worried about people not wanting to answer this question."

Further work that the study went on to show was that providers, clinicians, were actually more fearful of offending patients than patients were unwilling to answer the question. So, it's us as clinicians, we're actually one of the barriers to SOGI data collection, because we're afraid of offending patients. Patients are like, "Oh, no, we totally want to answer these questions. We understand the value of these question, and we would answer these questions, so long as it's appropriate for our clinical care."

The same work has been done even earlier. I like to highlight this great graph by Dr Brenderson Goldson from the early 2000s/2010s, but has been recently redone using the Behavioral Risk Factor Surveillance System national data, and I want to highlight here differences across age.

Older generations across the board have started coming down in terms of their nonresponse or refuse to answer around sexual intention. What we have found, what I've always found quite funny in this data, and has been repeated multiple times, is that people are more likely to refuse answering questions around their income than they are around their sexual orientation or gender identity. Income actually seems to be one of the more sensitive topics across any survey or health surveillance system in terms of how comfortable people feel, answering those questions and sharing that information. Sexual orientation: less controversial than asking somebody how much they make.

Now, the steps to actually ensuring that SOGI gets incorporated into electronic health records... there are multiple steps. This is not like a "flip a switch and done" kind of situation. You really have to create a team, this has to be a committee effort—multiple committees, most likely—that incorporates leadership from the top having buy-in and recognize the value of SOGI data collection and really trying to move mountains to make sure it happens.

Of course you have you have to have clinicians and clinical staff across the board, and I'm talking about everybody, from registrars all the way through to clinicians. The registrar, medical assistants, nurses, and so forth, really need to be part of this team to really understand getting everybody on board and getting buy in from all these groups, but also making sure that their perspectives are shared in terms of how to navigate SOGI data collection as it relates to their tasks.

And then, arguably, one the most important parts of health information: tech folks. Because our electronic health record systems are not monolith—they communicate with a lot of other systems—we have to make sure that those that SOGI data collection can crosstalk with a number of components.

This is something that myself, as a clinician working on implementation within our own health system, recognize that we have not only our own electronic health record—which happens to be Epic—and has tools for collecting SOGI and has communicate with at least another dozen other systems in terms of: insurance, billing, pathology reports, radiology reports, making sure that it communicates correctly to patients and so forth—each of these are different systems that have to incorporate this SOGI data collection from the EHR itself.

You need to come up with a particular timeline, you need to be realistic about this. Again, this is not something we expect people to be able to happen overnight. We have seen situations where that's been done, there was a there was a health system within the boroughs of New York and suburban Connecticut that essentially said "We flipped the switch and SOGI was immediately available."

That created some level of chaos, particularly on the registrar side of things. I don't recommend doing it that way. I think you have to have buy in from the registrars first to make sure they understand that this information will be available and that they need to be prepared to collect that information.

Last but not least, and not least important: I want to make sure community is involved. Large healthcare systems, I think, would benefit from having community advisory boards. A lot of electronic health record systems have had community involvement in terms of how they craft their SOGI forms for their EHRs. I encourage that we do the same. We are pulling together a community advisory board for our own health system to ensure that our current categories makes sense in SOGI as well as moving forward.

I want to highlight that these are requirements, this is not something we can pick and choose anymore. SOGI really needs to happen at this point. This is a great report from the United States Core Data for Interoperability. This is part of the [Office of the National Coordinator for Health Information Technology] really highlighting that SOGI data displays need to be understood not only on the clinical side, but on the patient side. And that this should actually be customized to reflect the community that you're providing care for.

I don't think the SOGI questionnaire, the generic SOGI questionnaire, for example, for Boston makes a whole lot of sense to actually here, maybe out in Albuquerque, where I'm at right now, because there's a large Indigenous population that has different terms as relates to their sexual orientation and gender identity.

While the term, for example, "two-spirit" has often been used, that its still seen a little bit of a Western name or Western construct, when a number of Indigenous tribes that receive care within the healthcare systems here have their own terms. Therefore, the SOGI forms and templates should reflect that terminology correctly.

Again, I highlight the importance of making sure that these concepts and categories are able to talk to other components of electronic health record system, I think most important, at least from my perspective, as a clinician researcher, this has to make make sense on the pathology side. So this needs to make sense when we're using certain reference ranges for lab results, when we're looking at particular types of pathology reports as it relates to cancers and such, then we need to make sure that whatever we collect for SOGI makes sense to the pathologist as well.

Then I will highlight a graph in terms of how SOGI can be incorporated into different parts of the workflow for the patient, as well as for staff. And then again, highlighting staff training is really important.

This is a great workflow in terms of— it notes different points where SOGI could be collected. There isn't just the one spot, and that's it. It could be done at the home where patients are accessing an electronic health record portal. For example, Epic has MyChart, Fenway has their own EHR where the people were able to access the webpage and provide the information.

This, of course, is not easy for everyone, there is going to be generational and socioeconomic differences in terms of who feels comfortable or has access to the technology to do this. But, that being said, research has shown that whenever people can provide information about themselves in their own home and their own setting, and not have to do it in front of somebody looking right at them, that's where they're going to feel most comfortable.

When they actually arrive, for example, for an in-person clinical encounter, they can do it with the registrar. Registrar's are trained, competent, and compassionate about making sure that information is collected. It is in a standard intake form, which indicates that everybody's going to be asked these questions. This isn't just because you happen to look like a gay man, therefore I'm going to ask you these questions. No, this is for everyone. And that needs to be really clearly done.

Further on, if somebody chooses not to answer those questions, or they answer those questions the clinician can also verify if it's important for the clinical encounter, and then it gets entered into the electronic health record system.

The same pathway really exists for telehealth, where again, the clinician—or depending on how people are setting up their telehealth encounters, whether they meet with a medical system first—people are able to offer that information to the clinical encounter, it gets recorded into electronic health record system and then downstream communication can happen.

I'm sharing a basic template, the template that we use in our electronic health record system. As you can see, it includes pronouns—getting people's gender identity, how they self identify—we're getting their sex assigned at birth, we're getting their sexual orientation, we can describe a number of components as in regards to their sexual partners. That way, it's not just: Do you have sex with men, women, or both? It's actually a little bit more in depth than that. And then we have the organ inventory.

This is something that took many years to develop in terms of making sure this made sense for our community and making sure it made sense for our system. But I want to highlight that this has anticipated concerns, and I know this will probably come up more during the Q&A. People are worried about offending patients. I've highlighted that patients are actually more comfortable answering these questions than clinicians are comfortable answering asking them, so I think we need to get clinicians to be more comfortable and patients will follow.

Also, patients will see that this data collection is an effort to better serve them. I would describe it as... us trying to keep our promise of precision medicine. For us to be more accurate, more precise in the care that we provide patients, we have to have this additional information.

Further, this will improve communication with patients. As I mentioned, when we collect pronouns correctly, when we collect information about people names correctly, we actually can then automate systems that will minimize potential friction. And I will tell you that systems, as they're currently built, create friction and distrust for patients. Every time I've had to correct pronouns within prior Epic templates, it was a step that I had to take to make sure that a patient wasn't offended or didn't trust the system.

I highlighted that this is medically relevant. There are a number of health disparities across LGBTQ populations. I think that goes without saying at this point. You're here as a part of a meeting with the American Medical Association. This work is supported by, essentially, every respected major medical association, health professional across the US and internationally, and now has more guidance from National Institutes for Health and National Academies. So again, this is not something that... we don't have to say "this isn't supported."

I think what I hear the most concerns around are privacy and confidentiality. There's a number of steps here that have to be taken with regards to making sure that all staff are trained to be caring and understand the importance of confidentiality. This is the same for all of our protected health information. I don't treat this as anything more sensitive than other demographic information, honestly. That being said, that means I treat all that other information... to the highest level of confidential restrictions and such.

I sit on an institutional review board and we really take all demographic information very seriously and not least of which SOGI. That means making sure that there are certain restricted views around demographic information or data in general, particularly on the backend for when researchers are trying to understand what's happening at the population level, you don't just give them all the demographic information upfront. They have to have a research question that makes sense for that information.

But also on the clinical side, thinking about who can go into the chart and see what information is there. I like to bring up the example of pediatric care. Pediatric patients can, ideally, as they grow, become more autonomous, take more control over their own health information, and enter this information into their chart.

That being said, oftentimes, parents or guardians have the same access or view of their child's chart, we have to think about ways of maybe protecting that information upfront. For example, minors are able to access STI testing and certain reproductive health care issues without parent or guardian approval. We have to make sure that the chart reflects that protection as well.

Lastly, with regards to HIPAA—and again, I think of this more on the research side, as well as the clinical side—is that we have to incorporate this into our definition of Protected Health Information as well. Right now, SOGI is really not... technically defined as a Protected Health Information category. But for those of us in research, and myself on the research ethics side of this, on a review board, we do treat SOGI as that, and therefore people have to have a justified reason for accessing that information for that individual level information.

There's been numerous reports out there—this is another report from the National Academies that highlights the value of collecting this data, why we need to be able to do it, how we can do it. When we do this, we can actually improve a variety of measures, not only around SOGI, but also all the downstream health components, and particularly around social determinants of health, which we are beginning to better understand for sexual-and gender-minority individuals.

This will fill in some of the data gaps we've had. I highlighted some of the research from the Behavioral Risk Factor Surveillance System, which I think are great sources of data for broadly LBGTQ populations. That said, it only goes back so many years, and currently, every state isn't asking such orientation and gender identity. We don't actually have a fully national sample of LGBTQ folks because this data is not being collected just yet.

Finally, we're trying to facilitate the comfort and ease of collecting this data. This is the purpose of this meeting... it's for me and other folks in the field to answer questions for you.

Lastly, this will, like I said at the very beginning, when we have the data, we can identify the problems and actually then develop programming and intervention. I think some examples I like to bring up that were hyperlocal were work done by Dr. Phoenix Matthews in Chicago, that developed for example, smoking intervention and smoking cessation programs for lesbian and bi women in Chicago.

[29:11] That work showed that when you actually incorporated somebody's identity and understood triggers as relates to their identity, you could actually then intervene and reduce the disparity between straight and lesbian/bi smoker rates.

[29:24] That's what we're hoping a lot of this data will allow us to do: Identify the problem and create a tailored intervention. There's many more resources out there, I'm actually a faculty for the Fenway Institute—I want to highlight that. But that being said, even before I joined the faculty this year, Fenway has always been one of the go-to's around sexual education, gender identity data collection.

[29:48] They recently updated their guide for the data collection implementation, as well as just the basics of some of the tools. And this just got updated a few months ago. So this is where I think people should really be looking to for a lot of resources.

As I mentioned, I've talked a lot, I want to make sure we have time for questions, and I will stop there.

Speaker 2: And just want to thank you for that really wonderful talk. We've been thinking a lot about this approach at MD Anderson, and one of the things that we've been wrestling with is the concept of this being sensitive information that only a physician or nurse practitioner or someone part of the core clinical team can talk to the patient about.

And certainly, when patients come to our institution, there are opportunities for them to interact with patient access specialists upfront that could collect the data. And even though those people aren't directly part of their care team, they are collecting a lot of the demographic things that we would consider HPI and protected information from a HIPAA perspective.

I'm wondering what your thoughts on who the folks are that are asking the questions, and whether or not that's come up, as you thought about rolling this out at your institution?

Streed: Great question. We've made a point of actually having a number of representatives from registrar's across different disciplines to make sure that they are providing input for how they would think this needs to be rolled out on their end. Because, as I mentioned, for the Fenway example, which has been implemented across dozens of health centers, they have a standard form. And that's what we're trying to get to at this point is making sure that this information is part of the standard intake process.

The registrars and the patient navigators, as you describe them, should be comfortable and should be able to incorporate this information into their interactions with the patient. People come in, more often, not by themselves, and we should be able to acknowledge who's with them.

Too often I've heard people say, "Oh, is this your brother? Or is this your sister?" and making assumptions about who's there with them rather than, based on information we already have, say, "Oh, so glad that your spouse could join you today to make sure we have a full conversation around your health care goals."

Again, this really gets to the need for staff training as part of this implementation. This is not just a data rollout. This is a system-wide effort to improve the quality of care that we're providing from, I would argue, even before somebody walks in the clinic, but especially from when they walk in the clinic to when they actually have their clinical encounter.

Speaker 2: Got it. Thank you.

Video Information

CME Disclosure Statement: Unless noted, all individuals in control of content reported no relevant financial relationships.

If applicable, all relevant financial relationships have been mitigated.

AMA CME Accreditation Information

Credit Designation Statement: The American Medical Association designates this Enduring Material activity for a maximum of 0.50  AMA PRA Category 1 Credit(s)™. Physicians should claim only the credit commensurate with the extent of their participation in the activity.

Successful completion of this CME activity, which includes participation in the evaluation component, enables the participant to earn up to:

  • 0.50 Medical Knowledge MOC points in the American Board of Internal Medicine's (ABIM) Maintenance of Certification (MOC) program;;
  • 0.50 Self-Assessment points in the American Board of Otolaryngology – Head and Neck Surgery’s (ABOHNS) Continuing Certification program;
  • 0.50 MOC points in the American Board of Pediatrics’ (ABP) Maintenance of Certification (MOC) program;
  • 0.50 Lifelong Learning points in the American Board of Pathology’s (ABPath) Continuing Certification program; and
  • 0.50 credit toward the CME [and Self-Assessment requirements] of the American Board of Surgery’s Continuous Certification program

It is the CME activity provider's responsibility to submit participant completion information to ACCME for the purpose of granting MOC credit.


Name Your Search

Save Search

Lookup An Activity


My Saved Searches

You currently have no searches saved.


My Saved Courses

You currently have no courses saved.